"Personal thoughts, ramblings, and nonsense from Drew, himself."
This post was written on December 11, 2007 at, or around 7:29 pm by Drew. This post is composed of 1,589 words from the English language and currently has 32 comments to its name. Additionally, this article is tagged under Apache, CentOS, htaccess, Sys Admin, The Notebook, UNIX/Linux, Workaround and you can trackback to this article using this link. This post was last updated on Mar 29, 2008. Enough talk, carry on.
You ever hear of Splunk? Splunk enables you to search and navigate all your logs and IT data in real time; logs, configurations, messages, traps and alerts, scripts and metrics. It’s an awesome tool to make it easier to monitor and watch your log files. Unfortunately, Splunk is expensive. How expensive? Try $5000 a year, for the cheapest license. Here’s the main problem; the free version of Splunk does not come with any user authentication, not even Admin authentication. This means that anyone can access your Admin area of Splunk, and can see any log files you have and can even set up new Splunks (log file watches). Let’s fix this!
I would have thought that a standard feature of Splunk would be at least Admin user authentication, but you can only get that with the professional version of Splunk. You have 30 days of the Professional version of Splunk, and then you must purchase the license after that. So, most individuals that just want to manage their log files remotely via the web, cannot afford and should not even purchase a Professional license, so the Free version is perfect. The lack of authentication kind of makes you turn your nose to Splunk, as this posses a security issue. Note one thing, when I say authentication, I mean username and password. You literally can access all admin features, including license information, just by going to the web address (which is usually a domain name on the default port 8000, e.g. http://domain.com:8000). This is totally rediculous. We can get around this by running a proxy within Apache and secure the subdomain (http://splunk.example.com/) with a .htaccess file.
The environment I’m running is Apache 2.x on a CentOS server and you must have root access to the server, as you will need to install Splunk and then make changes to the Apache server. Also, I presume that you already have a domain name and you are wanting to create a sub domain called splunk (splunk.domain.tld), that has some sort of user authentication.
Installing Splunk on a system using the RPM is very easy; almost too easy. First, you will want to download the current version of Splunk (3.1.3 at the time of writing). You can use compile it from the source if you would like, but this article will cover how to install Splunk using the RPM. After selecting the download you want (RPM), it will redirect you to a download page that will give your the wget URL for downloading Splunk; select and copy that full URL that it gives you. The link that I provide may be old, depending on when you read this post. Now, in your BASH prompt:
BASH
[root@server ~]# wget 'http://www.splunk.com/index.php/download_track?file=3.1.3/linux/splunk-3.1.3-28524-linux-2.6-x86_64.rpm&ac=&wget=true&name=wget&type=releases'
This will download Splunk into the current directory you are in. When the download has completed, you can start the install. The RPM install is the easiest, you just need to run one command:
BASH
[root@server ~]# rpm -i --force --prefix=/opt/splunk3.0/splunk splunk-path-to-rpm.rpm
You should see something close to the following:
BASH Output
----------------------------------------------------------------------
The Splunk Server has been installed in:
/opt/splunk3.0/splunk/splunk
To start the Splunk Server, run the command:
/opt/splunk3.0/splunk/splunk/bin/splunk start
To use Splunk's web interface, point your browser at:
http://server:8000
Complete documentation is at http://www.splunk.com/r/docs
----------------------------------------------------------------------
When you tell Splunk to start, it will create some files and directories and then check to see if SELinux is enforced. If you have SELinux enabled, then Splunk will not run correctly, and you will need to either disable SELinux, or configure SELinux to allow Splunk to run correctly (not covered in this article). You can temporarily stop SELinux, but unfortunately, Splunk looks at the selinux file, and checks to see if it is set to enforcing. If it is set to enforcing, then we will need to change this in the SELinux configuration file, which is located at /etc/sysconfig/selinux. Edit the selinux file and set the SELINUX=enforcing to SELINUX=disabled. Once you have done this, you will need to save the file and then stop SELinux in real-time, as changing the configuration file only tells SELinux to disable itself at boot-up. So, you will need to set run the command setenforce 0 to disable SELinux in real-time. If you do not do this, you can also reboot the system and it will take the new settings for SELinux.
As the documentation states, start the Splunk server:
BASH
[root@server ~]# /opt/splunk3.0/splunk/splunk/bin/splunk start
You will need to scroll down to the bottom of the license agreement and accept it to continue. It will run its init script and should start with no issues. After it starts, it will let you know that Splunk is running on port 8000 on the host name of the server; you can substitute the host name with the IP address of the server. In this case, the host name of the server is server, so we can access Splunk using http://server:8000. More than likely, you will actually have a domain name on a remote network/server, so you will access it by way of http://example.com:8000.
Remember, if this is a new server, you might not have Apache started and your firewall might cause issues when trying to access Splunk on your server. Make sure Apache is started by running /etc/init.d/httpd status If it is not running it will say httpd is stopped. You will need to start it by running /etc/init.d/httpd start. It should start with no issues. Now, try connecting to your server, by opening a web browser using http://server:8000 (or whatever your hostname is, in this case we are using server). This should display your Splunk startup screen. This means that Splunk has successfully been installed and is ready to be used! Congrats.
As stated before, Splunk doesn’t offer any user authentication by default, so we have to configure Apache to protect our Splunk logs so that no one else can view your log files, which can have some very valuable information in them. Let’s secure this drawback of the free version of Splunk and make it so that .htaccess can authorize a user login. In order to get this working, we have to configure Apache as a proxy server for the IP address and the server name.
Before we continue, you need to make sure that you have at least the mod_proxy, mod_proxy_http, and mod_proxy_connect Apache modules installed. Normally, these are installed and loaded by default, so you shouldn’t have to worry about this. To verify this, just type in httpd -M and make sure those modules are loaded.
Now, it’s time to actually setup the proxy. What we are going for here is to redirect any requests for the IP address and server name (such as a subdomain of splunk.example.com) and redirect it to localhost on the port that Splunk is running on and serve the .htaccess from a localhost connection.
Editing /etc/httpd/conf/httpd.conf
<virtualhost x.x.x.x:80>
ServerAdmin root@localhost
ServerAlias splunk.example.com
ProxyPass / http://127.0.0.1:8000/
ProxyPassReverse / http://127.0.0.1:8000/
ErrorLog logs/splunk.example.com-error_log
CustomLog logs/splunk.example.com-access_log common
</virtualhost>
<proxy http://127.0.0.1:8000/*>
Order deny,allow
Deny from all
Allow from all
AuthName "splunk.example.com"
AuthType Basic
AuthUserFile /var/www/.htpasswd.users
Require valid-user
</proxy>
Where x.x.x.x is your public IP Address.
Of course, you will need to configure this for you environment. Make sure you change the x.x.x.x to your public IP address and change example.com to you own domain. Additionally, if you would like to, you can change the splunk subdomain to whatever you would like to also. Just make sure you create and update you DNS information as needed. If you are going to have a splunk.example.com subdomain, make sure you have this configured in your DNS first before you do this (also allow for it to propagate). Also, make sure that you restart Apache, or else the new changes will not work:
Restarting Apache
/etc/init.d/httpd restart
With the above configuration, you told Apache to use the .htpasswd.users file in the /var/www directory. You can follow my other article on how to configure .htaccess. If you plan on storing your .htaccess/.htpasswd files somewhere else, you will need to update your httpd.conf file to reflect the absolute location.
Personally, I think the free version of Splunk should at least provide an admin user login, but that just isn’t something they are offering. Splunk, is very powerful and extremely helpful to see all your log files from one view. I don’t have alot of data that is written to my log files, however, the data that gets generated really helps to solve some issues. I guarantee that using Splunk will help you out greatly, especially if you have alot of custom logs that you are trying to manage manually.
How dare ye not let me know of new posts? :P
heh… I’m going to try this after I reinstall Ubuntu (the network driver reinstall iddin’t work to well. :P )
How dare ye not let me know of new posts? :P
heh… I’m going to try this after I reinstall Ubuntu (the network driver reinstall iddin’t work to well. :P )
Thought you had my RSS feed? Anyways, I’ve been quite busy with work. Haven’t talked with you online in awhile. Hope all is well.
How dare ye not let me know of new posts? :P
heh… I’m going to try this after I reinstall Ubuntu (the network driver reinstall iddin’t work to well. :P )
Thought you had my RSS feed? Anyways, I’ve been quite busy with work. Haven’t talked with you online in awhile. Hope all is well.
Well, aside from the fact that I killed ubuntu after a week of using it. :P Not much. Got a lot of projects I’m starting in the next few weeks so I’ll be busy too.
I am very interested in using this method. Free Splunk should come with a basic one-user authantication. I bet that admin is going to get hacked because of the information in Splunk and it will look bad on Splunk.
What webserver does Splunk use by default? Something embedded? I installed 3.x on my Ubuntu 7.10 server, and the web-interface started working without installing apache.
Will your setup still work?
Thanks,
Tristan
I am very interested in using this method. Free Splunk should come with a basic one-user authantication. I bet that admin is going to get hacked because of the information in Splunk and it will look bad on Splunk. What webserver does Splunk use by default? Something embedded? I installed 3.x on my Ubuntu 7.10 server, and the web-interface started working without installing apache. Will your setup still work?
Agreed. Splunk uses its own AppServer. Have a look at two files:
/opt/splunk/etc/bundles/default/web.conf
/opt/splunk/etc/bundles/default/server.conf
Splunk itself starts its AppServer and runs its Python Code core. I wouldn’t know how to do this without Apache though, as you are configuring Apache as the proxy, therefore, you need Apache for the proxy. You can install Apache pretty easily with yum or apt-get (if you’re running Ubuntu, apt-get is what you will probably use). You can install a base install of Apache without configuring a huge LAMP server.
So, overall, this will not work (that I know of) without Apache, or some type of webserver that can serve as a web proxy.
Let me know how it goes.
Regards,
Drew
I am very interested in using this method. Free Splunk should come with a basic one-user authantication. I bet that admin is going to get hacked because of the information in Splunk and it will look bad on Splunk. What webserver does Splunk use by default? Something embedded? I installed 3.x on my Ubuntu 7.10 server, and the web-interface started working without installing apache. Will your setup still work?
Agreed. Splunk uses its own AppServer. Have a look at two files:
/opt/splunk/etc/bundles/default/web.conf
/opt/splunk/etc/bundles/default/server.confSplunk itself starts its AppServer and runs its Python Code core. I wouldn’t know how to do this without Apache though, as you are configuring Apache as the proxy, therefore, you need Apache for the proxy. You can install Apache pretty easily with yum or apt-get (if you’re running Ubuntu, apt-get is what you will probably use). You can install a base install of Apache without configuring a huge LAMP server.
So, overall, this will not work (that I know of) without Apache, or some type of webserver that can serve as a web proxy.
Let me know how it goes.
Regards,
Drew
lots of hacking and someone really good at python
I didn’t try this yet, but was wondering, does this method stop someone from going to http://yourdomain.com:8000 and getting in? It looks like it just adds another URL where authentication would be required.
Thanks,
Don
I didn’t try this yet, but was wondering, does this method stop someone from going to http://yourdomain.com:8000 and getting in? It looks like it just adds another URL where authentication would be required.
Nope. You should have firewall rules running. You should (by default) block anything on your public network, and only allow what you want in, such as port 80 for web traffic, port 22 for SSH (I’ve changed mine to something else), etc.
Most people, like me, I allow any traffic on my loopback device (device lo), so I can have the proxy running on this, and forward all the splunk traffic on this. So when a request comes in for the domain, that points to a proxy, then the IP/domain isn’t allowing or listening on port 8000, but your loopback device is.
Any questions, or help, let me know.
Regards,
Drew
Drew, thanks for your prompt replies. Let me see if I understand this process.
1. Install splunk, and start the built-in python web-server. By default this runs on TCP port 8000.
2. Install Apache2. Configure an httpd.conf file almost identical to the one you posted. This will tell Apache to listen on port 80 and send any received requests to port 8000. The user will only see the port 80 traffic.
Does that sound correct?
If I can get this working with http, I will then try it with https.
Thanks,
Tristan
I guess an additional step would be to configure your firewall to allow port 80 incoming, and deny all other ports (including port 8000).
Hooray! I got this working on Ubuntu 7.10.
I am going to configure SSL on Apache and leave Splunk without SSL (since that communication is local to the server).
Thanks for posting this information, Drew. I might be creating a wiki page for Ubuntu soon.
I guess an additional step would be to configure your firewall to allow port 80 incoming, and deny all other ports (including port 8000).
Tristan,
SO sorry about the lack of replies. I never got an email stating you had replied, so I didn’t notice. I’m glad you got it working.
Good luck on your Ubuntu wiki and let me know if you need any suggestions or help.
Regards,
Drew
where do you place the .htaccess file? With Splunk, it’s very difficult to figure out where the actual “home” directory is located.
thanks in advance for your assistance!
where do you place the .htaccess file? With Splunk, it’s very difficult to figure out where the actual “home” directory is located.
You can put your .htaccess file really anywhere, as long as apache can read it (permissions). I recommend somewhere like /var/www. What distibution of Linux are you running? Let me know if you have anymore questions.
Regards,
Drew
Great writeup… I was wondering, to protect access to port 8000 on the LAN.. could you use iptables and only allow the loopback access (since the apache server will tunnel you through after AUTH?)
I am considering this setup, just not sure if we have too much data…is 80mb in logs going to exceed the indexed amount? (500mb)
Thanks and sorry for the off-topic question.
Matteo
is 80mb in logs going to exceed the indexed amount? (500mb)
This should be fine. It should not exceed the indexed amount.
You can protect access through port 8000 on the LAN, if you would like to, with IPTables. You can configure Apache to only listen to the “virtual host” of the Splunk directory on localhost, also.
Configure SELinux
If you have SELinux active on your system, you must add Splunk to the list of authenticated applications that can run in your SELinux environment.
To configure SELinux to allow Splunk to run, you need to run the
chcon command on the Splunk lib directory. Here is what you type :
chcon -c -v -R -u system_u -r object_r -t lib_t $SPLUNK_HOME/lib 2>&1 > /dev/null
You must also disable the check when Splunk starts by adding this line
to $SPLUNK_HOME/etc/splunk-launch.conf.
SPLUNK_IGNORE_SELINUX=1
Found that on Splunks website:
http://www.splunk.com/doc/3.3/installation/SplunkSELinux
I am getting:
503 Service Temporarily Unavailable
I have done exactly what this tutorial says. Also tried small variants i found online. Any help?
Drew,
I followed your instructions on a CentOS 5.2 + Splunk 3.3 box and they worked great. I ended up having to switch to Ubuntu for unrelated reasons, and i cannot get this setup to work for the life of me. Can somebody help me adapt these instructions to Ubuntu Server 8.04? Much appreciated!
I am getting: 503 Service Temporarily Unavailable
What do you log files tell you? Try looking in /var/log/httpd/error_log
I ended up having to switch to Ubuntu for unrelated reasons, and i cannot get this setup to work for the life of me. Can somebody help me adapt these instructions to Ubuntu Server 8.04? Much appreciated!
Do you still need help with this? If so, I am willing to set up a VM at home this week and write an updated Ubuntu version of this for you. Let me know if you still need assistance.
Regards,
Drew D.
Drew,
I know this is relatively old post by now…but I’m in the process of configuring Splunk as described above and am REALLY close…
My configuration is slightly different from yours in that I want http://myhost.com/splunk/ to be interpreted instead of the root directory that you describe above.
Apache seems to be fine with this, but none of the dynamically-fetched data (aka everything useful) seems to load whenever I map anything but the root dir to my Splunk server.
Is there another address I have to declare in my httpd.conf? Right now its just the /splunk/ ProxyPass and ProxyPassReverse directives.
Thanks!!!
Apache seems to be fine with this, but none of the dynamically-fetched data (aka everything useful) seems to load whenever I map anything but the root dir to my Splunk server.
So, are you saying that you get Splunk to work or are you saying that it doesn’t work? What do you mean that none of the dynamically-fetched data (aka everything useful) seems to load.
I can help you with the configuration and such, but you need to give me more information. You can also provide screenshots and link to imageshack.us or something. I’m more than willing to get you setup and going with this. Just need more information.
Regards,
Drew D.
Drew,
Thanks for the step-by-step instruction! Worked out really good for me.
Quick question. How come when i go directly to the url (http://:8000) it doees not prompt for a login?
Thanks!!
Drew,
I got it running. I made Splunk to listen locally on 127.0.0.1. Now, nobody can go directly to port 8000 using the hostname. They will get a failed to connect.
Thanks! =)
I got it running. I made Splunk to listen locally on 127.0.0.1. Now, nobody can go directly to port 8000 using the hostname. They will get a failed to connect.
RAM,
Great to hear. I apologize for not getting back to you fast enough. Things have been quite crazy, so I am glad you figured out the issue. Hope my article was easy enough to understand.
Regards,
Drew D.
You can tell splunk to bind only to localhost by setting SPLUNK_BINDIP=127.0.0.1 before starting it.
Fantastic. Adding BIND before starting up, then adding the info as described to httpd.conf made it all work first time.
Richard,
Thanks for the kind words! Anytime I get great comments, it makes me feel that my time here writing from my experiences are well worth the effort.
Regards,
Drew
Smashing! Thank you for this solution … clearly written and understandable.
I just installed Splunk today – the first thing I did was try to find a password or authentication option. Eeek – none!
So finding your solution was ideal. Worked “out of the box” first time.
Thanks!
Smashing! Thank you for this solution … clearly written and understandable.
…
So finding your solution was ideal. Worked “out of the box” first time.
Dean,
Thanks! Glad it helped you out. It’s always nice hearing awesome results from my experiences. Makes me feel like I actually did something productive!
Have a great day.
Drew
@Josh: SPLUNK_BINDIP will affect splunkd (the splunk daemon) and not splunkweb (the python web interface, what you really want to proxy).
which is a good thing and what you probably want to do anyway, but not quite the same thing.
to make splunkweb bind to localhost only, you have to edit etc/system/local/web.conf
# Host values may be any IPv4 or IPv6 address, or any valid hostname.
# The string ‘localhost’ is a synonym for ‘127.0.0.1′ (or ‘::1′, if
# your hosts file prefers IPv6). The string ‘0.0.0.0′ is a special
# IPv4 entry meaning “any active interface” (INADDR_ANY), and ‘::’
# is the similar IN6ADDR_ANY for IPv6. The empty string or None are
# not allowed.
#server.socket_host = 0.0.0.0
server.socket_host = 127.0.0.1
as a side note, you can also define SPLUNK_BINDIP in etc/splunk-launch.conf.
@Luke, @Drew:
you cannot reverse-proxy splunk to something different than root (as in “/something/”), because the HTML code makes lots of reference to “/” (e.g. /script.js”).
You can put all your Proxy directives inside a (Name)VirtualHost block, though. Works like a charm.
Hope that helps,
-a
Note: If this is your first time commenting on my site, there will be a delay, as I have to approve your comment.
Ajax AOL/AIM Apache Applications Architecture Career Case Projects CentOS CSS Debian Design File Systems Google Hacks Hardware Humor JavaScript Life Management Movies Networking Open-Standards Personal PHP Programming Registry Samba Security Shell Scripting Software Sys Admin Tech The Notebook Tips UNIX/Linux Virtualization VMWare VPC (MS Virtual PC) Walkthrough Web Apps Windows Work Workaround XHTML XHTML 2.0