In testing software, I am always concerned about the risks. Especially the risks that get skipped all the time – performance testing and tuning, security, and those secret problems that nobody looks into. Logs can reveal those symptoms of future problems, like high cholesterol is a future problem for a person’s health. Not just the errors that did not get to the user interface, but the trends of cpu & memory usage.
I’ve used a new tool (new to me) called Splunk in the past few months. It’s a super logging system that doesn’t identify anything except it remembers what it is told. It’s versatile in the sources that it can capture from because of it’s web service interface. And what a way to search those logs, the output is beautiful for somebody that is looking for trends & problems.
Graylog2 is an open source alternative. Splunk may be free for many people and their use levels, but I wanted to go naked open source in my investigation of this kind of tool. Though it may not be the easiest for a person like me, I chose to work with this.
What is Graylog2?
Graylog2, in the form that I currently know it, is a graylog2 server (java), mongodb, graylog2 web server (rails), and a ruby gem called gelf. The server and the web server both use the same mongo database, which can be changed by parameters in the command line (typically a production, development, and test environment), and ElasticSearch (search engine). The logs are sent in through Gelf (though David O’Dell has identified other ways to get logs submitted) then stored in the logs.
The use can log into the web interface to created “filters” to make finding messages and activity trends easier.
This was a bear. Not any particular part (ok, I’m lying, some parts were hard), but the whole thing was monstrous because there are so many parts involved in something like this.
This was the easiest part of the installation. I downloaded & unzipped, installed openjdk-6-jre, and mongodb. I had to modify my sources list to properly install mongodb, an activity that I hadn’t done in about a year of deep Linux exposure. So far so good. It required ElasticSearch, which wasn’t hard for my to install. In fact, I don’t remember doing anything to elastic at all. I had some issue with getting the server started properly (claimed the port was being used), which I eventually overcame by killing the java process from the first attempt to start.
I had to create a new database instance in mongodb, which was a learning experience because it’s different and I had never used it before.
Graylog2 web interface
This was not as hard to get started. I downloaded & unzipped the rails app. This probably sounds funny but I don’t have much experience using rails. I ran rails s to start the app, had to bundle install, and eventually added the port parameter because I have too many rails running at the same time (redmine). The toughest part was getting the database settings in sync with those on the graylog2 server. Somehow I had not user but it wasn’t asking me to create one. After a couple hours of fooling with mongo db (I learned more about it than I originally intended, not a bad thing), I created a new “environment” on mongo db by creating a new one in mongoid.yml and starting with “rails s -e production”.
I thought that I was going to be installing some kind of monitor or creating a daemon, both of which I hadn’t done in earnest (unless you count autoit hacks). I looked at monit but decided to use the GELF that seems to integrate nicely with the server. To get that running, I downloaded the gem for it, and modified an example script that sent messages to graylog2. The key lines were these:
n = GELF::Notifier.new("localhost", 12201) n.notify!(:host => my_host, :level => msg_level, :short_message => msg, :_ip => my_system, :_method => my_method)
The script wasn’t hard to create, but getting the right inputs while processing a log file (I was prototyping from /var/log/syslog) took some time.
Overall, I thought that I could replicate this by parsing almost any long, certainly all the log4j default formatted logs. I know that I could also track resource levels (cpu, memory, network usage, etc.).
I am really pleased with what I learned from this experiment, but even more so about the possibilities for using something like this to keep track of all the “unseen” information. David Cooper once told me that metrics without a decision to be made are just numbers. I mostly agree, but there have to be decisions that don’t get made because people are completely unaware of the information around them. I hope to wise enough to use it in healthy doses.
I am disappointed that I didn’t get the email notifications working. I hooked it up to my gmail account but never saw an email despite high doses of “fake” log entries being jammed into the server. If I had more time, I’d be sniffing to see what is being sent to gmail (and replied) because the logs I saw were useless. Ironic, huh?
In addition, I would like to have that ability to have a rules engine that acts upon information. If the memory climbs too high, better to reset than let it freeze up. I did not see anything in that arena here. Maybe there is something else on the Test Stack for that?