Tag Archives: Splunk

Monitoring (Graylog2)

In testing software, I am always concerned about the risks. Especially the risks that get skipped all the time – performance testing and tuning, security, and those secret problems that nobody looks into. Logs can reveal those symptoms of future problems, like high cholesterol is a future problem for a person’s health. Not just the errors that did not get to the user interface, but the trends of cpu & memory usage.

I’ve used a new tool (new to me) called Splunk in the past few months. It’s a super logging system that doesn’t identify anything except it remembers what it is told. It’s versatile in the sources that it can capture from because of it’s web service interface. And what a way to search those logs, the output is beautiful for somebody that is looking for trends & problems.

Graylog2 is an open source alternative. Splunk may be free for many people and their use levels, but I wanted to go naked open source in my investigation of this kind of tool. Though it may not be the easiest for a person like me, I chose to work with this.

What is Graylog2?

Graylog2, in the form that I currently know it, is a graylog2 server (java), mongodb, graylog2 web server (rails), and a ruby gem called gelf. The server and the web server both use the same mongo database, which can be changed by parameters in the command line (typically a production, development, and test environment), and ElasticSearch (search engine). The logs are sent in through Gelf (though David O’Dell has identified other ways to get logs submitted) then stored in the logs. FakeLogMessages

The use can log into the web interface to created “filters” to make finding messages and activity trends easier.

BasicFilter In addition, users can ‘search’ for logs by creating what appear to be mongo-like queries to the database through the web interface.Query

Installation

This was a bear. Not any particular part (ok, I’m lying, some parts were hard), but the whole thing was monstrous because there are so many parts involved in something like this.

Graylog2 Server

This was the easiest part of the installation. I downloaded & unzipped, installed openjdk-6-jre, and mongodb. I had to modify my sources list to properly install mongodb, an activity that I hadn’t done in about a year of deep Linux exposure. So far so good. It required ElasticSearch, which wasn’t hard for my to install. In fact, I don’t remember doing anything to elastic at all. I had some issue with getting the server started properly (claimed the port was being used), which I eventually overcame by killing the java process from the first attempt to start.
I had to create a new database instance in mongodb, which was a learning experience because it’s different and I had never used it before.

Graylog2 web interface

This was not as hard to get started. I downloaded & unzipped the rails app. This probably sounds funny but I don’t have much experience using rails. I ran rails s to start the app, had to bundle install, and eventually added the port parameter because I have too many rails running at the same time (redmine). The toughest part was getting the database settings in sync with those on the graylog2 server. Somehow I had not user but it wasn’t asking me to create one. After a couple hours of fooling with mongo db (I learned more about it than I originally intended, not a bad thing), I created a new “environment” on mongo db by creating a new one in mongoid.yml and starting with “rails s -e production”.

GELF

I thought that I was going to be installing some kind of monitor or creating a daemon, both of which I hadn’t done in earnest (unless you count autoit hacks). I looked at monit but decided to use the GELF that seems to integrate nicely with the server. To get that running, I downloaded the gem for it, and modified an example script that sent messages to graylog2. The key lines were these:

n = GELF::Notifier.new("localhost", 12201)
n.notify!(:host => my_host, :level => msg_level, :short_message => msg, :_ip => my_system, :_method => my_method)

The script wasn’t hard to create, but getting the right inputs while processing a log file (I was prototyping from /var/log/syslog) took some time.

OneFakeLogMessage

Overall, I thought that I could replicate this by parsing almost any long, certainly all the log4j default formatted logs. I know that I could also track resource levels (cpu, memory, network usage, etc.).

Summary

I am really pleased with what I learned from this experiment, but even more so about the possibilities for using something like this to keep track of all the “unseen” information. David Cooper once told me that metrics without a decision to be made are just numbers. I mostly agree, but there have to be decisions that don’t get made because people are completely unaware of the information around them. I hope to wise enough to use it in healthy doses.Analytics

I am disappointed that I didn’t get the email notifications working. I hooked it up to my gmail account but never saw an email despite high doses of “fake” log entries being jammed into the server. If I had more time, I’d be sniffing to see what is being sent to gmail (and replied) because the logs I saw were useless. Ironic, huh?

In addition, I would like to have that ability to have a rules engine that acts upon information. If the memory climbs too high, better to reset than let it freeze up. I did not see anything in that arena here. Maybe there is something else on the Test Stack for that?

Test Stack

Earlier this year I started thinking more about the test stack. The test stack is heavily integrated with the dev/CM stack. The automated tests run using a language, library, or languages and libraries. People edit the test files in editors or IDEs. The test code is stored in source control systems. Test are run by continuous integration environments, the results are stored. Deployment systems create VM’s for the system under test and for running tests. Testers tie their changes checked into source control to requirements and features in the issue tracking system.

I use these parts, these components, in my job. I have used different versions of them, different brands/sources, and in different ways. But until last week, almost all of my energy was focused on learning about the automated test languages & libraries. In some cases, I used a “stack in a box” that we can get from a vendor like Compuware or HP. Other times I was provided those components by IT departments and CM professionals.

But I decided earlier this year to learn more about that stack. I will not become an expert on those components. But I will install some of them, and integrate them, and get them all working together on my laptop (or from my laptop).

I will try to drop some insight into my experience with them. I have already worked with Windows, Mac, and Linux, including the on VM’s that software under test and test software can run on. I set up Git as a source control repository. This past weekend, I installed Jenkins that runs deployments and tests everytime code is checked into my Git. Last, I installed Redmine that I can record features that I will create. Those will be the parts I write.

I may install Chef, which deploys VM’s and configures them by deploying SUT and test software. I haven’t decided on that, or what kind of monitoring software (such as Splunk). If you have suggestions, I would appreciate reading them and possibly being influenced by them. I can’t do it all, but I want to know enough that I can better build one should I ever be in the position of a small office.

Update: I want to add database, of which I am using sqlite3 right now. Of course there is data loading.

Update 2: Somebody told me about flyaway, which is a database (scheme?) versioning application.

Continuous Integration (Jenkins)

Monitoring (Graylog2)