logstash, Monitoring, Ubuntu

Real Time Web-Monitoring Using Lumberjack-Logstash-Statsd-Graphite

For the last few days i was playing around with my two of my favourite tools Logstash and StatsD. Logstash, StatsD, Graphite together makes a killer combination. So i decided to test this combination along with Lumberjack for Real time Monitoring. I’m going to use, Lumberjack as the log shipper from the webserver, and then Logstash will stash the log’s porperly and and using the statsd output plugin i will ship the metrics to Graphite. In my previous blog, i’ve explained how to use Lumberjack with Logstash. Lumberjack will be watching my test web server’s access logs.

By default, i’m using the combined apache log format, but it doesnot have the original response time for each request as well as the total reponse time. So we need to modify the LogFormat, in order to add the two. Below is the LogFormat which i’m using for my test setup.

LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\" %D %>D" combined

Once the LogFormat is modified, restart the apache service in order to make the change to be effective.

Setting up Logstash Server

First Download the latest Logstash Jar file from the Logstash site. Now we need to create a logstash conf file. By default there is a grok pattern available for apache log called “COMBINEDAPACHELOG”, but since we have added the tow new fields for the response time, we need to add the same for grok pattern also. So below is a pattern which is going to be used with Logstash.

pattern => "%{COMBINEDAPACHELOG} %{NUMBER:resptime} %{NUMBER:resptimefull}"

So the Logstash conf file will look like this,

input {
      lumberjack {
        type => "apache-access"
        port => 4444
        ssl_certificate => "/etc/ssl/logstash.pub"
        ssl_key => "/etc/ssl/logstash.key"
  }
}

filter {
  grok {
        type => "apache-access"
    pattern => "%{COMBINEDAPACHELOG} %{NUMBER:resptime} %{NUMBER:resptimefull}"
  }
}

output {
  stdout {
    debug => true
      }
  statsd {
    type => "apache-access"
    host => "localhost"
    port => 8125
    debug => true
    timing => [ "apache.servetime", "%{resptimefull}" ]
    increment => "apache.response.%{response}"
  }
}

Setting up STATSD

Now we can start setting up the StatsD daemon. By default, Ubuntu’s latest OS ships with newer verision of NodeJS and NPM. So we can install it using APT/Aptitude.

$ apt-get install nodejs npm

Now clone the StatsD github repository to the local machine.

$ git clone git://github.com/etsy/statsd.git

Now create a local config file “localConfig.js” with the below contents.

{
graphitePort: 2003
, graphiteHost: "127.0.0.1"
, port: 8125
}

Now we can start the StatsD daemon.

$ node /opt/statsd/stats.js /opt/statsd/localConfig.js

The above command will start the StatsD in foreground. Now we can go ahead with setting up the Graphite.

Setting up Graphite

First, let’s install the basic python dependencies.

$ apt-get install python-software-properties memcached python-dev python-pip sqlite3 libcairo2 libcairo2-dev python-cairo pkg-config

Then, we can start installing Carbon and Graphite dependencies.

        cat >> /tmp/graphite_reqs.txt << EOF
        django==1.3
        python-memcached
        django-tagging
        twisted
        whisper==0.9.9
        carbon==0.9.9
        graphite-web==0.9.9
        EOF

$  pip install -r /tmp/graphite_reqs.txt

Now we can configure Carbon.

$ cd /opt/graphite/conf/

$ cp carbon.conf.example carbon.conf

Now we need to create a storage schema.

        cat >> /tmp/storage-schemas.conf << EOF
        # Schema definitions for Whisper files. Entries are scanned in order,
        # and first match wins. This file is scanned for changes every 60 seconds.
        # [name]
        # pattern = regex
        # retentions = timePerPoint:timeToStore, timePerPoint:timeToStore
        [stats]
        priority = 110
        pattern = ^stats\..*
        retentions = 10s:6h,1m:7d,10m:1y
        EOF


$ cp /tmp/storage-schemas.conf /opt/graphite/conf/storage-schemas.conf

Also we need to create a log directory for graphite.

$ mkdir -p /opt/graphite/storage/log/webapp

Now we need to copy over the local settings file and initialize database

$ cd /opt/graphite/webapp/graphite/

$ cp local_settings.py.example local_settings.py

$ python manage.py syncdb

Fill in the necessary details including the super user details while initializing the database. Once the database is initialized we can start the carbon cache and graphite webgui.

$ /opt/graphite/bin/carbon-cache.py start

$ /opt/graphite/bin/run-graphite-devel-server.py /opt/graphite

Now we can access the dashboard using the url, “http://ip-address:8080&#8221;. Once we have started the carbon cache, we can start the Logstash server.

$ java -jar logstash-1.1.13-flatjar.jar agent -f logstash.conf -v

Once the logstash has loaded all the plugins successfully, we can start shipping logs from the test webserver using Lumberjack. Since i’ve enabled the STDOUT plugin, i can see the output coming from the Logstash server. Now we can start accessing the real time graph’s from graphite gui. There are several other alternative for the Graphite GUI like Graphene, Graphiti, Graphitus, GDash. Anyways Logstash-StatsD-Graphite proves to be a wonderfull combination. Sorry that i could not upload any screenshot for now, but i will upload soon

Standard
Debian, logstash, Monitoring

Lumberjack – a Light Weight Log Shipper for Logstash

Logstash is one of the coolest projects that i always wanted to play around. Since i’m a sysadmin, i’m forced to handle multiple apps, which will logs in different formats. The most weird part is the timestamps, where most of the app uses it’s own time formats. Logstash helps us to solve such situations, we can remodify the time stamp to a standard time format, we can use the predefined filter’s for filtering out the log’s, even we can create our own filter’s using regex. All the documentations are available in the Logstash website Logstash mainly has 3 parts, 1) INPUT -> from which the log’s are shipped to Logstash, 2) Filter -> for filtering our incoming log’s to suit to our needs, 3) Output -> For storing or relaying the Filtered output log’s to various Applications.

Lumberjack is one such input plugin designed for logstash. Though the plugin is still in beta state, i decided to give it a try. By default we can also use logstash itself for shipping logs to centralized Logstash server, the JVM made it difficult to work with many of my constrained machines. Lumberjack claims to be a light weight log shipper which uses SSL and we can add custom fields for each line of log which we ships.

Setting up Logstash Server

Download the latest the logstash jar file from the logstash website. Now create a logstash configuration file for the logstash instance. In the config file, we have to enable the lumberjack plugin. Lumberjack uses SSL CA to verify the server. So we need to generate the same for the logstash server. We can use the below mentioned command to generate the SSL certificate and key.

$ openssl req -x509 -newkey rsa:2048 -keyout /etc/ssl/logstash.key -out /etc/ssl/logstash.pub -nodes -days 3650

Below is the sample logstash conf file which i used for stashing logs from Socklog.

input {

  lumberjack {
    type => "qmail"
    port => 4545
    ssl_certificate => "/etc/ssl/logstash.pub"
        ssl_key => "/etc/ssl/logstash.key"
  }
}

filter {
  grok {
        type => "socklog"
        pattern => "%{DATA:logfacility}: %{SYSLOGTIMESTAMP:timestamp} %{DATA:program}: *"
  }
  mutate {
        replace => [ "@message", "%{mess}" ]
  }
  date {
        type => "socklog"
        match => [ "timestamp", "MMM dd HH:mm:ss" ]
  }
}

output {
  stdout {
    debug => true
      }
}

Now we can start the the logstash using the above config.

$ java -jar logstash-1.1.13-flatjar.jar agent -f logstash.conf -v

Once the logstash has started successfully, we can use netstat to check if it listening on port 4545. I’m currently running logstash in the foreground, below is the logoutput from logstash

Starting lumberjack input listener {:address=>"0.0.0.0:4545", :level=>:info}
Input registered {:plugin=><LogStash::Inputs::Lumberjack type=>"socklog", ssl_certificate=>"/etc/ssl/logstash.pub", ssl_key=>"/etc/ssl/logstash.key", charset=>"UTF-8", host=>"0.0.0.0">, :level=>:info}
Match data {:match=>{"@message"=>["%{DATA:logfacility}: %{SYSLOGTIMESTAMP:timestamp} %{DATA:program}: *"]}, :level=>:info}
Grok compile {:field=>"@message", :patterns=>["%{DATA:logfacility}: %{SYSLOGTIMESTAMP:timestamp} %{DATA:program}: *"], :level=>:info}
Output registered {:plugin=><LogStash::Outputs::Stdout debug_format=>"ruby", message=>"%{@timestamp} %{@source}: %{@message}">, :level=>:info}
All plugins are started and registered. {:level=>:info}

Setting up Lumberjack agent

On the machine from which we are going to ship the log’s, clone the Lumberjack github repo.

$ git clone https://github.com/jordansissel/lumberjack.git

Install the fpm ruby gem, which is required to build the lumberjack package.

$ gem install fpm

$ cd lumberjack && make

$ make deb   => This will build a debian package of the lumberjack

$ dpkg -i lumberjack_0.0.30_amd64.deb  => The package will install all the files to the `/opt/lumberjack`

Now copy the SSL certificate which we have generated at the Logstash server, to the Lumberjack machine. Once the SSL certificte has been copied, we can start the lumberjack agent.

$ /opt/lumberjack/bin/lumberjack --ssl-ca-path ./ssl/logstash.pub --host logstash.test.com --port 4545 /var/log/socklog/main/current

Below is the log output from the lumberjack.

2013-06-25T15:04:32.798+0530 Watching 1 files, setting open file limit to 103
2013-06-25T15:04:32.798+0530 Watching 1 files, setting memory usage limit to 1048576 bytes
2013-06-25T15:04:32.878+0530 Connecting to logstash.test.com(192.168.19.19):4545
2013-06-25T15:04:33.186+0530 slow operation (0.307 seconds): connect to 192.168.19.19:4545
2013-06-25T15:04:33.186+0530 Connected successfully to logstash.test.com(192.168.19.19):4545
2013-06-25T15:04:34.653+0530 Declaring window size of 4096
2013-06-25T15:04:36.734+0530 flushing since nothing came in over zmq

Now we will start getting the output from the Logstash in our screen, since we are using the ‘stdout’ output plugin. A very good detailed documentation about Lumberjack and Logstash can be found here, written by Brian Altenhofel. He had given a talk on this at Drupalcon 2013, Portland. The video for the talk is available here. It’s a very good blog post.

Standard