CollectD, Elasticsearch, Kibana, logstash, Monitoring, Redis

Monitoring Redis Using CollectD and ELK

Redis is an open-source, networked, in-memory, key-value data store. It’s being heavily used every where from Web stack to Monitoring to Message queues. Monitoring tools like Sensu already has some good scripts to Monitor Redis. Last Month during PyCon 2014 @Plivo, opensourced a new rate limited queue called SHARQ which is based on Redis. So apart from just Monitoring checks, we decided to have a tsdb of what’s happening in our Redis Cluster. Since we are heavily using ELK stack to visualize our infrastructure, we decided to go ahead with the same.

CollectD Redis Plugin

There is a cool CollectD plugin for Redis. It pulls a verity of Data from Redis which includes, Memory used, Commands Processed, No. of Connected Clients and slaves, No. of blocked Clients, No. of Keys stored/db, uptime and challenges since last save. The installation is pretty simple and straight forward.

$ apt-get update && apt-get install collectd

$ git clone https://github.com/powdahound/redis-collectd-plugin.git /tmp/redis-collectd-plugin

Now place the redis_info.py file onto the collectd folder and enable the Python Plugins so that collectd can use this python file. Below is our collectd conf

Hostname    "<redis-server-fqdn>"
Interval 10
Timeout 4
Include "/etc/collectd/filters.conf"
Include "/etc/collectd/thresholds.conf"
LoadPlugin network
ReportStats true

        LogLevel info

Include "/etc/collectd/redis.conf"      # This is the configuration for the Redis plugin
<Plugin network>
    Server "<logstash-fqdn>" "<logstash-collectd-port>"
</Plugin>

Now copy the redis python plugin and the conf file to collectd folder.

$ mkdir /etc/collectd/plugin            # This is where we are going to place our custom plugins

$ cp /tmp/redis-collectd-plugin/redis_info.py /etc/collectd/plugin/

$ cp /tmp/redis-collectd-plugin/redis.conf /etc/collectd/

By default, the plugin folder in the redis.conf is defined as ‘/opt/collectd/lib/collectd/plugins/python’. Make sure to replace this with the location where we are copying the plugin file, in our case “/etc/collectd/plugin”. Now lets restart the collectd daemon to enable the redis plugin.

$ /etc/init.d/collectd stop

$ /etc/init.d/collectd start

In my previous Blog, i’ve mentioned how to enable and use the ColectD input plugin in Logstash and to use Kibana to plot the data coming from the collectd. Below are the Data’s that we are receiving from the CollectD on Logstash,

  1) type_instance: blocked_clients
  2) type_instance: evicted_keys
  3) type_instance: connected_slaves
  4) type_instance: commands_processed
  5) type_instance: connected_clients
  6) type_instance: used_memory 
  7) type_instance: <dbname>-keys
  8) type_instance: changes_since_last_save
  9) type_instance: uptime_in_seconds
10) type_instance: connections_received

Now we need to Visualize these via Kibana. Lets create some ElasticSearch queries so that visualize them directly. Below are some sample queries created in Kibana UI.

1) type_instance: "commands_processed" AND host: "<redis-host-fqdn>"
2) type_instance: "used_memory" AND host: "<redis-host-fqdn>"
3) type_instance: "connections_received" AND host: "<redis-host-fqdn>"
4) type_instance: "<dbname>-keys" AND host: "<redis-host-fqdn>"

Now We have some sample queries, lets visualize them.

Now create histograms in the same procedure by changing the Selected Queries.

Advertisements
Standard
Monitoring, NodeJS, Sensu

UCHIWA – Awesome Dashboard for Sensu

It’s been more than a year since i’ve started playing with Sensu. It was one of coolest monitoring projects that i’ve ever worked with. Perfect for Cloud infrastructure and backed by a cool community. Though Sensu is still in Juvenile (v 0.12) state, its mature enough to tackle majority of the monitoring issues. Recently i started migrating all our monitoring from traditional Nagios to Sensu and the pretty cool thing with sensu is, we can directly use the Nagios plugins with Sensu. That’s an easy task for migration. We don’t have to rebuild all the check’s to make with Sensu. But most of the people outside was pointing to the sensu’s default dashboard. Though the dashboard doesn’t looks pretty fancy, it can do all the functions. But having a good dashboard which can display the current status is always a time saver.

So i was being searching for a good dashboard and the first choice on the google search was Sensu-Admin. A Rails project, which needs a backend DB. But still i was not satisfied with it, and i started looking out for something different. The second choice was sabisu. Sabisu uses Cloudant’s hosted Couchdb with Lucene. We just need to store all the events in a Redis List and a custom script which reads the data from the Redis List and pushes it to the Cloudant’s CouchDB. So we basically need a Cloudant account and the Webapp makes Lucene queries to the Cloudant DB and displays the results on the Sabisu dashboard. Though i tried to rebuild the same setup locally, like running a CouchDB+Lucene locally and sending the same data to the local couchdb. With some codehack’s i was able to make the webapp talk to my local CouchDB and display the results on the dashboard.

But then, I found a super cool dashboard project called UCHIWA which was started in Github a few days back by Simon Plourde. Uchiwa is simple dashboard built with NodeJS and uses SocketIO for real time updates. The screenshot’s looks super cool and i decided to give it a try. It has only one dependency, NodeJS, no backend DB’s required as it talks to the Sensu’s API in realtime.

Setting Up NodeJS

For Ubuntu, we can use the chris-lea’s PPA.

apt-get install python-software-properties        # required for "apt-add-repository" binary

apt-add-repository ppa:chris-lea/node.js

apt-get update

apt-get install nodejs

now we have the latest NodeJS on our system, we can start setting up Uchiwa.

Setting Up Uchiwa Dashboard

Uchiwa’s source is available in Github.

git clone https://github.com/palourde/uchiwa.git

Once cloned, the repository contains the “package.json” file which contains the list of necessary dependencies. We can use “npm” (node package manager) to install all these.

cd uchiwa

npm install

Now we need to create a config file for the app. There is a sample config file available in the repo. So we need to mention the Sensu’s API IP and Port number and also the auth credentials if any. Plus auth credentials for accessing Uchiwa Dashboard page.

once all these are set, we are done. We just need to start the service.

node app.js 

We can access the page via http://localhost:3000/, or we can proxy pass from the webserver. Instructions for Nginx is available on the Readme of the project. Now we need to keep this app running all the time. So it’s better to create a init/upstart process for the same, so that the process will start automatically when the system reboots. There is a cool Node project called forever which is a simple CLI tool for ensuring that a given script runs continuously.

I’ve created an upstart script for Uchiwa, bu putting a conf file “uchiwa.conf” in the “/etc/init” directory. Below is the content for the conf file. Once the file is in place, we have to do a reload of the upstart configuration. initctl reload-configuration will do the trick.

description "uchiwa - dashboard for sensu"
env APP_PATH = "/usr/local/uchiwa/"

start on startup
stop on shutdown

script
  cd $APP_PATH
  exec forever start app.js
end script

Uchiwa looks pretty cool and neat and it has the stash support also. There are couple of addons required like “Downtime”, but Uchiwa is a pretty new project and i’m sure that this project is gonna grow soon. It has already received 99 stars on the Github. Kudos to Simon Plourde for Open Sourcing this awesome project.

Standard
Elasticsearch, Kibana, logstash, Monitoring, Plivo, SIP, Ubuntu, Voip

Extending ELK Stack to VOIP Infrastructure

Being a DevOps guy, i always love metrics. Visualized metrics gives a good picture of what’s happening in our live battle stations. There are now a quite lot of Open Source tools for monitoring and visualizing. It’s more than a year since i’ve started using Logstash. It never turned me down. ElasticSearch-Logstash-Kibana (ELK) is a killer combination. Though i started Elasticsearch + Logstash as a log analyzer, later StatsD and Graphite took it to the next level. When we have a simple infrastructure it’s easy to monitor. But when the infra starts scaling, it becomes quite difficult to keep track of all the events happening inside each nodes. Though service checks can help, but there is still limitation for it. I faced a lot of scenarios where things breaks but service checks will be fine. Under such scenarios logs are the only hope. They have all these events captured.

At Plivo, we manage a variety of servers from SIP, Media, Proxy, WebServers, DB’s etc. Being a fully Cloud based system, i really wanted to have a system which can keep track of all the live events/status of what’s really happening inside our infra. So my plan was to collect two important stats, 1) Server’s events 2) Application events.

Collectd and Logstash

Collectd is a daemon which collects system performance statistics periodically. Since we have a lot Server’s which handle Realtime Media, it’s a very critical component for us. We need to ensure that the server’s are not getting overloaded and there is no latency in network. I’ve been using Logstash heavily for stashing all my logs. And there is a stable input plugin for collectd to send the all the system metrics to logstash.

First we need to enable the Network Plugin, and then we need to mention our Logstash server IP and port so that collectd can start injecting metrics. Below is a sample colectd configuration.

Hostname    "test.plivo.com"
Interval 10
Timeout 4
Include "/etc/collectd/filters.conf"
Include "/etc/collectd/thresholds.conf"
ReportStats true
    LogLevel info
LoadPlugin interface
LoadPlugin load
LoadPlugin memory
LoadPlugin network
<Plugin interface>
    Interface "eth0"
    IgnoreSelected false
</Plugin>
<Plugin network>
    Server "{logstash_server_ip}" "logstash_server_port"    # if no port number is mentioned, it will take the default port number (25826)
</Plugin>

Now on the Logstash server, we need to add the CollectD plugin on to the input filter in the logstash’s config file.

input {
      collectd {
      port => "5555"    # default port is 25826
      }
}

Now we are set. Based the plugins enabled in the collectd config file, collctd will start sending the metrics to Logstash on the Interval mentioned in the config, default is 10s. So in my case, i wanted the Load, CPU usage, Memory usage, Bandiwdth (TX and RX) etc. There are default plugins for all these metrics, which we can just enable it in the config file. We also had some custom plugins to collect some custom metrics. BTW writing custom plugin is pretty easy in Collectd.

Now using the Logstash’s Elasticsearch output plugin, we can keep these metrics in Elasticsearch. Now this where Kibana comes in. We can start visualizing these metrics via Kibana. We need to create a custom Lucene Query. Once we have the query, we can create a custom histogram’s for each of these queries. Below aresome sample Lucene queries that we can use with Kibana.

For Load -> collectd_type:"load" AND host:"test.plivo.com"
For Network usage -> collectd_type:"if_octets" AND host:"test.plivo.com"

Below is the screenshot of histogram for Load and Network (TX and RX)

Log Events

Now next is to collect the events from the application logs. We use SIP protocol for all our VOIP sessions. So all our SIP server’s are very critical for us. SIP is pretty similar to HTTP. The response codes are very similar to HTTP responses, ie 1xx, 2xx, 3xx, 4xx, 5xx, 6xx. So i wrote some custom grok patterns so keep track of all of these responses and stores the same on the Elasticsearch.

The second stats which i was interested was our SIP registrar server. We provide SIP endpoints to our customers so that they can use the same with SIP/Soft phones. So i was more interested on stats like Number of registrations/sec, Auth error rates. Plus using ElasticSearch’s MAP facet’s i can create BetterMap. In my previous blog post’s i’ve mentioned on how to create these bettermaps using Kibana and Elasticsearch. Below bettermap screenshot shows us the SIP endpoint registrations from various locations in the last 2 hours.

Now using the Kibana we can start visualizing all these data’s. Below is a sample of Dashboard that i’ve created using Kibana.

ELK stack proved to be an amazing combination. We are currently injecting 3 million events every day and ElasticSearch was blazingly fast in indexing all theses.

Standard
Elasticsearch, Kibana, logstash, Monitoring

Near RealTime Dashboard with Kibana and Elasticsearch

Being in DevOps it’s always Multi tasking. From Regular customer queries it goes through Monitoring, Troubleshooting etc. And offcourse when things breaks, it really becomes core multi tasking. Especially when you have a really scaling infrastructure, we should really understand what’s really happening in our infrastructure. Yes we do have many new generation cloud monitoring tools like Sensu, but what if we have a near real time system that can tell us the each and every events happeing in our infrastructure. Logs are the best places where we can keep track of the events, even if the monitoring tools has missed it. We have a lot of log aggregator tools like tool Logstash, Splunk, Apache Kafka etc. And for log based event collection the common choice will be always Logstash -> StatsD -> Graphite. And ElasticSearch for indexing these.

My requirement was pretty straight. Record the events, aggregate them and keeps track of them in a timely manner. Kibana uses ElasticSearch facets for aggregating the search query results.Facets provide aggregated data based on a search query. So as a first task, i decided to visualize the location of user’s who are registering their SIP endpoints on our SIP registrar server. Kibana gives us a good interface for the 2D heat map as well as a new option called BetterMap. Bettermap uses geographic coordinates to create clusters of markers on map and shade them orange, yellow and green depending on the density of the cluster. So from the logs, i just extracted the register events, and used a custom regex patterns to extract the details like the Source IP, usernames etc using logstash. Using the logstash’s GeoIP filter, the Geo Locations of the IP can be identified. For the BetterMap, we need coordinates, in geojson format. GeoJSON is [longitude,latitude] in an array. From the Geo Locations that we have identified in the GeoIP filter, we can create this GeoJSON for each event that we are receiving. Below is a sample code that i’ve used in logstash.conf for creating the GeoJSON in Logstash.

if [source_ip]  {
    geoip {
      type => "kamailio-registers"
      source => "source_ip"
      target => "geoip"
      add_field => ["[geoip][coordinates]","%{[geoip][longitude]}"]
      add_field => ["[geoip][coordinates]","%{[geoip][latitude]}"]
    }
    mutate {
      convert => [ "[geoip][coordinates]", "float" ]
    }
  }

The above filter will create a GeoJSON array “geoip.coordinates”. This array can be used for creating the BetterMap in Kibana. Below are the settings for creating a BetterMap panel in the Kibana dashboard. While adding a new panel, select “bettermap” as the panel type, and the co-ordinate filed should be the one which contains the GeoJSON data. Make sure that the data is of the format [longitude,latitude], ie Longitude first and then followed by latitude.

Moving ahead, i decided to collect the events happening on our various other server’s. We were one of the earliest companies who started using SIP (Session Initiation Protocol). SIP employs design elements similar to the HTTP request/response transaction model. So similar to web traffic, i’ve decided to collect events related to 4XX, 5XX and 6XX error responses, as it is very important to us. Once the logs are shipped to logstash, i wrote another custom grok pattern, which extracts the Error Code and Error responses, including the server which returned the same. These data’s can be used for future analysis also. So i decided to store these on ElasticSearch. So now we have the real time event data’s stored, but how to visualize. Since i dont have to perform much mathematical analytics with data, i decided to to remove graphite. Kibana has a wonder full GUI for visualizing the data. So decided to go ahead with Kibana. One option is “histogram” panel time. Using histogram we can visualize the data via a regular bar graph, as well as using the area graph. There is another panel type called “terms” which can be used to display the agrregated events via pie chart, bar chart, or a table. And below is what i achieved with Kibana.

This is just an inital setup. I’m going to add more events to this. As of now Kibana + Elasticsearch proves to be a promising combination for displaying all near real time events happening in my Infrastructure.

Standard
logstash, Monitoring, Riemann, StatsD

Event Monitoring Using Logstash + StatsD + Riemann

Being an OPS guy, i love Logs a lot. Logs contains lots of sensitive events recorded in it. Though a lot of people rely on monitoring tools, there are a lot of scenario where we still can’t rely on monitoring. In such scenarios, logs are the best sources to identify those events in a near real time fashion. A common scenario is Web Operations, where we need to count the the various 4xx, 5xx, Auth errors experienced by the user’s. I had a simliar requirement where i need to identify the 4xx, 5xx, 6xx errors and other similar failures on various SIP server’s. But apart from just visualising these error’s i also wanted a notification system which can notify me when the value crosses the threshold.

Logstash and StasD is a perfect combination for aggregating events from the logs. StatsD has a Graphite backend, where it sends the aggreagated metric values for visualizing. But when we have large number graphs, and offcourse when being a multi tasking Ops guy, it’s not possible to sit and watch all these graphs. So we need a notification system which alert’s us when things starts breaking. Here comes RIEMANN. Riemann aggregates events from your servers and applications with a powerful stream processing language. Riemann is pretty light weight, easy to configure monitoring framework. Logstash sents the filtered events from the logs to StatsD output plugin. Based on the flushInterval, statsD iterates through the received events sents the aggregated metric values to the Graphite. There is also a Riemann output plugin for Logstash, but we need to pass the state/metric to the plugin. In my case, logstash filters the event from the log, so i need to converts these events to time based metric values. Since statsD already has these events converted into time series metrics, i decided to write a small backend for statsD that can send these aggregated metrics to Riemann.

The StatsD backend basically requires to main functions, one is ”flush_stats” which will get invoked once the flush interval is reached. This function then iterates over the received metrics and passes these aggregated metrics to another function called ”post_stats”, which sends the metrics to the corresponding aplications. In our case, we need to send the metrics to Riemann. There is a Riemann-Node plugin, which we can utilize here for sending the metrics to Riemann server. Below is the content for the ”flush_stats” and ”post_stats” functions. Currently i’ve added support only for counters. Soo i’ll be adding support for Counters and Timers also.

flush_stats function
--------------------

var flush_stats = function riemann_flush(ts, metrics) {
var statString = '';
var numStats = 0;
var key;

var counters = metrics.counters;
var gauges = metrics.gauges;
var timers = metrics.timers;
var pctThreshold = metrics.pctThreshold;

for (key in counters) {
    var value = counters[key];
    var valuePerSecond = value / (flushInterval / 1000); // calculate "per second" rate

    statsString = value;
    service = key;
    time_stamp = ts;
    post_stats(statString, service_name, time_stamp);
}
}; 


post_stats function
-------------------

var post_stats = function riemann_post_metrics(statString, service_name, time_stamp) {

riemannStats.last_exception = Math.round(new Date().getTime() / 1000);

client.send(client.Event({
  service: service_name,
  metric:  statsString,
  time: time_stamp
}));
};

So here i’m not gonna send the per second metrics. I’m using the default 10 sec flushInterval. So every seconds StatsD will send the incremented metrics to Riemann. The namespace, sender etc are defined in the logstash conf itself. The full plugin file is available in here

To use this Riemann backend, first we need to copy this file into the backend folder of the StatsD repo folder. Then we need to enable this plugin in the StatsD config file. Below is a sample config file which uses both graphite and Riemann backends.

{
  riemannPort: 5555
, riemannHost: "localhost"
, graphitePort: 2003
, graphiteHost: "localhost"
, port: 8125
, backends: [ "./backends/riemann", "./backends/graphite" ]
}

So now StatsD will send out the incremented and the per second metric to Graphite and the Riemann backend will send the incremented metric to the Rieman server. No we can define the metric threshold and the notification method on the reimann config file. Below is my reimann metric threshold and notification.

(streams
      (where (>= metric 10)
        (where (service #"SIP")
          (email "deepakmdass88@gmail.com")))))

So whenever the recived metric value is beyond 10, Riemann will notify the same to my Email. I’ve done some dry testing with this setup. So far this setup never turned me down. Though there are some tweaks to be done, but this setup really suited to my requirement. being an OPS guy, my primary focus is to detect the outages at a very early stages to minimize the impact. Hope this guy will be an added defence layer for the same.

Standard
Monitoring, Plivo, Redis, Sensu, Sinatra

Phone Call Notification for Sensu Using Plivo

It’s almost 2 weeks since i’ve joined Plivo’s DevOps family. I was spending a lot of time on understanding their API’s as i’m new to telephony. Plivo’s API made telephony so easy that even a person with basic programming language can built powerfull telephony apps to suit to their infrastructure. So when i started playing around with the API, i decided to take it to a different level. I’m strong lover of Sensu Monitoring tool, so i decided to integrate Plivo with Sensu to built a new Notification system using the Phone call. Many people rely on Pagerduty for the same feature. But that part is completly managed by PagerDuty, where in for the alerts which we sent to PagerDuty, they will notify us via phone call to the phone numbers mentioned on Pager Duty. So i decided to built a similar Handler to Sensu, so that we can have a similar feature on Sensu. In this blog i will explain how i achieved the same with Plivo Framework.

Plivo provides Application Programming Interfaces (APIs) to make and receive calls, send SMS, make a conference call, and more. These APIs are used in conjunction with XML responses to control the flow of a call or a message. Developers can create Session Initiation Protocol (SIP) endpoints to perform the telephony operations and the APIs are platform independent and can be used in any programming environment such as PHP, Ruby, Python, etc. It also provides helper libraries for these programming languages.

Here the Sensu will be initiating outbound calls using Plivo’s Call API, to the required numbers and will read out the Alert using Plivo’s Text to Speech Engine. First we need an account on Plivo Cloud, we can go to SignUP page, and need to create an account. By using the default, we can make test calls. But for longer usage, i will suggest to buy some credits for using the service extensively. Once we login with credentials, at the dashboard we can see Plivo AuthID and Plivo AuthToken, which is required to access Plivo’s API. Now for making out bound calls, we need to use the Call API. We also need to provide an ”answer_url” which contains XML instructions to direct the progress of the call. Plivo will fetch the ”answer_url” and executes te XML instructions. So here i will be using Sinatra to create a web app that will returns the XML. The main challenge is the text to be mentioned in the XML need to retrieved from the alert. So we cannot predefine the text as well as the request url, because it will be in dynamic in nature.

Solution :- Here i’m going to use a publically accessible Redis server which is protected with a password. When Sensu handler receives the alert, it will create a hash from the alert received with a random UUID as the name of the hash. And the same UUID will be used as the request path for the answer_url. And when sinatra recieves a request, by default the requested path wont be existing in the sinatra’s config file, as it’s a dynamically generated. So by default Sinatra will return a 404 response. But in Sinatra, there is an option called ”not_found”, where we can customize the response instead of returning 404 directly. So i will be using the “not_found” option and instead of directly returing a 404, i will make sinatra to talk to my redis server. Since the UUID can be fetched from the request URL, and the same is used in the redis as the name of the Hash, Sinatra can get the details of the alert from the Redis. These details are then used to create the XML and will be returned as the response to Plivo. It will be better to go through Plivo’s API documentation to understand more about the XML responses and outbound calls.

For those who don’t have a dedicated server to host the Sinatra app and Redis server, heroku can be used to host the Sinatra app. We can also use the redis addon availabe at the Heroku for our usage.

Sinatra App

Below is the code for the Sinatra app. The ”not_found” option is the one which performs the connection with the Redis.

require 'plivo'
require 'sinatra'
require 'redis'
require 'rest_client'

get '/' do
  play_loop = 1
  lang = "en-US"
  voice = "WOMAN"
  text = "Congratulations! You just made a text to speech app on Plivo cloud!"
  @path = request.path
  puts @path
  speak_params = {
                     'loop' => play_loop,
                     'language' => lang,
                     'voice' => voice,
                     }

  plivo = Plivo::Speak.new(text, speak_params)

  response = Plivo::Response.new()
  response.add(p)
  return response.to_xml
end

not_found do
  @path = request.path
  keyword = @path[1..-1]
  response = Redis.new(:host => "<redis_host_name>", :port => <redis_port>, :password => "<redis_pass>")
  data_red = response.get("#{keyword}")
  if data_red == nil
         return "404 Not Found"
  else
         data = JSON.parse(response.get("#{keyword}"))
         text = data["text"]
         play_loop = 1
         lang = "en-US"
         voice = "WOMAN"
         speak_params = {
                        'loop' => play_loop,
                        'language' => lang,
                        'voice' => voice,
                        }

         plivo = Plivo::Speak.new(text, speak_params)

         response = Plivo::Response.new()
         response.add(p)
         return r.to_xml
   end
end

The above app will redirect all the non found requests and make a requests to the Redis server using the request.path as the name of the hash. If there is a valid hash table, it will generate a Plivo XML else it will return “404 Not Found”.

Plivo Handler

Below is the Plivo Handler code. this is the Initial code, need to be tweaked to suit to the Sensu’s coding standard. Also the settings option need to be added so that all the parameters can be provided in the JSON file of the Handler. But as of now, this handler has been tested with Sensu and it is working perfectly fine.

require 'rubygems'
require 'sensu-handler'
require 'timeout'
require 'securerandom'
require 'plivo'
require 'redis'
require 'timeout'


class PLIVO < Sensu::Handler

  def short_name
    @event['client']['name'] + '/' + @event['check']['name']
  end

  def action_to_string
    @event['action'].eql?('resolve') ? "RESOLVED" : "ALERT"
  end

  def handle
    plivo_auth_id = "XXXXXXXXXXXXXXXX"          # => available at the plivo dashboard
    plivo_auth_token = "XXXXXXXXXXXXXXXX"       # => available at the plivo dashboard
    body = <<-BODY.gsub(/^ {14}/, '')
            #{@event['check']['output']}
            Host: #{@event['client']['name']}
           BODY
    uuid = SecureRandom.hex
    r = Redis.new(:host => "<redis_host_name>", :port => <redis_port>, :password => "<redis_pass>")
    temp = {
            'text' => "#{body}"
            }
    plivo_number = "<YOUR PLIVO NUMBER"
    to_number = "<DESTINATION NUMBER>"
    answer_url = "<YOUR HEROKU APP URL>/#{uuid}"
    call_params = {
                     'from' => plivo_number,
                     'to' => to_number,
                     'answer_url' => answer_url,
                     'answer_method' => 'GET'
                     }
         r.set "#{uuid}", temp.to_json
         r.expire "#{uuid}", <EXPIRY TTL>
         sleep 5
     begin
       timeout 10 do
         puts "Connecting to Plivo Cloud.."
         plivo = Plivo::RestAPI.new(plivo_auth_id, plivo_auth_token)
     details = plivo.make_call(call_params)
       end
       rescue Timeout::Error
       puts "timed out while attempting to contact Plivo Cloud"
         end
  end
end

The above handler will make an outbound call to the Destination number and plivo’s text to speech engine will read out the Alert summary on the call. Now If you want to call multiple user’s, simply create an array or hash with the contact numbers and iterate over it.

This handler will give us the same feature of Phone Call notification similar to Pager Duty. We just need to pay for the phone call usage. Dedicated to all those who wants an affordable yet a good notification system with Sensu. I will be merging out my code soon to the sensu-community repository.

Standard
logstash, Monitoring, Ubuntu

Real Time Web-Monitoring Using Lumberjack-Logstash-Statsd-Graphite

For the last few days i was playing around with my two of my favourite tools Logstash and StatsD. Logstash, StatsD, Graphite together makes a killer combination. So i decided to test this combination along with Lumberjack for Real time Monitoring. I’m going to use, Lumberjack as the log shipper from the webserver, and then Logstash will stash the log’s porperly and and using the statsd output plugin i will ship the metrics to Graphite. In my previous blog, i’ve explained how to use Lumberjack with Logstash. Lumberjack will be watching my test web server’s access logs.

By default, i’m using the combined apache log format, but it doesnot have the original response time for each request as well as the total reponse time. So we need to modify the LogFormat, in order to add the two. Below is the LogFormat which i’m using for my test setup.

LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\" %D %>D" combined

Once the LogFormat is modified, restart the apache service in order to make the change to be effective.

Setting up Logstash Server

First Download the latest Logstash Jar file from the Logstash site. Now we need to create a logstash conf file. By default there is a grok pattern available for apache log called “COMBINEDAPACHELOG”, but since we have added the tow new fields for the response time, we need to add the same for grok pattern also. So below is a pattern which is going to be used with Logstash.

pattern => "%{COMBINEDAPACHELOG} %{NUMBER:resptime} %{NUMBER:resptimefull}"

So the Logstash conf file will look like this,

input {
      lumberjack {
        type => "apache-access"
        port => 4444
        ssl_certificate => "/etc/ssl/logstash.pub"
        ssl_key => "/etc/ssl/logstash.key"
  }
}

filter {
  grok {
        type => "apache-access"
    pattern => "%{COMBINEDAPACHELOG} %{NUMBER:resptime} %{NUMBER:resptimefull}"
  }
}

output {
  stdout {
    debug => true
      }
  statsd {
    type => "apache-access"
    host => "localhost"
    port => 8125
    debug => true
    timing => [ "apache.servetime", "%{resptimefull}" ]
    increment => "apache.response.%{response}"
  }
}

Setting up STATSD

Now we can start setting up the StatsD daemon. By default, Ubuntu’s latest OS ships with newer verision of NodeJS and NPM. So we can install it using APT/Aptitude.

$ apt-get install nodejs npm

Now clone the StatsD github repository to the local machine.

$ git clone git://github.com/etsy/statsd.git

Now create a local config file “localConfig.js” with the below contents.

{
graphitePort: 2003
, graphiteHost: "127.0.0.1"
, port: 8125
}

Now we can start the StatsD daemon.

$ node /opt/statsd/stats.js /opt/statsd/localConfig.js

The above command will start the StatsD in foreground. Now we can go ahead with setting up the Graphite.

Setting up Graphite

First, let’s install the basic python dependencies.

$ apt-get install python-software-properties memcached python-dev python-pip sqlite3 libcairo2 libcairo2-dev python-cairo pkg-config

Then, we can start installing Carbon and Graphite dependencies.

        cat >> /tmp/graphite_reqs.txt << EOF
        django==1.3
        python-memcached
        django-tagging
        twisted
        whisper==0.9.9
        carbon==0.9.9
        graphite-web==0.9.9
        EOF

$  pip install -r /tmp/graphite_reqs.txt

Now we can configure Carbon.

$ cd /opt/graphite/conf/

$ cp carbon.conf.example carbon.conf

Now we need to create a storage schema.

        cat >> /tmp/storage-schemas.conf << EOF
        # Schema definitions for Whisper files. Entries are scanned in order,
        # and first match wins. This file is scanned for changes every 60 seconds.
        # [name]
        # pattern = regex
        # retentions = timePerPoint:timeToStore, timePerPoint:timeToStore
        [stats]
        priority = 110
        pattern = ^stats\..*
        retentions = 10s:6h,1m:7d,10m:1y
        EOF


$ cp /tmp/storage-schemas.conf /opt/graphite/conf/storage-schemas.conf

Also we need to create a log directory for graphite.

$ mkdir -p /opt/graphite/storage/log/webapp

Now we need to copy over the local settings file and initialize database

$ cd /opt/graphite/webapp/graphite/

$ cp local_settings.py.example local_settings.py

$ python manage.py syncdb

Fill in the necessary details including the super user details while initializing the database. Once the database is initialized we can start the carbon cache and graphite webgui.

$ /opt/graphite/bin/carbon-cache.py start

$ /opt/graphite/bin/run-graphite-devel-server.py /opt/graphite

Now we can access the dashboard using the url, “http://ip-address:8080&#8221;. Once we have started the carbon cache, we can start the Logstash server.

$ java -jar logstash-1.1.13-flatjar.jar agent -f logstash.conf -v

Once the logstash has loaded all the plugins successfully, we can start shipping logs from the test webserver using Lumberjack. Since i’ve enabled the STDOUT plugin, i can see the output coming from the Logstash server. Now we can start accessing the real time graph’s from graphite gui. There are several other alternative for the Graphite GUI like Graphene, Graphiti, Graphitus, GDash. Anyways Logstash-StatsD-Graphite proves to be a wonderfull combination. Sorry that i could not upload any screenshot for now, but i will upload soon

Standard