Monitoring, NodeJS, Sensu

UCHIWA – Awesome Dashboard for Sensu

It’s been more than a year since i’ve started playing with Sensu. It was one of coolest monitoring projects that i’ve ever worked with. Perfect for Cloud infrastructure and backed by a cool community. Though Sensu is still in Juvenile (v 0.12) state, its mature enough to tackle majority of the monitoring issues. Recently i started migrating all our monitoring from traditional Nagios to Sensu and the pretty cool thing with sensu is, we can directly use the Nagios plugins with Sensu. That’s an easy task for migration. We don’t have to rebuild all the check’s to make with Sensu. But most of the people outside was pointing to the sensu’s default dashboard. Though the dashboard doesn’t looks pretty fancy, it can do all the functions. But having a good dashboard which can display the current status is always a time saver.

So i was being searching for a good dashboard and the first choice on the google search was Sensu-Admin. A Rails project, which needs a backend DB. But still i was not satisfied with it, and i started looking out for something different. The second choice was sabisu. Sabisu uses Cloudant’s hosted Couchdb with Lucene. We just need to store all the events in a Redis List and a custom script which reads the data from the Redis List and pushes it to the Cloudant’s CouchDB. So we basically need a Cloudant account and the Webapp makes Lucene queries to the Cloudant DB and displays the results on the Sabisu dashboard. Though i tried to rebuild the same setup locally, like running a CouchDB+Lucene locally and sending the same data to the local couchdb. With some codehack’s i was able to make the webapp talk to my local CouchDB and display the results on the dashboard.

But then, I found a super cool dashboard project called UCHIWA which was started in Github a few days back by Simon Plourde. Uchiwa is simple dashboard built with NodeJS and uses SocketIO for real time updates. The screenshot’s looks super cool and i decided to give it a try. It has only one dependency, NodeJS, no backend DB’s required as it talks to the Sensu’s API in realtime.

Setting Up NodeJS

For Ubuntu, we can use the chris-lea’s PPA.

apt-get install python-software-properties        # required for "apt-add-repository" binary

apt-add-repository ppa:chris-lea/node.js

apt-get update

apt-get install nodejs

now we have the latest NodeJS on our system, we can start setting up Uchiwa.

Setting Up Uchiwa Dashboard

Uchiwa’s source is available in Github.

git clone https://github.com/palourde/uchiwa.git

Once cloned, the repository contains the “package.json” file which contains the list of necessary dependencies. We can use “npm” (node package manager) to install all these.

cd uchiwa

npm install

Now we need to create a config file for the app. There is a sample config file available in the repo. So we need to mention the Sensu’s API IP and Port number and also the auth credentials if any. Plus auth credentials for accessing Uchiwa Dashboard page.

once all these are set, we are done. We just need to start the service.

node app.js 

We can access the page via http://localhost:3000/, or we can proxy pass from the webserver. Instructions for Nginx is available on the Readme of the project. Now we need to keep this app running all the time. So it’s better to create a init/upstart process for the same, so that the process will start automatically when the system reboots. There is a cool Node project called forever which is a simple CLI tool for ensuring that a given script runs continuously.

I’ve created an upstart script for Uchiwa, bu putting a conf file “uchiwa.conf” in the “/etc/init” directory. Below is the content for the conf file. Once the file is in place, we have to do a reload of the upstart configuration. initctl reload-configuration will do the trick.

description "uchiwa - dashboard for sensu"
env APP_PATH = "/usr/local/uchiwa/"

start on startup
stop on shutdown

script
  cd $APP_PATH
  exec forever start app.js
end script

Uchiwa looks pretty cool and neat and it has the stash support also. There are couple of addons required like “Downtime”, but Uchiwa is a pretty new project and i’m sure that this project is gonna grow soon. It has already received 99 stars on the Github. Kudos to Simon Plourde for Open Sourcing this awesome project.

Advertisements
Standard
Elasticsearch, Kibana, logstash, Monitoring, Plivo, SIP, Ubuntu, Voip

Extending ELK Stack to VOIP Infrastructure

Being a DevOps guy, i always love metrics. Visualized metrics gives a good picture of what’s happening in our live battle stations. There are now a quite lot of Open Source tools for monitoring and visualizing. It’s more than a year since i’ve started using Logstash. It never turned me down. ElasticSearch-Logstash-Kibana (ELK) is a killer combination. Though i started Elasticsearch + Logstash as a log analyzer, later StatsD and Graphite took it to the next level. When we have a simple infrastructure it’s easy to monitor. But when the infra starts scaling, it becomes quite difficult to keep track of all the events happening inside each nodes. Though service checks can help, but there is still limitation for it. I faced a lot of scenarios where things breaks but service checks will be fine. Under such scenarios logs are the only hope. They have all these events captured.

At Plivo, we manage a variety of servers from SIP, Media, Proxy, WebServers, DB’s etc. Being a fully Cloud based system, i really wanted to have a system which can keep track of all the live events/status of what’s really happening inside our infra. So my plan was to collect two important stats, 1) Server’s events 2) Application events.

Collectd and Logstash

Collectd is a daemon which collects system performance statistics periodically. Since we have a lot Server’s which handle Realtime Media, it’s a very critical component for us. We need to ensure that the server’s are not getting overloaded and there is no latency in network. I’ve been using Logstash heavily for stashing all my logs. And there is a stable input plugin for collectd to send the all the system metrics to logstash.

First we need to enable the Network Plugin, and then we need to mention our Logstash server IP and port so that collectd can start injecting metrics. Below is a sample colectd configuration.

Hostname    "test.plivo.com"
Interval 10
Timeout 4
Include "/etc/collectd/filters.conf"
Include "/etc/collectd/thresholds.conf"
ReportStats true
    LogLevel info
LoadPlugin interface
LoadPlugin load
LoadPlugin memory
LoadPlugin network
<Plugin interface>
    Interface "eth0"
    IgnoreSelected false
</Plugin>
<Plugin network>
    Server "{logstash_server_ip}" "logstash_server_port"    # if no port number is mentioned, it will take the default port number (25826)
</Plugin>

Now on the Logstash server, we need to add the CollectD plugin on to the input filter in the logstash’s config file.

input {
      collectd {
      port => "5555"    # default port is 25826
      }
}

Now we are set. Based the plugins enabled in the collectd config file, collctd will start sending the metrics to Logstash on the Interval mentioned in the config, default is 10s. So in my case, i wanted the Load, CPU usage, Memory usage, Bandiwdth (TX and RX) etc. There are default plugins for all these metrics, which we can just enable it in the config file. We also had some custom plugins to collect some custom metrics. BTW writing custom plugin is pretty easy in Collectd.

Now using the Logstash’s Elasticsearch output plugin, we can keep these metrics in Elasticsearch. Now this where Kibana comes in. We can start visualizing these metrics via Kibana. We need to create a custom Lucene Query. Once we have the query, we can create a custom histogram’s for each of these queries. Below aresome sample Lucene queries that we can use with Kibana.

For Load -> collectd_type:"load" AND host:"test.plivo.com"
For Network usage -> collectd_type:"if_octets" AND host:"test.plivo.com"

Below is the screenshot of histogram for Load and Network (TX and RX)

Log Events

Now next is to collect the events from the application logs. We use SIP protocol for all our VOIP sessions. So all our SIP server’s are very critical for us. SIP is pretty similar to HTTP. The response codes are very similar to HTTP responses, ie 1xx, 2xx, 3xx, 4xx, 5xx, 6xx. So i wrote some custom grok patterns so keep track of all of these responses and stores the same on the Elasticsearch.

The second stats which i was interested was our SIP registrar server. We provide SIP endpoints to our customers so that they can use the same with SIP/Soft phones. So i was more interested on stats like Number of registrations/sec, Auth error rates. Plus using ElasticSearch’s MAP facet’s i can create BetterMap. In my previous blog post’s i’ve mentioned on how to create these bettermaps using Kibana and Elasticsearch. Below bettermap screenshot shows us the SIP endpoint registrations from various locations in the last 2 hours.

Now using the Kibana we can start visualizing all these data’s. Below is a sample of Dashboard that i’ve created using Kibana.

ELK stack proved to be an amazing combination. We are currently injecting 3 million events every day and ElasticSearch was blazingly fast in indexing all theses.

Standard