confd, Docker, etcd, serf

Scaling HA-Proxy with Docker, ConfD, Serf, ETCD

Containers, Microservices are the hottest topics in the IT world now. Container technology like LXC, Docker, rkt etc are being used heavily in production now. It's easy to use these technologies on stateless services, so even if a container is crashed, a new container can be launched within seconds and can start serving traffic. Containers are widely used for serving web traffics. Tools like Marathon, provides a HA-Proxy based service discovery, wherein it reloads the HA-Proxy service based on the start/stop of the containers in the Mesos cluster. A simplest similar method is to use etcD with Docker and Haproxy.

With this we can apply the same logic here also ie, when a container is launched, we can save the IP and Port that is being mapped from the HOST machine and the same can be updated dynamically to HA-Proxy config file using ConfD and reload the HA-Proxy service. Now imagine if a container is terminated due to some reason, the HA-Proxy host check will fail, but the entry will be still there in the haproxy.cfg. Since we will be launching new containers either manually or using frameworks like Marathon/Helios etc, there is no need to to keep the old entries in the haproxy.cfg as well as in the ETCD.

But we need a way to keep track of the containers which got terminated, so that we can remove those entries. Enter serf, Serf is a decentralized solution for cluster membership, failure detection, and orchestration. Serf relies on an efficient and lightweight gossip protocol to communicate with nodes. The Serf agents periodically exchange messages with each other in much the same way that a zombie apocalypse would occur: it starts with one zombie but soon infects everyone. Serf is able to quickly detect failed members and notify the rest of the cluster. This failure detection is built into the heart of the gossip protocol used by Serf.

We will be running Serf agents on each container. These serf clusters are isolated ie, we can control the joining members, thereby maintaining separate serf clusters. So if a Container goes down, other agents will receive this member-failed event, and we can define an event handler that can preform the key removal from the etcD which in turn invoke the confD. Then ConfD will remove the server entry from the HA-Proxy config file.

Now let's see this in action ;), i'm using Ubuntu 14.04 as the Base Host

Setting up HA-Proxy and Docker Web Containers

Let's install HA-Proxy on the base Host,

$ apt-get install haproxy

Enable the Ha-Proxy service in /etc/default/haproxy and start the service

$ /etc/init.d/haproxy start

Now install Docker container,

$ curl -sSL | sh

Once Docker is installed, lets pull an Ubuntu image from the Docker Registry

$ docker pull ubuntu:14.04

Setting up ETCD

Download the latest version of EtcD

$ curl -L -o etcd-v2.1.3-linux-amd64.tar.gz

$ tar xvzf etcd-v2.1.3-linux-amd64.tar.gz

Let's start the EtcD services,

$ ./etcd -name default --listen-peer-urls "," --listen-client-urls "," --advertise-client-urls "," --initial-advertise-peer-urls "," --initial-cluster 'default=,default='

where is my Host machine IP

Setting up ConfD

Let's setup the ConfD service. For ConfD, we basically need two files, 1) a toml file which contains the info like which keys to lookup in etcd, which etcd cluster lookup, path to the template file etc… and 2) a tmpl (template) file

Download the latest version of confd

$ wget

$ cp -rvf confd-0.10.0-linux-amd64 /usr/local/bin/

$ mkdir -p /etc/confd/{conf.d,templates}

Now lets create the tmpl and toml file

toml: /etc/confd/conf.d/myconfig.toml

src = "haproxy.cfg.tmpl"
dest = "/etc/haproxy/haproxy.cfg"
keys = [
reload_cmd = "/etc/init.d/haproxy reload"

tmpl: /etc/confd/templates/haproxy.cfg.tmpl

    log /dev/log    local0
    log /dev/log    local1 notice
    chroot /var/lib/haproxy
    user haproxy
    group haproxy

    log    global
    mode    http
    option    httplog
    option    dontlognull
        contimeout 5000
        clitimeout 50000
        srvtimeout 50000
    errorfile 400 /etc/haproxy/errors/400.http
    errorfile 403 /etc/haproxy/errors/403.http
    errorfile 408 /etc/haproxy/errors/408.http
    errorfile 500 /etc/haproxy/errors/500.http
    errorfile 502 /etc/haproxy/errors/502.http
    errorfile 503 /etc/haproxy/errors/503.http
    errorfile 504 /etc/haproxy/errors/504.http

frontend localnodes
    bind *:80
    mode http
    default_backend nodes

backend nodes
    mode http
    balance roundrobin
    option forwardfor
{{range gets "/proxy/frontend/*"}}
    server {{base .Key}} {{.Value}} check

Start the service in foreground by running the below command,

$ confd -backend etcd -node --log-level="debug" -interval=10  # confd will perform lookup for every 10sec

For now, there is no key present in the ETCD, ConfD will perform md5sum lookup of the current file and expected state of the file, if there is any change it will perform updating the file. Below are the debug logs from the ConfD service

2015-09-07T07:09:48Z vagrant-ubuntu-trusty-64 ./confd[14378]: INFO Backend set to etcd
2015-09-07T07:09:48Z vagrant-ubuntu-trusty-64 ./confd[14378]: INFO Starting confd
2015-09-07T07:09:48Z vagrant-ubuntu-trusty-64 ./confd[14378]: INFO Backend nodes set to
2015-09-07T07:09:48Z vagrant-ubuntu-trusty-64 ./confd[14378]: DEBUG Loading template resources from confdir /etc/confd
2015-09-07T07:09:48Z vagrant-ubuntu-trusty-64 ./confd[14378]: DEBUG Loading template resource from /etc/confd/conf.d/myconfig.toml
2015-09-07T07:09:48Z vagrant-ubuntu-trusty-64 ./confd[14378]: DEBUG Retrieving keys from store
2015-09-07T07:09:48Z vagrant-ubuntu-trusty-64 ./confd[14378]: DEBUG Key prefix set to /
2015-09-07T07:09:48Z vagrant-ubuntu-trusty-64 ./confd[14378]: DEBUG Using source template /etc/confd/templates/haproxy.cfg.tmpl
2015-09-07T07:09:48Z vagrant-ubuntu-trusty-64 ./confd[14378]: DEBUG Compiling source template /etc/confd/templates/haproxy.cfg.tmpl
2015-09-07T07:09:48Z vagrant-ubuntu-trusty-64 ./confd[14378]: DEBUG Comparing candidate config to /etc/haproxy/haproxy.cfg
2015-09-07T07:09:48Z vagrant-ubuntu-trusty-64 ./confd[14378]: INFO /etc/haproxy/haproxy.cfg has md5sum aa00603556cb147c532d5d97f90aaa17 should be fadab72f5cef00c13855a27893d6e39c
2015-09-07T07:09:48Z vagrant-ubuntu-trusty-64 ./confd[14378]: INFO Target config /etc/haproxy/haproxy.cfg out of sync
2015-09-07T07:09:48Z vagrant-ubuntu-trusty-64 ./confd[14378]: DEBUG Overwriting target config /etc/haproxy/haproxy.cfg
2015-09-07T07:09:48Z vagrant-ubuntu-trusty-64 ./confd[14378]: DEBUG Running /etc/init.d/haproxy reload
2015-09-07T07:09:48Z vagrant-ubuntu-trusty-64 ./confd[14378]: DEBUG " * Reloading haproxy haproxy\n   ...done.\n"

Configure the Docker Web Containers

Since in the end, our Containers will be running on a different host, HA-Proxy needs to know the HOST IP where the containers resides. If the host IP has a Public IP we can easily perform a lookup and get the public ip, in my current scenario, im running in a Vagrant VM, so i'll be passing the HOST IP via Docker ENV variable during startup. If we do a strict port mapping like 80:80 or 443:443, we wont be able to run more than one Web container in a single HOST. So if we go with random ports, our HA-Proxy needs to know these random ports. These port info will be stored in the ETCD key store, so ConfD can utilize the same.

let's start our First container,

  $ docker run --rm=true -it -p 9001:80 -p 8010:7373 -p 8011:5000 -h=web1 -e "FD_IP=" -e "FD_PORT=9001" -e "SERF_RPC_PORT=8010" -e "SERF_PORT=8011" ubuntu:apache

Where port 9000 is mapped to containers port 80, and port 8010 and 8011 for the Serf Clients, FD_IP is the host IP

Now, lets setup the services for the container

  $ apt-get update && apt-get install apache2

  $ wget

  $ unzip

Let's check the serf in foreground on a screen/tmux

$ ./serf agent -node=web2 -bind= -rpc-addr= -log-level=debug

If the Serf agent is starting fine, then lets go ahead and create an event handler for the event member-failed, which happens when a container is crashed. Below is a simple event handler script
import sys
import os
import requests
import json

base_url = ""   # Use any config mgmt tool or user-data script to populate this value for each host
base_key = "/v2/keys"
cluster_key = ["/proxy/frontend/", "/proxy/serf/", "/proxy/serf-rpc/"]    # EtcD keys that has to be removed

for line in sys.stdin:
    event_type = os.environ['SERF_EVENT']
    if event_type == 'member-failed':
        address = line.split('\t')[0]
        for key in cluster_key:
            full_url = base_url + base_key + key + address
            print full_url
            r = requests.get(full_url)
            if r.status_code == 200:
                r = requests.delete(full_url)
                print "Key Successfully removed from ETCD"
            if r.status_code == 404:
                print "Key already removed by another Serf Agent"

Also we need a bootstrapping script that can add the necessary keys onto ETCD, when the container has started for the first time. This script has to be invoked every during the starting of the container, so ConfD is aware of the new Containers that have started. Below is a simple shell script for the same,
curl -s -X PUT "$HOST" -d value=$ETCD_VAL > /dev/null
exit_stat=`echo $?`
if [ $exit_stat == 0 ]; then
  curl -s -X POST --data-urlencode 'payload={"channel": "#docker", "username": "etcdbot", "text": "`Host: '"$HOST"'`, FD key has been added successfully to ETCD cluster", "icon_emoji": ":ghost:"}'
  curl -s -X POST --data-urlencode 'payload={"channel": "#docker", "username": "etcdbot", "text": "`Host: '"$HOST"'`, Failed to add FD key to the ETCD cluster", "icon_emoji": ":ghost:"}'
curl -s -X PUT "$HOST" -d value=$ETCD_SERF_PORT > /dev/null
exit_stat=`echo $?`
if [ $exit_stat == 0 ]; then
  curl -s -X POST --data-urlencode 'payload={"channel": "#docker", "username": "etcdbot", "text": "`Host: '"$HOST"'`, SERF PORT key has been added successfully to ETCD cluster", "icon_emoji": ":ghost:"}'
  curl -s -X POST --data-urlencode 'payload={"channel": "#docker", "username": "etcdbot", "text": "`Host: '"$HOST"'`, Failed to add SERF PORT key to the ETCD cluster", "icon_emoji": ":ghost:"}'
curl -s -X PUT "$HOST" -d value=$ETCD_SERF_RPC_PORT > /dev/null
exit_stat=`echo $?`
if [ $exit_stat == 0 ]; then
  curl -s -X POST --data-urlencode 'payload={"channel": "#docker", "username": "etcdbot", "text": "`Host: '"$HOST"'`, SERF RPC PORT key has been added successfully to ETCD cluster", "icon_emoji": ":ghost:"}'
  curl -s -X POST --data-urlencode 'payload={"channel": "#docker", "username": "etcdbot", "text": "`Host: '"$HOST"'`, Failed to add SERF RPC PORT key to the ETCD cluster", "icon_emoji": ":ghost:"}'

Now lets add the necessary keys to ETCD,

$ bash /opt/scripts/

Once the keys are added, lets start the serf agent wit the event handler script,

$ ./serf agent -node=web1 -bind= -rpc-addr= -log-level=debug -event-handler=/opt/scripts/

# Script output
==> Starting Serf agent...
==> Starting Serf agent RPC...
== > Serf agent running!
      Node name: 'web1'
      Bind addr: ''
        RPC addr: ''
      Encrypted: false
        Snapshot: false
        Profile: lan

==> Log data will now stream in as it occurs:

  2015/09/07 07:37:12 [INFO] agent: Serf agent starting
  2015/09/07 07:37:12 [INFO] serf: EventMemberJoin: web1
  2015/09/07 07:37:13 [INFO] agent: Received event: member-join

Now we have a running Web Container, since we have added the keys to ETCD, lets verify that ConfD has detected the changes. Below is the logs from the ConfD,

2015-09-07T07:40:17Z vagrant-ubuntu-trusty-64 ./confd[14970]: DEBUG Retrieving keys from store
2015-09-07T07:40:17Z vagrant-ubuntu-trusty-64 ./confd[14970]: DEBUG Key prefix set to /
2015-09-07T07:40:17Z vagrant-ubuntu-trusty-64 ./confd[14970]: DEBUG Using source template /etc/confd/templates/haproxy.cfg.tmpl
2015-09-07T07:40:17Z vagrant-ubuntu-trusty-64 ./confd[14970]: DEBUG Compiling source template /etc/confd/templates/haproxy.cfg.tmpl
2015-09-07T07:40:17Z vagrant-ubuntu-trusty-64 ./confd[14970]: DEBUG Comparing candidate config to /etc/haproxy/haproxy.cfg
2015-09-07T07:40:17Z vagrant-ubuntu-trusty-64 ./confd[14970]: INFO /etc/haproxy/haproxy.cfg has md5sum fe4fbaa3f5782c7738a365010a5f6d48 should be fadab72f5cef00c13855a27893d6e39c
2015-09-07T07:40:17Z vagrant-ubuntu-trusty-64 ./confd[14970]: INFO Target config /etc/haproxy/haproxy.cfg out of sync
2015-09-07T07:40:17Z vagrant-ubuntu-trusty-64 ./confd[14970]: DEBUG Overwriting target config /etc/haproxy/haproxy.cfg
2015-09-07T07:40:17Z vagrant-ubuntu-trusty-64 ./confd[14970]: DEBUG Running /etc/init.d/haproxy reload
2015-09-07T07:40:17Z vagrant-ubuntu-trusty-64 ./confd[14970]: DEBUG " * Reloading haproxy haproxy\n   ...done.\n"
2015-09-07T07:40:17Z vagrant-ubuntu-trusty-64 ./confd[14970]: INFO Target config /etc/haproxy/haproxy.cfg has been updated

Also, lets confirm that the haproxy.cfg file has been updated,

# Updated part from the file
backend nodes
    mode http
    balance roundrobin
    option forwardfor

    server web1 check    # ConfD has updated the server entry based on the key present in the EtcD

Now let's add the second Web container using the steps we followed for the first container, make sure that the ports are updated on the run script (For the second container, im using 9001 as the web port, 8020/8021 as the Serf ports). Once the bootstrapping scrip is executed successfully, lets verify if the haproxy.cfg file is update with the second entry.

backend nodes
    mode http
    balance roundrobin
    option forwardfor

    server web1 check

    server web2 check

For now, the serf agents on both the containers are running as a standalone service and they are not aware of each other. Lets execute serf join command to join the cluster. Below is a simple python script for the same.

import sys
import os
import etcd
import random
import subprocess

node_keys = []
client = etcd.Client(host='', port=4001)

r ='/proxy/serf-rpc', recursive = True)
for child in r.children:
node_key = (random.choice(node_keys))   # Pick one server randomly
remote_serf =
print remote_serf.value
local_serf =  os.environ['FD_IP'] + ":" + os.environ['SERF_PORT']
serf_args = "-rpc-addr=%s %s" %(remote_serf.value, local_serf)
print["/serf", "join", "-rpc-addr=%s" %remote_serf.value, local_serf])

Run the join script on the second container and check for the events on the first container's serf agent logs

$ /opt/scripts/

Below are the event logs on the first container,

2015/09/07 07:53:02 [INFO] agent.ipc: Accepted client:         # Connection from Web2
2015/09/07 07:53:02 [INFO] agent: joining: [] replay: false
2015/09/07 07:53:02 [DEBUG] memberlist: Initiating push/pull sync with:
2015/09/07 07:53:02 [INFO] serf: EventMemberJoin: web2
2015/09/07 07:53:02 [INFO] agent: joined: 1 nodes
2015/09/07 07:53:02 [DEBUG] serf: messageJoinType: web2

Now both the agents are connected, lets verify the same

./serf members     # we can run this command on any one of the containers

web2  alive
web1  alive

Our Serf Cluster is up and running. Now lets terminate one container and see if Serf is able to detect the event. Once the serf detects the event it will call the event_handler script which should remove the keys from the etcD. In this case, i'm terminating the second container. We could see the event popping up in the first container's serf agent logs.

2015/09/07 07:57:37 [INFO] memberlist: Suspect web2 has failed, no acks received
2015/09/07 07:57:39 [INFO] memberlist: Suspect web2 has failed, no acks received
2015/09/07 07:57:41 [INFO] memberlist: Suspect web2 has failed, no acks received
2015/09/07 07:57:42 [INFO] memberlist: Suspect web2 has failed, no acks received
2015/09/07 07:57:42 [INFO] memberlist: Marking web2 as failed, suspect timeout reached
2015/09/07 07:57:42 [INFO] serf: EventMemberFailed: web2
2015/09/07 07:57:42 [INFO] serf: attempting reconnect to web2
2015/09/07 07:57:43 [INFO] agent: Received event: member-failed
2015/09/07 07:57:43 [DEBUG] agent: Event 'member-failed' script output:
Key Successfully removed from ETCD
Key Successfully removed from ETCD
Key Successfully removed from ETCD

As per the event_handler script output, the key has been successfully removed from the ETCD. Lets check the confd logs

# Confd logs
2015-09-07T07:57:47Z vagrant-ubuntu-trusty-64 ./confd[14970]: DEBUG Retrieving keys from store
2015-09-07T07:57:47Z vagrant-ubuntu-trusty-64 ./confd[14970]: DEBUG Key prefix set to /
2015-09-07T07:57:47Z vagrant-ubuntu-trusty-64 ./confd[14970]: DEBUG Using source template /etc/confd/templates/haproxy.cfg.tmpl
2015-09-07T07:57:47Z vagrant-ubuntu-trusty-64 ./confd[14970]: DEBUG Compiling source template /etc/confd/templates/haproxy.cfg.tmpl
2015-09-07T07:57:47Z vagrant-ubuntu-trusty-64 ./confd[14970]: DEBUG Comparing candidate config to /etc/haproxy/haproxy.cfg
2015-09-07T07:57:47Z vagrant-ubuntu-trusty-64 ./confd[14970]: INFO /etc/haproxy/haproxy.cfg has md5sum fadab72f5cef00c13855a27893d6e39c should be df9f09eb110366f2ecfa43964a3c862d
2015-09-07T07:57:47Z vagrant-ubuntu-trusty-64 ./confd[14970]: INFO Target config /etc/haproxy/haproxy.cfg out of sync
2015-09-07T07:57:47Z vagrant-ubuntu-trusty-64 ./confd[14970]: DEBUG Overwriting target config /etc/haproxy/haproxy.cfg
2015-09-07T07:57:47Z vagrant-ubuntu-trusty-64 ./confd[14970]: DEBUG Running /etc/init.d/haproxy reload
2015-09-07T07:57:47Z vagrant-ubuntu-trusty-64 ./confd[14970]: DEBUG " * Reloading haproxy haproxy\n   ...done.\n"
2015-09-07T07:57:47Z vagrant-ubuntu-trusty-64 ./confd[14970]: INFO Target config /etc/haproxy/haproxy.cfg has been updated

# haproxy.cfg file

backend nodes
mode http
balance roundrobin
option forwardfor

server web1 check

Bingo 🙂 , confd has detected the key removal and it has updated the haproxy.cfg file :). We can even remove the bootstrap script and can use Serf event handlers to populate the key when the member-join event is triggered

With ConfD, Serf, ETCD we can keep our service config files up to date with any human intervention. No matter if we scale up or scale down our Docker containers, our system will get updated automatically :). Put an option to log the events onto Slack and that's it, we wont miss any events and our whole team can keep track of what's happening under the hood

Docker, Jenkins

Virtual Cluster Testing Using Jenkins and Docker

Nowadays CI or Conitnous Integration is being implemented in almost all IT companies. Many of the DevOps work’s are in related to the CI. The common scenario is, Developers push the codes to the GIT/SVN repo and triggers jenkins to perform tests and sometimes packaging, and if it’s a fuly automated system the new changes are deployed to the staging. And the QA team takes over the testing part. But when you are in small team, all these has to be achieved with the minimal team. So before the new change is completely pushed to staging, i decided to have a simple testing of all the components quickly. I read about blogs where many DevOps engineers spins up new instances like a full replica of their entire architecture and performs the new code deployment and load test on this new cluster and if all the components are behaving properly with the new code change, it’s then further deployed to Staging for next level of full scale QA.

Though the above step seems to be interesting, i didn’t want to waste up resources by spinnig up a new set of instances each time. Being a hardcore Docker fan, i decided to replace the instance lauch iwth Docker containers. So instead of launching ne instances, Jenkins will launch new Docker containers with SDN(Software Defined Network). Below is simple architecture diagram of my new design.

So the work flow goes like this,

1) Developers pushes the new code changes along with the new Tag to the corresponding Repositories.

2) Github webhook then triggers jenkins to start the Build jobs.

3) Jenkins performs the build and if the build succeeds, jenkins triggers Debian pacakging for the application.

4) Once the packaging is completed, Jenkins will trigger Docker image creation for the corresponding application using the newly build packages.

5) Once the image build is completed, Jenkins uses Docker Compose to build our Virtual clusters which is an exact replica of our Prod/Staging.

6) Once the cluster is up, we perform automated testing of all our components and makes sure that the components are behaving normally with the new code changes.

Now once the test results are normal, we can initiate the code deployment to staging and can start the full scale QA.

By using Docker, i was able to reduce the resource usage. All these containers are running on a Single M3.Medium box. Sice i’m concentrating more on the components working part and not on the load test side, with this smaller box i was able to achieve my results properly.

A bit about docker-compose. I’m using docker-compose for managing the docker cluster. Compose is a tool for defining and running complex applications with Docker. With Compose, we can define a multi-container application in a single file, then spin our applications up in a single command which does everything that needs to be done to get it running. Below is my docker-compose yml file content.

    image:        web:latest
      - redis
      - "8080:80"
      - ENV1
          - ENV2
    image:        my_redis:latest
      - ""
    image:        my_backend:latest
    net:          "host"

From the initial test results, i was very much satisfied. Now i’m planning to extend this setup to next level including a fully automated load test.

Docker, FreeSwitch, sipp, sippy_cup

Sippy_cup – FreeSwitch Load Test Simplified

Ever since the entry of Docker, everyone is busy porting their applications to Docker Containers. Now with the tools like Mesos, CoreOS etc we can easily achieve scalability also. @Plivo we always dedicate ourselves to play around such new technologies. In my previous blog posts, i’ve explained how to containerize the Freeswitch, how to perform some basic load test using simple dialplans etc. My previous load tests required a bunch of basic Freeswitch servers to originate calls to flood the calls to the FreeSwitch container. So this time i’m going to use a simple method, which everyone can use even from their laptops.

Enter SIPp. SIPp is a free Open Source test tool / traffic generator for the SIP protocol. But the main issue for beginer like me is in generating a proper XML for SIPp that can match to my exact production scenarios. After googling, i came across a super simple ruby wrapper over SIPp called sippy_cup. SIPpy_cup is a simple ruby wrapper over SIPp. We just need to create a simple yaml file and sippy_cup parses this yml file and generates the XML equivalent which will be then used to generate calls. sippy_cup can also be used to generate only the XML file for SIPp.

Setting up sippy_cup is very simple. There are only two dependencies

      1) ruby (2.1.2 recomended)
      2) SIPp

Another important dependency is our local internet bandwidth. Flooding too many calls will definitely result in network bottlenecks, which i faced when i generated 1k calls from my laptop. Now let’s install SIPp.

sudo apt-get install pcaputils libpcap-dev libncurses5-dev

wget ''

tar zxvf sipp.svn.tar.gz

# compile sipp

# compile sipp with pcapplay support
make pcapplay

Once we have installed SIPp and ruby, we can install sippy_cup via ruby gems.

gem install sippy_cup

Configuring sippy_cup

First we need to create yml file for our call flow. There is a good documentation available on the Readme on various options that can be used to create the yml to suit to our call flow. My call flow is pretty simple, i’ve a DialPlan in my Docker FS, which will play an mp3 file. So below is a simple yml config for this call flow

source: <local_machine_ip>
destination: <docker_fs_ip>:<fs_port>
max_concurrent: <no_of_concurrent_calls>
calls_per_second: <calls_per_second>
number_of_calls: <total_no_of_calls>
to_user: <to_number>            # => should match the FS Dialplan
steps:                      # call flow steps
  - invite                  # Initial Call INVITE
  - wait_for_answer         # Waiting for Answer, handles 100, 180/183 and finally 200 OK
  - ack_answer              # ACK for the 200 OK
  - sleep 1000              # Sleeps for 1000 seconds
  - send_bye                # Sends BYE signal to FS

Now let’s run sippy_cup using our config yml

sippy_cup -r test.yml

Below is the output of a sample load test. Total 20 calls with 10 concurrent calls

         INVITE ---------->      20        1         0
         100 <----------         20        0         0         0
         180 <----------         0         0         0         0
         183 <----------         0         0         0         0
         200 <----------  E-RTD1 20        0         0         0
         ACK ---------->         20        0
              [ NOP ]
         Pause [    30.0s]       20                            0
         BYE ---------->         20        0
------------------------------ Test Terminated --------------------------------

----------------------------- Statistics Screen ------- [1-9]: Change Screen --
  Start Time             | 2014-10-22   19:12:40.494470 1414030360.494470
  Last Reset Time        | 2014-10-22   19:13:45.355358 1414030425.355358
  Current Time           | 2014-10-22   19:13:45.355609 1414030425.355609
  Counter Name           | Periodic value            | Cumulative value
  Elapsed Time           | 00:00:00:000000           | 00:01:04:861000
  Call Rate              |    0.000 cps              |    0.308 cps
  Incoming call created  |        0                  |        0
  OutGoing call created  |        0                  |       20
  Total Call created     |                           |       20
  Current Call           |        0                  |
  Successful call        |        0                  |       20
  Failed call            |        0                  |        0
  Response Time 1        | 00:00:00:000000           | 00:00:01:252000
  Call Length            | 00:00:00:000000           | 00:00:31:255000
------------------------------ Test Terminated --------------------------------

I, [2014-10-22T19:13:45.357508 #17234]  INFO -- : Test completed successfully!

I tried to perform a large scale load test by making 1k calls with 250 concurrent calls. My local internet was flooding with network traffic as there was real Media packets coming from the servers, though it bottlenecked my internet, but still i was able to make 994 successfull calls. I suggest to do such heavy load test on machines wich has good network throughput. Below are the output for this test.

------------------------------ Scenario Screen -------- [1-9]: Change Screen --
  Call-rate(length)   Port   Total-time  Total-calls  Remote-host
   5.0(0 ms)/1.000s   8836     585.61 s         1000

  Call limit reached (-m 1000), 0.507 s period  1 ms scheduler resolution
  6 calls (limit 250)                    Peak was 176 calls, after 150 s
  0 Running, 8 Paused, 1 Woken up
  604 dead call msg (discarded)          0 out-of-call msg (discarded)
  3 open sockets
  1490603 Total RTP pckts sent           0.000 last period RTP rate (kB/s)

                                 Messages  Retrans   Timeout   Unexpected-Msg
         INVITE ---------->      1000      332       0
         100 <----------         954       53        0         0
         180 <----------         0         0         0         0
         183 <----------         0         0         0         0
         200 <------2014-10-22  19:19:23.202714 1414030763.202714: Dead call 990-17510@ (successful), 

received 'SIP/2.0 200 OK
Via: SIP/2.0/UDP;received=;branch=z9hG4bK-17510-990-8
From: "sipp" <sip:sipp@>;tag=990
To: <sip:14158872327@>;tag=9p6t351mvXZXg
Call-ID: 990-17510@
CSeq: 2 BYE
User-Agent: Plivo
Supported: timer, precondition, path, replaces
Conte----  E-RTD1        994       126       0         0
         ACK ---------->         994       126
              [ NOP ]
       Pause [    30.0s]         994                           0
         BYE ---------->         994       0
------------------------------ Test Terminated --------------------------------

----------------------------- Statistics Screen ------- [1-9]: Change Screen --
  Start Time             | 2014-10-22   19:15:29.941276 1414030529.941276
  Last Reset Time        | 2014-10-22   19:25:15.056475 1414031115.056475
  Current Time           | 2014-10-22   19:25:15.564038 1414031115.564038
  Counter Name           | Periodic value            | Cumulative value
  Elapsed Time           | 00:00:00:507000           | 00:09:45:622000
  Call Rate              |    0.000 cps              |    1.708 cps
  Incoming call created  |        0                  |        0
  OutGoing call created  |        0                  |     1000
  Total Call created     |                           |     1000
  Current Call           |        6                  |
  Successful call        |        0                  |      994
  Failed call            |        0                  |        0
  Response Time 1        | 00:00:00:000000           | 00:00:01:670000
  Call Length            | 00:00:00:000000           | 00:00:31:673000
------------------------------ Test Terminated --------------------------------

Sippy_cup is definitely a good tool for all beginers who finds really hard time to work around with SIPp XML’s. I’m really excited to see how Docker is going to contirbute to VOIP world.

Docker, Marathon, Mesos

Running Mesos With Native Docker Support

In my previous blog, i’ve explained how to manage Docker cluster using Mesos and Marathon. Now Mesos has released 0.20 with Native Docker support. Till Mesos 0.19, we had to use mesosphere’s Deimos for running Docker containers on Mesos slaves. Now from Mesos 0.20, Mesos has the ability to run Docker containers directly. As per Mesos Documentation we can run containers in two ways. 1) as a Task and 2) as an Executor. Currently Mesos 0.20 supports only the host (–net=host) Docker networking mode.

Install the latest Mesos 0.20 using the mesosphere’s Mesos Debian package. Once we have setup the Zookeeper and Docker, we need to make a few changes to enable the Mesos Native Docker support. We need to start all the Mesos-Slave with --containerizers=docker,mesos flag. For enabling this flag, we need to create a file containerizers in /etc/mesos-slave/containerizers with content “docker,mesos”. Mesos slave process, when starting up, will read the folder and enable this flag.

$ echo 'docker,mesos' > /etc/mesos-slave/containerizers

$ service mesos-slave restart

I was trying to start a container with Port 9999 via Marathon REST API. But the task was keep on failing. Up on investigating the logs, i found that the slave when sends the allocatable resource details to mesos master was total allocatable: cpus():0.8; mem():801; disk():35164; ports():[31000-31099, 31101-32000]. So the allowed port range was [31000-31099, 31101-32000]. So if we wnat to use any other custom port, we need to define the same in the /etc/mesos-slave folder by creating a config file.

    $ echo "ports(*):[31000-31099, 31101-32000, 9998-9999]" > "/etc/mesos-slave/resources"

Now, my new allocatable resource details sent to the Mesos master became total allocatable: ports():[31000-31099, 31101-32000, 9998-9998]; cpus():0.7; mem():801; disk():35164. The above config is crucial if we want to allocate our own custom range of ports. So this gave a good idea on how to control my resource allocation by creating custom configuration files. Now let’s restart Mesos-slave process.

    $ service mesos-slave restart

    $ ps axf | grep mesos-slave | grep -v grep
     1889 ?        Ssl    0:26 /usr/local/sbin/mesos-slave --master=zk://localhost:2181,localhost:2182,localhost:2183/mesos --log_dir=/var/log/mesos --containerizers=docker,mesos --resources=ports(*):[31000-31099, 31101-32000, 9998-9999]
     1900 ?        S      0:00  \_ logger -p -t mesos-slave[1889]
     1901 ?        S      0:00  \_ logger -p user.err -t mesos-slave[1889]

Now we have the Mesos slave running with Native Docker support. But the current stable version of Marathon still don’t support the native Docker feature of Mesos. So we need to setup Marathon 0.7 from scratch. I’ve written a blog on upgrading Marathon from 0.6.x to 0.7. Once we have setup the Marathon 0.7, we can start using Mesos with the Native Docker support. Let’s create a new App in Marathon via its REST API.

create a JSON file in the new container format. say ubuntu.json

    "id": "jackex",
    "container": {
        "docker": {
            "image": "ubuntu:14.04"
        "type": "DOCKER",
        "volumes": []
    "cmd": "while sleep 10; do date -u +%T; done",
    "cpus": 0.2,
    "mem": 200,
    "ports": [9999],
    "requirePorts": true,
    "instances": 1

Now, we can use the Marathon API to launch the Docker task,

$ curl -X POST -H "Content-Type: application/json" localhost:8080/v2/apps -d@ubuntu.json

{"id":"/jackex","cmd":"while sleep 10; do date -u +%T; done","args":null,"user":null,"env":{},"instances":1,"cpus":0.2,"mem":200.0,"disk":0.0,"executor":"","constraints":[],"uris":[],"storeUrls":[],"ports":[9999],"requirePorts":true,"backoffSeconds":1,"backoffFactor":1.15,"container":{"type":"DOCKER","volumes":[],"docker":{"image":"ubuntu:14.04"}},"healthChecks":[],"dependencies":[],"upgradeStrategy":{"minimumHealthCapacity":1.0},"version":"2014-08-31T16:03:50.593Z"}

$ docker ps
CONTAINER ID        IMAGE               COMMAND                CREATED             STATUS              PORTS               NAMES
25ba3950a9e1        ubuntu:14.04        /bin/sh -c 'while sl   2 minutes ago       Up 2 minutes                            mesos-a19af637-ffb9-4d3c-8b61-778d47087ace

Mesos UI

Marathon UI

With the new Native Docker support, Mesos is becoming more friendly with Docker. And this is a great achievement for people like me who are trying to use Mesos and Docker in our Infrastructure. A big kudos to Mesos and Mesosphere team for their great work 🙂

Docker, Marathon, Mesos

Upgrading Marthon for Mesos Native Docker Support

A few days ago Mesos 0.20 version was released with native Docker support. Till that, Mesos Docker integration was performed using Mesosphere’s Deimos application. But with the new Mesos, there is no need for any external application for launching Docker containers in Mesos Slaves. As per Mesos Documentation we can run containers in two ways. 1) as a Task and 2) as an Executor. Currently Mesos 0.20 supports only the host (--net=host) Docker networking mode.

The current stable release of Marathon is 0.6X. The debian package provided by the Mesosphere also provides the 0.6x version. The container format has been completely changed in 0.7.x version. So we cannot use the 0.6.x version along with Mesos 0.20. More details are available in the Marathon Documentation

The current Mesos Master branch is version := "0.7.0-SNAPSHOT". So we need to compile Marathon from scratch. The only dependency for compiling Marathon is scala-sbt. Get the latest Debian package for sbt from here. The current sbt version is 0.13.5.

$ wget

$ dpkg -i sbt-0.13.5.deb

Now let’s clone the Marathon githib repo.

$ git clone && cd marathon

$ sbt assembly

$ ./bin/build-distribution  # for building jar file

The above build command will create an executable jar file “marathon-runnable.jar” under the “target” folder. We can use this jar file along with Marathon 0.6.x start script. If we check the upstart script of Marathon 0.6.x, it uses /usr/local/bin/marathon binary for starting the service. So we need to edit two lines in this file to use our latest version 0.7’s jar file.

First let’s move our jar file to say “/opt/ folder.

$ cp marathon-runable.jar /opt/marathon.jar

Now let’s edit the Marathon binary. Make the below changes in the binary file,

marathon_jar="/opt/marathon.jar" # Line number 21


exec java "${vm_opts[@]}" -jar "$marathon_jar" "$@"  # -jar option added to use the jar file instead if the default -cp option. Line number 62

Now let’s restart the Marathon service.

$ service marathon restart

Now let’s use the ps command and verify the service status.

$ ps axf | grep marathon | grep -v grep

22653 ?        Ssl    6:09 java -Xmx512m -Djava.library.path=/usr/local/lib -Djava.util.logging.SimpleFormatter.format=%2$s %5$s%6$s%n -jar /opt/marathon.jar --zk zk://localhost:2181,localhost:2182,localhost:2183/marathon --master zk://localhost:2181,localhost:2182,localhost:2183/mesos
22663 ?        S      0:00  \_ logger -p -t marathon[22653]
22664 ?        S      0:00  \_ logger -p user.notice -t marathon[22653]

Now Marathon is running with the version 0.7 jar file. Let’s verify by creating a new Docker task. Make sure that the Mesos version 0.20 is running on the host and not the older versions of Mesos. More detail about the new container format is available in the Marathon Documentation page.

create a JSON file in the new container format. say ubuntu.json

    "id": "mesos-docker-test",
    "container": {
        "docker": {
            "image": "ubuntu:14.04"
        "type": "DOCKER",
        "volumes": []
    "cmd": "while sleep 10; do date -u +%T; done",
    "cpus": 0.2,
    "mem": 200,
    "ports": [9999],
    "requirePorts": false,
    "instances": 1

Now, we can use the Marathon API to launch the Docker task,

$ curl -X POST -H "Content-Type: application/json" localhost:8080/v2/apps -d@ubuntu.json

{"id":"/mesos-docker-test","cmd":"while sleep 10; do date -u +%T; done","args":null,"user":null,"env":{},"instances":1,"cpus":0.2,"mem":200.0,"disk":0.0,"executor":"","constraints":[],"uris":[],"storeUrls":[],"ports":[9999],"requirePorts":false,"backoffSeconds":1,"backoffFactor":1.15,"container":{"type":"DOCKER","volumes":[],"docker":{"image":"ubuntu:14.04"}},"healthChecks":[],"dependencies":[],"upgradeStrategy":{"minimumHealthCapacity":1.0},"version":"2014-08-31T16:03:50.593Z"}

$ docker ps
CONTAINER ID        IMAGE               COMMAND                CREATED             STATUS              PORTS               NAMES
25ba3950a9e1        ubuntu:14.04        /bin/sh -c 'while sl   2 minutes ago       Up 2 minutes                            mesos-a19af637-ffb9-4d3c-8b61-778d47087ace

Here i’ve used Port number 9999. But by default, in Mesos 0.20, default port range resourced by Mesos Slave is [31000-32000]. Please refer my next Blog post on how to modify and set a custom port range.

With the new Native Docker support, Mesos is becoming more friendly with Docker. And this is a great achievement for people like me who are trying to use Mesos and Docker in our Infrastructure. A big kudos to Mesos and Mesosphere team for their great work 🙂

Docker, Marathon, Mesos, Zookeeper

Managing HA Docker Cluster Using Multiple Mesos Masters

In my previous blog, i’ve described how to manage Docker cluster using Mesos and Marathon. It was using a single Mesos Master. But for production, we cannot go with a single Mesos master, as it will result in a sinlge point of failure. Mesos supports Multi master via Zookeeper. So in this blog i’m going to explain how to setup a Multi Master Mesos Cluster using Zookeper for Automatic promotion of Mesos master when the current Active Mesos Master fails. This time i’m going to setup 3 Zookeeper services and 2 Mesos Master on a single Vagrant box.

Setting up ZooKeeper Cluster

First lets download the the latest stable Zookeeper source code.

$ wget

Now extract the tar file and create 3 copies of the same, say zookeeper1, zookeeper2 and zookeeper3. Also Zookeeper needs Java on the machine, so let’s install java dependencies.

$ apt-get install openjdk-7-jdk openjdk-7-jre

Now inside, the extracted zookeeper source folder, we need to create a config file. So in our case, inside each Zookeeper folder, we need a zoo.cfg file.

For zookeeper1 folder,

# content of zoo.cfg

For zookeeper2 folder,

# content of zoo.cfg

For zookeeper3 folder,

    # content of zoo.cfg

Here, since the 3 zookeeper services are running on the same host, i’ve assigned separate ports. If the zookeeper are running on separate instances, then we can have the same ports for all of the zookeeper nodes. There are two port numbers. The first followers use to connect to the leader, and the second is for leader election.

Now we can start the Zookeeper in Foreground using the ./bin/ start-foreground from each of the 3 zookeeper folders. netstat -nltp | grep 288 will display the port of the zookeeper service which is the current master among the cluster. We can also check the connectivity using the zookeepr client binary available in the zookeeper source folder. Once zookeeper cluster is UP, we can go ahead setting up Mesos.

Setting up Multiple Mesos Masters

Mesossphere team has already built packages for Mesos,Marathon and Deimos. So one of my Mesos master will be from the package. But i also wanted to play with the Mesos Source, so i decided to build Mesos from the source. So my second Mesos Master will be from the scratch.

First Master from the Mesosphere package.

$ apt-key adv --keyserver --recv E56151BF

$ DISTRO=$(lsb_release -is | tr '[:upper:]' '[:lower:]')

$ CODENAME=$(lsb_release -cs)

# Add the repository
$ echo "deb${DISTRO} ${CODENAME} main" | sudo tee /etc/apt/sources.list.d/mesosphere.list

$ apt-get -y update

$ apt-get -y install mesos marathon deimos

Now we need to define the Zookeeper cluster details so that Mesos master and slave can connect. Edit /etc/mesos/zk and add zk://localhost:2181,localhost:2182,localhost:2183/mesos. More details about Zookeeper URL is available here. Also More details about installing Mesos from Mesosphere’s Debian package is available in their website.

So now we have one master ready, we can start the Mesos Master, Mesos Slave and Marathon. ANd esnure that they can connect to our Zookeeper cluster properly. We can also see the connection status in the stdout of the zookeeper process as they are running in foreground.

Now building the second Mesos master from scratch. First let’s download the Mesos Source Code.

$ wget

$ tar xvzf mesos-0.19.0.tar.gz && cd mesos-0.19.0

$ ./bootstrap && ./configure --prefix=/usr/local/mesos

$ make && make install

Once the Compilation is succeeded, we can start the new Mesos Master service. The default port 5050 is used by the existing master, so we need to run this new service on a different port.

$ /usr/local/mesos/sbin/mesos-master --zk=zk://localhost:2181,localhost:2182,localhost:2183/mesos --port=5054 --quorum=1 --registry=in_memory --work_dir=/var/lib/mesos/

Once the new Mesos Master is up, we can create a test container via Marathon Rest API. Create a simple json file called ubuntu.json

    "container": {
    "image": "docker:///libmesos/ubuntu",
    "options" : []
  "id": "ubuntu",
  "instances": "1",
  "cpus": ".3",
  "mem": "200",
  "uris": [ ],
  "cmd": "while sleep 10; do date -u +%T; done"

$ curl -X POST -H "Content-Type: application/json" localhost:8080/v2/apps -d@ubuntu.json

This will launch a single container. We can check the status of the container via the Marathon UI as well as via RestAPI also. Now comes the critical part for Production, What happens when the master fails. Mesos Documentation says, if the current master fails, in a Multi master Mesos Cluster, Zookeeper will elect a new Master, in our case the second Mesos master service. And Mesos Claims that the running services will not get crashed and the status will be taken by the newly promoted master. In our case, we have a Docker container running as a long running service.

First i’ll stop one of the Mesos master service,

$ service mesos-master stop

Now immidiately on the stdout of the Zookeeper, we can see the logs corresponding to the Election process. Below is the same,

I0814 19:41:07.889044  9093 contender.cpp:243] New candidate (id='7') has entered the contest for leadership
2014-08-14 19:41:07,896:9088(0x7ff8e2ffd700):ZOO_INFO@check_events@1750: session establishment complete on server [], sessionId=0x347d606d0d90003, negotiated timeout=10000
I0814 19:41:07.899509  9092 group.cpp:310] Group process ((10)@ connected to ZooKeeper
I0814 19:41:07.900049  9092 group.cpp:784] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
I0814 19:41:07.900069  9092 group.cpp:382] Trying to create path '/mesos' in ZooKeeper
I0814 19:41:07.915621  9092 detector.cpp:135] Detected a new leader: (id='6')
I0814 19:41:07.915699  9092 group.cpp:655] Trying to get '/mesos/info_0000000006' in ZooKeeper
I0814 19:41:07.920266  9094 network.hpp:423] ZooKeeper group memberships changed
I0814 19:41:07.920910  9094 group.cpp:655] Trying to get '/mesos/log_replicas/0000000003' in ZooKeeper
I0814 19:41:07.924144  9095 network.hpp:461] ZooKeeper group PIDs: { log-replica(1)@ }
I0814 19:41:07.926575  9092 detector.cpp:377] A new leading master (UPID=master@ is detected
I0814 19:41:07.926679  9092 master.cpp:957] The newly elected leader is master@ with id 20140814-194033-251789322-5050-8808

As per the above logs, the second Mesos Master running at port 5054 was promoted as New Mesos Master. This will be reflected to Marathon also. We can go to the Marthon UI and make sure that the Docker process that we have started using the old Mesos master is still running fine. We can even try making some scaling changes and can make sure that the new master can alter the process.

In my testing, i’ve tried stoping each of the service and ensured that only one Mesos master is running and also tried scaling the Apps from one Master service to make sure that the new Master was able to keep track of all these changes. The results were quite promising. Mesos + Marrthon + Docker indeed is killer combo. We can really built a cross vendor independent cluster with scaling capabilites.

Docker, FreeSwitch, Ubuntu

Load Test on Docker Freeswitch – Part 1

Docker is a very powerfull tool for managing Linux containers. In my previous blog i’ve explaind on how to setup a Docker Freeswitch. Docker is very mature now, version 1.0 has already been released. Docker is now supported by all major cloud vendors. Docker was showing promising results when i was performing my initial testing. So this time i decided to perform a heavy load test on the Freeswitch container to ensure that Docker can really enter Telephony. Like any normal sys admin, i was googling for Freeswitch load test, and most of the results were pointing to Sipp, an Open Source test tool / traffic generator for the SIP protocol. For me Sipp didnt helped me as it started throwing errors beyond 320 simultaneous calls. The UDP connections were timing out. I tried increasing the timeout, which didn’t helped much.

So next choice is to use a Freeswitch itself, to generate calls. Using the FreeSwitch’s originate command to generate simultaneous calls and hit the Docker Freeswitch container. I also decided to collect all system metrics, so that i knows how the machine behaves under various load tests conditions. For this i deciced to use CollectD and Graphite combo. Collectd 5+ has an inbuild graphite plugin which can send the collectd metrics to a graphite server.

I’ve already setup an Ubuntu-Freeswitch Docker image. First we need to pull the images from the Docker hub.

$ docker pull deepakmdass88/fs-ubuntu

Now i’m going to start the Docker FreeSwitch container in foreground.

$ docker run --rm --privilieged -i -t -p 5060:5060/tcp -p 5060:5060/udp -p 16384:16384/udp -p 16385:16385/udp -p 16386:16386/udp -p 16387:16387/udp -p 16388:16388/udp -p 16389:16389/udp -p 16390:16390/udp -p 16391:16391/udp -p 16392:16392/udp -p 16393:16393/udp -p 5080:5080/tcp -p 5080:5080/udp deepakmdass88/fs-ubuntu /bin/bash

The privilieged option was enabled because, the FreeSwitch init script sets some custom ulimit values, so the container has to be given special privileges. Corresponding SIP and RTP ports are forwarded from the host to the container.

Now before starting the Freeswitch service, we can set up the CollectD agent. By default, the Ubuntu repostiry contains CollectD versio 4.10, but the Graphite plugin is available from version 5.0+ onwards. So we can use somne PPA which has the corresponding version available.

$ apt-get install python-software-properties

$ add-apt-repository ppa:joey-imbasciano/collectd5

$ apt-get update && apt-get install collectd

Now in the /etc/collectd.conf, uncomment LoadPlugin write_graphite. Also, in the same file and uncomment the plugin definition and fill in the server details.

<Plugin write_graphite>
        Host ""
        Port "2003"
        Protocol "tcp"
        LogSendErrors true
        Prefix "collectd."
        StoreRates true
        AlwaysAppendDS false
        EscapeCharacter "_"

I’ve enabled a custom freeswitch plugin, which will extract the current ongoing calls count from freeswitch and sends it to the graphite server. Once the config changes are done we can restart the CollectD service. Now we can check our graphite UI to see if the default metrics like memory, load, cpu etc. are reaching the graphite server. Once CollectD-Graphite setup is ready, we can go ahead with our load test. So, once the call has reached the server, we need some Dialplan to continue the calls. So the simplest method is to create an infinite loop of playing some file, or some conference. Below are some dialplans that i’ve created in the public.xml

# Infinite Play Loop

 <extension name="111222333">
       <condition field="destination_number" expression="^111222333$">
         <action application="answer"/>
         <action application="playback" data="sounds/music/8000/got.wav"/>
         <action application="transfer" data="111222333 XML public"/>

# Test conference

  <extension name="docker-fs-test-conf">
    <condition field="destination_number" expression="^112233">
      <action application="answer"/>
      <action application="sleep" data="500"/>
      <action application="conference" data="docker-test@public"/>

# Default IVR menu

    <extension name="ivr_demo">
      <condition field="destination_number" expression="^5000$">
        <action application="answer"/>
        <action application="sleep" data="2000"/>
        <action application="ivr" data="demo_ivr"/>

Now, we have the dialplans ready, next is authentication. By default there are two ways, Digest auth and IP Whitelist. Here i’m going to use IP whitelist, so we need to whitelist our IP in the acl.conf file.

 <list name="domains" default="deny">
      <!-- domain= is special it scans the domain from the directory to build the ACL -->
      <node type="allow" domain="$${domain}"/>
      <node type="allow" cidr=""/>                 # IP of FS from which we are going to send the calls
      <!-- use cidr= if you wish to allow ip ranges to this domains acl. -->
      <!-- <node type="allow" cidr=""/> -->

Now we can start the Freeswitch service.

$ /etc/init.d/freeswitch start

We can check the freeswich service using the fs_cli command.

$ /usr/local/freeswitch/bin/fs_cli -x "show status"

UP 0 years, 0 days, 6 hours, 34 minutes, 59 seconds, 648 milliseconds, 56 microseconds
FreeSWITCH (Version 1.5.13b git 39200cd 2014-07-02 21:55:21Z 64bit) is ready
1068 session(s) since startup
0 session(s) - peak 299, last 5min 0
0 session(s) per Sec out of max 30, peak 29, last 5min 0
1000 session(s) max
min idle cpu 0.00/100.00
Current Stack Size/Max 240K/8192K

Now freeswitch is ready to accept the connection. We can start sending the calls from our Load test freeswitch. Below is the script that was used to originate the calls from the load test Freeswitch machine. This will create simultaneous calls towards the Docker FS.


while [ 1 ]; do

set -i req
req=$(/usr/local/freeswitch/bin/fs_cli -q -b -x "show channels count" | awk '{print $1}')
if [ $req -lt $MAX_CALLS ]; then
    /usr/local/freeswitch/bin/fs_cli -q -b -x "bgapi originate sofia/external/$SIP_URI loadtest"
    echo "sleep a bit ..."
    sleep 10s


While bulk calls are being made from the Load test freeswitch machines, to test the Quality in real time, it’s better to dial to the extension directly from a Sip Phone/Client and ensure that voice quality is good. Below is my Graphite dashboard for the load test.

Default Graphite UI

Tessera UI

The FS was stable till 500 simultaneous calls, after that there was a sudden drop in calls and also the voice quality started dropping and in a minute the Freeswitch crashed due to Segmentation fault. I’m going to analyze the core dump file to understand more about the crash. The other smaller drops that we see in the graph was caused by the Load test Freeswitch machine, as the load was getting high when the number of calls was increased. But 500 simultaneous calls are pretty decent and the there was no issue in voice quality till the number of calls crossed 500. Though it’s very difficult to make a final confirmation, i decided to go ahead with phase 2 load test.

In the phase 2 test, i’m planning to use multiple FS load test machines to generate large simultaneous calls + running 2 separate FS containers on the same host and split the incoming calls to both these containers. Once the phase 2 test is completed, ill share the test results in an another blog post. Docker is still under heavy development, and i’m sure Docker will be entering Telephony soon.

Docker, Marathon, Mesos, Ubuntu

Managing Docker Clusters Using Mesos and Marathon

Docker has became one of my favourite tool. It’s super cool and super easy tool to manage linux containers. LXC’s are around in IT world for some time, but by the entry of Docker last year, the wave started rising. Thanks to Docker team and Solomon Hykes for open sourcing such a wonderfull project. I’ve already mentioned a lot of stuffs about Docker in my previos blogs, so today im going explain how Docker can be used as a Cluster. There are some interesting tools like CoreOS, Helios etc for managing Docker as a cluster. But today i’m going to explain on how to set up a Docker cluster using Apache Mesos. CoreOS is a custom linux os which comes with SystemD. But the restriction is, we have to use that custom images of coreos.. Indeed CoreOS team open sourced some exciting tools like etcd fleet which works with CoreOS for managing Docker clusters. But Mesos is quite simple, we can install it via package, or even using tar balls available in thier Github repo onto most of the Linux Distro’s and it’s quite easy to configure also. Mesos is heavily used by Twitter to manage their data center’s. And now Mesosphere has opensourced a new tool called Mararthon which now provides a UI and a Rest API for maaging and scheduling Mesos Frameworks aka jobs, in this case containers as a service.

A few weeks ago, Mesos 0.19 was released which comes with an official support for Docker coantiners by intergrating Deimos into it. And a few days ago Marathon has released their new version 0.6.0 supports launching any task in a Docker container via Mesos 0.19+

Setting up Mesos Cluster

In this test setup, i’m going to setup both Mesos master/slave and Zookeeper on the same Ubuntu 14.04 vagrant node. First we can install the dependencies,

$ apt-get install curl python-setuptools python-pip python-dev python-protobuf

Now we can install Zookeeper

$ apt-get install zookeeperd

After the installation, ZooKeeper has 1 configuration. Each Zookeeper needs to know its position in the quorum.

$ echo 1 | sudo dd of=/var/lib/zookeeper/myid

Now we can setup Docker

$ echo "deb docker main" > /etc/apt/sources.list.d/docker.list

$ apt-get update && apt-get install lxc-docker

$ docker version

   Client version: 1.0.0
   Client API version: 1.12
   Go version (client): go1.2.1
   Git commit (client): 63fe64c
   Server version: 1.0.0
   Server API version: 1.12
   Go version (server): go1.2.1
   Git commit (server): 63fe64c

Let’s pull some basic ubuntu images from Docker Hub so that we can use the same for testing.

$ docker pull libmesos/ubuntu

Now we can configure Mesos

$ curl -fL -o /tmp/mesos.deb

$ dpkg -i /tmp/mesos.deb

$ mkdir -p /etc/mesos-master

$ echo in_memory | sudo dd of=/etc/mesos-master/registry

## Mesos Python egg for use in authoring frameworks

$ curl -fL -o /tmp/mesos.egg

$ easy_install /tmp/mesos.egg

We can download the latest Marathon 0.6 from here

$ tar xvzf marathon-0.6.0.tgz

Mesos uses Deimos for managing dockers, Deimos can installed via pip

$ pip install deimos

Also, we need to configure mesos to use Deimos,

$ mkdir -p /etc/mesos-slave

$ echo /usr/local/bin/deimos | sudo dd of=/etc/mesos-slave/containerizer_path

$ echo external | sudo dd of=/etc/mesos-slave/isolation

Now we can start all the services.

$ initctl reload-configuration

$ service docker start

$ service zookeeper start

$ service mesos-master start

$ service mesos-slave start

##### Starting Marathon #####

$ cd marathon-0.6.0

$ ./bin/start --master zk://localhost:2181/mesos --zk_hosts localhost:2181

Marathon will now start listening to port 8080, We can access the UI from the browser via this port, also via rest API using the same port.

curl localhost:8080/help   # gives us some details about the API's

I just went through the Deimos code, so under the hood they are using docker run with some default parameters like --sig-proxy, --rm, --cidfile, -v, -w and extra parameters that we are passing while creating the task via Marathon.

As of now, we still can’t pass details like Container image, Docker options via Marathon GUI. So we can use the Rest API for the time being. Below is a sample curl request for launcing a single container,

curl -X POST -H "Accept: application/json" -H "Content-Type: application/json" \
    localhost:8080/v2/apps -d '{
        "container": {"image": "docker:///libmesos/ubuntu", "options": ["--privileged"]},
        "cpus": 0.5,
        "cmd": "sleep 500",
        "id": "docker-tester",
        "instances": 1,
        "mem": 300

We can pass custom options to the docker run command via “options”. After making the curl request, we can check the syslog, as mesos will be logging into syslog by default. We can even see the Docker run command on the same.

Jun 27 07:24:58 vagrant-ubuntu-trusty-64 deimos[19227]: deimos.containerizer.docker.launch() exit 0 // docker run --sig-proxy --rm --cidfile /tmp/deimos/mesos/00d459fb-22ca-4af7-9a97-ef8a510905f2/cid -w /tmp/mesos-sandbox -v /tmp/deimos/mesos/00d459fb-22ca-4af7-9a97-ef8a510905f2/fs:/tmp/mesos-sandbox --privileged -p 31498:31498 -c 512 -m 300m -e PORT=31498 -e PORT0=31498 -e PORTS=31498 libmesos/ubuntu sh -c 'sleep 500'

We can also use the Marathon Rest API to check the status of the job which we started.

curl -X GET -H "Content-Type: application/json" localhost:8080/v2/apps

Below is the screenshort for the same from the Marathon UI.

We can also check if the container is launched via docker ps command.

A more detailed report about the Docker job which we have launched can be viewed via the default Mesos GUI listening on port 5050 on the Mesos master. Now we can test the scalability of the Job. Currently we have only one container running. So now we can try scaling say adding one more node. We can do it in two ways, like via PUT request using curl or using GUI

curl -X PUT -H "Content-Type: application/json" localhost:8080/v2/apps/docker-tester \
    "container": {"image": "docker:///libmesos/ubuntu", "options": ["--privileged"]},
            "cpus": 0.5,
            "cmd": "sleep 500",
            "id": "docker-tester",
            "instances": 2,     # increasing the instance count to 2
            "mem": 300

Now we can use the docker ps command to see if the new container is launched or not. Also we can see that status in UI also.

Similarly, we can scale down also. I’ve tested the same and all seems to be good. Marathon ensures that the docker process will be running. So incase if the process crashes Marathon will restart the same and ensures that the instances are up and running as per our configuration. There are a few other Open Sourced Mesos Scheduler’s like Apache Aurora, Airbnb’s Chronos. But for my requirement marathon is pretty straight and simple and also provides a very good Rest API layer for managing containers. Mesos, Marathon and Docker are still young, but provides a killer combination for managing clusters built over Docker containers.

Docker, FreeSwitch, Ubuntu, Voip

Dockerizing FreeSwitch – Docker Enters Telephony World

Docker has became one of the hottest topics in IT now a days. Docker is an open-source project that automates the deployment of applications inside software containers. Docker extends a common container format called Linux Containers (LXC), with a high-level API providing lightweight virtualization that runs processes in isolation.Docker uses LXC, cgroups, and the Linux kernel itself. Though i coudn’t make out to the DockerCon 2014 in SF, a lot new developments were announced on the DockerCon. Especially three new Opensource Projects libcontainer, libchan and libswarn. Docker is indeed creating a revolution in the container space, creating a next generation of scalable platform management. There are a lot PAAS services like Deis,, Dokku which are already using Docker in production. Another important and exciting project is CoreOS. CoreOS uses tools like SystemD, Fleet, EtcD to build a fully scalabale docker based cluster management system. I definitely need a separate blog to write about CoreOS, it’s really a super exciting project to play with.

Last week Docker Team released Version 1.0 of Docker. So i’ll be using the same in this new set up. It’s been almost 6 Month’s since i’ve been working @ Plivo as a DevOps Engineer. Telephony was really a very new platform for me. And my first companion was offcourse FreeSwitch,a scalable open source cross-platform telephony platform designed to route and interconnect popular communication protocols using audio, video, text or any other form of media. I was heavily using Vagrant for all my experiments in my mac. But after started using Docker, it really made me crazy. I’ve played for some time wiht LXC’s long back. So this was like a leap back to the container world.

There are a lot of concerns on using Virtual Machines in Telephony world. Especially for the server’s that handles the Real Time voice packets, as voice quality is pretty important in Telephony. Docker’s again more light weight isolated environment, and i decide to see how Docker can perform with such issues. If Docker handle Freeswitch smoothly, then i’m sure that we can use Docker for other telephony app’s like OpenSIPS/Kamailio etc, as they handle only sessions not the Media traffic. I know there are a lot of concerns like CPU load, Network etc, but this is like an initial move to test Docker into Telephony.

Setting Up Docker

Docker 1.0 is available from the Official Docker repo.

$ echo "deb docker main" > /etc/apt/sources.list.d/docker.list

$ apt-get update && apt-get install lxc-docker

Now we can check the Docker version using the docker binary itself.

$ docker version

Client version: 1.0.0
Client API version: 1.12
Go version (client): go1.2.1
Git commit (client): 63fe64c
Server version: 1.0.0
Server API version: 1.12
Go version (server): go1.2.1
Git commit (server): 63fe64c

Now Docker is installed, but we need some OS images to use with docker. We can build custom images using debootstrap etc. But there are official minimal images available in Docker HUB. We can search for the repositories and can pull those images via docker binary itself.

For example to pull the entire Ubuntu images, we can just do,

$ docker pull ubuntu

But this will download all the ubuntu images available in the repo. We can also do selective download by using the tag.

$ docker pull ubuntu:14:04

Once the images are downloaded, we can use images option in docker binary to see all the downloaded images.

$ docker images

REPOSITORY                      TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
ubuntu                          14.04               ad892dd21d60        10 days ago         275.5 MB

Here i’m not going to daemonize the container, i’ll be using the interactive option. But first, let’s start a new container.

$ docker run -t -i ubuntu:14.04 /bin/bash

This command will start a conatiner and will open up a bash session for us and we will be inside the bash session. Now to use an application we need to open up corresponding ports to outside world. We can use the “-p” option while starting a docker container to enable port forwarding. Under the hood, docker is using IPtables for the same. In the case of Freeswitch, we need to open 5060,5080 for the default Sofia profiles (Internal and External). Also we need to open the RTP ports. In this test i’ll be opening a predefined set of ports ie from “16384” to “16394”. (As my Docker host resides on Azure, creating an Endpoint for each port forward is really a pain, so i decided to open only a few). And also i’ll be opening port 22, so that we can have an ssh server inside the container.

$ docker run -t -i -p 2223:22 -p 5060:5060/tcp -p 5060:5060/udp -p 16384:16384/udp -p 16385:16385/udp -p 16386:16386/udp -p 16387:16387/udp -p 16388:16388/udp -p 16389:16389/udp -p 16390:16390/udp -p 16391:16391/udp -p 16392:16392/udp -p 16393:16393/udp -p 5080:5080/tcp -p 5080:5080/udp ubuntu:14.04 /bin/bash

This will start a new container and Docker by default will setup the IPtables for port forwarding. So now my IPtables looks like this.

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
   43 16850 ACCEPT     udp  --  !docker0 docker0             udp dpt:5080
    0     0 ACCEPT     tcp  --  !docker0 docker0             tcp dpt:5080
  988  198K ACCEPT     udp  --  !docker0 docker0             udp dpt:16392
    0     0 ACCEPT     udp  --  !docker0 docker0             udp dpt:16389
    0     0 ACCEPT     udp  --  !docker0 docker0             udp dpt:16385
    0     0 ACCEPT     udp  --  !docker0 docker0             udp dpt:16393
 2026  405K ACCEPT     udp  --  !docker0 docker0             udp dpt:16388
 8817 1763K ACCEPT     udp  --  !docker0 docker0             udp dpt:16384
12144 8684K ACCEPT     udp  --  !docker0 docker0             udp dpt:5060
 4359  257K ACCEPT     tcp  --  !docker0 docker0             tcp dpt:5060
 9917 1983K ACCEPT     udp  --  !docker0 docker0             udp dpt:16390
    0     0 ACCEPT     udp  --  !docker0 docker0             udp dpt:16387
    0     0 ACCEPT     tcp  --  !docker0 docker0             tcp dpt:22
   38  4848 ACCEPT     udp  --  !docker0 docker0             udp dpt:16391
    1   152 ACCEPT     udp  --  !docker0 docker0             udp dpt:16386
    0     0 ACCEPT     all  --  *      lxcbr0  
    0     0 ACCEPT     all  --  lxcbr0 *  
 431K  630M ACCEPT     all  --  *      docker0              ctstate RELATED,ESTABLISHED
 128K   19M ACCEPT     all  --  docker0 !docker0  
   16  2460 ACCEPT     all  --  docker0 docker0  

Now we can go ahead with Freeswitch compilation. In my previous blog, i’ve mentioned how to compile and set up freeswitch. Once freeswitch is ready, we need to make a few changes. By default, Freeswitch uses STUN to route through NAT, but this doesn’t work with Docker. So we have to set the external IP manually. In the Freeswitch installed folder, edit conf/autoload_configs/switch.conf.xml. In this file we can set the External IP manually. Add the below lines to switch_conf.xml.

<X-PRE-PROCESS cmd="set" data="external_sip_ip=<YOUR_EXTERNAL_IP>"/>
<X-PRE-PROCESS cmd="set" data="external_rtp_ip=<YOUR_EXTERNAL_IP>"/>

Also we need to modify the Default Sofia Profiles and need to set the ext-rtp-ip and ext-sip-ip to use our external IP added in the switch_conf.xml file while establishing connections. Add the below lines to the conf/sip_profiles/internal.xml and conf/sip_profiles/external.xml

<param name="ext-rtp-ip" value="$${external_rtp_ip}"/>
<param name="ext-sip-ip" value="$${external_sip_ip}"/>

Now we need to set teh RTP ip range to the range which we have forwarded while creting the container. So we need to edit conf/autoload_configs/switch.conf.xml

<param name="rtp-start-port" value="16384"/>
<param name="rtp-end-port" value="16394"/>

Once the changes are made, we can start the FreeSwitch service. Now to make sure that the External IP is working properly, we can check the sofia profile status using fs_cli. below is a sample output of the sofia profile status.

freeswitch@internal> sofia status profile internal
Name                internal
Domain Name         N/A
Auto-NAT            false
DBName              sofia_reg_internal
Pres Hosts,
Dialplan            XML
Context             public
Challenge Realm     auto_from
Ext-RTP-IP          <my_external_ip>
Ext-SIP-IP          <my_external_ip>
URL                 sip:mod_sofia@<my_external_ip>:5060
BIND-URL            sip:mod_sofia@<my_external_ip>:5060;maddr=;transport=udp,tcp
HOLD-MUSIC          local_stream://moh
TEL-EVENT           101
DTMF-MODE           rfc2833
CNG                 13
SESSION-TO          0
MAX-DIALOG          0
NOMEDIA             false
LATE-NEG            true
PROXY-MEDIA         false
ZRTP-PASSTHRU       true
CALLS-IN            0
CALLS-OUT           0

Now freeswitch ahs started successfully. We can test some basic calls using softphones like Xlite, Telephone etc. By default, there are some default extensions and user’s available, so we can use the same for testing the calls. But i really wanted to try trunkning also and wanted to see the quality of the voice. So i created SIP trunking in Freeswitch using Plivo. And i tested a couple of calls to US and India DID’s and no issues were detected in the quality. But again i need to test the laod of the server’s when it startes handling concurrent calls and also the voice quality. But i decied to d oit as a Phase II. But as of now, Docker FreeSwitch is working perfectly like a physical machine with out issues.

So now we have a working FreeSwitch container, now here comes the main advantage of the Docker. We can create a new image with all these changes, so that nex time i dont need to work from scratch. I can use this saved image and a readymade Docker Freeswitch container can be launched in seconds. Since we are in interactive mode, we should not quit the session before it’s saved or else all the things will be lost,becoz dokcer will destroy the same. So open up a new shell on the docker host and use the commit option. But to use the commit command, we need to know the container id, so here docker ps command comes handy.

$ docker ps

CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS           NAMES

e7f3c02346d4        a4196763d248        /bin/bash           32 hours ago        Up 32 hours>22/tcp,>5060/tcp,>5060/udp,>5080/tcp,>5080/udp,>16384/udp,>16385/udp,>16386/udp,>16387/udp,>16388/udp,>16389/udp,>16390/udp,>16391/udp,>16392/udp,>16393/udp   silly_turing

In my case “e7f3c02346d4” is the container ID. So i can use the same for commit. I won’t be commiting to the base Ubuntu image, as i can use the same for other purposes, so here i’ll commiting to a new image say “ubntu-fs-docker”

$ docker commit -m "<commit message>" e7f3c02346d4 ubntu-fs-docker

Now we can use this “ubntu-fs-docker” image to launch a ready made FreeSwitch server’s.

Docker is a very juvenile project about more than a year old. But the use cases are expanding heavily in the Modern IT world. Docker is fueling up a new generation of scalable servers. Wishing all the best for Docker and kudos to Solomon Hykes and the DotCloud team for opensourcing such a powerfull project