Bazel – For Fast, Correct, Reproducible Builds

In my previous blog, i’ve shown bazel in action by building a solr cloud package. In this blog i’m going to explain a bit more about Bazel.

Bazel is the Open Source version of Google’s internal build tool Blaze. Bazel is currently in beta state, but it has been used by a number of companies in production. Bazel has some quite interesting features. Bazel has a good caching mechanism. It caches all input files, all external dependencies etc … Before running the actual build, bazel will first check the existing cache and if the cache is valid. If valid, then bazel will try to check if there are any changes to the input files/ dependencies. If it detect any changes, then bazel will start re-building the package. We can also use bazel to build our test targets and can make bazel to run our unit/integration tests for the built targets. Bazel can also detect cyclic dependencies with in the code. Another important feature is sandboxing. On Linux, Bazel can run build/test inside a sandboxed environment and can detect file leaks or broken dependencies. This is because, during sandbox mode, bazel will mount only the specified input files, data dependencies on to the sandbox environment.

Bazel Build Flow

Let’s see how the bazel build process flow works. First thing that we need is a WORKSPACE file. A bazel workspace is a directory that contains the source files for one or more software projects, as well as a WORKSPACE file and BUILD files that contain the instructions that Bazel uses to build the software. It also contains symbolic links to output directories in the Bazel home directory

Let’s create a simple workspace for testing

$ mkdir bazel-test && cd bazel-test


Now i’m going to build a simple python package. is a simple python script which imports a hello function from So our primary script is which has a dependency on

vagrant@trusty-docker:~/bazel-test$ cat
from dep import hello
print hello("Building a simple python package with Bazel")

vagrant@trusty-docker:~/bazel-test$ cat
def hello(msg):
    return msg

The Bazel’s build command basically looks for a BUILD file on the target location. This file should contain the necessary bazel build rules. Bazel’s Python Rule Documentation explains the list of rules that are supported. Applying this to our test scripts, we are going to build a py_binary for our and this binary has a py_library dependency towards So our final BUILD file will be,

  name = 'dep',
  srcs = [''],

  name = 'hello',
  srcs = [''],
  deps = [':dep'],    # our dependency towards ``

So we have the BUILD file now, let’s kick off a build

vagrant@trusty-docker:~/bazel-test$ bazel build hello
INFO: Found 1 target...
Target //:hello up-to-date:
INFO: Elapsed time: 4.564s, Critical Path: 0.03s

woohoo, so bazel has build the package for us. Now if we check our workspace, we will see a bunch of bazel-* symlinks. These directories points to the bazel home directory where our final build output lies.

vagrant@trusty-docker:~/bazel-test$ tree -d
β”œβ”€β”€ bazel-bazel-test -> /home/vagrant/.cache/bazel/_bazel_vagrant/9dedbe0729180ec68a026adfb67cba5d/execroot/__main__
β”œβ”€β”€ bazel-bin -> /home/vagrant/.cache/bazel/_bazel_vagrant/9dedbe0729180ec68a026adfb67cba5d/execroot/bazel-test/bazel-out/local-fastbuild/bin
β”œβ”€β”€ bazel-genfiles -> /home/vagrant/.cache/bazel/_bazel_vagrant/9dedbe0729180ec68a026adfb67cba5d/execroot/bazel-test/bazel-out/local-fastbuild/genfiles
β”œβ”€β”€ bazel-out -> /home/vagrant/.cache/bazel/_bazel_vagrant/9dedbe0729180ec68a026adfb67cba5d/execroot/__main__/bazel-out
└── bazel-testlogs -> /home/vagrant/.cache/bazel/_bazel_vagrant/9dedbe0729180ec68a026adfb67cba5d/execroot/bazel-test/bazel-out/local-fastbuild/testlogs

So our new python binary is available in bazel-bin/hello. Also, bazel creates something called runfiles which exists next to the binary. Bazel actually copies our dependencies (input files and data dependencies) onto this runfiles folder.

-r-xr-xr-x 1 vagrant vagrant 4364 Feb 19 20:13 bazel-bin/hello
vagrant@trusty-docker:~/bazel-test$ ls -l bazel-bin/hello
hello                    hello.runfiles/          hello.runfiles_manifest
vagrant@trusty-docker:~/bazel-test$ ls -l bazel-bin/hello.runfiles/__main__/
total 4
lrwxrwxrwx 1 vagrant vagrant  31 Feb 19 20:13 -> /home/vagrant/bazel-test/
lrwxrwxrwx 1 vagrant vagrant 130 Feb 19 20:13 hello -> /home/vagrant/.cache/bazel/_bazel_vagrant/9dedbe0729180ec68a026adfb67cba5d/execroot/bazel-test/bazel-out/local-fastbuild/bin/hello
lrwxrwxrwx 1 vagrant vagrant  33 Feb 19 20:13 -> /home/vagrant/bazel-test/

If we go through our python binary bazel-bin/hello, it’s nothing but a wrapper script which basically identifies our runfiles directory path, add this runfiles path to the PYTHONPATH env variable and then invokes our file. In the beginning, i’ve mentioned that bazel has a good caching mechanism. Let’s re-run the build command and see the output, especially the time taken to complete the build process.

vagrant@trusty-docker:~/bazel-test$ bazel build hello
INFO: Found 1 target...
Target //:hello up-to-date:
INFO: Elapsed time: 0.247s, Critical Path: 0.00s

Let’s compare the build time for both the build process. The first build process took ~ 4.5 sec. But the second one is ~ 0.2 sec. This is because, bazel didnt run real build process during the second run. It actually verified the input files against its cache and found no change.

Now let’s add a simple unit test and see how bazel can run the same.

vagrant@trusty-docker:~/bazel-test$ cat
import unittest

from dep import hello

class TestHello(unittest.TestCase):

  def test_hello(self):
    self.assertEquals(hello("test message"), "test message")

if __name__ == '__main__':

Now let’s add a py_test rule to our BUILD file so that bazel can use it with bazel test.

    name = "hello_test",
    srcs = [""],
    deps = [

We have the py_test rule, now let’s run the bazel test command and verify.

vagrant@trusty-docker:~/bazel-test$ bazel test hello_test
INFO: Found 1 test target...
Target //:hello_test up-to-date:
INFO: Elapsed time: 2.255s, Critical Path: 0.06s
//:hello_test                                                            PASSED in 0.0s

Executed 1 out of 1 test: 1 test passes.

woohoo the test seems to run fine. Now let’s manually break the test and see if bazel is picking the failure also.

vagrant@trusty-docker:~/bazel-test$ bazel test hello_test
INFO: Found 1 test target...
FAIL: //:hello_test (see /home/vagrant/.cache/bazel/_bazel_vagrant/9dedbe0729180ec68a026adfb67cba5d/execroot/bazel-test/bazel-out/local-fastbuild/testlogs/hello_test/test.log).
Target //:hello_test up-to-date:
INFO: Elapsed time: 0.199s, Critical Path: 0.05s
//:hello_test                                                            FAILED in 1 out of 2 in 0.0s

Executed 1 out of 1 test: 1 fails locally.

Bingo, bazel is detecting the test failure too. During our build process we saw that bazel caches the build and doesnt re-run the build process unless it desont detect any changes to the dependencies. Now lets see if what bazel does with tests too.

vagrant@trusty-docker:~/bazel-test$ bazel test hello_test
INFO: Found 1 test target...
Target //:hello_test up-to-date:
INFO: Elapsed time: 0.169s, Critical Path: 0.04s
//:hello_test                                                            PASSED in 0.0s

Executed 1 out of 1 test: 1 test passes.
There were tests whose specified size is too big. Use the --test_verbose_timeout_warnings command line option to see which ones these are.
vagrant@trusty-docker:~/bazel-test$ bazel test hello_test
INFO: Found 1 test target...
Target //:hello_test up-to-date:
INFO: Elapsed time: 0.087s, Critical Path: 0.00s
//:hello_test                                                   (cached) PASSED in 0.0s

Executed 0 out of 1 test: 1 test passes.
There were tests whose specified size is too big. Use the --test_verbose_timeout_warnings command line option to see which ones these are.

Bingo, we can see the (cached) line in the output of the second tests run. So like the build process, bazel does caches the tests too.

Customizing Bazel Rules

py_binary, py_library etc… are the default bazel python rules which comes with bazel. Unlike any other product, we might endup in cases where we need to have custom rules to solve our specific needs. And the good news is, Bazel comes with an extension called skylark. With skylark, we can create custom build rules matching our requirements. Skylark syntax are pretty similar to python. I’ll be writing a more detailed blog on skyalrk soon πŸ™‚


Though bazel is still in beta, it seems to be a really interesting tool for building hermetic packages. Bazel does has the ability to detect cylic dependencies and dependency leaks which is really an important thing. The caching ability of bazel really helps us to build faster packages compared to other traditional build tools.

bazel, solr

Bazeling Solr Cloud

It’s been a long time since i wrote my last blog. And this time i decided to write about something which ive been working for the last few months. Its Bazel. We have been using bazel heavily in production for the last couple of months and results seems to be pretty good. Leonid from our Infra team recently gave a talk about how we use bazel to build hermetic packages. I’ll be writing some detailed blog on how to play with bazel. But in this blog, we will be seeing bazel in action only.

This time i’m going to build a bazel package for solr cloud v6.3.0 and going to use this bazel package to spin up a solr cloud service.


Basically there are 3 main dependencies,

  1. Bazel, for building our solr package
  2. Java 8 (as latest versions of both bazel and solr requires java8)
  3. ZooKeeper Cluster

Solr uses Zookeeper as a repository for cluster configuration and coordination. There are tons of blogs on how to setup a simple ZK cluster, so i’m gonna skip that part. My test setup has a single node ZK cluster.

Setting up Bazel Solr Package

Bazel install page is well documented on how to setup bazel locally. If we go through the solr documentation, there are a bunch of variables that the bin/solr wrapper script looks for, especially when we want to customize our solr settings. On a local test setup, we dont care about such customization, but live environment, we definitely need to tweak things like JAVA_HOME or our SOLR_HOME directory where solr stores the data or even ZK hosts list.

My Bazel package is going to be pretty straight forward, it will have a shell binary which is basically a wrapper script. And this script will have two bazel data dependencies, 1) solr cloud source file and 2) solr config files. This wrapper binary makes sure that all the necessary runtime variables are set and the SOLR_HOME contains all necessary config files including various configs for the collections too. This wrapper binary will be using the solr source that is embedded with in the bazel’s runfiles folder (where bazel keeps all data dependencies for a specific build rule).

We need to teach bazel where to fetch the solr source file. So lets create a workspace and add tell bazel where to look for the solr source. Add the below lines to WORKSPACE file

# Custom Solr version to use
      name = 'solr_ver630',
      url = '',
      sha256 = '07692257575fe54ddb8a8f64e96d3d352f2f533aa91b5752be1869d2acf2f544',
      build_file = 'tools/BUILD.extract',
      strip_prefix = 'solr-6.3.0',

My BUILD.extract is pretty simple, it just exposes all the files present inside the extracted tar file

package(default_visibility = ["//visibility:public"])

# Exposes all files in a package as `@workspace_name//:files`
  name = 'files',
  srcs = glob(['**/*']),

Let’s create a folder for keeping our various solr config files like solr.xml, solrconfig.xml etc… Copy the necessary config files and expose them via a BUILD file. We can either use glob to expose everything blindly or we can simply create a list of files which we want to expose. If we create a list with specific file names, then bazel will expose only those files. In my case, i’m gonna use the glob similar to what i’m using in BUILD.extract file.

Now let’s add our shell binary rule. I’m going to keep the source file that the shell binary rule uses inside a separate directory instead of polluting the workspace. The shell binary rule build rule is,

package(default_visibility = ["//visibility:public"])

  name = 'start_solr',
  srcs = [''],      # This is the src of the wrapper script
  data = [
      '@solr_ver630//:files',    # -> Adding solr source as a data dep
      '//configs/solr:files',    # -> Adding our configs folder as a data dep

My final directory structure looks to be,

β”œβ”€β”€ configs
β”‚   └── solr
β”‚       β”œβ”€β”€ BUILD
β”‚       β”œβ”€β”€ collections
β”‚       β”‚   └── vader          #  Follow @depresseddarth :)
β”‚       β”‚       β”œβ”€β”€ protwords.txt
β”‚       β”‚       β”œβ”€β”€ schema.xml
β”‚       β”‚       β”œβ”€β”€ solrconfig.xml
β”‚       β”‚       └── stopwords.txt
β”‚       β”œβ”€β”€
β”‚       β”œβ”€β”€
β”‚       └── solr.xml
β”œβ”€β”€ services
β”‚   └── solr
β”‚       β”œβ”€β”€ BUILD
β”‚       └──
β”œβ”€β”€ tools
β”‚   └── BUILD.extract

We have all the rules ready to kick of the bazel build process.

bazel build //services/solr:start_solr
INFO: Found 1 target...
Target //services/solr:start_solr up-to-date:
INFO: Elapsed time: 17.465s, Critical Path: 0.82s

Bazel has successfully completed the build process and the final package files are available inside bazel-bin/ with in the workspace. Below is the final layout of the bazel-bin directory.

 └── services
    └── solr
        β”œβ”€β”€ start_solr
        └── start_solr.runfiles   # bazel runfiles folder where all data depeare present
            β”œβ”€β”€ __main__
            β”‚   β”œβ”€β”€ configs       # `config` folder which was added as a data dep
            β”‚   β”‚   └── solr
            β”‚   β”‚       └── collections
            β”‚   β”‚           └── vader
            β”‚   β”œβ”€β”€ external
            β”‚   β”‚   └── solr_ver630   # embedded solr source bianry. The wrapper script will be using the solr binary from this directory
            β”‚   └── services
            β”‚       └── solr
            |           β”œβ”€β”€ start_solr
            |           └──
            └── solr_ver630

Spin up Solr Cloud Service

First, let’s spin up our ZK node

$ bin/start-zk-server

Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
    2017-02-16 17:34:25,126 [myid:] - INFO  [main:QuorumPeerConfig@103] - Reading configuration from: /usr/local/zookeeper/bin/../conf/zoo.cfg
    2017-02-16 17:34:25,130 [myid:] - INFO  [main:DatadirCleanupManager@78] - autopurge.snapRetainCount set to 20
    2017-02-16 17:34:25,130 [myid:] - INFO  [main:DatadirCleanupManager@79] - autopurge.purgeInterval set to 1
    2017-02-16 17:34:25,131 [myid:] - WARN  [main:QuorumPeerMain@113] - Either no config or no quorum defined in config, running  in standalone mode
    2017-02-16 17:34:25,132 [myid:] - INFO  [PurgeTask:DatadirCleanupManager$PurgeTask@138] - Purge task started.
    2017-02-16 17:34:25,146 [myid:] - INFO  [main:QuorumPeerConfig@103] - Reading configuration from: /usr/local/zookeeper/bin/../conf/zoo.cfg
    2017-02-16 17:34:25,146 [myid:] - INFO  [main:ZooKeeperServerMain@95] - Starting server
    2017-02-16 17:34:25,151 [myid:] - INFO  [PurgeTask:DatadirCleanupManager$PurgeTask@144] - Purge task completed.
    2017-02-16 17:34:25,155 [myid:] - INFO  [main:Environment@100] - Server environment:zookeeper.version=3.4.6--1, built on 05/31/2016 17:14 GMT
    2017-02-16 17:34:25,155 [myid:] - INFO  [main:Environment@100] - Server
    2017-02-16 17:34:25,155 [myid:] - INFO  [main:Environment@100] - Server environment:java.version=1.8.0_112
    2017-02-16 17:34:25,155 [myid:] - INFO  [main:Environment@100] - Server environment:java.vendor=Oracle Corporation
    2017-02-16 17:34:25,155 [myid:] - INFO  [main:Environment@100] - Server environment:java.home=/usr/lib/jvm/jdk-8-oracle-x64/jre
    2017-02-16 17:34:25,155 [myid:] - INFO  [main:Environment@100] - Server environment:java.class.path=/usr/local/zookeeper/bin/../build/classes:/usr/local/zookeeper/bin/../build/lib/*.jar:/usr/local/zookeeper/bin/../lib/slf4j-log4j12-1.6.1.jar:/usr/local/zookeeper/bin/../lib/slf4j-api-1.6.1.jar:/usr/local/zookeeper/bin/../lib/netty-3.7.0.Final.jar:/usr/local/zookeeper/bin/../lib/log4j-1.2.16.jar:/usr/local/zookeeper/bin/../lib/jline-0.9.94.jar:/usr/local/zookeeper/bin/../zookeeper-3.4.6.jar:/usr/local/zookeeper/bin/../src/java/lib/*.jar:/usr/local/zookeeper/bin/../conf:
    2017-02-16 17:34:25,155 [myid:] - INFO  [main:Environment@100] - Server environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
    2017-02-16 17:34:25,156 [myid:] - INFO  [main:Environment@100] - Server
    2017-02-16 17:34:25,157 [myid:] - INFO  [main:Environment@100] - Server environment:java.compiler=<NA>
    2017-02-16 17:34:25,157 [myid:] - INFO  [main:Environment@100] - Server
    2017-02-16 17:34:25,157 [myid:] - INFO  [main:Environment@100] - Server environment:os.arch=amd64
    2017-02-16 17:34:25,157 [myid:] - INFO  [main:Environment@100] - Server environment:os.version=3.16.0-33-generic
    2017-02-16 17:34:25,157 [myid:] - INFO  [main:Environment@100] - Server
    2017-02-16 17:34:25,158 [myid:] - INFO  [main:Environment@100] - Server environment:user.home=/root
    2017-02-16 17:34:25,158 [myid:] - INFO  [main:Environment@100] - Server environment:user.dir=/home/sentinelleader
    2017-02-16 17:34:25,159 [myid:] - INFO  [main:ZooKeeperServer@823] - tickTime set to 2000
    2017-02-16 17:34:25,159 [myid:] - INFO  [main:ZooKeeperServer@832] - minSessionTimeout set to -1
    2017-02-16 17:34:25,159 [myid:] - INFO  [main:ZooKeeperServer@841] - maxSessionTimeout set to -1
    2017-02-16 17:34:25,170 [myid:] - INFO  [main:NIOServerCnxnFactory@94] - binding to port
    2017-02-16 17:34:26,144 [myid:] - INFO  [NIOServerCxn.Factory:] - Accepted socket connection from /
    2017-02-16 17:34:26,205 [myid:] - INFO  [NIOServerCxn.Factory:] - Client attempting to renew session 0x15a4747c2ba0037 at /
    2017-02-16 17:34:26,208 [myid:] - INFO  [NIOServerCxn.Factory:] - Established session 0x15a4747c2ba0037 with negotiated timeout 15000 for client /127.

Once ZK is up, let’s try using our newly built wrapper binary to start the solr service. We also need to create a zk node. This node will be used in the ZK_HOST params.

$ solr-6.3.0/server/scripts/cloud-scripts/ -zkhost localhost:2181 -cmd makepath /solr

# lets verify if the node is created successfully

$ solr-6.3.0/server/scripts/cloud-scripts/ -zkhost localhost:2181 -cmd get /solr/clusterstate.json
{}             # the output should be a null json

Let’s start our solr node

$ bazel-bin/services/solr/start_solr

Rotating solr logs, keeping a max of 9 generations
2017-02-16 19:41:39.222 INFO  (main) [   ] o.e.j.s.Server jetty-9.3.8.v20160314
2017-02-16 19:41:39.540 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter  ___      _       Welcome to Apache Solrβ„’ version 6.3.0
2017-02-16 19:41:39.544 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter / __| ___| |_ _   Starting in cloud mode on port 9301
2017-02-16 19:41:39.544 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter \__ \/ _ \ | '_|  Install dir: /home/sll/.cache/bazel/_bazel_sll/8c5b18e68ee8d852703298c6bc6863a4/external/solr_ver630
2017-02-16 19:41:39.616 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter |___/\___/_|_|    Start time: 2017-02-16T19:41:39.546Z
2017-02-16 19:41:39.638 INFO  (main) [   ] o.a.s.c.SolrResourceLoader Using system property solr.solr.home: /tmp/solr-home
2017-02-16 19:41:39.698 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter Loading solr.xml from SolrHome (not found in ZooKeeper)
2017-02-16 19:41:39.700 INFO  (main) [   ] o.a.s.c.SolrXmlConfig Loading container configuration from /tmp/solr-home/solr.xml
2017-02-16 19:41:40.040 INFO  (main) [   ] o.a.s.u.UpdateShardHandler Creating UpdateShardHandler HTTP client with params: socketTimeout=600000&connTimeout=60000&retry=true
2017-02-16 19:41:40.045 INFO  (main) [   ] o.a.s.c.ZkContainer Zookeeper client=
2017-02-16 19:41:40.139 INFO  (main) [   ] o.a.s.c.OverseerElectionContext I am going to be the leader localhost:9301_solr
2017-02-16 19:41:40.143 INFO  (main) [   ] o.a.s.c.Overseer Overseer (id=97469495097163793-localhost:9301_solr-n_0000000001) starting
2017-02-16 19:41:40.238 INFO  (main) [   ] o.a.s.c.ZkController Register node as live in ZooKeeper:/live_nodes/localhost:9301_solr
2017-02-16 19:41:40.243 INFO  (zkCallback-5-thread-1-processing-n:localhost:9301_solr) [   ] o.a.s.c.c.ZkStateReader Updated live nodes from ZooKeeper... (0) -> (1)
2017-02-16 19:41:40.321 INFO  (main) [   ] o.a.s.c.CorePropertiesLocator Found 0 core definitions underneath /tmp/solr-home
2017-02-16 19:41:40.380 INFO  (main) [   ] o.e.j.s.ServerConnector Started ServerConnector@5386659f{HTTP/1.1,[http/1.1]}{}
2017-02-16 19:41:40.380 INFO  (main) [   ] o.e.j.s.Server Started @1749m

Woohoo the solr node has started successfully. Let’s manually check the status of running service.

$ bin/solr status

Found 1 Solr nodes:

Solr process 15066 running on port 9301
  "version":"6.3.0 a66a44513ee8191e25b477372094bfa846450316 - shalin - 2016-11-02 19:52:42",
  "uptime":"0 days, 0 hours, 0 minutes, 36 seconds",
  "memory":"17.4 MB (%3.6) of 122.7 MB",

woohoo the service up and running. Now lets create our vader collection.

$ solr-6.3.0/bin/solr create -c vader -d /tmp/solr-home/configsets/vader/conf/ -shards 1 -replicationFactor 1

Connecting to ZooKeeper at ...
Uploading /tmp/solr-home/configsets/vader/conf for config vader to ZooKeeper at

Creating new collection 'vader' using command:


Let’s also verify that our configs are uploaded to ZooKeeper

$ /usr/local/zookeeper/bin/

Connecting to localhost:2181

WatchedEvent state:SyncConnected type:None path:null

[zk: localhost:2181(CONNECTED) 0] ls /solr/
configs             overseer            aliases.json        live_nodes          collections
overseer_elect      security.json       clusterstate.json

[zk: localhost:2181(CONNECTED) 0] ls /solr/configs/vader/

schema.xml       protwords.txt    solrconfig.xml   synonyms.txt     stopwords.txt

We can also verify our collection state from zookeeper

[zk: localhost:2181(CONNECTED) 0] get /solr/collections/vader/state.json

woohoo our solr cloud service is running with the newly created collection. And we now have a fully hermetic package with all the dependencies embedded with in. Bazel has a pretty good caching mechanism, so it will not rebuild the package everytime nor re-download the external dependencies when we run the same build command again and again. We can also use bazel to bundle our packages. Currently bazel can create tar/deb packages and even docker images. Bazel has got lot of interesting features which i’ll explain in detail in my upcoming posts on bazel πŸ™‚