bazel

Bazel – For Fast, Correct, Reproducible Builds

In my previous blog, i’ve shown bazel in action by building a solr cloud package. In this blog i’m going to explain a bit more about Bazel.

Bazel is the Open Source version of Google’s internal build tool Blaze. Bazel is currently in beta state, but it has been used by a number of companies in production. Bazel has some quite interesting features. Bazel has a good caching mechanism. It caches all input files, all external dependencies etc … Before running the actual build, bazel will first check the existing cache and if the cache is valid. If valid, then bazel will try to check if there are any changes to the input files/ dependencies. If it detect any changes, then bazel will start re-building the package. We can also use bazel to build our test targets and can make bazel to run our unit/integration tests for the built targets. Bazel can also detect cyclic dependencies with in the code. Another important feature is sandboxing. On Linux, Bazel can run build/test inside a sandboxed environment and can detect file leaks or broken dependencies. This is because, during sandbox mode, bazel will mount only the specified input files, data dependencies on to the sandbox environment.

Bazel Build Flow

Let’s see how the bazel build process flow works. First thing that we need is a WORKSPACE file. A bazel workspace is a directory that contains the source files for one or more software projects, as well as a WORKSPACE file and BUILD files that contain the instructions that Bazel uses to build the software. It also contains symbolic links to output directories in the Bazel home directory

Let’s create a simple workspace for testing

$ mkdir bazel-test && cd bazel-test

$ touch WORKSPACE

Now i’m going to build a simple python package. hello.py is a simple python script which imports a hello function from dep.py. So our primary script is hello.py which has a dependency on dep.py

vagrant@trusty-docker:~/bazel-test$ cat hello.py
from dep import hello
print hello("Building a simple python package with Bazel")

vagrant@trusty-docker:~/bazel-test$ cat dep.py
def hello(msg):
    return msg

The Bazel’s build command basically looks for a BUILD file on the target location. This file should contain the necessary bazel build rules. Bazel’s Python Rule Documentation explains the list of rules that are supported. Applying this to our test scripts, we are going to build a py_binary for our hello.py and this binary has a py_library dependency towards dep.py. So our final BUILD file will be,

py_library(
  name = 'dep',
  srcs = ['dep.py'],
)

py_binary(
  name = 'hello',
  srcs = ['hello.py'],
  deps = [':dep'],    # our dependency towards `dep.py`
)

So we have the BUILD file now, let’s kick off a build

vagrant@trusty-docker:~/bazel-test$ bazel build hello
............
INFO: Found 1 target...
Target //:hello up-to-date:
  bazel-bin/hello
INFO: Elapsed time: 4.564s, Critical Path: 0.03s

woohoo, so bazel has build the package for us. Now if we check our workspace, we will see a bunch of bazel-* symlinks. These directories points to the bazel home directory where our final build output lies.

vagrant@trusty-docker:~/bazel-test$ tree -d
.
├── bazel-bazel-test -> /home/vagrant/.cache/bazel/_bazel_vagrant/9dedbe0729180ec68a026adfb67cba5d/execroot/__main__
├── bazel-bin -> /home/vagrant/.cache/bazel/_bazel_vagrant/9dedbe0729180ec68a026adfb67cba5d/execroot/bazel-test/bazel-out/local-fastbuild/bin
├── bazel-genfiles -> /home/vagrant/.cache/bazel/_bazel_vagrant/9dedbe0729180ec68a026adfb67cba5d/execroot/bazel-test/bazel-out/local-fastbuild/genfiles
├── bazel-out -> /home/vagrant/.cache/bazel/_bazel_vagrant/9dedbe0729180ec68a026adfb67cba5d/execroot/__main__/bazel-out
└── bazel-testlogs -> /home/vagrant/.cache/bazel/_bazel_vagrant/9dedbe0729180ec68a026adfb67cba5d/execroot/bazel-test/bazel-out/local-fastbuild/testlogs

So our new python binary is available in bazel-bin/hello. Also, bazel creates something called runfiles which exists next to the binary. Bazel actually copies our dependencies (input files and data dependencies) onto this runfiles folder.

-r-xr-xr-x 1 vagrant vagrant 4364 Feb 19 20:13 bazel-bin/hello
vagrant@trusty-docker:~/bazel-test$ ls -l bazel-bin/hello
hello                    hello.runfiles/          hello.runfiles_manifest
vagrant@trusty-docker:~/bazel-test$ ls -l bazel-bin/hello.runfiles/__main__/
total 4
lrwxrwxrwx 1 vagrant vagrant  31 Feb 19 20:13 dep.py -> /home/vagrant/bazel-test/dep.py
lrwxrwxrwx 1 vagrant vagrant 130 Feb 19 20:13 hello -> /home/vagrant/.cache/bazel/_bazel_vagrant/9dedbe0729180ec68a026adfb67cba5d/execroot/bazel-test/bazel-out/local-fastbuild/bin/hello
lrwxrwxrwx 1 vagrant vagrant  33 Feb 19 20:13 hello.py -> /home/vagrant/bazel-test/hello.py

If we go through our python binary bazel-bin/hello, it’s nothing but a wrapper script which basically identifies our runfiles directory path, add this runfiles path to the PYTHONPATH env variable and then invokes our hello.py file. In the beginning, i’ve mentioned that bazel has a good caching mechanism. Let’s re-run the build command and see the output, especially the time taken to complete the build process.

vagrant@trusty-docker:~/bazel-test$ bazel build hello
INFO: Found 1 target...
Target //:hello up-to-date:
  bazel-bin/hello
INFO: Elapsed time: 0.247s, Critical Path: 0.00s

Let’s compare the build time for both the build process. The first build process took ~ 4.5 sec. But the second one is ~ 0.2 sec. This is because, bazel didnt run real build process during the second run. It actually verified the input files against its cache and found no change.

Now let’s add a simple unit test and see how bazel can run the same.

vagrant@trusty-docker:~/bazel-test$ cat hello_test.py
import unittest

from dep import hello

class TestHello(unittest.TestCase):

  def test_hello(self):
    self.assertEquals(hello("test message"), "test message")

if __name__ == '__main__':
  unittest.main()

Now let’s add a py_test rule to our BUILD file so that bazel can use it with bazel test.

py_test(
    name = "hello_test",
    srcs = ["hello_test.py"],
    deps = [
        ':dep',
    ],
)

We have the py_test rule, now let’s run the bazel test command and verify.

vagrant@trusty-docker:~/bazel-test$ bazel test hello_test
INFO: Found 1 test target...
Target //:hello_test up-to-date:
  bazel-bin/hello_test
INFO: Elapsed time: 2.255s, Critical Path: 0.06s
//:hello_test                                                            PASSED in 0.0s

Executed 1 out of 1 test: 1 test passes.

woohoo the test seems to run fine. Now let’s manually break the test and see if bazel is picking the failure also.

vagrant@trusty-docker:~/bazel-test$ bazel test hello_test
INFO: Found 1 test target...
FAIL: //:hello_test (see /home/vagrant/.cache/bazel/_bazel_vagrant/9dedbe0729180ec68a026adfb67cba5d/execroot/bazel-test/bazel-out/local-fastbuild/testlogs/hello_test/test.log).
Target //:hello_test up-to-date:
  bazel-bin/hello_test
INFO: Elapsed time: 0.199s, Critical Path: 0.05s
//:hello_test                                                            FAILED in 1 out of 2 in 0.0s
  /home/vagrant/.cache/bazel/_bazel_vagrant/9dedbe0729180ec68a026adfb67cba5d/execroot/bazel-test/bazel-out/local-fastbuild/testlogs/hello_test/test.log

Executed 1 out of 1 test: 1 fails locally.

Bingo, bazel is detecting the test failure too. During our build process we saw that bazel caches the build and doesnt re-run the build process unless it desont detect any changes to the dependencies. Now lets see if what bazel does with tests too.

vagrant@trusty-docker:~/bazel-test$ bazel test hello_test
INFO: Found 1 test target...
Target //:hello_test up-to-date:
  bazel-bin/hello_test
INFO: Elapsed time: 0.169s, Critical Path: 0.04s
//:hello_test                                                            PASSED in 0.0s

Executed 1 out of 1 test: 1 test passes.
There were tests whose specified size is too big. Use the --test_verbose_timeout_warnings command line option to see which ones these are.
vagrant@trusty-docker:~/bazel-test$ bazel test hello_test
INFO: Found 1 test target...
Target //:hello_test up-to-date:
  bazel-bin/hello_test
INFO: Elapsed time: 0.087s, Critical Path: 0.00s
//:hello_test                                                   (cached) PASSED in 0.0s

Executed 0 out of 1 test: 1 test passes.
There were tests whose specified size is too big. Use the --test_verbose_timeout_warnings command line option to see which ones these are.

Bingo, we can see the (cached) line in the output of the second tests run. So like the build process, bazel does caches the tests too.

Customizing Bazel Rules

py_binary, py_library etc… are the default bazel python rules which comes with bazel. Unlike any other product, we might endup in cases where we need to have custom rules to solve our specific needs. And the good news is, Bazel comes with an extension called skylark. With skylark, we can create custom build rules matching our requirements. Skylark syntax are pretty similar to python. I’ll be writing a more detailed blog on skyalrk soon 🙂

Conclusion

Though bazel is still in beta, it seems to be a really interesting tool for building hermetic packages. Bazel does has the ability to detect cylic dependencies and dependency leaks which is really an important thing. The caching ability of bazel really helps us to build faster packages compared to other traditional build tools.

Advertisements
Standard

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s