Scaling Rust builds with Bazel

The CI saga

Over the project life, we used various tools to mitigate the cargo's limitations, with mixed success.

The nix days

When we started the Rust implementation in mid-2019, we relied on nix to build all our software and set up the development environment in a cross-platform way (we develop both on macOS and Linux).

As our code base grew, we started to feel nix's limitations. The unit of caching in nix is a derivation. If we wanted to take full advantage of nix's caching capabilities, we would have to "nixify" all our external dependencies and internal Rust packages (one derivation per Rust package). After a long fight with build reproducibility issues, our glorious dev-infra team implemented fine-grained caching using the cargo2nix project.

Unfortunately, most developers in the team were uncomfortable with nix. It became a constant source of confusion and lost developer productivity. Since nix has a steep learning curve, only a few nix wizards could understand and modify the build rules. This nix-alienation bifurcated our build environment: the CI servers built the code with nix-build, and developers built the code by entering the nix-shell and invoking cargo.

The iceberg

The final blow to the nix story came around late-2020, close to the network launch. Our security team chose Ubuntu as the deployment target and insisted that production binaries link against the regularly updated system libraries (libc, libc++, openssl, etc.) the deployment platform provides. This setup is hard to achieve in nix without compromising correctnessWe considered using patchelf, but it's a bad idea in general: libc++ from nix packages can be incompatible with the one installed on the deployment platform..

Furthermore, the infrastructure team got a few new members unfamiliar with nix and decided to switch to a more familiar technology, Docker containers. The team implemented a new build system that runs cargo builds inside a docker container with the versions of dynamic libraries identical to those in the production environment.

The new system grew organically and eventually evolved into a hot mess of a hundred GitLab Yaml configuration files calling shell and python scripts in the correct order. These scripts used the known filesystem locations and environment variables to pass the build artifacts around. Most integration tests ended up as shell scripts expected some inputs that the CI pipeline produces.

The new Docker-based build system lost the granular caching capabilities of nix-build. The infra team attempted to build a custom caching system but eventually abandoned the project. Cache invalidation is a challenging problem indeed.

With the new system, the chasm between the CI and development environments deepened further because the nix-shell didn't go anywhere. The developers continued to use nix-shell for everyday development. It's hard to pinpoint the exact reason. I attribute that to the fact that entering the nix-shell is less invasive than entering a docker container, and nix-shell does not require running in a virtual machine on macOS (Rust compile times are slow). Also, the infra team was so busy rewriting the build system that improving the everyday developer experience was out of reach.

I call this setup an "iceberg": on the surface, a developer needed only nix and cargo to work on the code, but in practice, that was only 10% of the story. Since most tests required a CI environment, developers had to create merge requests to check whether their code worked beyond the basic unit tests. The CI didn't know developers were interested in running a specific test and executed the entire test suite, wasting scarce computing resources and slowing the development cycle.

The tests accumulated over time, the load on the CI system grew, and eventually, the builds became unbearably slow and flaky. It was time for another change.

Enter Bazel

Among about a dozen build systems I worked with, Bazel is the only one that made sense to meIt might also well be that I never learned to do anything without involving protocol buffers.. One of my favorite features of Bazel is how explicit and intuitive it is for everyday use.

Bazel is like a good videogame: it's easy to learn and challenging to master. It's easy to define and wire build targets (that's what most engineers do), but adding new build rules requires some expertise. Every engineer at Google can write correct build files without knowing much about Blaze (Google's internal variant of Bazel). The build files are verbose bordering plain boring, but it's a good thing. They tell the reader precisely what the module's artifacts and dependencies are.

Bazel offers many features, but we mostly cared about the following:

Bazel is extensible enough to cover all our use cases. Bazel gracefully handled everything we threw at it: Linux and macOS binaries, WebAssembly programs, OS images, Docker containers, Motoko programs, TLA+ specifications, etc. The best part is: We can also combine and mix these artifacts in any way we like.
Aggressive caching. The sandboxing feature ensures that build actions do not use undeclared dependencies, making it much safer to cache build artifacts and, most importantly for us, test results.
Remote caching. We use the cache from our CI system to speed up developer builds.
Distributed builds. Bazel can distribute tasks across multiple machines to finish builds even faster.
Visibility control. Bazel allows package authors to mark some packages as internal to prevent other teams from importing the code. Controlling dependency graphs is crucial for fast builds.

Even more importantly, Bazel unifies our development and CI environments. All our tests are Bazel tests now, meaning that every developer can run any test locally. At its heart, our CI job is bazel test --config=ci //....

One nice feature of our Bazel setup is that we can configure versions of our external dependencies in a single file. Ironically, cargo developers implemented support for workspace dependency inheritance a few weeks after we finished the migration.

The CI saga

The nix days

The iceberg

Enter Bazel

Recommend

Asteroid to Whiz Between Earth and Moon on Saturday in Rare Occurrence

天风证券：ChatGPT 推动 AI 芯应用，带动数据快速增长

iQOO Z7x手机今日开售搭载基于安卓13的OriginOS 3系统

Jack Dorsey’s Block ‘facilitates fraud, misleads investors’—Hindenburg report

Mawson Infrastructure Group Reports Full Year 2022 Financial Results; Select Fin...

A Potential Major Discovery: An Aperiodic Monotile

opencv-python 2 图像基本操作 - 一枚码农

1 - Windows 10 - Python 类的常用高级系统函数(方法)通识 - Muxiu

Microsoft Loop and the Future of Collaborative Experiences

金山办公：未来将接入 GPT4，目前尚未与 OpenAI 开展合作

About Joyk