Battle tested strategies for speeding up CI builds

5 strategies to speed up CI execution times

Have you ever been in a situation where you are required to reduce the CI execution times for an application to half or one third of the original time? Or are the builds are simply taking forever to finish and you need to improve the run time quickly?

We faced similar challenges at VTS because we have a monolith application where the CI workflows needed ~110 jobs per run. After a lot of research, trial and error, we were able to collate the below 5 strategies to speed up CI workflows. We wanted to share these out to help shorten the feedback loop & increase developer productivity for others in a similar situation.

Motivation

At VTS, we wanted to leverage Github Actions but first, we needed to get Github Actions execution times at par with our current CI provider as a prerequisite.

We implemented these battle-tested strategies in order to speed up builds by 70% on Github Actions for our monolith web application and as a result, the CI runtimes were not just at par but 15% faster than our current CI provider!

What started as a long & slow ordealof 30 minutes was down to 9 minutes

Note: These strategies are CI provider agnostic & should work everywhere

Caching

The obvious one! Depending on the application stack there are varied sets of dependencies which can be cached to speed up the overall CI runtime. For reference, we have a react frontend and the backend is powered by rails.

Here are some of the folders we aggressively cached in Github Actions and the savings that came with it —

Ruby Dependencies

The path where bundler stores all the application gems

Path: usr/local/bundle

Savings: 3 minutes

Node Dependencies

The path where npm/yarn stores all project dependencies

Path: node_modules

Savings: 2 minutes

Webpack cache

The path where webpack stores build related cache by default

Path: node_modules/.cache

Savings: 3 minutes for webpack build jobs

It’s important to cache these initial jobs as any time savings here will have a direct impact to the overall runtime. This is because most downstream jobs are dependent on the completion of these initial jobs.

Fallback caching

Fallback or partial caching is a feature where the last found cache entry can be restored if the primary cache key is not found only if restore-keys are provided.

Imagine that there’s a change which adds a new npm package. In this scenario it’ll use the fallback cache due to a cache miss because the primary cache keys are based on the hash of the lock file. Yarn install will need to run again due to cache miss but instead of rebuilding all the packages it’ll only install the new dependency and update the cache after — pretty slick isn’t it?

Furthermore, it allows for more parallelization, as now jobs like linter or tests can run at the start without needing to wait for dependency install jobs (e.g. yarn or bundle install) to complete.

Use fallback caching only if you cache recommended global cache folders provided by the package manager as those are guaranteed to remain pure. Here’s a list of examples using fallback caching.

Don’t use this feature if the workflow is caching entire dependency folders i.e. node_modules as it can lead to cache corruption/inconsistency over time.

Parallelization

Test Parallelization

Parallelizing test suites on CI is crucial to speed up runtimes. We’ve got just the recommendation to implement this — Knapsackpro.

Knapsackpro setup is very minimal and one can parallelize tests on CI in no time as it does most of the heavy lifting while giving room to customize the configuration as required. It’s a paid tool but it’s worth it as it supports multiple frameworks.

We use the tool for parallelizing rspec, cypress (different application) & jest tests and it works really well. We were able to bring down execution times by 6–8 minutes by optimally parallelizing the rspec and jest tests across CI nodes.

A no-brainer buy vs build decision because building a test parallelization framework isn’t an easy feat

Tests parallelization benefits can be realized if tests are atomic. We recommend investing time in making the test suite atomic & stateless if not already but it’s out of scope for this article.

Parallelization

Apart from tests, look for any other longer running jobs and see if they can be broken down to parallelize further by leveraging features like caching to speed up the builds.

For example, we were running the ruby dependency and database setup jobs sequentially but we were able to parallelize both via caching to save a couple more minutes.

Deep diving into every job & demystifying dependencies saved us another 15–20% in overall execution times

Building docker images for CI

Any job that requires a lot of setup in terms of installing system dependencies is a great candidate for having its own docker image.

Create a docker image to reduce the number of steps needed to run a job thereby speeding up the execution time.

We created a base image for ruby which had custom utilities installed and another image for running E2E tests which required chromedriver and a bunch of other utilities.

Gains with this approach won’t be huge but it’ll make a difference and the workflow code will be cleaner as a bonus!

Smart execution

We were facing a bottleneck at VTS where we had a huge suite of jobs for running linters and tests for every small change we made.

We decided to brainstormif there are jobs which should run only on changed files and not the entire repository, and whether we needed to run every job for each pull request.

Answering the below questions will help to identify jobs that are redundant or those which should only run on changed files —

Does a linter need to check all the files on every pull request?
Do we need to run all tests every time there’s a small change?
Are there any redundant steps in the build?
Are there health checks which wait for a service container to be spun up taking longer than usual?
Are there any slow tasks which need to run only for 5% of the code changes or don’t add much value?

Some linters and test frameworks out there have the ability to run checks only for changed files. We utilized this feature forJest to reduce runtime by ~60% & for ESlint to reduce execution times by ~80% for these specific steps.

Once there’s a list of jobs which can run on changed files or removed, make the changes and see the savings for yourself!

Results

Now for the results, here’s the before and after view for the pull request workflow with execution times for our monolith in Github Actions —

PR workflow beforePR workflow after

Note: Both workflow snapshots are exactly the same functionally but the “after” snapshot look a bit different because —

We consolidated multiple linter/test jobs into one job with multiple steps
We reduced the required CI nodes for our parallel test matrices by ~40%

The above was to save on CI resources, but that’s a topic for another day ;)

Bonus

One other strategy to speed up builds is to implement docker layer caching.

At VTS we’ll soon be implementing this to speed up our deployment pipelines and also to boost container initialization times (which affects every single job) in Github Actions.

That’s it! Thank you for reading. 👏 or share the article if it was helpful.

Dev works as a Senior SRE on the platform infrastructure team at VTS. He is passionate about building developer tooling, reliable systems and driving continuous improvements.

Battle tested strategies for speeding up CI builds

Battle tested strategies for speeding up CI builds

Motivation

Caching

Ruby Dependencies

Node Dependencies

Webpack cache

It’s important to cache these initial jobs as any time savings here will have a direct impact to the overall runtime. This is because most downstream jobs are dependent on the completion of these initial jobs.

Fallback caching

Parallelization

Test Parallelization

Parallelization

Building docker images for CI

Smart execution

We decided to brainstormif there are jobs which should run only on changed files and not the entire repository, and whether we needed to run every job for each pull request.

Results

Bonus

Recommend

The Present And Future Of the App Business

微信内测分类清理缓存，从此告别空间不够的烦恼

Google removing support for some video and image sitemap extension tags

新冠药一季度销售不佳，辉瑞CEO建议复阳患者追加疗程

一个月吸粉5000万，普通人能从刘畊宏身上学到什么？

魔炼者 1505 108键有线机械键盘圆键帽黑色国产青轴混光 69元包邮（需用券）-聚超...

慢直播行为艺术展，让视频号走向公域

Daily Coping 6 May 2022 | Voice of the DBA

风险投资机构GD1推出500万美元加密风险投资基金

福建白湖亭万达亮相首个元宇宙智慧商圈

About Joyk