Deploying the Serverless Stack

How I Learned To Stop Worrying and Push To Master

11/22/2021

If you've ever worked on a team of developers of any size, your team has undoubtedly reached an agreement among your team about how your code will be branched and integrated.

For most of my career, I've been following one form or another of a GitFlow branching model.

If you're not familiar with GitFlow, it is a branching model where you isolate feature branches and carefully craft them into release branches that make their way back into your master or main branch. If this sounds complicated, hang tight - I come bearing diagrams!

In the past couple of months, my team at Evolution Virtual has switched from GitFlow to Trunk-Based Development. As a result, we have observed huge gains both in our productivity and overall morale.

The idea behind Trunk-Based Development is that every commit goes straight to the main branch. These means that you often have many developers committing to the same branch multiple times a day.

Why would anyone do this?

Isn't it a nightmare to manage releases and bugfixes if the state of your main branch is in a constant state of metamorphosis?

I can already feel the nervous twitching from the GitFlow purists. Allow me to make my case. I believe that if you approach your branching model with an open mind, you may even end up agreeing with me.

How I Learned To Stop Worrying and Pushed To Master

Why GitFlow Wasn't Working For Us

The practice of GitFlow is a set of checks and balances designed to slow you down. This is by design. The theory is that if there are more stopgaps in your workflow, you will catch more mistakes and ensure higher quality code in your main branch.

The GitFlow workflow is essentially:

Branch new features off of the develop branch
Iterate on your feature branch until it is completed
Merge back to develop
A release branch is cut to merge back to the main or master branch. The main branch is the source of truth of what is deployed to production

In practice, slowing down the flow of integration into the main branch can arguably introduce more bugs and lower quality than moving fast and pushing smaller changes more frequently. I'll discuss why that is in a moment.

GitFlow Was Slowing Us Down

As I already said, slowing down the integration flow is partly by design with GitFlow. The issue was that slowing down our integration flow was also slowing down our rate of development.

Our workflow was admittedly tedious, but I don't know that there was much more ceremony than most of the other places I've worked.

For a feature to be considered ready to merge into the main branch, it had to go through:

Code review ( in GitHub )
Pass all unit tests in CI
Build on CircleCI
Get deployed to a user acceptance testing (UAT) environment
Pass manual and automated QA (there were often multiple rounds of QA if small details were overlooked)

Our velocity was taking a huge hit.

Quality Assurance was under-resourced. They had a huge job of checking and re-checking every feature to verify that there were no regressions. After merging a feature into develop, they had to check again to see if there were any new issues that were introduced by bad merges or conflicting feature requirements.

Our team was consistently completing between 6 and 8 story points. It was simple to project out and see we were going to be missing our deadlines if nothing was adjusted.

By changing our flow, we went from 8 points to 16 points per sprint. In the past few weeks, as bugs started pouring in from QA, we have gotten down to 12 points, but so far we have never gone lower.

Merge Of Doom

One of the biggest issues with a sophisticated branching model is that working branches tend to diverge from each other over time.

No matter how much you try to keep developers working on their own corners of the application, there will inevitably be overlap, and where there is overlap, there are conflicts.

The element of time compounds the problem even further. The more time that passes from the point a branch is broken off from develop to the time it is merged back in, the more opportunity there is for other branches to diverge in drastic ways.

A helpful illustration is to imagine you are writing a novel with one of your friends. You write the first chapter and then you ask your friend to write the second chapter while you tackle the third. Without extremely careful planning, your friend may end his chapter in a place that doesn't flow into yours. You may have different ideas of what the plot should be or who the characters really are.

If each chapter is fifty pages, it's almost impossible that the two will work together without heavy rewriting. However, if every chapter is only a few sentences, you'll have more opportunity to course correct as you discover new elements of the story.

Like our novel example, we actually reduce the chance of big conflicts the more frequently we push to our source of truth.

Conflicts are costly when managed well. In the past, we would often pair program through our merge conflicts to make sure we weren't accidentally overwriting something important that another developer had worked on.

It would often mean a couple of hours a week with two or more developers comparing notes on a conflict, trying to make sure our changes didn't get lost.

When managed poorly, conflicts can be even more costly.

Features we thought were completed would sometimes be missing important requirements that had been overwritten. Bugs that were fixed would be broken again. New bugs would be introduced.

The threshold of conflict was amplified by the time that passed between when a branch was cut from develop to the time when it was merged back.

For bigger features, a branch's life could last one or even two weeks. The more time that passed, the greater divergence there would be from the other code.

If a feature was left undone for a month or more, it was almost impossible to resurrect and merge back into the codebase because the shape of the codebase would change so much over time.

The Solution: Delete All the Branches

One day, I was programming something and couldn't get it working. My eight year old son came upstairs. I decided to use him as my rubber duck. After I explained the issue in a way I thought an eight your old could grasp, he looked at me and said "why don't you just delete all your code and start over?"

What I'm proposing here is a process equivalent to deleting all of your code. What I'm proposing is deleting all of your branches. Well, all of your branches but one.

This all came together for me when I was catching up on YouTube and stumbled across Dave Farley's video Continuous Integration vs Feature Branch Workflow. He explained many of the inherent problems that I was seeing in my team's ability to deliver code quickly.

To be fair, I have always done Trunk-Based Development on personal projects and open-source libraries, but never in a team setting. My question was whether Continuous Integration could work in the real world with multiple developers pushing code constantly. I did some investigation and a lot of large companies use the Continuous Integration model. If it worked for them, it should be able to work for us.

With Trunk-Based Development (or Continuous Integration), developers are encouraged to push their code to the main branch frequently. Not just when a feature is finished, but every time there is new meaningful working code.

The trick is to separate code deployments from new features becoming available. We found that features can be toggled on and off with feature flags, so our code releases could become separate from our feature releases. This allowed us to push incomplete features frequently without a fear of releasing anything half-baked.

The Changes We Made

Moving to continuous integration meant changing our processes at many levels.

I considered all of the things that were slowing us down from the point of a developer finishing a task to the code finally getting merged. Our team has made the following adjustments:

Code Reviews After Merge

Before moving away from GitFlow, we reviewed our code before merging it using GitHub's pull requests.

Since we didn't have branches anymore, we also didn't have pull requests.

Our code reviews needed to take place at a different point in the process.

We have transitioned to having a meeting once a week where we walk through the code that was committed the previous week.

Code review is a time to make suggestions on ways we can improve our codebase. We find areas where we're duplicating code and refactor it. We use the time as an opportunity to gain a better understanding of the pieces of the system that we haven't personally touched.

I feel that post-merge code reviews have actually been more productive than the asynchronous pull requests we were doing before. It gives each developer an opportunity to explain their intent and the reason they made each choice.

QA In Staging

On a typical day, we will push code to the main branch between 10 and 20 times. If the code builds and the tests pass, it automatically deploys to a staging environment.

Before switching to Trunk-Based Development, we were building ephemeral UAT environments for each feature. Each feature was independently tested before graduating to the develop branch, merging into staging and getting deployed to production.

We have now eliminated every environment but staging and production.

We are now using Serverless Stack (SST) for all of our serverless infrastructure. Since we're able to run our own infrastructure for local development, our local environment essentially serves as a testing sandbox for our features prior to getting merged back to the main branch.

We do have a manual QA step in staging, but since we are continuously integrating our code, every issue that is found is considered a new bug in our system. We do have more bug tickets than we did before changing our process, but it also allows us to complete features more quickly and iterate on them.

Iterating through bug tickets has helped to allow us to do less context switching. We can finish the task that we're working on and then tackle bugs that come in below it.

When we do start work on a bug or a feature, it is always in the context of the latest state of the main branch.

Manual Release to Production

After our code automatically deploys to staging, it goes into a hold state. At that point, we can click a button and it will release to production once we manually verify that there are no regressions.

We can release to production as often as needed, but so far, it's usually been once or twice a week.

Are We Even Really Agile Now?

Because of the new flow, we barely have formal sprints. We do continue to do daily standups as well as some informal pop-up sprint planning sessions.

We aren't exactly doing a sprint. We are pushing continual value to our users. Our flow is basically Kanban and it honestly works really well with the Continuous Integration flow.

That said, I think what we're doing is very much in line with the spirit of Agile. The first principle of the Agile manifesto is "Our highest priority is to satisfy the customer through early and continuous delivery of valuable software."

Our process is designed to continuously deliver valuable software, so if Agile is about people over processes, I think we may check the box.

Is It Working?

For our team, the answer is definitely yes.

I'm always hesitant to be overly prescriptive about particular solutions or processes, but in our case, we've doubled and tripled our productivity. We have gotten much more responsive to issues and bugs. We barely ever have a merge conflict that is more than a couple of lines of easily resolved code.

That said, GitFlow and other git branching strategies work well for a lot of people. If it works for you, I say keep doing what you're doing. For the rest of us, Continuous Integration has been a breath of fresh air!

How I Learned To Stop Worrying and Push To Master

Deploying the Serverless Stack

How I Learned To Stop Worrying and Push To Master

Why would anyone do this?

Why GitFlow Wasn't Working For Us

GitFlow Was Slowing Us Down

Merge Of Doom

The Solution: Delete All the Branches

The Changes We Made

Code Reviews After Merge

QA In Staging

Manual Release to Production

Are We Even Really Agile Now?

Is It Working?

Recommend

酷滴科技FX Expert完成数百万美元融资，BAI领投

How to setup Usability Testing in Maze app

Polars

融资新闻丨数据平台Nansen完成7500万美元新一轮融资，Accel牵头

The reason leadership development isn’t working

10 Cybersecurity Tips for Recruiters / Recruiting Companies

Amazon still isn't doing enough to stop bait-and-switch reviews

最近一群创业者和VC“互喷”30分钟：因为赚钱问题

【白夜谈】“铃芽户缔”与塞式翻译

Authoring a SIMD enhanced Wasm library with Rust

About Joyk