Testing anti-pattern: The soviet police-station

October 4, 2022

1,591 Words

There are many good reasons to write automated tests:

They can help drive, or provide feedback on, the implementation.
They act as living documentation / proof of what the program does.
They prevent us from putting bugs in front of users.

But, to me, the most valuable purpose of automated tests are the confidence that they inspire.
When I have a suite of good automated test, I’m confident it will tell me about any bugs I introduce before my users do.
It gives me the confidence to change my code without worrying about breaking it.
In other words – the tests help confident refactoring (defined as the changing of the code’s internals, to make it more readable / extensible etc., without changing its behaviour).

That’s why I was quite surprised when my colleague, Nitish, pointed out how the tests I was writing were hindering, not helping, refactoring.
Here’s a short example:

As part of our e-commerce application, we have a shopping cart, where we place items. We can then calculate the total amount to be paid for all the items in the cart:

class Cart

def total

cost = @items.map(&:price).sum

shipping = Shipment.new(@items).cost

return cost + shipping

end

fairly straightforward: the total to pay is the sum of all items’ prices plus shipping costs.
The calculation of shipment method and cost is delegated to the Shipment class.

class Shipment

def initialize(_items)

end

def cost

15 # TODO: some complex logic that takes into account number, and size of, items

end

Now, as is inevitable, some evil “business” person decides to ruin our beautiful code.
They tell us that premium users will get a discount if they purchase more than 5 items.

OK, so technically, this means that the code to handle Shipment now has to know about the user that owns the cart. Let’s make that change.
Staying true to Kent Beck’s “make the change easy, then make the easy change”, and working in small, safe steps, we make a small change:

class Cart

def total

cost = @items.map(&:price).sum

shipping = Shipment.new(@items, @user).cost # provide user object to shipment

return cost + shipping

end

class Shipment

def initialize(_items, _user)

end

def cost

15

end

This is a textbook refactoring: the behaviour of the code hasn’t changed. It’s not even reading the new user parameter yet. It returns the same result as it did before.

However, a test now fails:

#<Shipment (class)> received :new with unexpected arguments

Let’s have a look at that failing test:

RSpec.describe Cart do

describe '#total' do

before do

# `Shipment` is a complex class;

# we don't want the tests for `Cart` to fail whenever `Shipment` changes

# It's enough to just validate that the correct interaction happens

fake_shipment = instance_double(Shipment, cost: shipment_cost)

allow(Shipment).to receive(:new).with(items).and_return(fake_shipment)

end

let(:items) { [Item.new(price: 5), Item.new(price: 10)] }

let(:shipment_cost) { 20 }

it 'updates the total to be the sum of all prices plus shipping costs' do

expect(Cart.new(items).total).to eq(5+10+20)

end

Can you spot the problem here?

The fake Shipment object that is set up in the test doesn’t yet account for the new user argument.
No problem, just add some more mocking and we’ll be off on our merry way:

allow(Shipment).to receive(:new).with(items, user).and_return(fake_shipment)

That’s what I’ve done for years. Often, with much more complex class interactions, requiring many more changes to tests. Until Nitish pointed out that this is a waste of time.

My tests were essentially a copy-paste of my production code.
I only sprinkled some doubles, allows, and expects on top.
My tests were not going to find bugs, because all interactions were faked and will always return a constant value.
And, as we just saw, they were getting in the way of, rather than helping with, refactoring.
He called it the “Soviet police-station” style of testing.
The process to work with these tests is like so:

Refactor production code.
Observe some tests go red.
Take those tests to the back room.
Beat the sh*t out of them to make them look like the production code.
Tests finally confess to being green.

In other words – when a test fails, the “solution” is to copy over the implementation into the test, rather than fix an actual bug.

what purpose do those tests even serve?

They may have helped us drive the implementation at one point.
But they’re documenting the implementation, rather than what the application actually does.
They will certainly never find any bugs, because they’re just faking everything.

They will either stay green forever, or, if they do go red – only make us beat them up again rather than fix anything in our application.

Test behaviour; not functions

What’s the alternative to the soviet police-station type of useless testing?
We can verify the desired behaviour of the system, rather than individual functions and methods.
For example, instead of the test

Cart 
   #total 
      is the sum of all prices plus shipping costs

we can have something like

Shopping 
   inserting an item into the cart 
      updates the total to be the sum of all prices plus shipping costs

What’s the difference?

We’re not interested in testing a specific method (total) on a specific class (Cart). This is an internal detail of our implementation. It’s not what the users of our software care about.
What they do care about is, when they put an item into the cart, then they can see an updated total amount.
So long as this holds true, then the implementation details are irrelevant.
Our test should remain green, regardless of the implementation we choose.

(There’s much more to be said about this, and Ian Cooper said it much better in this life-changing conference talk.)

How does this work in practice?

What would be a good implementation of a test that verifies the application’s behaviour?
Well, that depends on our architecture.
If we use something like ports and adapters / hexagonal architecture, then our application’s control flow would look something like:

user request (e.g. click on UI, API request) --> adapter (e.g. web controller) --> port (e.g. domain service) --> core application code

In this case, a natural place to test the logic of our application will be the port.
The port exposes an API to accomplish a certain task (such as “add item”, “remove item” or “get cart total”). This is exactly what the user is trying to do.
It’s also decoupled from any fast-changing presentation / transport concerns.
Testing at the port level allows us to test the entire flow of a user request, which is just what we want:
We want to test the behaviour of the application, as seen by the user.

What if I’m using a different architecture?

then you’re wrong, and you should re-architect your application.
Just (mostly) kidding!

For other architectures, we’ll need to figure out where our “business logic” begins.
This is the point in the code that we want to verify.
It’s quite possible that this beginning is intermingled with presentation concerns. (for example: controllers in Rails’s implementation of MVC).

In this case, one option is to bite the bullet and test everything – the business logic, intermingled with other logic.
This is my first technique when using Rails – I use the built in methods for testing Rails controllers.
And it works reasonably well. It can break down in a couple of instances:

When the ‘non-business’ (e.g. presentation) logic keeps changing.
For example – the calculation doesn’t change, but the HTML generated by the controller does.
In this case, we’re forced back to our old role as soviet cops, beating up the tests until they pass.
Performance issues.
Operations such as serializing JSON or rendering HTML may be time-consuming. If these are included as part of the executed code in tests, then the tests may become slow.

To solve those issues, it’s possible introduce a “port” (for example a “domain service” or “action” class). This leaves behind the presentation (or whatever) logic, and encapsulates all ‘business’-y logic in a more testable class.

P.S. we can still write isolated unit tests

The fact that we prefer behavioural tests doesn’t mean that we can’t also test in isolation, where it makes sense:

complex pieces of logic may be easier and clearer to test separately. (think about the logic to calculate shipment costs, above. it could depend on a huge amount of variables, such as number of items, their size, who the user is, delivery address…)
If we want to use mocks and stubs to drive our design (aka the “London” school of TDD).
That’s a useful technique for discovering the implementation, and how the class design and interactions may look.
Personally, I’d consider deleting these tests afterwards. They’ve already served their purpose of guiding our design. From now on, as we’ve seen, they will only serve to hinder changes to that design. They’re certainly not going to find any bugs.

Closing words

Tests should be an enabler for well-maintained code.
Tests should give us confidence that, if they are passing, then the application works as intended.
Additionally, they shouldn’t get in the way of changing the code.

Tests that verify implementation hinder us from changing our code, as well as being less confidence-inspiring.
Tests that verify behaviour enable us to change and evolve our code, as the application’s functionality, and our understanding of the domain, evolve.

Testing anti-pattern: The soviet police-station

Testing anti-pattern: The soviet police-station

Test behaviour; not functions

How does this work in practice?

What if I’m using a different architecture?

P.S. we can still write isolated unit tests

Closing words

Recommend

Google Pixel 4 gets its final guaranteed update, just in time for the Pixel 7

Sony PlayStation stand-alone games will wait at least a year to port to PC

抖音直播变现的五种方法

WiseTech Global donates $2.5 million to kids tech learning platform Grok Academy...

Amazon sale: today's best deals including the Fire TV Stick and Sony OLED TVs

Spookyfest is back: Trick your friends, treat yourself with Minecraft adventures

How to make a For All Mankind duct tape space suit costume

批量检测端口连通性

链游创企Horizon融资4000万美元，育碧与Take-Two Interactive参投

Juice.ai

About Joyk