6

Away with unit tests! Idiot-proof your data types with F#

 3 years ago
source link: https://medium.com/dmg-tech-space/away-with-unit-tests-idiot-proof-your-data-types-with-f-27e158fa0217
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Away with unit tests! Idiot-proof your data types with F#

A blog by Ben Gobeil, Tech Lead for Divisions Maintenance Group.

Writing good code is hard. We all need a little help from our compiler sometimes, but that is what makes writing F# code so much fun.

I like to strive for this guiding principle, borrowed from Yaron Minsky: “Make illegal states unrepresentable.”

We have to model our data types to reduce the risk of making them possibly invalid. In this blog post, I’ll offer a few strategies to model well-crafted types, setting you up for success.

Canonical data modeling

Here’s a quick example:

This is a record type used to describe a person, or at least his or her name. But there are two obvious problems.

Problem No. 1: Not everyone has a middle name. (Frankly, not everyone has a first name — lookin’ at you, McLovin — but let’s put that aside for now.) To solve the middle-name problem, we should clearly document in the data type that MiddleName is an optional field.

Problem No. 2: FullName is a field in the record, but it contains duplicated data from each of the other fields. In other words, it is not canonical and introduces possible error states. We can correct this with a function in a module to compute the full name.

There is no inherent problem with this approach, and it will give us the full name without duplicating data. But since F# is on .NET, and records are actually classes, we can package this computation with the record type using members. Let’s convert this module function into a member.

Awesome. We now have the simplest form to represent a person’s name that cannot be invalid and does not have duplicated data.

Modeling our domain

Let’s expand on this idea. Say I want to document what company this person works for and his job. But that only applies if he is employed. What if he is unemployed? Or maybe he runs his own business, in which case I want to know his company’s name but not necessarily his job title.

Here’s our first draft with these changes:

Just like before with MiddleName, we are not clearly documenting that CompanyName and JobTitle are optional. We should make them optional, right?

Here we encounter another problem. If the person is an employee — not running his own company — it is still possible to construct a person with a job title but no company name. How can we make sure that if the person is an employee, the company name is also provided?

Using the compiler instead of manual/automated tests

Let’s talk briefly about the hierarchy of ideal ways to catch bugs. There are a few principals that apply:

No. 1: We want to catch bugs as quickly as possible.

No. 2: Manual testing is generally more costly than automated testing.

No. 3: Automated testing is generally more costly than compiler errors.

No. 4: Compiler errors are generally more costly than static analysis/design-time errors/red squiggly lines (nowadays, compiler errors and static analysis might return the same errors).

In an ideal world, we would prioritize not being able to even express invalid states, hence the Yaron Minsky quote. But for the sake of this exercise, let’s go down the list of possible solutions.

Remember, our goal is to avoid a case where a person has a job title but doesn’t provide a company name. To prevent this error, we can let QA write a test case that states: A person cannot have a job title without a company name.

But this solution comes with many problems:

No. 1: QA has to check manually.

No. 2: The code does not document this requirement.

No. 3: We can ship erroneous code.

OK, so that solution is out. What about writing unit tests? Every TDD/XP developer LOVES writing unit tests. The culture of automated tests is considered a religion in some countries… probably.

Now, although it might sound like I am against writing unit tests, I’m not. I am just not a fan of doing something I don’t really need to do.

Let’s take a step back.

In F#, record types have public constructors unless they are explicitly set to private. This is a valuable tool when creating types that have validation rules.

For instance, we can privatize the record type constructor and expose a public function to create the object.

So far, this changes nothing. But let’s add our jobTitle/CompanyName validation rule:

Great. Now we get validation at the time of construction to avoid an invalid state. But any function that uses a person in its logic can still be puzzled about what can or can’t be represented.

You will still need a unit test.

Designing with discriminated unions

I see two problems with the last design:

Problem No. 1: The type does not describe the domain rules as much as we would want.

Problem No. 2: We now have logic in our constructor that needs to be unit tested.

As we talked about earlier, there are essentially three types of employment we might encounter here:

No. 1: An employee with a company name and a job title.

No. 2: An entrepreneur with a company name but no job title.

No. 3: An unemployed person with no extra details.

With discriminated unions, we can document this at the type level (i.e. no validation logic required). Here it is:

With this data model, we no longer need a private constructor/TryCreate functions since it is not possible to represent an illegal state. We could not even write a unit test to check if we can create an invalid person with these business rules (*cough* maybe with C# interop *cough*). This does not mean the private constructor/TryCreate pattern is inherently bad. It simply means, if we can realistically represent a solid/documented data structure with code, we should probably do so.

Type-driven design (sigh… not another acronym)

Here are two more examples of good and bad type-driven design. First, I want to represent a three-letter word:

This cannot be wrong and does not need unit tests.

This is an example where, in theory, we could avoid unit tests, but maybe we shouldn’t since it is not realistic to do so.

Second, I want to model a string that has fewer than 100 characters. The length can vary but cannot exceed 100 characters.

It is insane to find a type-driven approach to this problem.

This would be quite type-safe, but at what cost? We need to balance the cost of designing strong types with the cost of creating and maintaining unit tests.

Prefer strong types, within reason. We have to factor the pros and cons on a case-by-case basis. If we feel it is insane to have 100 constructors to create our type, then maybe the private constructor/TryCreate pattern will suffice:

Then we only have to test the TryCreate function. All other downstream functions can be sure that this string is fewer than 100 characters.

Summary

Let’s recap what we’ve discussed:

· We can model the domain with options and discriminated unions.

· We can avoid data duplication by using members.

· We can enforce business rules using the TryCreate pattern, but it still requires unit testing.

· We can sometimes get away with not needing unit tests if we model our domain correctly using discriminated unions.

· We should balance type-driven efforts with efforts to create a TryCreate function with according unit tests.

Hope you enjoyed this content!

Ben Gobeil is a Tech Lead for Divisions Maintenance Group. You can find more of his work on YouTube, Twitter, GitHub or his website, bengobeil.com.

To learn more about DMG, including career opportunities for engineers, designers, developers and data analysts, visit divisionsmg.com.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK