Edge Case Poisoning

TLA+ Workshop Last Call

The TLA+ workshop is next week! Oct 20-22, register here, use the code COMPUTRONSTUFF for $500 off. Learn how a few hours of designing can save weeks of debugging!

I’m looking at my favorite cookbook, the CIA book, and thinking about how I would represent the recipes as data. I open to a random page and got anise sticks:

ANISE STICKS

Heavy cream     160g
Glucose Syrup    50g
Milk Chocolate  570g
Pernod           40g

That’s not too bad. It’s just a list of ingredients, where ingredient is a food and a mass.1

Recipe: list of Ingredient
Ingredient = (Food, Mass)

That should cover everything, right? Until I open up another random page and get

CITRUS CONFIT

Lemon skins 25 skins
Sugar       1500g
...

So not every ingredient is going to have a mass. Some are going to be quantities. The data model should instead be

Recipe = list of Ingredient
Ingredient = (Food, Measure)
Measure = OneOf(Mass, Quantity)

Little more complicated but still pretty simple. Let’s look at another random recipe:

CARAMEL CREAMS

Foil cups   150 cups
Sugar       570g

“Foil cups” are an ingredient but not a food. They’re inedible. We need to represent them in the recipe, but I want to make it clear it’s not a food.

Recipe = list of Ingredient
Ingredient = (Item, Measure)
Measure = OneOf(Mass, Quantity)
Item = OneOf(Inedible, Food)

Same deal: find a random recipe and see how it breaks our model.

HARD CANDY

Sugar 
Water
Cream of Tartar (optional)
...

Optional? Looks like we can’t just return a single list of ingredients anymore. We should return a dictionary with a list of required ingredients and a list of optional ones.

Recipe = 
  required: list of Ingredient
  optional: list of Ingredient

Ingredient = (Item, Measure)
Measure = OneOf(Mass, Quantity)
Item = OneOf(Inedible, Food)

And again:

NOUGAT MONTELIMAR

Sugar 
...
Dried Pear or Apricot
...

Two distinct ingredients, at least one of which must be present. We can’t make them both optional because then a valid recipe could include neither. And in this case, while the two options have the same measure, it’s possible for them to be different measures. We need to replace our ingredients with a logical proposition:

Recipe = 
  required: list of Ingredient
  optional: list of Ingredient

Ingredient = 
  OneOf(
    (Item, Measure),
    Or(Ingredient, Ingredient)
  )

Measure = OneOf(Mass, Quantity)
Item = OneOf(Inedible, Food)

While it doesn’t appear in this book, some recipes also have an AND conjunction. You see this a lot in baking, where you need (milk AND vinegar) OR buttermilk. But we won’t include it here. Anything else?

HOT CHOCOLATES

1. CINNAMON MARSHMALLOW
Granulated gelatin
...

2. GANACHE
Heavy cream
...

Ah yes, subrecipes. Some recipes involve making two separate recipes and then combining them. For simplicity, we can say that all recipes have subrecipes, some just have a single subrecipe.

Recipe = list of Subrecipe

Subrecipe = 
  required: list of Ingredient
  optional: list of Ingredient

Ingredient = 
  OneOf(
    (Item, Measure),
    Or(Ingredient, Ingredient)
  )

Measure = OneOf(Mass, Quantity)
Item = OneOf(Inedible, Food)

One last thing we’ll look at:

ORANGE PINWHEELS

Fondant (page 290)
Dark chocolate
...

Some recipes include other recipes as ingredients. This doesn’t necessarily require a change to the data model. We can just represent the ingredient recipe as an edible ingredient. Alternatively, we can recursively expand the recipe and expand out all of the ingredients that are also recipes.

Item = OneOf(Inedible, Food, Recipe)

So we turn to the recipe for fondant:

FONDANT

Sugar             1000g
Glucose Syrup      200g
Water              200g
Fondant (optional) 100g

Wait, what?? Yes, fondant can include fondant as an ingredient.2 Trying to expand fondant will give us infinite recursion and crash our program. We need to include cycle detection to handle the case of fondant. If we want to include recipe expansion as a feature of our model, we need to make the algorithm more complicated to handle the single edge case of fondant.

Let’s compare our original data model to the one we ended with:

ORIGINAL

Recipe = list of Ingredient
Ingredient = (Food, mass)

FINAL

Recipe = list of Subrecipe

Subrecipe = 
  required: list of Ingredient
  optional: list of Ingredient

Ingredient = 
  OneOf(
    (Item, Measure),
    Or(Ingredient, Ingredient)
  )

Item = OneOf(Inedible, Food, Recipe)
Measure = OneOf(Mass, Quantity)

Each of those extra bits of complexity in the final model exists to handle known complexity in the recipes. But how common are the special cases? I lied when I said I flipped randomly. If I really opened to a random recipe, I’d be unlikely to see any of these complexities. They’re all edge cases. 90% of the recipes in the book could be represented just fine in my original data model. Using them is made harder by the existence of the edge cases. How would I find out if a recipe has ingredient X?

original model: Something like Recipe.ingredients.any?{|i| i.food = X}.
final model: Something like Recipe.subrecipe.any?{|s| (s.required + s.optional).any?{|i| if i.one_of then TODO «already lost interest, I’ll make the intern do it»

Now you can argue that the original model is wrong. After all, it doesn’t handle any of the edge cases! There are two “unfortunates” with this argument.3 First, it’s not necessarily true that the client will encounter any of these edge cases. Consider this model as part of a recipe API, where multiple groups are calling it for their own purposes. Alice needs recipes with inedibles, Barry needs recipes with optionals, and everybody else needs neither. Alice is “punishing” everybody else with her edge case. They have a more complicated API because of her. In turn, she’s “punished” by Barry’s edge case. While she needs an API that handles inedibles, she doesn’t need one that handles optionals, so her model is overcomplicated too.

Second, while the original model isn’t totally correct, it’s philosophically correct. It represents the most common case in a much more understandable way. Someone looking at the code base for the first time will more easily be able to understand what’s going on. If they then see an edge case, they can mentally model it as “the happy path except…”, a perturbation of the base model. If you instead present them with the final model, they’ll struggle much more to see the core idea in the forest of edge case handling.

I call this edge case poisoning. It’s almost impossible to avoid because anything dealing with the real world is going to have tons and tons of edge cases. You think handling time zones is hard? Time zones are only notorious because they are a real-world domain that we all have to deal with. A software engineer in Turkey who works in shipping probably doesn’t have to deal with the intricacies of confectionery recipes. But they will have to deal with time zones. Doesn’t matter if you’re dealing with time zones or recipes or logistics or art collectors, you will have edge cases, meaning you will be forced to take on the essential complexity.

Addressing edge case poisoning

¯\_(ツ)_/¯

I feel like the usual approach is “raise the level of abstraction”, but that 1) makes the base case even less clear than edge case poisoning does, and 2) gets messy when you discover new edge cases after you raise the level of abstraction. I didn’t talk about volume servings, or “as needed” measurements, or duplicate ingredients, or required ingredients with caveats, or the actual recipe. And this is just for confectioning. What happens if we add cheesemaking? Computers are really bad at these “mostly one way with exceptions” modeling.

Another common option is approach is reducing the number of edge cases by reducing functionality. Most open recipe formats store a lot less information than I did in that example, moving more of the edge cases to string data that don’t represent formal concepts in the model. This works but is quitter talk.

I suspect there are different possible approaches to edge case poisoning if the edge cases make up 10% of your cases vs if they make up 0.1%. Maybe in the latter case you can instead return

Output = OneOf(RegularRecipe, WeirdRecipe)
RegularRecipe = list of Ingredient
Ingredient = (Food, mass)
...

What I also like about this is that there’s a “refinement” from the Pathological case to the Regular case. You can “project” a normal recipe into the pathological format, and then write whatever algorithm for both possible inputs. This gives you a test (do they give isomorphic results?) and a simpler version of the data and algorithm that a newcomer can read to understand the gist of things before diving into the edge cases. But I’ve never tried this in practice so I don’t know how well it works. Might require too much redundant code to be worthwhile.

Update for all the website readers

This was sent as part of an email newsletter; you can subscribe here. Common topics are software history, formal methods, and theorycrafting. Updates are 1-2x a week with the first (like this) being public and the second being email-only.

I’m using a Pascal-esque pseudocode here. ↩
To make fondant you agitate a sugar syrup to form seed crystals. Including some premade fondant gives you more seed to start with and dramatically shortens the time you need to spend scraping a thick syrup around. Whenever I make fondant I always throw a hundred grams in the freezer to use for the next batch. ↩
They don’t make the argument wrong. They make it unfortunate for us all that the argument is right. ↩

Edge Case Poisoning

TLA+ Workshop Last Call