15 April 2020

Scaling Query Suggestions in a COVID-19 World

When we launched Yext Answers at ONWARD-19 the world was a very different place. We joked about Todd Munion declaring that “over 1.1 billion Americans suffer from dry eyes.” Just a few months later, over 1.1 billion human beings around the world would be eager for information on a different medical condition—COVID-19—and the misinformation was everywhere. A lie can travel halfway around the world while the truth is putting on its personal protective equipment.

Answering the Call

Which is why we at Yext wanted to apply our expertise, Perfect Answers Everywhere, to this global pandemic. We partnered with the State of New Jersey to power their official COVID-19 website, covid19.nj.gov. Up until that point, we had a few dozen customers with plans to scale to hundreds or even thousands more in the medium term. We built a scalable system with extra capacity, expecting to use it over the course of the following year. Instead, with a single press conference, Governor Phil Murphy increased our usage by two orders of magnitude over all our previous customers combined:

Governor Murphy continued to mention us at every subsequent near-daily press conference. It was exciting to look at our monitoring graphs and see the impact of each press conference on our request counts:

It was even more exciting to not see the impact of any press conference on our latency graphs:

Yext Answers comprises a number of different backends which all handled the two order of magnitude increase in traffic relatively easily. However, one backend in particular happens to handle 10x more traffic than each of the others. That backend is what we call Query Suggest, providing suggestions for what you might want to search for, based on what you’ve already typed so far. The data used is a combination of hardcoded site-supplied example queries, and actual anonymized user queries, ranked by popularity.

In this post, I’ll describe the engineering choices we made to make this service useful and scalable.

Barking up the Wrong Trie

Seasoned engineers may have already guessed what data structure would serve as the core of query suggest’s serving system: a Trie. As a refresher, a trie is a tree-like data structure where each node represents a letter of the alphabet. The letters of a string s uniquely define a path through the trie to a node n with an important property: every string in the trie that has s as a prefix will be in the subtree rooted at n. From these candidates, we display the most popular query suggestions to the user.

A trie is so efficient for the problem at hand that we can try (no pun intended) additional lookups - probable wrong lookups - for negligible additional cost. We built a custom trie implementation that recursively attempts typos of the given query. If you type “health”, we will return the sub-trie rooted at “health”, as well as the ones rooted at “heath” (character deletion), “haelth” (transposed characters), “heelth” (substitution), “heaalth” (addition), etc. We allow combinations of typos - the longer the query, the more typos we will allow. For example:

A single keystroke could result in thousands of recursive lookups through the trie. However, we can cheaply filter out wrong candidates and the trie can efficiently consider good candidates. The vast majority of these typo candidates will quickly short circuit their recursion. Once we’re considering suggestions for “hea”, the array of subtrees is already in the L1 or L2 cache, and it’s very efficient to see that there is no “heaq” node. On the other hand, if the user typed “heaq” but actually meant “heal”, we’re able to recurse down the proper sub-trie and generate good query suggestions.

Strapped for Cache

When building highly scalable systems, caching should be one of the first tools considered. There are plenty of generic caches that would have performed quite well. However, this problem has unique features that allows for a cache that is both dirt simple to implement, and also slightly better performing than any of the off-the-shelf caching solutions.

The index we are serving happens to have all of the most popular queries, ordered by popularity. And, for each indexed result, we can trivially construct the set of queries that would return it. That is, for the query “how can I protect myself”, we know it will be returned for the empty string, “h”, “ho”, “how”, etc. By walking the set of queries in decreasing popularity, and looking up each prefix to get the full list of query suggestions, we can build a cache of popular queries before responding to any user queries.

Hash Flow Analysis

At this point, we’ve solved the architectural problems that lead to an efficient system: Use the right data structure to serve, use an effective cache, offload as much of the work as possible to the preprocessing steps (our query filtering, normalizing, and index building daemon - not discussed here). The next steps are to optimize the code itself. It is exciting to have a data structure that can efficiently perform thousands of lookups for every user keystroke, generating useful results in far less time than the keystroke itself took. That also means that even minor improvements in the trie implementation will have a large impact on the system’s overall throughput and latency.

And, through CPU profiling and heap profiling, we’ve determined that our bottleneck was… hashing. The same optimization that made HashSets and HashMaps so popular was the cause of so much wasted resources. This problem manifested in multiple different ways.

For example, our first implementation returned a HashSet of candidate results to be ranked. We chose this because the candidates are unordered and unique, and those properties naturally lend themselves to a HashSet. However, we didn’t need this intermediate data structure to enforce uniqueness; the preprocessed index creation and trie itself already guaranteed uniqueness. For this purpose, it turns out that the most efficient Set implementation is a List. This 5 line change, written during a hackathon at our annual engineering offsite, led to a doubling of throughput and decrease in tail latency.

There were also considerable memory costs to all of our HashMaps. Each trie node maintains pointers to its children in a HashMap. When engineers first learn about tries, we tend to focus on the nodes near the root, where there is considerable overlap. However, most nodes have very few children, or none at all. By default, a Java HashMap initializes with space to handle 16 children. When we changed this to 1, we found a space savings of 25%. When we changed it so TrieNodes only create a HashMap when there’s at least one child, we saw even further savings - at one point, we had 1G of RAM holding just empty HashMaps! Sure, this means populating the trie is slower, but population happens before we start responding to user queries. The space savings not only saves on our serving cost, but it means that we’re able to store more of the trie in the local processor cache. When performing thousands of lookups in under 25ms, the one to two order of magnitude difference in latency between reading from RAM and from CPU cache is significant.

It turns out that there are many ways to micro-optimize this data structure in a way that has real-world savings. For any engineer who loves nitty gritty optimizations and hates hearing the phrase “premature optimization”, this is the ideal project. We experimented with a custom built TinyHashMap - a data structure that uses a List of Pairs to implement the Map interface for Map sizes under N, then switches to using a delegate HashMap once the size goes over the threshold. Here we were able to experiment with different values of N, trading off memory footprint vs. runtime latency. We saw significant savings there. Soon we plan to experiment with HashMaps that use primitives as their keys, as that can yield further improvements over the general purpose Java default options.

The Future State of Answers

The success of our New Jersey partnership has drawn a lot of interest in this product. We’ve launched for the State of Alabama and the US Department of State. We’re in preliminary discussions with other government entities. We’ve launched some new commercial clients, as well as helped others with our free 90 day trial. See more as we kick off April 15 as No Wrong Answers Day.

Scaling Query Suggestions in a COVID-19 World

Scaling Query Suggestions in a COVID-19 World

Answering the Call

Barking up the Wrong Trie

Strapped for Cache

Hash Flow Analysis

The Future State of Answers

Recommend

The Making of Yext Pages, Part 2

Keeping Calm When the Sky is Falling: Reducing Stress Around Incidents

Moving Onsite Interviews to the Cloud

Monitoring local Kubernetes services with Tilt

CUE for cross-language test cases

My Summer as a SWE Intern

A Type System for Knowledge Graph Entities

Skaffold: k8s Development Made Easy

How to Use the Jira API

Book Review: The Unicorn Project

About Joyk