3

How to optimize an already fast indexing process (advanced use cases)

 2 years ago
source link: https://www.algolia.com/blog/engineering/how-to-optimize-an-already-fast-indexing-process-advanced-use-cases/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

How to optimize an already fast indexing process (advanced use cases)

Fast indexing ensures that search results contain the most up-to-date information in the most timely manner. The meaning of “timely” depends on every company’s particular business activity. But as mentioned in our previous article on indexing best practices, and repeated below, fast indexing comes out of the box – that is, there’s nothing you need to do to optimize the indexing process, it’s already fast enough for most use cases. 

Optimizing an already fast indexing process might seem unnecessary. But in some situations, optimizing adds speed to the standard indexing process. We see this in a handful of advanced use cases where data needs to be updated even faster than usual, sometimes in real-time. 

We’ll discuss the following advanced use cases:

  • High-frequency data changes
  • Crisis
  • Black Friday or other high-activity events
  • Real-time indexing, scenario 1: booking a hotel, reservations & rental availability
  • Real-time indexing, scenario 2: bidding and stock markets

*Just a general note on indexing. For all use cases, whether standard or advanced, you’ll always want to perform a “batch indexing process” that updates data in small batches, incrementally, and per attribute

High-frequency data changes

In this scenario, it’s about how you prioritize some changes over others. We’ll call this “selective updating”. For example, you decide to send some updates now (like price) and others later (like description). This lowers the number of changes in each batch,  shortening the frequency of your batching process (e.g., every 5 minutes to every 1 minute).

Another technique is to use “non-delete indexing”, where you set an “is-unavailable” attribute on a record instead of deleting it. Deleting records costs more in terms of performance than merely updating attributes. But it’s a trade-off. While it’s faster to change the value of an attribute, it’s always better to reduce the size of an index via deleting, which makes it faster to index. Thus, the best practice is to do the techniques together: make attribute-changes every 5 minutes and delete those records every 30 minutes.

Crisis

A crisis situation happens when a factory doesn’t deliver a supply of goods. Or a ship gets stuck in the Suez canal, blocking the world’s supply of goods. In these scenarios, the online business has to immediately remove the unavailable items from its website and replace them with another set of products. They may also have to rethink their promotions. 

There’s actually no reason to do anything different from a standard indexing process. However, if the crisis goes on for too long, or requires a lot of changes in a short time, it’s important to avoid overloading the indexing. The best approach then is to categorize the changes:

  • Remove out-of-stock items immediately. You have two options here: delete the items from the index or use an “out-of-stock” boolean, filtering out records whose “out-of-stock” attribute is false. As mentioned in the previous section, the best approach is to change the attributes every 5 minutes and delete the flagged records every 30 minutes.
  • Update the new items as they arrive, in batched updates separately from the other 2 categories.
  • Change the promotions separately, in batched updates separately from the other 2 categories.

Black Friday or other high-activity events

Black Friday combines the scenarios of high-frequent changes and “crisis”, so you’ll want to follow the suggestions of those scenarios. The difference here is that on Black Friday, the high-activity period could last longer than a crisis, or create sudden spikes of activity – but at least you can plan for the event in advance, which makes it easier to manage.

Real-time indexing, scenario 1: booking a hotel, reservations & rental availability

Users expect search results to have the most up-to-date, accurate information. Technically, this means that they want to see changes in the back office to appear in their search results immediately. Businesses want this as well: outdated information can negatively impact profits and customer trust.

Example: booking a hotel. If a hotel appears available in the search results, users expect that it has the same status and price when they click for more detail. But what if the hotel is booked between the query and the click? You can manage this gracefully with a friendly front-end UX (see below). But you can also mitigate the problem with an additional indexing strategy, often used in real-time systems-level programming that relies on a second, smaller index.

You create a tiny index on the Algolia server that collects updates, up to 1000 records. The front-end code merges this tiny index with your main index on every search, removing results (e.g., out-of-stock items) or updating information (e.g., price) on the fly. Here’s the algorithm:

  • The Main index contains all product data, as normal.
  • The Tiny index gets updated with each change as it occurs, not yet the main index. This tiny index should contain no more than 1000 records. 
  • For every query, the front end merges the two indexes. There are two possibilities: 
    • The Tiny index contains one attribute per record, the object id. If the record exists in the results (matched on object id), it’s removed from the results.
    • The Tiny index contains 2 attributes (object id, price). The object id is used to match records in the results and the price attribute to overwrite the price information in the results.
  • Meanwhile, on the back end, the main index gets updated regularly with the data in the tiny index (following the standard batch-update techniques described previously). The tiny index is then zeroed out, ready to receive new updates.

Notes:

  • The client-side merge should not impact performance because client-side merging is simple, fast, and contains only a small amount of records.
  • Pagination: Removing results will affect the pagination, because the removed items will create gaps in the list of results. Thus, some search result pages will have less items than others. To solve this, place a banner or a promotion in the gap. But the best solution is to use infinite scrolling or “load more” logic if you need to remove items often.  

Real-time indexing, scenario 2: bidding and stock markets

Here, user expectations and business needs are more stringent: this use case requires that changes in the back-office show up immediately in the search results. We see this in stock market trading or bidding applications, where prices can change every second or even millisecond. If this scenario is yours, contact Algolia and discuss the different advanced settings you can adjust on your application, engine, and data that can maximize your indexing performance. While engine-level changes should rarely be done, in exceptional circumstances, they can shorten indexing times.

Other considerations

Perceived performance – Front end UI/UX solutions  

One important aspect to performance is perceived performance. We won’t cover UI/UX best practices here, but want to acknowledge the importance of building a front end that gives the feeling of high performance. This is not about creating a false impression, it’s about communicating to the user that there is a (reasonable) waiting time. A friendly progress bar is an example of this: it asks the user nicely to wait, and if it’s not too long, people are ok with that. There are many equally effective ways to manage performance on the UI.

Out-of-the-box performance

As promised, a word about what we mean by “out-of-the-box high performance”. Our indexing comes with the following technologies:

  • A search engine using advanced indexing techniques
  • High-performant bare-metal servers configured for performance 
  • A globally available cluster-based cloud infrastructure, with low latency and server redundancy (i.e., no server downtime)
  • An API with a retry method to ensure (contractually) 99.99% availability 

Next readings

Our first article on indexing presented a high-level overview of standard and advanced indexing use cases. Our next article walked you through indexing best practices and the implementation details of a standard indexing process. This article discussed how to optimize indexing in advanced use cases.

Now it’s time to help you build solutions. Our remaining articles will provide front & back end code for some of the advanced indexing use cases we discussed, starting with real-time pricing. 

To get started with indexing, you can upload your data for free, or get a customized demo from our search experts today.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK