7

What is intelligent search and how does it work?

 2 years ago
source link: https://www.algolia.com/blog/ux/what-is-intelligent-search-and-how-does-it-work/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

What is intelligent search and how does it work?

wuzahve9jlqxtqypp8no.jpg

Did you know that your IT service desk is quite likely under increasing pressure? An influx of demand due to an increase in digital transformation projects means many support organizations are struggling to provide adequate IT support to their employees.   To help streamline support processes and improve service levels for stakeholders, many service desk managers are turning to a knowledge-centered-service (KCS) approach. KCS involves sharing and refining knowledge on an on-going basis. The benefits of KCS with customer support are plentiful. Service desk agents access historical company content through search and continuously optimize that content, all with the goal of helping fast-track the resolution process. KCS: a best-practice methodology Knowledge-centered service (also known as knowledge-centered support) is an approach that describes how organizations can better use the information they possess in order to improve their service delivery. The KCS methodology involves IT service-desk workers not only solving systems, employee, and stakeholder issues but documenting the solution process for effective knowledge sharing and customer success in the future. The continuous updating of the knowledge base during the KCS process improves service delivery.  What is a knowledge base? A knowledge base is an online library that contains knowledge articles about products, services, issues, and topics. It might include product FAQs, troubleshooting guides, user manuals, and other documents. To support the collection, retrieval, and sharing of knowledge, today’s best knowledge bases tend to be AI powered and searchable. For support teams that deal with many service requests, the collective knowledge in a knowledge base is essential to speed up the response time for new agents and established ones alike, as they refer to historical company content on a regular basis. The benefits of knowledge-centered service Like all good knowledge management, KCS practices can help your organization’s service desk capture knowledge to enhance customer satisfaction in several ways. It helps your team: Resolve cases faster A searchable database at a support team’s fingertips means no more second guessing service requests. Past problem-solving incidents are well documented, giving agents something to refer to. Frequently asked questions can be quickly answered. Create a culture of expertise A culture of expertise refers to a service organization culture that’s distinguished by knowledge and skills. A rich library of resources helps team members foster a culture of expertise, which equips agents to deal with cases. A KCS approach to service requests produces better-informed service agents and more-skilled resolution of inquiries. Optimize the use of resources Having a searchable knowledge base means content can be optimized for search. Especially with large knowledge bases, tagging content with keywords then allows agents searching for the right information to access specific content faster, reducing the frustration of searching for the right information and helping ensure operational efficiency.   Enable a self-service success strategy A self-serve database gives support agents a level of autonomy and ownership over their work and speeds up the training process for new agents (who can use the database to see if a case has already been documented). Increase first-contact resolution  An optimized database of resources not only speeds up the resolution process, it increases the likelihood of cases being resolved the first time, facilitating continuous improvement. If first-contact resolution is high, this should indicate that customers are receiving good service in an efficient way. Creating a KCS-centric knowledge base Creating an effective knowledge base takes time and is an ongoing process. However, using the KCS method can help you create a knowledge base faster and effectively maintain it so that it’s always up to date.  Here’s how to create a knowledge base that gives back: Step 1: Create good documentation When IT service-desk inquiries are resolved, it’s important that the appropriate documents are created (if they don’t exist). The process of creating support documentation for the customer issues in each case — be it in the form of FAQs, articles, one pagers, case studies, troubleshooting guides, or even complete manuals — will quickly build a comprehensive library of useful resources.  Step 2: Use consistent document structure Creating a standardized document structure enables consistency across the library and simplifies the process of creating additional documentation. Agents can then just start with a content template when they’re putting together documentation.  Step 3: Use your organization’s knowledge It’s not enough for a great knowledge base to simply exist. Support teams must be trained in how to use the platform, how to create and add documentation to the library, and how to effectively use search to find the right information when they need it.  The importance of search for a first-rate knowledge base Your knowledge base should be organized in an intuitive way, as well as easily searchable. To optimize accessibility, you can arrange content by topic or content type. A search bar ensures that users can instantly pull up the right content when they need it.  With a searchable knowledge base, you can use keywords to segment information by product, topic, or any other relevant category. Keyword best practices A keyword is a word, phrase, or set of words that best describes what content covers. With a knowledge base, agents can use keywords to locate documents related to their issue. For example, for lost-password predicaments, agents can find relevant information by entering a keyword such as “password” or “forgotten password” in the search bar. Support teams can define their own rules for logging and finding documents. Here are some tips on doing it right: Choose keywords an agent is likely to search for: Get inside an agent’s head. What search terms might they use? How would they phrase their search queries?  Relate keywords to content: Each one should describe what the content is about.  Use long-tail keywords: A long-tail keyword is typically a three-to-five-word phrase. Long-tail keywords are longer and therefore more specific, helping to narrow down the content.  Use interlinking in content: Have service-management content creators add hyperlinks that can direct searchers to other relevant content.  Build a keyword directory: As a reference document, create a list of keywords that relate to the information agents might need. For example, here’s a list of keywords that helps people query the Microsoft knowledge base.  Benefits of having a searchable knowledge base A search bar underpinned by AI-powered software is a crucial feature of a successful knowledge base. When your agents have that, they can: Retrieve content fast and quickly address their customer’s need Easily find what they need and enjoy a pleasant user experience Establish document management processes: routinely structure their documents with keywords and organize content intuitively Add best-in-class search to upgrade your knowledge base To recap, you can centralize your IT historical support information and ensure the continuous optimization of your content to keep supporting your service team when they need it the most. And you can make all your relevant articles easy to find (vs. having to be individually excavated from siloed folders).  Algolia’s advanced search API is ready to help on both counts. Our search functionality is renowned and reliable, with proven algorithms, integrations, and UI libraries. Whether or not you’re implementing a KCS approach, it’s essential to help your service desk agents effortlessly tap the information they need and be able to reuse knowledge. Discover for free how search can help you vastly improve the efficiency (not to mention the user experience) of your service desk. If you like what you see, there’s no massive up-front investment, and you can choose flexible payments. Let us know a good time to chat and we’ll be in touch!

eonxweqq2vvuefntig5b.jpg

NLP & NLU as part of semantic search

NLP and NLU—two (often confused) technologies that make search more intelligent and ensure that people can search and find what they want, without having to type the exact right words as they’re found on a page or in a product.  NLP and NLU are why you can type “dresses” and find that long sought-after “NYE Party Dress”, and why you can type “Matthew McConnahey” and get Mr. McConnaughey back. NLP stands for “natural language processing.” It’s one of those things that has built up such a large meaning, that it’s easy to look past the fact that it tells you exactly what it is: NLP processes natural language, specifically into a format that computers can understand. These kinds of processing can include such tasks as normalization, spelling correction, or stemming, each of which we’ll look at in more detail. NLU stands for “natural language understanding.” This technology aims to “understand” what a block of natural language is communicating. It performs tasks that  can, for example, identify verbs and nouns in sentences or important items within a text. People or programs can then use this information to complete other tasks. Computers seem advanced because they can do a lot of actions in a short period of time. However, in a lot of ways, computers are quite daft. They need information to be structured in specific ways to build upon it. For natural language data, that’s where NLP comes in, because it takes messy data (and natural language can be very messy) and processes it into something that computers can work with. Text normalization When searchers type text into a search bar, they are trying to find a good match, not play “guess the format.” It would be unfair and unproductive, for example, to require a user to type a query in exactly the same format as the matching words in a record. We use text normalization to do away with this requirement so that the text will be in a standard format no matter where it’s coming from. What we’ll see as we go through different normalization steps is that there is no approach that everyone follows. Each normalization step generally increases recall and decreases precision.  A quick aside: “recall” means that a search engine finds results that are known to be good. Precision means that a search engine finds only good results. Search results could have 100% recall by returning every single document in an index, but precision would be poor. Conversely, a search engine could have 100% recall by only returning documents that it knows for sure to be a perfect fit, but some good results will likely be missed. Again, normalization generally increases recall and decreases precision. Whether that movement towards one end of the recall-precision spectrum is valuable depends on the use case and the search technology, so it isn’t a question of applying all normalization techniques, but instead deciding which ones provide the best balance of precision and recall. Letter normalization The simplest normalization you could imagine would be the handling of letter case. In English, at least, words are generally capitalized at the beginning of sentences, occasionally in titles, and when they are proper nouns. (There are other rules, too, depending on whom you ask.) But in German, all nouns are capitalized. Other languages have their own rules. These rules are useful, otherwise, we wouldn’t follow them. For example, capitalizing the first words of sentences helps us quickly see where sentences begin. That usefulness, however, is diminished in an information retrieval context. The meanings of words don’t change simply because they are in a title and have their first letter capitalized.  Even trickier is that there are rules, and then there is how people actually write. If I text my wife, “SOMEONE HIT OUR CAR!”, we all know that what I’m talking about is a car, and not something different because the word is capitalized. We can see this clearly by reflecting on how many people don’t use any capitalization at all when communicating informally—which is, incidentally, how most case-normalization works. Of course, we know that sometimes capitalization does change the meaning of a word or phrase. We can see that “cats” are an animal, and “Cats” is a musical. In most cases, though, the increased precision that comes with not normalizing on case is offset by decreasing recall by far too much. The difference between the two is easy to tell via context, too, which we’ll be able to leverage through natural language understanding. While less common in English, handling diacritics is also a form of letter normalization. Diacritics are the marks, or “glyphs,” attached to letters, as in á, ë, or ç. Words can be otherwise spelled the same, but added diacritics can change the meaning. In French, “élève” means “student,” while “élevé” means “elevated.” Nonetheless, many people will not include the diacritics when searching, and so another form of normalization is to strip all diacritics, leaving behind the simple (and now ambiguous) “eleve.” Tokenization The next normalization challenge is how to break down the text the searcher has typed in the search bar and the text in the document.This step is necessary because word order does not need to be exactly the same between the query and the document text, except when a searcher wraps the query in quotes. Breaking queries, phrases, and sentences into words may seem like a simple task: just break up the text at each space. Problems show up quicky with this approach. Again, let’s start with English. Separating on spaces alone means that the phrase “Let’s break up this phrase!” yields us let’s, break, up, this, and phrase! as words. For search, we almost surely don’t want the exclamation point at the end of the word “phrase.” Whether we want to keep the contracted word “let’s” together is not as clear. Some software will break the word down even further (“let” and “‘s”) and some won’t. Some will not break down “let’s” while breaking down “don’t” into two pieces. This process is called “tokenization.” We call it tokenization for reasons that should now be clear: what we end up with are not words but discrete groups of characters. This is even more true for languages other than English.  German speakers, for example, can merge words (more accurately “morphemes,” but close enough) together to form a larger word. The German word for “dog house” is “Hundehütte,” which contains the words for both “dog” (“Hund”) and “house” (“Hütte”). Nearly all search engines tokenize text, but there are further steps an engine can take to normalize the tokens. Two related approaches are stemming and lemmatization. Stemming and lemmatization Stemming and lemmatization take different forms of tokens and break them down so that they can be compared. For example, take the words “calculator” and “calculation,” or “slowing” and “slowly.” We can see there are some clear similarities. Stemming breaks a word down to its “stem,” or what other variants of the word are based off of. Stemming is fairly straightforward; you could do it on your own. What’s the stem of “stemming?” You can probably guess that it’s “stem.” Often stemming means removing prefixes or suffixes, as in this case. There are multiple stemming algorithms, and the most popular is the Porter Stemming Algorithm, which has been around since the 1980s. It is a series of steps applied to a token to get to the stem. Stemming can sometimes lead to results that you wouldn’t foresee. Looking at the words “carry” and “carries,” you might expect that the stem of each of these is “carry.” The actual stem, at least according to the Porter Stemming Algorithm, is “carri.” This is because stemming attempts to be able to compare related words, and breaks down words to their smallest possible parts in order to do so, even if that part is not a word itself. On the other hand, if you want an output that will always be a recognizable word, then you want lemmatization. Again, there are different lemmatizers, such as NLTK using Wordnet. Lemmatization breaks a token down to its “lemma,” or the word which is considered the base for its derivations. The lemma from Wordnet for “carry” and “carries,” then, is what we expected before: “carry.”  Lemmatization will generally not break down words as much as stemming, nor will as many different word forms be considered the same after the operation. The stems for “say,” “says,” and “saying” are all “say,” while the lemmas from Wordnet are “say,” “say,” and “saying.” In order to get these lemma, lemmatizers are generally corpus based. If you want the broadest recall possible, you’ll want to use stemming. If you want the best possible precision, use neither stemming nor lemmatization. Which you go with ultimately depends on your goals, but most searches can generally perform very well with neither stemming nor lemmatization, retrieving the right results and not introducing noise. Plurals If you decide not to include lemmatization or stemming in your search engine, there is still one normalization technique that you should consider. That is the normalization of plurals to their singular form.  Generally, ignoring plurals is done through the use of dictionaries. Even if “de-pluralization” seems as simple as chopping off an “-s,” that’s not always the case. The first problem is with irregular plurals, such as “deer,” “oxen,” and “mice.” A second problem is with pluralization that happens with an “-es” suffix, such as “potato.” Finally, there are simply the words that end in an “s” but aren’t plural, like “always.” A dictionary-based approach will ensure that you introduce recall, but not incorrectly.  Just as with lemmatization and stemming, whether you normalize plurals is dependent on your goals. Cast a wider net by normalizing plurals, a more precise one by avoiding normalization. Usually, normalizing plurals is the right choice, and you can remove normalization pairs from your dictionary when you find them causing problems. One area, however, where you will almost always want to introduce increased recall is when handling typos. Typo tolerance and spell check We have all encountered typo tolerance and spell check within search, but it’s useful to think about why it’s present. Sometimes, there are typos because fingers slip and hit the wrong key. Other times, the searcher thinks a word is spelled differently than it is. Increasingly, “typos” can also be a result of poor speech to text understanding. Finally, words can seem like they have typos but really don’t, such as in comparing “scream” and “cream.” The simplest way to handle these typos, misspellings, and variations is to avoid trying to correct them at all. There are algorithms that can compare different tokens. One of these is the Damerau-Levenshtein Distance algorithm.  This measure looks at how many edits would be needed to go from one token to another. You can then filter out all tokens with a distance that is too high. (Two is generally a good threshold, but you will probably want to adjust this based on the length of the token.) After filtering, you can use the distance for sorting results, or to feed into a ranking algorithm. Many times, context can matter when determining if a word is misspelled or not. The word “scream” is probably correct after “I,” but not after “ice.” Machine learning can be a solution for this, by bringing context to this NLP task. This spell check software can use the context around a word to identify whether it is likely to be misspelled, and what its most likely correction is. One thing important to note that we skipped over before is that words may not only be misspelled when a user types it into a search bar. Words may also be misspelled inside a document. This is especially true when the documents are made up of user-generated content. This detail is relevant because it means that if a search engine is only looking at the query for typos, it is missing half of the information. The best typo tolerance should work across both query and document, and this is why edit distance generally works best for retrieving and ranking results. Spell check can be used to craft a better query or provide feedback to the searcher, but it is often unnecessary, and should never stand alone. Natural language understanding While NLP is all about processing text and natural language, NLU is about understanding that text. Named entity recognition A task that can aid in search is that of named entity recognition, or NER. NER identifies key items, or “entities,” inside of text. While some people will call NER natural language processing and others will call it natural language understanding, what’s clear is that it can find what’s important within a text. For the query “NYE party dress” you would perhaps get back an entity of “dress” that is mapped to a type of “category.” NER will always map an entity to a type, from as generic as “place” or “person,” to as specific as your own facets.  NER can also use context to identify entities. A query of “white house” may refer to a place, while “white house paint” might refer to a color of “white” and a product category of “paint.” Query categorization Named entity recognition is valuable in search because it can be used in conjunction with facet values to provide better search results. Recalling the “white house paint” example, you can use the “white” color and the “paint” product category to filter down your results to only show those that match those two values. This would give you high precision. If you don’t want to go that far, you can simply boost all products that match one of the two values. Query categorization can also help with recall. For searches where there are a low number of results, you can use the entities to include related products. Imagine that there are no products that match the keywords “white house paint.” In this case, leveraging the product category of “paint” can return other paints that might be a decent alternative, such as that nice eggshell color. Document tagging Another way that named entity recognition can help with search quality is by moving the task from query time to ingestion time (when the document is added to the search index). When ingesting documents, NER can use the text to tag those documents automatically. These documents will then be easier to find for the searchers. Either the searchers use explicit filtering, or the search engine applies automatic query-categorization filtering, to enable searchers to go directly to the right products using facet values. Intent detection Related to entity recognition is intent detection, or determining the action that a user wants to take. This is not the same as what we talk about when we say identifying searcher intent. Identifying searcher intent is getting people to the right content that they want at the right time.  Intent detection maps a request to a specific, pre-defined intent and then takes an action based on that intent. A user searching for “how to make returns” might trigger the “help” intent, while “red shoes” might trigger the “product” intent. In the first case, you could route the search to your help desk search, and in the second one, to the product search. This isn’t so different from what you see when you search for the weather on Google and you get a weather box at the very top of the page. (Newly launched web search engine Andi takes this concept to the extreme, bundling search in a chatbot.) For most search engines, intent detection as outlined here isn’t necessary. Most search engines only have a single content type on which to search at a time. When there are multiple content types, federated search can perform admirably by showing multiple search results in a single UI at the same time. Other NLP and NLU tasks There are plenty of other NLP and NLU tasks, but these are usually less relevant to search. Tasks like sentiment analysis can be useful in some contexts, but search isn’t one of them. You could imagine using translation to search multi-language corpuses, but it rarely happens in practice, and is just as rarely needed. Question answering is an NLU task that is increasingly implemented into search, especially search engines that expect natural language searches. Once again, you can see this on major web search engines. Google, Bing, and Kagi will all immediately answer the question “how old is the Queen of England?” without needing to click through to any results.  Some search engine technologies have explored implementing question answering for more limited search indices, but outside of help desks or long, action-oriented content, the usage is limited. Few searchers are going to an online clothing store and asking questions to a search bar. Summarization is an NLU task that is more useful for search. Much like with the use of NER for document tagging, automatic summarization can enrich documents. Summaries can be used to match documents to queries, or to provide a better display of the search results. This better display can help searchers be confident that they have gotten good results, and get them to the right answers more quickly. Even including newer search technologies using images and audio, the vast, vast majority of searches happen with text. To get the right results, it’s important to make sure that the search is processing and understanding both the query and the documents. NLP and NLU tasks like tokenization, normalization, tagging, typo tolerance, and others can help make sure that searchers don’t need to be search experts, but instead can go from need to solution “naturally” and quickly.

Dustin Coates

Dustin Coates

Product and GTM Manager


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK