Lucidworks: Deploying Custom Data Science Models with Lucidworks Fusion

Data Science , Fusion Tips , Lucidworks Fusion , Reference Materials , Technical Article

Deploying Custom Data Science Models with Lucidworks Fusion

Learn how Lucidworks Fusion helps teams reduce friction and accelerate the velocity of data science.

bySanket Shahane on April 3, 2020

qmMruev.jpg!web

Lucidworks Managed Search: Same Solr, Fewer Chores

Marcus Eagan

Fusion 5.1 Is Here: Faster Deployment of Data Science and Innovation

Alea Abed

Whether in finance, retail, healthcare, or oil and gas, data science and machine learning are pervasive across all domains and business processes. However, there is no “global” ML solution that works for all problems.

Data science teams are continuously adapting new frameworks and methods to solve challenges in the best possible way. This creates pressure on engineering and DevOps teams to be able to serve the latest solutions, with friction at hand-off points and potentially higher technical debt. The biggest challenge faced today by these technical experts is taking ML models to production quickly in the context of a fully functional and performant search application.

This post will cover in detail how Lucidworks Fusion reduces the friction of deploying custom machine learning models but if you’d like to see these tools in action be sure to also sign up for our upcoming webinar, Accelerate Data Science Velocity with Fusion 5.1 .

Machine Learning in Retail and Enterprise Search

Retail search and enterprise document discovery applications use data science as an important ingredient for personalizing mission critical applications. Simple keyword matching is no longer enough to satisfy today’s users. Semantic Search applied toward product recommendations, user query understanding, document categorization, sentiment analysis, and summarization are critical to providing enhanced, personalized experiences to consumers as well as employees. As data science teams strive to build models that satisfy requirements of new-gen users, having the ability to smoothly take those models into production is becoming critical.

Data Science Toolkit Integration in Lucidworks Fusion

Lucidworks Fusion is a cloud-native, scalable enterprise document discovery platform built with openness and pluggability at its core. Fusion seamlessly integrates with a variety of commercial and open source machine learning frameworks to derive insights from large unstructured documents. Use cases vary from e-commerce search applications, to conversational frameworks, to support portals and internal enterprise knowledge discovery applications.

Fusion’s Data Science Toolkit Integration is a model service that provides seamless integration with query and index pipelines to add intelligence for processessing incoming queries and documents. Fusion integrates with Seldon Core , an open source framework for model deployment management. Fusion’s Data Science Toolkit Integration enables data science teams to develop and validate models built for specific data and use Fusion to deploy them in production. This capability helps teams to:

Streamline production of search-focused ML models
Reduce data science teams dependencies on DevOps teams and vice versa
Increase productivity, drive experimentation to fail fast, iterate, and improve

Deployment and Consumption Workflow

Data Science teams will

build
validate models for organizations problems,
convert them to versioned docker images and
register with Fusion to deploy

jYjUveM.png!web

The diagram above describes a typical data science team’s workflow. The team first identifies the problem, takes data from various storages, uses Jupyter notebooks with Python ML libraries and performs iterations until a satisfactory version is produced. After that, uses simple commands build a docker image and publish to Fusion. Fusion needs one-time access setup to the organization’s private docker repository to register the image. Fusion can then deploy the models on demand at scale.

Using models at Query (search) and Index (data ingest) time

Case 1: Processing documents at index time.

When indexing documents from Sharepoint, GDrive or any other data source, Machine Learning models can enrich the document with Entities, Summary, Topics, Sentiment Scores etc.

Documents pass through the following flow: Fusion Connectors → Index Pipeline → Solr Index

Fusion’s Machine Learning Index Stage will interact with deployed ML models and pass documents/predicts back and forth between the pipeline and Seldon core.

YbMjUzA.png!web

The image above describes how the documents flow through different stages in an index pipeline getting enriched at each step before being stored. The Machine Learning Index Stage interacts with Fusion’s ML Service which then talks to Seldon Core. Seldon Core routes the requests to the respective models while load-balancing between model replicas. Finally the prediction from the model is returned back to the pipeline and the document is enriched with that prediction. Model replicas are copies of Model Docker images deployed to increase scalability.

Case 2: Processing User Queries

When processing user queries in real time (from the search front end either ecommerce website or internal knowledge discovery portal) queries can be passed through ML models to predict various user intent attributes such as, brand affinity, product category for the query is looking for, expected color etc.

Queries pass from: Front End → Query Pipeline → Solr Index. → Response → Front End

3ENnYf3.png!web

The diagram above shows how a user query travels through a Fusion query pipeline and the Machine Learning Query Stage interacts with Deployed ML Models passing queries/predictions back and forth between the pipeline and Seldon Core. The predictions can then be used as Solr Boost or Filter parameters. E.g. A model can predict department:electronics for query “ipad”.

Case 3: Post Processing search results at query time.

Responses to user queries, from the Fusion backend can also be modified to alter the ranking of the results, redact certain documents etc. to promote personally relevant results based on user information, show documents based on semantic similarity in addition to keyword search.

Queries pass through the following workflow: Front End → Query Pipeline → Solr Index

q2Y3iqN.png!web

Machine Learning Query Stage will interact with Deployed ML Models and pass Response documents/predictions back and forth between the pipeline and Seldon core. The re-ranked / altered results can then be passed on to the front end. E.g. Certain models that do this are popularly known as LTR models (learning to rank) .

Lucidworks Models

Lucidworks has deployed multiple Deep Learning based ML Models on this framework, available for Fusion users out of the box.

Sentiment analysis small text
Sentiment analysis large text
Semantic search apps
1. Smart Answers (coming soon)
2. Zero search results treatment (coming soon)

See Fusion in Action

If you want to learn more and see Fusion’s capabilities for data science in action, register for our upcoming webinar, Accelerate Data Science Velocity with Fusion 5.1 .

Deploying Custom Data Science Models with Lucidworks Fusion

Related Articles

Machine Learning in Retail and Enterprise Search

Data Science Toolkit Integration in Lucidworks Fusion

Deployment and Consumption Workflow

Using models at Query (search) and Index (data ingest) time

Case 1: Processing documents at index time.

Lucidworks Models

See Fusion in Action

Recommend

KCP: a new low-latency, secure network stack

Rosetta@home

Run the unmodified iOS kernel image on the VIM3 development board

RISC OS and NetBSD running on same SoC

Let's Prove A Concurrent Blocking Queue (sequel of Let's Prove L...

Java's Missing Features: 5 Years Later

MailgunLogger

This Month in Mutter & GNOME Shell | March 2020

AMAZ LINUX 2 + QB + GD 挂 PT 魔力

Percona DBaaS CLI to Simplify Deployment in Kubernetes

About Joyk