4

Next generation data insights using natural language queries

 2 years ago
source link: https://blog.twitter.com/engineering/en_us/topics/insights/2022/next-generation-data-insights-using-natural-language-queries
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Insights

Next generation data insights using natural language queries

At Twitter, we process approximately 400 billion events in real time and generate petabyte (PB)  scale data every day. There are different ways in which various teams at Twitter harness this data to build a better Twitter for everyone. 

Taking a broad view, we could cluster the infrastructure and tools of a comprehensive and robust big data platform into three categories - data processing, data storage, and data consumption. Across the industry we have powerful infrastructure for processing petabytes of data (such as Spark, Cloud Dataflow, Airflow) and storing voluminous data such as distributed blobstores (GCS, S3, Hadoop, Columnar DBs, BigQuery). However, non-trivial challenges still exist in gathering timely, meaningful, and actionable insights from these exabyte-scale data platforms through dashboards, visualizations, and reports.

The problem

One of the biggest hurdles with current data-consumption products used in the industry is that there is a need for backroom processing where engineers and analysts need to create dashboards, reports etc. before consumption. This leads to challenges:

  • decreased time value of the data, thereby impacting Twitter's ability to make timely data-driven decisions.
  • increased total costs of generating insights from new attributes, features, and dashboards. Engineers/analysts have to invest in continuous development and maintenance of the dashboards/reports due to evolving business needs.
  • missed opportunities, as current tools don’t anticipate and proactively surface insights from exabytes of data based on what our internal business customers might find useful. Currently, questions are human initiated rather than human and platform initiated.

The solution

Over the past 20 years, insight products have come a long way from crosstab reporting (late 90’s) and dashboards (2000’s), to immersive visualizations (2010’s). With the recent advancements in natural language processing and machine learning, there is a unique opportunity to make consumption of data, from exa-scale platforms for insights, both intuitive and timely. 

A similar thought was shared by E.F. Codd, in his paper, ‘Seven Steps to Rendezvous with the Casual User’, as he wrote, “If we are to satisfy the needs of casual users of data bases, we must break through the barriers that presently prevent these users from freely employing their native languages (e.g., English) to specify what they want.”

We built an in-house product called Qurious which allows our internal business customers to ask questions in their natural language. They are then served the insights in real time without the need to create dashboards. The product includes a webapp and a Slack chatbot, both of which are integrated with BigQuery and Data QnA APIs. 

Qurious demo

Below is the demo of the first version of Qurious that provides an autocomplete search box for our internal business customers to type a question. The user is able to click a ‘Get Data’ button to get the answer in a datatable. 


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK