Insights

Next generation data insights using natural language queries

At Twitter, we process approximately 400 billion events in real time and generate petabyte (PB) scale data every day. There are different ways in which various teams at Twitter harness this data to build a better Twitter for everyone.

Taking a broad view, we could cluster the infrastructure and tools of a comprehensive and robust big data platform into three categories - data processing, data storage, and data consumption. Across the industry we have powerful infrastructure for processing petabytes of data (such as Spark, Cloud Dataflow, Airflow) and storing voluminous data such as distributed blobstores (GCS, S3, Hadoop, Columnar DBs, BigQuery). However, non-trivial challenges still exist in gathering timely, meaningful, and actionable insights from these exabyte-scale data platforms through dashboards, visualizations, and reports.

The problem

One of the biggest hurdles with current data-consumption products used in the industry is that there is a need for backroom processing where engineers and analysts need to create dashboards, reports etc. before consumption. This leads to challenges:

decreased time value of the data, thereby impacting Twitter's ability to make timely data-driven decisions.
increased total costs of generating insights from new attributes, features, and dashboards. Engineers/analysts have to invest in continuous development and maintenance of the dashboards/reports due to evolving business needs.
missed opportunities, as current tools don’t anticipate and proactively surface insights from exabytes of data based on what our internal business customers might find useful. Currently, questions are human initiated rather than human and platform initiated.

The solution

Over the past 20 years, insight products have come a long way from crosstab reporting (late 90’s) and dashboards (2000’s), to immersive visualizations (2010’s). With the recent advancements in natural language processing and machine learning, there is a unique opportunity to make consumption of data, from exa-scale platforms for insights, both intuitive and timely.

A similar thought was shared by E.F. Codd, in his paper, ‘Seven Steps to Rendezvous with the Casual User’, as he wrote, “If we are to satisfy the needs of casual users of data bases, we must break through the barriers that presently prevent these users from freely employing their native languages (e.g., English) to specify what they want.”

We built an in-house product called Qurious which allows our internal business customers to ask questions in their natural language. They are then served the insights in real time without the need to create dashboards. The product includes a webapp and a Slack chatbot, both of which are integrated with BigQuery and Data QnA APIs.

Qurious demo

Below is the demo of the first version of Qurious that provides an autocomplete search box for our internal business customers to type a question. The user is able to click a ‘Get Data’ button to get the answer in a datatable.

Next generation data insights using natural language queries

Insights

Next generation data insights using natural language queries

The problem

The solution

Qurious demo

Recommend

网传今晚上海封城？上海六院什么情况？最新回应

收缩、裁员、提速、融资，生鲜零售2022年往哪走？

机构今日买入这13股，卖出万孚生物4.6亿元丨龙虎榜

北向资金今日净流出50亿元卖出贵州茅台8.48亿元

总理记者会：应对气候变化等问题，需中长期有力应对

5 Main Reasons Why Mood Boards are Essential for Design Process

俄乌局势、经济增速、台湾问题…今年记者会总理回答了这14个问题｜一图速览

Paramount+ drops enigmatic first teaser for Star Trek: Strange New Worlds

京东收购德邦落定，快递物流收购刚开始

几秒一反转的收费短剧究竟是进步，还是无可奈何？

About Joyk