31

Natural Language to SQL: Use it on your own database

 4 years ago
source link: https://mc.ai/natural-language-to-sql-use-it-on-your-own-database/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

This article is structured as follows:

1. Introduction
2. Installing & Running EditSQL on SParC
3. Making Changes to the Code
4. Adding a Custom Database and Building the Vocab
5. Testing your Question
6. Conclusion
Image by author

1. Introduction

Generating SQL queries from user questions involves solving tasks more than just question-answering and machine translation. Like in question-answering, as the user interacts more, they will often put forward questions that require complex processing such as — a reference to some information mentioned previously or require a combination of several disparate schemas. As a tool, such a system helps the end users, who are often inexperienced in database querying matters, extract information from scores of complex databases.

Several attempts have been made on different benchmark datasets to address such problems in the Text-to-SQL task, especially in semantic parsing. You can check out this interesting survey [4] that introduces the task and the problem quite well.

You can also check out this interactive converter built by the AllenNLP team on the ATIS dataset. Just like several other and better performing models, they use semantic parsing and an encoder-decoder architecture to do the job.

Example from SParC dataset. Image from [1]

As of July 2020, the leaderboard for Spider and SParC list the following models as some of the best performing models with an opensource code given:

Here, you can find the complete leaderboard for Spider and SParC . For practicality, we limit the scope of this article to exploring EditSQL especially on SParC.

Another example of SParC given by the team on their leaderboard

The Limitation:

While the listed models perform well on their respective datasets, the given codes do not have the option to test the model on a custom database (let alone train). In fact, even to make changes to existing queries of the dataset is a tedious and error-prone process.

However, out of the models given above we can make some changes to EditSQL and manage to run the SParC experiment on a custom SQLite database. These changes are minor and prove to be an easy workaround to test the performance on your own database.

About EditSQL:

EditSQL attempts to solve a context-dependent text to SQL query generation task and incorporates interaction histories as well as an utterance-table encoder-decoder unit to robustly understand the context of a user’s utterance (or question). To do this, they use an encoder based on BERT which avails in capturing complex database structures and relate them to the utterances. Thus, given an arbitrary question, the model will most certainly identify correctly the database schema to which the question corresponds.

Table Encoder structure given in [1]
Utterance encoder structure given in [1]

Furthermore, EditSQL takes into account the relation between the user utterance and the table structures as well as recent history of encoding to correctly identify the context. As shown in the diagram above, the information gained here is then passed to a table-aware decoder that uses an attention enhanced LSTM to perform SQL generation.

However, the user often asks questions that contain information provided in a previous interaction. In a sequential generation mechanism this might lead to redundancy in processing and query generation. What gives EditSQL its name is the novel mechanism to “edit” the generated tokens of the query and take care of this problem using another Bi-LSTM paired with the attention enabled context vector.

Model architecture for editing the queries (given in [1])

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK