26

I have been working on developing a system "Converting Natural Language to SQL Query".

I have read the answers from the similar questions, but was not able to get the information that I was looking for.

Below is the flowchart for such system which I have got from An Algorithm to Transform Natural Language into SQL Queries for Relational Databases by Garima Singh, Arun Solanki

Flowchart

I have understood till part of speech tagging step. But how do I approach the remaining steps.

  1. Do I need to train all the possible SQL queries?
  2. Or, once part of speech tagging is done, I have to play with the words and form a SQL query?

Edit: I have successfully implemented the from step "user query" to "Part of speech tagging".

Thank you.

Kevin Bowen
  • 113
  • 1
  • 5
deepguy
  • 1,441
  • 7
  • 18
  • 38

5 Answers5

21

If you want to tackle the problem from another perspective, with an end to end learning, such that you don't specify ahead of time this large pipeline you've mentioned earlier, all you care about is the mapping between sentences and their corresponding SQL queries.

Tutorials:

How to talk to your database

Papers:

Dataset:

A large annotated semantic parsing corpus for developing natural language interfaces.

Github code:

  1. seq2sql
  2. SQLNet

Also, there are commercial solutions like nlsql

Fadi Bakoura
  • 896
  • 5
  • 13
6

NLTK has an excellent step by step guide on everything you need to convert human language to an SQL query using the nltk package in python.

It’s rudimentary, but it answers your question.

PyRsquared
  • 1,584
  • 1
  • 10
  • 17
  • Thanks @killerT2333 . I just had look. But it is kind of confusing. Is there any other simple doc ? – deepguy May 14 '18 at 16:15
  • 3
    That's the simplest one I know of - it's quite a complex task what you're asking, so there's no simple answer to your question. On the nltk documentation they do take you through the theory at a high level, and at also at a low level with a lot of code examples. More extensive than that, you probably need to search github or research papers. – PyRsquared May 14 '18 at 16:17
  • I will go through that one more time. And update you here. – deepguy May 14 '18 at 16:58
2

To complement Fadi's answer, the following are other useful papers on NL to SQL methods. The major difference of these methods is that they support queries that should be answered using more than one table (joining different tables), however the Salesforce paper (and their dataset) is focused on queries on one table at a time.

Both of these papers use the GeoQuery dataset avaialbe here.

vahid
  • 21
  • 2
1

Just to complement this thread with the latest AI updates on this matter: Open AI released recently Codex.

Codex is an AI trained to convert and analyze many coding languages (main focus is python).

It's also GPT-3 based.

Stephen Rauch
  • 1,783
  • 11
  • 21
  • 34
joaopq
  • 11
  • 1
  • codex is terrible at this, the more you get deeper the more you understand how far it is from really work. it makes many syntax errors with GROUPS and subqueries – hkrlys May 10 '23 at 17:48
1

There are lots of works on text-to-SQL task.

I strongly suggest you to check WikiSQL and Spider datasets. Studies start from seq2seq + attention mech. to BERT-based solutions. Also each study points out the importance of the input representaion where you can feed all the table schema or just a column name. It's a pretty deep topic and as @PyRsquared said there is no simple answer :)

iva123
  • 111
  • 2