<<

Towards Complex Text-to-SQL in Cross-Domain with Intermediate Representation Jiaqi Guo1,∗ Zecheng Zhan2,∗ Yan Gao3, Yan Xiao3, Jian-Guang Lou3 Ting Liu1, Dongmei Zhang3 1Xi’an Jiaotong University, Xi’an, China 2Beijing University of Posts and Telecommunications, Beijing, China 3Microsoft Research Asia, Beijing, China [email protected],[email protected] {Yan.Gao,Yan.Xiao,jlou,dongmeiz}@microsoft.com [email protected]

NL: Show the names of students who have a grade higher Abstract than 5 and have at least 2 friends. SQL: SELECT T1.name We present a neural approach called IRNet for FROM friend AS T1 JOIN highschooler AS T2 complex and cross-domain Text-to-SQL. IR- ON T1.student_id = T2.id WHERE T2.grade > 5 Net aims to address two challenges: 1) the GROUP BY T1.student_id HAVING count(*) >= 2 mismatch between intents expressed in natu- Figure 1: An example from the Spider benchmark to ral language (NL) and the implementation de- illustrate the mismatch between the intent expressed in tails in SQL; 2) the challenge in predicting NL and the implementation details in SQL. The columns caused by the large number of out- ‘student id’ to be grouped by in the SQL query is not of-domain words. Instead of end-to-end syn- mentioned in the question. thesizing a SQL query, IRNet decomposes the synthesis process into three phases. In the first phase, IRNet performs a schema link- proaches on a newly released, cross-domain Text- ing over a question and a . to-SQL benchmark, Spider. Then, IRNet adopts a grammar-based neural model to synthesize a SemQL query which is The Spider benchmark brings new challenges an intermediate representation that we that prove to be hard for existing approaches. to bridge NL and SQL. Finally, IRNet deter- Firstly, the SQL queries in the Spider contain ministically infers a SQL query from the syn- nested queries and clauses like GROUPBY and thesized SemQL query with domain knowl- HAVING, which are far more complicated than edge. On the challenging Text-to-SQL bench- that in another well-studied cross-domain bench- mark Spider, IRNet achieves . accuracy, 46 7% mark, WikiSQL (Zhong et al., 2017). Considering obtaining 19.5% absolute improvement over ‘student id’ previous state-of-the-art approaches. At the the example in Figure 1, the column time of writing, IRNet achieves the first po- to be grouped by in the SQL query is never men- sition on the Spider leaderboard. tioned in the question. In fact, the GROUPBY clause is introduced in SQL to facilitate the im- 1 Introduction plementation of aggregate functions. Such imple- mentation details, however, are rarely considered Recent years have seen a great deal of renewed by end users and therefore rarely mentioned in interest in Text-to-SQL, i.e., synthesizing a SQL questions. This poses a severe challenge for ex- query from a question. Advanced neural ap- isting end-to-end neural approaches to synthesize proaches synthesize SQL queries in an end-to-end SQL queries in the absence of detailed specifica- manner and achieve more than 80% exact match- tion. The challenge in essence stems from the ing accuracy on public Text-to-SQL benchmarks fact that SQL is designed for effectively query- (e.g., ATIS, GeoQuery and WikiSQL) (Krishna- ing relational instead of for represent- murthy et al., 2017; Zhong et al., 2017; Xu et al., ing the meaning of NL (Kate, 2008). Hence, there 2017; Yaghmazadeh et al., 2017; Yu et al., 2018a; inevitably exists a mismatch between intents ex- Dong and Lapata, 2018; Wang et al., 2018; Hwang pressed in natural language and the implementa- et al., 2019). However, Yu et al. (2018c) yields tion details in SQL. We regard this challenge as a unsatisfactory performance of state-of-the-art ap- mismatch problem. ∗ Equal Contributions. Work done during an internship Secondly, given the cross-domain settings of at MSRA. Spider, there are a large number of out-of-domain

4524 Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 4524–4535 Florence, Italy, July 28 - August 2, 2019. c 2019 Association for Computational Linguistics (OOD) words. For example, 35% of words in database schemas on the development set do not ∷= � � | ∷= | �� | � occur in the schemas on the training set in Spi- er� � �� der. As a comparison, the number in WikiSQL is | � �� only 22%. The large number of OOD words poses ∷= � � � � � � � � � � � � ⋯ � another steep challenge in predicting columns in � ∷= � | � � ∷= � | � SQL queries (Yu et al., 2018b), because the OOD �� ∷= �� �� | �� �� words usually lack of accurate representations in |>�|>�|<�|<� |≥�|≥�|=�|=� neural models. We regard this challenge as a lexi- | ≠ � | ≠ � | � cal problem. � � � � | � � | � � � ∷= m � min � � In this work, we propose a neural approach, � � � | � called IRNet, towards tackling the mismatch prob- � ∷= lem and the lexical problem with intermediate Figure 2: ∷=The context-free grammar of SemQL. representation and schema linking. Specifically, column ranges over distinct column names in a instead of end-to-end synthesizing a SQL query schema. ranges over tables in a schema. from a question, IRNet decomposes the synthe- sis process into three phases. In the first phase, IRNet performs a schema linking over a question and a schema. The goal of the schema linking is to recognize the columns and the tables men- tioned in a question, and to assign different types � ���� to the columns based on how they are mentioned � ���� ���� in the question. Incorporating the schema linking can enhance the representations of question and name � > � ≥ � schema, especially when the OOD words lack of friend accurate representations in neural models during grade� � highschooler friend testing. Then, IRNet adopts a grammar-based neu- ∗ ral model to synthesize a SemQL query, which is Figure 3: An illustrative example of SemQL. Its cor- an intermediate representation (IR) that we design responding question and SQL query are shown in Fig- to bridge NL and SQL. Finally, IRNet determin- ure 1. istically infers a SQL query from the synthesized SemQL query with domain knowledge. periments, learning to synthesize SemQL queries The insight behind IRNet is primarily inspired rather than SQL queries can substantially benefit by the success of using intermediate represen- other neural approaches for Text-to-SQL, such as tations (e.g., lambda calculus (Carpenter, 1997), SQLNet (Xu et al., 2017), TypeSQL (Yu et al., FunQL (Kate et al., 2005) and DCS (Liang et al., 2018a) and SyntaxSQLNet (Yu et al., 2018b). 2011)) in various semantic parsing tasks (Zelle Such results on the one hand demonstrate the ef- and Mooney, 1996; Berant et al., 2013; Pasupat fectiveness of SemQL in bridging NL and SQL. and Liang, 2015; Wang et al., 2017), and previ- On the other hand, it reveals that designing an ef- ous attempts in designing IR to decouple meaning fective intermediate representation to bridge NL representations of NL from database schema and and SQL is a promising direction to being there database management system (Woods, 1986; Al- for complex and cross-domain Text-to-SQL. shawi, 1992; Androutsopoulos et al., 1993). On the challenging Spider benchmark (Yu et al., 2018c), IRNet achieves 46.7% exact matching ac- 2 Approach curacy, obtaining 19.5% absolute improvement over previous state-of-the-art approaches. At the In this section, we present IRNet in detail. We first time of writing, IRNet achieves the first position describe how to tackle the mismatch problem and on the Spider leaderboard. When augmented with the lexical problem with intermediate representa- BERT (Devlin et al., 2018), IRNet reaches up to tion and schema linking. Then we present the neu- 54.7% accuracy. In addition, as we show in the ex- ral model to synthesize SemQL queries.

4525 2.1 Intermediate Representation a column is a of another table, there should be a foreign key constraint declared in the To eliminate the mismatch, we design a domain schema. This assumption usually holds as it is the specific language, called SemQL, which serves as best practice in database design. More than 95% an intermediate representation between NL and of examples in the training set of the Spider bench- SQL. Figure 2 presents the context-free grammar mark hold this assumption. The assumption forms of SemQL. An illustrative SemQL query is shown the basis of the inference. Take the inference of the in Figure 3. We elaborate on the design of SemQL FROM clause in a SQL query as an example. We in the following. first identify the shortest path that connects all the Inspired by lambda DCS (Liang, 2013), SemQL declared tables in a SemQL query in the schema is designed to be tree-structured. This structure, on (A database schema can be formulated as an undi- the one hand, can effectively constrain the search rected graph, where vertex are tables and edges are during synthesis. On the other hand, in foreign key relations among tables). Joining all of the tree-structure nature of SQL (Yu et al., the tables in the path eventually builds the FROM 2018b; Yin and Neubig, 2018), following the same clause. Supplementary materials provide detailed structure also makes it easier to translate to SQL procedures of the inference and more examples of intuitively. SemQL queries. The mismatch problem is mainly caused by the implementation details in SQL queries and miss- 2.2 Schema Linking ing specification in questions as discussed in Sec- tion 1. Therefore, it is natural to hide the imple- The goal of schema linking in IRNet is to recog- mentation details in the intermediate representa- nize the columns and the tables mentioned in a tion, which forms the basic idea of SemQL. Con- question, and assign different types to the columns sidering the example in Figure 3, the GROUPBY, based on how they are mentioned in the question. HAVING and FROM clauses in the SQL query are Schema linking is an instantiation of entity link- eliminated in the SemQL query, and the conditions ing in the context of Text-to-SQL, where entity in WHERE and HAVING are uniformly expressed is referred to columns, tables and cell values in in the subtree of Filter in the SemQL query. The a database. We use a simple yet effective string- implementation details can be deterministically in- match based method to implement the linking. In ferred from the SemQL query in the later infer- the followings, we illustrate how IRNet performs ence phase with domain knowledge. For example, schema linking in detail based on the assumption a column in the GROUPBY clause of a SQL query that the cell values in a database are not available. usually occurs in the SELECT clause or it is the As a whole, we define three types of entities that primary key of a table where an aggregate func- may be mentioned in a question, namely, table, tion is applied to one of its columns. column and value, where value stands for a cell In addition, we strictly require to declare the ta- value in the database. In order to recognize enti- ble that a column belongs to in SemQL. As illus- ties, we first enumerate all the n-grams of length trated in Figure 3, the column ‘name’ along with 1-6 in a question. Then, we enumerate them in the its table ‘friend’ are declared in the SemQL query. descending order of length. If an n-gram exactly The declaration of tables helps to differentiate du- matches a column name or is a subset of a col- plicated column names in the schema. We also de- umn name, we recognize this n-gram as a column. clare a table for the special column ‘*’ because we The recognition of table follows the same way. If observe that ‘*’ usually aligns with a table men- an n-gram can be recognized as both column and tioned in a question. Considering the example in table, we prioritize column. If an n-gram begins Figure 3, the column ‘*’ in essence aligns with the and ends with a single quote, we recognize it as table ‘friend’, which is explicitly mentioned in the value. Once an n-gram is recognized, we will re- question. Declaring a table for ‘*’ also helps infer move other n-grams that overlap with it. To this the FROM clause in the next inference phase. end, we can recognize all the entities mentioned When it comes to inferring a SQL query from in a question and obtain a non-overlap n-gram se- a SemQL query, we perform the inference based quence of the question by joining those recognized on an assumption that the definition of a database n-grams and the remaining 1-grams. We refer each schema is precise and complete. Specifically, if n-gram in the sequence as a span and assign each

4526 Column: year book tittle title Memory Schema Encoder Type: Exact Exact Partial Match Match ⋯ Match Table: book club movie 1 book title ⋯ 2 year

�� ��

book title ⋯ year Memory Memory Decoder Schema Schema

book club year book club book club ApplyRule ApplyRule ApplyRule ApplyRule SelectColumn SelectTable ApplyRule SelectColumn SelectTable ApplyRule SelectColumn SelectTable � ∷= ∷= � � ∷= � � � ≔ � � ≔ � ≔ � ……

Embedding � NL Encoder� Encoder Units Question: Show the book titles and years for all books in descending order by year Decoder Units Type: none none Column none Column none none Table none none none none Column

Figure 4: An overview of the neural model to synthesize SemQL queries. Basically, IRNet is constituted by an NL encoder, a schema encoder and a decoder. As shown in the figure, the column ‘book title’ is selected from the schema, while the second column ‘year’ is selected from the memory. span a type according to its entity. For example, a decision first on whether to select from memory if a span is recognized as column, we will assign or not, which sets it apart from the vanilla pointer it a type COLUMN. Figure 4 depicts the schema network (Vinyals et al., 2015). The motivation be- linking results of a question. hind the memory augmented pointer network is For those spans recognized as column, if they that the vanilla pointer network is prone to select- exactly match the column names in the schema, ing same columns according to our observations. we assign these columns a type EXACT MATCH, NL Encoder. Let x=[(x1,τ1), ··· , (xL,τL)] de- otherwise a type PARTIAL MATCH. To link the note the non-overlap span sequence of a question, th cell value with its corresponding column in the where xi is the i span and τi is the type of span xi schema, we first query the value span in Concept- assigned in schema linking. The NL encoder takes Net (Speer and Havasi, 2012) which is an open, x as input and encodes x into a sequence of hidden large-scale knowledge graph and search the re- states Hx. Each word in xi is converted into its sults returned by ConceptNet over the schema. We embedding vector and its type τi is also converted only consider the query results in two categories into an embedding vector. Then, the NL encoder of ConceptNet, namely, ‘is a type of’ and ‘related takes the average of the type and word embeddings i terms’, as we observe that the column that a cell as the span embedding ex. Finally, the NL en- value belongs to usually occurs in these two cate- coder runs a bi-directional LSTM (Hochreiter and gories. If there exists a result exactly or partially Schmidhuber, 1997) over all the span embeddings. matches a column name in the schema, we as- The output hidden states of the forward and back- sign the column a type VALUE EXACT MATCH ward LSTM are concatenated to construct Hx. or VALUE PARTIAL MATCH. Schema Encoder. Let s=(c,t) denote a database schema, where c={(c1,φi), ··· , (cn,φn)} is the 2.3 Model set of distinct columns and their types that we as- We present the neural model to synthesize SemQL sign in schema linking, and t={t1, ··· ,tm} is the queries, which takes a question, a database schema set of tables. The schema encoder takes s as input and the schema linking results as input. Figure 4 and outputs representations for columns Ec and depicts the overall of the model via an tables Et. We take the column representations as illustrative example. an example below. The construction of table rep- To address the lexical problem, we consider resentations follows the same way except that we the schema linking results when constructing rep- do not assign a type to a table in schema linking. resentations for the question and columns in the Concretely, each word in ci is first converted schema. In addition, we design a memory aug- into its embedding vector and its type φi is also mented pointer network for selecting columns dur- converted into an embedding vector ϕi. Then, the ing synthesis. When selecting a column, it makes schema encoder takes the average of word embed-

4527 i dings as the initial representations eˆc for the col- and be recorded in the memory. The probability umn. The schema encoder further performs an of selecting a column c is calculated as follows. attention over the span embeddings and obtains i p(ai = SELECTCOLUMN[c]|x,s,a

4528 the database split for evaluations, where 206 Approach Dev Test databases are split into training, develop- Seq2Seq 1.9% 3.7% 146 20 Seq2Seq + Attention 1.8% 4.8% ment and 40 testing. There are 8625, 1034, 2147 Seq2Seq + Copying 4.1% 5.3% question-SQL query pairs for training, develop- TypeSQL 8.0% 8.2% SQLNet 10.9% 12.4% ment and testing. Just like any competition bench- SyntaxSQLNet 18.9% 19.7% mark, the test set of Spider is not publicly avail- SyntaxSQLNet(augment) 24.8% 27.2% able, and our models are submitted to the data IRNet 53.2% 46.7% BERT owner for testing. We evaluate IRNet and other SyntaxSQLNet(BERT) 25.0% 25.4% approaches using SQL Exact Matching and Com- IRNet(BERT) 61.9% 54.7% ponent Matching proposed by Yu et al. (2018c). Table 1: Exact matching accuracy on SQL queries. Baselines. We also evaluate the sequence-to-

90 sequence model (Sutskever et al., 2014) aug- SyntaxSQLNet SyntaxSQLNet(BERT) 80 IRNet mented with a neural attention mechanism (Bah- IRNet(BERT) danau et al., 2014) and a copying mechanism (Gu 70 et al., 2016), SQLNet (Xu et al., 2017), TypeSQL 60 (Yu et al., 2018a), and SyntaxSQLNet (Yu et al., 50 40 2018b) which is the state-of-the-art approach on 30 the Spider benchmark. 20 F1 Scores of Component Matching (%) Implementations. We implement IRNet and the 10

0 baseline approaches with PyTorch (Paszke et al., SELECT KEYWORDS ORDER BY GROUP BY WHERE 2017). Dimensions of word embeddings, type em- Component beddings and hidden vectors are set to 300. Word Figure 5: F1 scores of component matching of Syn- embeddings are initialized with Glove (Penning- taxSQLNet, SyntaxSQLNet(BERT), IRNet and IR- ton et al., 2014) and shared between the NL en- Net(BERT) on the test set. coder and schema encoder. They are fixed during training. The dimension of action embedding and ure to illustrate the architecture of the encoder. To node type embedding are set to 128 and 64, re- establish baseline, we also augment SyntaxSQL- spectively. The dropout rate is 0.3. We use Adam Net with BERT. Note that we only use the base (Kingma and Ba, 2014) with default hyperparam- version of BERT due to the resource limitations. eters for optimization. Batch size is set to 64. We do not perform any data augmentation for BERT. Language model pre-training has shown to fair comparison. All our code are publicly avail- be effective for learning universal language repre- able. 1 sentations. To further study the effectiveness of our approach, inspired by SQLova (Hwang et al., 3.2 Experimental Results 2019), we leverage BERT (Devlin et al., 2018) Table 1 presents the exact matching accuracy of to encode questions, database schemas and the IRNet and various baselines on the development schema linking results. The decoder remains the set and the test set. IRNet clearly outperforms all same as in IRNet. Specifically, the sequence of the baselines by a substantial margin. It obtains spans in the question are concatenated with all the 27.0% absolute improvement over SyntaxSQLNet distinct column names in the schema. Each col- on the test set. It also obtains 19.5% absolute umn name is separated with a special token [SEP]. improvement over SyntaxSQLNet(augment) that BERT takes the concatenation as input. The rep- performs large-scale data augmentation. When in- resentation of a span in the question is taken as the corporating BERT, the performance of both Syn- average hidden states of its words and type. To taxSQLNet and IRNet is substantially improved construct the representation of a column, we first and the accuracy gap between them on both the run a bi-directional LSTM (BI-LSTM) over the development set and the test set is widened. hidden states of its words. Then, we take the sum To study the performance of IRNet in detail, fol- of its type embedding and the final hidden state of lowing Yu et al. (2018b), we measure the average the BI-LSTM as the column representation. The F1 score on different SQL components on the test construction of table representations follows the same way. Supplementary material provides a fig- 1https://github.com/zhanzecheng/IRNet

4529 Extra Technique IRNet IRNet(BERT) Approach Easy Medium Hard Hard Base model 40.5% 53.9% SyntaxSQLNet 38.6% 17.6% 16.3% 4.9% +SL 48.5% 60.3% SyntaxSQLNet +SL + MEM 51.3% 60.6% 42.9% 24.9% 21.9% 8.6% (BERT) +SL + MEM + CF 53.2% 61.9% IRNet 70.1% 49.2% 39.5% 19.1% IRNet(BERT) 77.2% 58.7% 48.1% 25.3% Table 4: Ablation study results. Base model means that we does not use schema linking (SL), memory aug- Table 2: Exact matching accuracy of SyntaxSQL- mented pointer network (MEM) and the coarse-to-fine Net, SyntaxSQLNet(BERT), IRNet and IRNet(BERT) framework (CF) on IRNet. on the test set by hardness .

Approach SQL SemQL Seq2Seq 1.9% 11.4%(+9.5) their slot-filling based models only support a sub- Seq2Seq + Attention 1.8% 14.7%(+12.9) set of SemQL queries. The notable improvement, Seq2Seq + Copying 4.1% 18.5%(+14.1) TypeSQL 8.0% 14.4%(+6.4) on the one hand, demonstrates the effectiveness of SQLNet 10.9% 17.5%(+6.6) SemQL. On the other hand, it shows that design- SyntaxSQLNet 18.9% 27.5%(+8.6) ing an intermediate representations to bridge NL BERT SyntaxSQLNet(BERT) 25.0% 35.8%(+10.8) and SQL is promising in Text-to-SQL.

Table 3: Exact matching accuracy on the develop- ment set. The header ‘SQL’ means that the approaches 3.3 Ablation Study are learned to generate SQL queries, while the header ‘SemQL’ indicates that they are learned to generate We conduct ablation studies on IRNet and IR- SemQL queries. Net(BERT) to analyze the contribution of each de- sign choice. Specifically, we first evaluate a base set. We compare between SyntaxSQLNet and IR- model that does not apply schema linking (SL) and Net. As shown in Figure 5, IRNet outperforms the coarse-to-fine framework (CF), and replace the SyntaxSQLNet on all components. There are at memory augment pointer network (MEM) with least 18.2% absolute improvement on each com- the vanilla pointer network (Vinyals et al., 2015). ponent except KEYWORDS. When incorporating Then, we gradually apply each component on the BERT, the performance of IRNet on each com- base model. The ablation study is conducted on ponent is further boosted, especially in WHERE the development set. clause. Table 4 presents the ablation study results. It We further study the performance of IRNet is clear that our base model significantly outper- on different portions of the test set according to forms SyntaxSQLNet, SyntaxSQLNet(augment) the hardness levels of SQL defined in Yu et al. and SyntaxSQLNet(BERT). Performing schema (2018c). As shown in Table 2, IRNet significantly linking (‘+SL’) brings about 8.5% and 6.4% ab- outperforms SyntaxSQLNet in all four hardness solute improvement on IRNet and IRNet(BERT). levels with or without BERT. For example, com- Predicting columns in the WHERE clause is known pared with SyntaxSQLNet, IRNet obtains 23.3% to be challenging (Yavuz et al., 2018). The F1 absolute improvement in Hard level. score on the WHERE clause increases by 12.5% To investigate the effectiveness of SemQL, we when IRNet performs schema linking. The signif- alter the baseline approaches and let them learn to icant improvement demonstrates the effectiveness generate SemQL queries rather than SQL queries. of schema linking in addressing the lexical prob- As shown in Table 3, there are at least 6.6% and lem. Using the memory augmented pointer net- up to 14.4% absolute improvements on accuracy work (‘+MEM’) further improves the performance of exact matching on the development set. For ex- of IRNet and IRNet(BERT). We observe that the ample, when SyntaxSQLNet is learned to generate vanilla pointer network is prone to selecting same SemQL queries instead of SQL queries, it registers columns during synthesis. The number of ex- 8.6% absolute improvement and even outperforms amples suffering from this problem decreases by SyntaxSQLNet(augment) which performs large- 70%, when using the memory augmented pointer scale data augmentation. The relatively limited network. At last, adopting the coarse-to-fine improvement on TypeSQL and SQLNet is because framework (‘+CF’) can further boost performance.

4530 3.4 Error Analysis 4 Discussion

To understand the source of errors, we analyze 483 Performance Gap. There exists a performance failed examples of IRNet on the development set. gap on IRNet between the development set and the We identify three major causes for failures: test set, as shown in Table 1. Considering the ex- plosive combination of nested queries in SQL and Column Prediction. We find that 32.3% of failed the limited number of data (1034 in development, examples are caused by incorrect column predic- 2147 in test), the gap is probably caused by the dif- tions based on cell values. That is, the correct ferent distributions of the SQL queries in Hard and column name is not mentioned in a question, but Extra level. To verify the hypothesis, we construct the cell value that belongs to it is mentioned. As a pseudo test set from the official training set. We the study points out (Yavuz et al., 2018), the cell train IRNet on the remaining data in the training values of a database are crucial in order to solve set and evaluate them on the development set and this problem. 15.2% of the failed examples fail the pseudo test set, respectively. We find that even to predict correct columns that partially appear in though the pseudo set has the same number of questions or appear in their synonym forms. Such complicated SQL queries (Hard and Extra Hard) failures may can be further resolved by combining with the development set, there still exists a per- our string-match based method with embedding- formance gap. Other approaches do not exhibit the match based methods (Krishnamurthy et al., 2017) performance gap because of their relatively poor to improve the schema linking. performance on the complicated SQL queries. For Nested Query. 23.9% of failed examples are example, SyntaxSQLNet only achieves 4.6% on caused by the complicated nested queries. Most the SQL queries in Extra Hard level on test set. of these examples are in the Extra Hard level. In Supplementary material provides detailed experi- the current training set, the number of SQL queries mental settings and results on the pseudo test set. in Extra Hard level (∼20%) is the least, even less Limitations of SemQL. There are a few limita- than the SQL queries in Easy level (∼23%). In tions of our intermediate representation. Firstly, view of the extremely large search space of the it cannot support the self join in the FROM clause complicated SQL queries, data augmentation tech- of SQL. In order to support the self join, the niques may be indispensable. variable mechanism in lambda calculus (Carpen- Operator. 12.4% of failed examples make mis- ter, 1997) or the scope mechanism in Discourse take in the operator as it requires common knowl- Representation Structure (Kamp and Reyle, 1993) edge to predict the correct one. Considering the may be necessary. Secondly, SemQL has not com- following question, ‘Find the name and member- pletely eliminated the mismatch between NL and ship level of the visitors whose membership level SQL yet. For example, the INTERSECT clause is higher than 4, and sort by their age from old to in SQL is often used to express disjoint condi- young’, the phrase ‘from old to young’ indicates tions. However, when specifying requirements, that sorting should be conducted in descending or- end users rarely concern about whether two con- der. The operator defined here includes aggregate ditions are disjointed or not. Despite the limita- functions, operators in the WHERE clause and the tions of SemQL, experimental results demonstrate sorting orders (ASC and DESC). its effectiveness in Text-to-SQL. To this end, we Other failed examples cannot be easily catego- argue that designing an effective intermediate rep- rized into one of the categories above. A few of resentation to bridge NL and SQL is a promising them are caused by the incorrect FROM clause, be- direction to being there for complex and cross- cause the ground truth SQL queries join those ta- domain Text-to-SQL. We leave a better interme- bles without foreign key relations defined in the diate representation as one of our future works. schema. This violates our assumption that the def- inition of a database schema should be precise and 5 Related Work complete. Natural Language Interface to Database. The When incorporated with BERT, 30.5% of failed task of Natural Language Interface to Database examples are fixed. Most of them are in the Col- (NLIDB) has received significant attention since umn Prediction and Operator category, but the im- the 1970s (Warren and Pereira, 1981; Androut- provement on Nested Query is quite limited. sopoulos et al., 1995; Popescu et al., 2004; Hallett,

4531 2006; Giordani and Moschitti, 2012). Most of the 6 Conclusion early proposed systems are hand-crafted to a spe- cific database (Warren and Pereira, 1982; Woods, We present a neural approach SemQL for com- 1986; Hendrix et al., 1978), making it challeng- plex and cross-domain Text-to-SQL, aiming to ad- ing to accommodate cross-domain settings. Later dress the lexical problem and the mismatch prob- work focus on building a system that can be reused lem with schema linking and intermediate repre- for multiple databases with minimal human ef- sentation. Experimental results on the challeng- forts (Grosz et al., 1987; Androutsopoulos et al., ing Spider benchmark demonstrate the effective- 1993; Tang and Mooney, 2000). Recently, with ness of IRNet. the development of advanced neural approaches Acknowledgments on Semantic Parsing and the release of large-scale, cross-domain Text-to-SQL benchmarks such as We would like to thank Bo Pang and Tao Yu for WikiSQL (Zhong et al., 2017) and Spider (Yu evaluating our submitted models on the test set of et al., 2018c), there is a renewed interest in the the Spider benchmark. Ting Liu is the correspond- task (Xu et al., 2017; Iyer et al., 2017; Sun et al., ing author. 2018; Gur et al., 2018; Yu et al., 2018a,b; Wang et al., 2018; Finegan-Dollak et al., 2018; Hwang et al., 2019). Unlike these neural approaches that References end-to-end synthesize a SQL query, IRNet first Hiyan Alshawi. 1992. The core language engine. MIT synthesizes a SemQL query and then infers a SQL press. query from it. I. Androutsopoulos, G. Ritchie, and P. Thanisch. 1993. Intermediate Representations in NLIDB. Early Masque/sql: An efficient and portable natural lan- proposed systems like as LUNAR (Woods, 1986) guage query interface for relational databases. In and MASQUE (Androutsopoulos et al., 1993) also Proceedings of the 6th International Conference on propose intermediate representations (IR) to rep- Industrial and Applications of Artifi- cial Intelligence and Expert Systems, pages 327– resent the meaning of questions and then translate 330. Gordon & Breach Science Publishers. it into SQL queries. The predicates in these IRs are designed for a specific database, which sets Ion Androutsopoulos, Graeme D Ritchie, and Peter SemQL apart. SemQL targets a wide adoption and Thanisch. 1995. Natural language interfaces to databases–an introduction. Natural language engi- no human effort is needed when it is used in a new neering, 1:29–81. domain. Li and Jagadish (2014) propose a query tree in their NLIDB system to represent the mean- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Ben- ing of a question and it mainly serves as an inter- gio. 2014. Neural machine translation by jointly arXiv preprint action medium between users and their system. learning to align and translate. arXiv:1409.0473. Version 7. Entity Linking. The insight behind performing schema linking is partly inspired by the success Jonathan Berant, Andrew Chou, Roy Frostig, and Percy Liang. 2013. Semantic parsing on freebase from of incorporating entity linking in knowledge base question-answer pairs. In Proceedings of the 2013 question answering and semantic parsing (Yih Conference on Empirical Methods in Natural Lan- et al., 2016; Krishnamurthy et al., 2017; Yu et al., guage Processing, pages 1533–1544. Association 2018a; Herzig and Berant, 2018; Kolitsas et al., for Computational Linguistics. 2018). In the context of semantic parsing, Krish- James Bornholt, Emina Torlak, Dan Grossman, and namurthy et al. (2017) propose a neural entity link- Luis Ceze. 2016. Optimizing synthesis with metas- ing module for answering compositional questions ketches. In Proceedings of the 43rd Annual on semi-structured tables. TypeSQL (Yu et al., ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 775–788. ACM. 2018a) proposes to utilize type information to bet- ter understand rare entities and numbers in ques- Bob Carpenter. 1997. Type-logical semantics. MIT tions. Similar to TypeSQL, IRNet also recognizes press. the columns and tables mentioned in a question. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and What sets IRNet apart is that IRNet assigns dif- Kristina Toutanova. 2018. Bert: Pre-training of deep ferent types to the columns based on how they are bidirectional transformers for language understand- mentioned in the question. ing. arXiv preprint arXiv:1810.04805. Version 1.

4532 Li Dong and Mirella Lapata. 2018. Coarse-to-fine de- Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, coding for neural semantic parsing. In Proceedings Jayant Krishnamurthy, and Luke Zettlemoyer. 2017. of the 56th Annual Meeting of the Association for Learning a neural semantic parser from user feed- Computational Linguistics, pages 731–742. Associ- back. In Proceedings of the 55th Annual Meeting ation for Computational Linguistics. of the Association for Computational Linguistics, pages 963–973. Association for Computational Lin- Catherine Finegan-Dollak, Jonathan K. Kummerfeld, guistics. Li Zhang, Karthik Ramanathan, Sesh Sadasivam, Rui Zhang, and Dragomir Radev. 2018. Improving Hans Kamp and U. Reyle. 1993. From Discourse text-to-sql evaluation methodology. In Proceedings to Logic Introduction to Modeltheoretic Semantics of the 56th Annual Meeting of the Association for of Natural Language, Formal Logic and Discourse Computational Linguistics, pages 351–360. Associ- Representation Theory. ation for Computational Linguistics. Rohit Kate. 2008. Transforming meaning represen- Alessandra Giordani and Alessandro Moschitti. 2012. tation grammars to improve semantic parsing. In Generating sql queries using natural language syn- Proceedings of the Twelfth Conference on Compu- tactic dependencies and . In Proceedings tational Natural Language Learning, pages 33–40. of the 17th International Conference on Applica- Coling 2008 Organizing Committee. tions of Natural Language Processing and Informa- tion Systems, pages 164–170. Springer-Verlag. Rohit J. Kate, Yuk Wah Wong, and Raymond J. Mooney. 2005. Learning to transform natural to for- Barbara J. Grosz, Douglas E. Appelt, Paul A. Martin, mal languages. In Proceedings of the 20th National and Fernando C. N. Pereira. 1987. Team: An experi- Conference on Artificial Intelligence, pages 1062– ment in the design of transportable natural-language 1068. AAAI Press. interfaces. Artificial Intelligence, 32:173–243. Diederik P Kingma and Jimmy Ba. 2014. Adam: A Jiatao Gu, Zhengdong Lu, Hang Li, and Victor O.K. method for stochastic optimization. arXiv preprint Li. 2016. Incorporating copying mechanism in arXiv:1412.6980. Version 9. sequence-to-sequence learning. In Proceedings of the 54th Annual Meeting of the Association for Com- Nikolaos Kolitsas, Octavian-Eugen Ganea, and putational Linguistics, pages 1631–1640. Associa- Thomas Hofmann. 2018. End-to-end neural entity tion for Computational Linguistics. linking. In Proceedings of the 22nd Conference on Computational Natural Language Learning, Izzeddin Gur, Semih Yavuz, Yu Su, and Xifeng Yan. pages 519–529. Association for Computational 2018. Dialsql: Dialogue based structured query Linguistics. generation. In Proceedings of the 56th Annual Meet- ing of the Association for Computational Linguis- Jayant Krishnamurthy, Pradeep Dasigi, and Matt Gard- tics, pages 1339–1349. Association for Computa- ner. 2017. Neural semantic parsing with type con- tional Linguistics. straints for semi-structured tables. In Proceedings of the 2017 Conference on Empirical Methods in Nat- Catalina Hallett. 2006. Generic querying of relational ural Language Processing, pages 1516–1526. Asso- databases using natural language generation tech- ciation for Computational Linguistics. niques. In Proceedings of the Fourth International Natural Language Generation Conference, pages Fei Li and H. V. Jagadish. 2014. Constructing an 95–102. Association for Computational Linguistics. interactive natural language interface for relational databases. Proceedings of the VLDB Endowment, Gary G. Hendrix, Earl D. Sacerdoti, Daniel Sagalow- 8:73–84. icz, and Jonathan Slocum. 1978. Developing a natu- ral language interface to complex data. ACM Trans- Chen Liang, Jonathan Berant, Quoc Le, Kenneth D. actions on Database Systems, 3:105–147. Forbus, and Ni Lao. 2017. Neural symbolic ma- chines: Learning semantic parsers on freebase with Jonathan Herzig and Jonathan Berant. 2018. Decou- weak supervision. In Proceedings of the 55th An- pling structure and lexicon for zero-shot semantic nual Meeting of the Association for Computational parsing. In Proceedings of the 2018 Conference on Linguistics, pages 23–33. Association for Computa- Empirical Methods in Natural Language Process- tional Linguistics. ing, pages 1619–1629. Association for Computa- tional Linguistics. Percy Liang. 2013. Lambda dependency-based compositional semantics. arXiv preprint Sepp Hochreiter and Jurgen¨ Schmidhuber. 1997. Long arXiv:1309.4408. Version 2. short-term memory. Neural computation, 9:1735– 1780. Percy Liang, Michael Jordan, and Dan Klein. 2011. Learning dependency-based compositional seman- Wonseok Hwang, Jinyeung Yim, Seunghyun Park, and tics. In Proceedings of the 49th Annual Meeting of Minjoon Seo. 2019. A comprehensive exploration the Association for Computational Linguistics: Hu- on wikisql with table-aware word contextualization. man Language Technologies, pages 590–599. Asso- arXiv preprint arXiv:1902.01069. Version 1. ciation for Computational Linguistics.

4533 Panupong Pasupat and Percy Liang. 2015. Compo- Oriol Vinyals, Meire Fortunato, and Navdeep Jaitly. sitional semantic parsing on semi-structured tables. 2015. Pointer networks. In Advances in Neural In- In Proceedings of the 53rd Annual Meeting of the formation Processing Systems 28, pages 2692–2700. Association for Computational Linguistics and the Curran Associates, Inc. 7th International Joint Conference on Natural Lan- guage Processing, pages 1470–1480. Association Chenglong Wang, Alvin Cheung, and Rastislav Bodik. for Computational Linguistics. 2017. Synthesizing highly expressive sql queries from input-output examples. In Proceedings of the Adam Paszke, Sam Gross, Soumith Chintala, Gre- 38th ACM SIGPLAN Conference on Programming gory Chanan, Edward Yang, Zachary DeVito, Zem- Language Design and Implementation, pages 452– ing Lin, Alban Desmaison, Luca Antiga, and Adam 466. ACM. Lerer. 2017. Automatic differentiation in pytorch. In NIPS-W. Chenglong Wang, Kedar Tatwawadi, Marc Brockschmidt, Po-Sen Huang, Yi Mao, Olek- Jeffrey Pennington, Richard Socher, and Christopher sandr Polozov, and Rishabh Singh. 2018. Robust Manning. 2014. Glove: Global vectors for word text-to-sql generation with execution-guided de- representation. In Proceedings of the 2014 Con- coding. arXiv preprint arXiv:1807.03100. Version ference on Empirical Methods in Natural Language 3. Processing, pages 1532–1543. Association for Com- putational Linguistics. David H. D. Warren and Fernando Pereira. 1981. Eas- ily adaptable system for interpreting natural lan- Ana-Maria Popescu, Alex Armanasu, Oren Etzioni, guage queries. American Journal of Computational David Ko, and Alexander Yates. 2004. Modern nat- Linguistics, 8:110–122. ural language interfaces to databases: Composing statistical parsing with semantic tractability. In Pro- David H. D. Warren and Fernando C. N. Pereira. 1982. ceedings of the 20th International Conference on An efficient easily adaptable system for interpreting Computational Linguistics. Association for Compu- natural language queries. Computational Linguis- tational Linguistics. tics, 8:110–122.

Armando Solar-Lezama. 2008. Program Synthesis W A Woods. 1986. Readings in natural language pro- by Sketching. Ph.D. thesis, Berkeley, CA, USA. cessing. chapter Semantics and Quantification in AAI3353225. Natural Language Question Answering, pages 205– 248. Morgan Kaufmann Publishers Inc. Robert Speer and Catherine Havasi. 2012. Represent- ing general relational knowledge in conceptnet 5. Xiaojun Xu, Chang Liu, and Dawn Song. 2017. Sqlnet: In Proceedings of the Eighth International Confer- Generating structured queries from natural language ence on Language Resources and Evaluation, pages without reinforcement learning. arXiv preprint 3679–3686. European Language Resources Associ- arXiv:1711.04436. Version 1. ation. Navid Yaghmazadeh, Yuepeng Wang, Isil Dillig, and Yibo Sun, Duyu Tang, Nan Duan, Jianshu Ji, Guihong Thomas Dillig. 2017. Sqlizer: Query synthesis from Cao, Xiaocheng Feng, Bing Qin, Ting Liu, and Ming natural language. Proceedings of the ACM on Pro- Zhou. 2018. Semantic parsing with syntax- and gramming Languages, 1:63:1–63:26. table-aware sql generation. In Proceedings of the 56th Annual Meeting of the Association for Compu- Semih Yavuz, Izzeddin Gur, Yu Su, and Xifeng Yan. tational Linguistics, pages 361–372. Association for 2018. What it takes to achieve 100 percent con- Computational Linguistics. dition accuracy on wikisql. In Proceedings of the 2018 Conference on Empirical Methods in Natural Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Language Processing, pages 1702–1711. Associa- Sequence to sequence learning with neural net- tion for Computational Linguistics. works. In Proceedings of the 27th International Conference on Neural Information Processing Sys- Wen-tau Yih, Matthew Richardson, Chris Meek, Ming- tems, pages 3104–3112. MIT Press. Wei Chang, and Jina Suh. 2016. The value of se- mantic parse labeling for knowledge base question Lappoon R. Tang and Raymond J. Mooney. 2000. Au- answering. In Proceedings of the 54th Annual Meet- tomated construction of database interfaces: Inte- ing of the Association for Computational Linguis- grating statistical and relational learning for seman- tics, pages 201–206. Association for Computational tic parsing. In Proceedings of the 2000 Joint SIG- Linguistics. DAT Conference on Empirical Methods in Natu- ral Language Processing and Very Large Corpora: Pengcheng Yin and Graham Neubig. 2017. A syntactic Held in Conjunction with the 38th Annual Meet- neural model for general-purpose code generation. ing of the Association for Computational Linguis- In Proceedings of the 55th Annual Meeting of the As- tics, pages 133–141. Association for Computational sociation for Computational Linguistics, pages 440– Linguistics. 450. Association for Computational Linguistics.

4534 Pengcheng Yin and Graham Neubig. 2018. Tranx: A transition-based neural abstract syntax parser for se- mantic parsing and code generation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstra- tions, pages 7–12. Association for Computational Linguistics. Tao Yu, Zifan Li, Zilin Zhang, Rui Zhang, and Dragomir Radev. 2018a. Typesql: Knowledge- based type-aware neural text-to-sql generation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computa- tional Linguistics: Human Language Technologies, pages 588–594. Association for Computational Lin- guistics.

Tao Yu, Michihiro Yasunaga, Kai Yang, Rui Zhang, Dongxu Wang, Zifan Li, and Dragomir Radev. 2018b. Syntaxsqlnet: Syntax tree networks for com- plex and cross-domain text-to-sql task. In Proceed- ings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 1653–1663. Association for Computational Linguistics. Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, Zilin Zhang, and Dragomir Radev. 2018c. Spider: A large- scale human-labeled dataset for complex and cross- domain semantic parsing and text-to-sql task. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3911–3921. Association for Computational Linguis- tics.

John M. Zelle and Raymond J. Mooney. 1996. Learn- ing to parse database queries using inductive logic programming. In Proceedings of the Thirteenth Na- tional Conference on Artificial Intelligence, pages 1050–1055. AAAI Press.

Victor Zhong, Caiming Xiong, and Richard Socher. 2017. Seq2sql: Generating structured queries from natural language using reinforcement learning. CoRR, abs/1709.00103.

4535