Neural Semantic Parsing in Low-Resource Settings with Back-Translation and Meta-Learning
Total Page:16
File Type:pdf, Size:1020Kb
The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20) Neural Semantic Parsing in Low-Resource Settings with Back-Translation and Meta-Learning Yibo Sun,∗,1 Duyu Tang,2 Nan Duan,2 Yeyun Gong,2 Xiaocheng Feng,1 Bing Qin,1 Daxin Jiang3 1Harbin Institute of Technology, Harbin, China 2Microsoft Research Asia, Beijing, China 3Microsoft Search Technology Center Asia, Beijing, China {ybsun, xcfeng, qinb}@ir.hit.edu.cn {dutang, nanduan, yegong, djiang}@microsoft.com Abstract matching tables if necessary, but have no access to either an- notated programs or execution results. Our key idea is to use Neural semantic parsing has achieved impressive results in these rules to collect modest question-programs pairs as the recent years, yet its success relies on the availability of large starting point, and then leverage automatically generated ex- amounts of supervised data. Our goal is to learn a neural semantic parser when only prior knowledge about a limited amples to improve the accuracy and generality of the model. number of simple rules is available, without access to ei- This presents three challenges including how to generate ex- ther annotated programs or execution results. Our approach amples in an efficient way, how to measure the quality of is initialized by rules, and improved in a back-translation generated examples which might contain errors and noise, paradigm using generated question-program pairs from the and how to train a semantic parser that makes robust predic- semantic parser and the question generator. A phrase table tions for examples covered by rules and generalizes well to with frequent mapping patterns is automatically derived, also uncovered examples. updated as training progresses, to measure the quality of gen- erated instances. We train the model with model-agnostic We address the aforementioned challenges with a frame- meta-learning to guarantee the accuracy and stability on ex- work consisting of three key components. The first compo- amples covered by rules, and meanwhile acquire the versa- nent is a data generator. It includes a neural semantic pars- tility to generalize well on examples uncovered by rules. Re- ing model, which maps a natural language question to a pro- sults on three benchmark datasets with different domains and gram, and a neural question generation model, which maps a programs show that our approach incrementally improves the program to a natural language question. We learn these two accuracy. On WikiSQL, our best model is comparable to the models in a back-translation paradigm using pseudo parallel state-of-the-art system learned from denotations. examples, inspired by its big success on unsupervised neu- ral machine translation (Sennrich, Haddow, and Birch 2016; Lample et al. 2018). The second component is a quality Introduction controller, which is used for filtering out noise and errors Semantic parsing aims to map natural language questions to contained in the pseudo data. We construct a phrase table the logical forms of their underlying meanings, which can with frequent mapping patterns, therefore noise and errors be regarded as programs and executed to yield answers, aka with low frequency can be filtered out. A similar idea has denotations (Berant et al. 2013). In the past few years, neu- been worked as posterior regularization in neural machine ral network based semantic parsers have achieved promising translation (Zhang et al. 2017; Ren et al. 2019). The third performances (Liang et al. 2017), however, their success is component is a meta learner. Instead of transferring a model limited to the setting with rich supervision, which is costly to pretrained with examples covered by rules to the generated obtain. There have been recent attempts at low-resource se- examples, we leverage model-agnostic meta-learning (Finn, mantic parsing, including data augmentation methods which Abbeel, and Levine 2017), an elegant meta-learning algo- are learned from a small number of annotated examples rithm which has been successfully applied to a wide range (Guo et al. 2018), and methods for adapting to unseen do- of tasks including few-shot learning and adaptive control. mains while only being trained on annotated examples in We regard different data sources as different tasks, and use other domains. outputs of the quality controller for stable training. This work investigates neural semantic parsing in a low- We test our approach on three tasks with different pro- resource setting, in which case we only have our prior grams, including SQL (and SQL-like) queries for both knowledge about a limited number of simple mapping rules, single-turn and multi-turn questions over web tables (Zhong, including a small amount of domain-independent word-level Xiong, and Socher 2017; Iyyer, Yih, and Chang 2017), ∗This work was done during an internship at MSRA. and subject-predicate pairs over a large-scale knowledge Copyright c 2020, Association for the Advancement of Artificial graph (Bordes et al. 2015). The program for SQL queries Intelligence (www.aaai.org). All rights reserved. for single-turn questions and subject-predicate pairs over 8960 Algorithm 1 Low-Resource Neural Semantic Parsing with Back-Translation and MAML Require: Q: a set of natural language questions. LF : a collection of sampled logical forms. Require: r: a rule, if satisfied, maps q to lf. α, β: step size hyperparameters. 1: Apply r to Q, obtain training data D0 2: Use D0 to initialize θq→lf and θlf→q 3: while not done do 4: Apply fq→lf to Q, apply flf→q to LF 5: Use fq→lf (Q) and flf→q(LF ) to update the phrase table of the quality controller qc 6: Update θlf→q using D0 and qc(fq→lf (Q)) 7: for Task Ti ∈{r = T,r = F } do 8: Sample Di from {D0, qc(fq→lf (Q)), qc(flf→q(LF ))} for task Ti 9: Compute gradients using Di and L, update learner θi = θ − α∇θL(fθ) 10: Sample Di from {D0, qc(fq→lf (Q)), qc(flf→q(LF ))} for the meta-update D L = − ∇ L( ) 11: Compute gradients using i and , update meta-learner θ θ β θ fθi 12: end for r=T r=F 13: Update θq→lf and θq→lf with θ as initialization, respectively 14: end while knowledge graph is simple while the program for SQL simple mapping rules based on our prior knowledge about queries for multi-turn questions have top-tier complexity the task, and may also have a small amount of domain- among currently proposed tasks. Results show that our independent word-level matching tables if necessary. These approach yields large improvements over rule-based sys- rules are not perfect, with low coverage, and can even be in- tems, and incorporating different strategies incrementally correct for some situations. For instance, when predicting a improves the overall performance. On WikiSQL, our best SQL command in the first task, we have a prior knowledge performing system achieves execution accuracy of 72.7%, that (1) WHERE values potentially have co-occurring words comparable to a strong system learned from denotations with table cells; (2) the words “more” and “greater” tend to (Agarwal et al. 2019) with an accuracy of 74.8%. be mapped to WHERE operator “>”; (3) within a WHERE clause, header and cell should be in the same column; and Problem Statement (4) the word “average” tends to be mapped to aggregator “avg”. Similarly, when predicting a λ-calculus in the third We focus on the task of executive semantic parsing. The goal task, the entity name might be present in the question, and q is to map a natural language question/utterance to a logical among all the predicates connected to the entity, the predi- lf Wld form/program , which can be executed over a world cate with maximum number of co-occurred words might be a to obtain the correct answer . correct. We would like to study to what extent our model can We consider three tasks. The first task is single-turn table- achieve if we use rules as the starting point. based semantic parsing, in which case q is a self-contained question, lf is a SQL query in the form of “SELECT agg col1 WHERE col2 = val2 AND ...”, and Wld is a web ta- Learning Algorithm ble consisting of multiple rows and columns. We use Wik- We describe our approach for low-resource neural semantic iSQL (Zhong, Xiong, and Socher 2017) as the testbed for parsing in this section. this task. The second task is multi-turn table-based seman- We propose to train a neural semantic parser using back- tic parsing. Compared to the first task, q could be a follow- translation and meta-learning. The learning process is sum- up question, the meaning of which depends on the con- marized in Algorithm 1. We describe the three components versation history. Accordingly, lf in this task supports ad- in this section, namely back-translation, quality control, and ditional operations that copy previous turn lf to the cur- meta-learning. rent turn. We use SequentialQA (Iyyer, Yih, and Chang Wld 2017) for evaluation. In the third task, we change to Back-Translation a large-scale knowledge-graph (i.e. Freebase) and consider knowledge-based question answering for single-turn ques- Following the back-translation paradigm (Sennrich, Had- tions. We use SimpleQuestions (Bordes et al. 2015) as the dow, and Birch 2016; Lample et al. 2018), we have a se- testbed, where the lf is in the form of a simple λ-calculus mantic parser, which maps a natural language question q to like λx.predicate(subject, x), and the generation of lf is a logical form lf, and a question generator, which maps lf equivalent to the prediction of the predicate and the subject to q.