Zero-Shot Cross-lingual Semantic

Tom Sherborne and Mirella Lapata School of Informatics, University of Edinburgh [email protected], [email protected]

Abstract Liang, 2016; Dong and Lapata, 2016) with further modeling advances in multi-stage decoding (Dong Recent work in crosslingual semantic parsing has successfully applied and Lapata, 2018; Guo et al., 2019), schema linking to localize accurate parsing to new languages. (Shaw et al., 2019; Wang et al., 2019) and grammar However, these advances assume access to based decoding (Yin and Neubig, 2017; Lin et al., high-quality machine translation systems, and 2019). In addition to modeling developments, re- tools such as word aligners, for all test lan- cent work has also expanded to multi-lingual pars- guages. We remove these assumptions and ing. However, this has primarily required that par- study cross-lingual semantic parsing as a zero- allel training data is either available (Jie and Lu, shot problem without parallel data for 7 test 2014), or requires professional translation to gen- languages (DE, ZH, FR, ES, PT, HI, TR). We propose a multi-task encoder-decoder model erate (Susanto and Lu, 2017a). This creates an to transfer parsing knowledge to additional entry barrier to localizing a semantic parser which languages using only English-Logical form may not be necessary. Sherborne et al.(2020) and paired data and unlabeled, monolingual utter- Moradshahi et al.(2020) explore the utility of ma- ances in each test language. We train an en- chine translation, as a cheap alternative to human coder to generate language-agnostic represen- translation, for training data but found translation tations jointly optimized for generating logical artifacts as a performance limiting factor. Zhu et al. forms or utterance reconstruction and against language discriminability. Our system frames (2020) and Li et al.(2021) both examine zero-shot zero-shot parsing as a latent-space alignment spoken language understanding and observed a sig- problem and finds that pre-trained models can nificant performance penalty from cross-lingual be improved to generate logical forms with transfer from English to lower resource languages. minimal cross-lingual transfer penalty. Ex- Cross-lingual generative semantic parsing, as perimental results on Overnight and a new opposed to the slot-filling format, has been under- executable version of MultiATIS++ find that our zero-shot approach performs above back- explored in the zero-shot case. This challenging translation baselines and, in some cases, ap- task combines the outstanding difficulty of struc- proaches the supervised upper bound. tured prediction, for accurate parsing, with a la- tent space alignment requirement, wherein mul- 1 Introduction tiple languages should encode to an overlapping semantic representation. Prior work has identified arXiv:2104.07554v1 [cs.CL] 15 Apr 2021 Executable semantic parsing translates a natural language utterance to a logical form for execution this penalty from cross-lingual transfer (Artetxe in some knowledge base to return a denotation. The et al., 2020; Zhu et al., 2020; Li et al., 2021) that parsing task realizes an utterance as a semantically- is insufficiently overcome with pre-trained mod- identical, but machine-interpretable, expression els alone. While there has been some success in grounded in a denotation. The transduction be- machine-translation based approaches, we argue tween natural and formal languages has allowed that inducing a shared multilingual space without semantic parsers to become critical infrastructure parallel data is superior because (a) this nullifies in the pipeline of human-computer interfaces for the introduction of translation or word alignment from structured data (Berant errors and (b) this approach scales to low-resource et al., 2013; Liang, 2016; Kollar et al., 2018). languages without reliable machine translation. Sequence-to-sequence approaches have proven In this work, we propose a method of zero-shot capable in producing high quality parsers (Jia and executable semantic parsing using only mono- lingual data for cross-lingual transfer of parsing erative trees (Lu, 2014; Susanto and Lu, 2017b) knowledge. Our approach uses paired English- and LSTM-based sequence-to-sequence models logical form data for the parsing task and adapts to (Susanto and Lu, 2017a). This work has largely additional languages using auxiliary tasks trained affirmed the benefit of “high-resource” multi- on unlabeled monolingual corpora. Our motivation language ensemble training (Jie and Lu, 2014). is to accurately parse languages, for which paired Code-switching in multilingual parsing has also training data is unseen, to examine if any transla- been explored through mixed-language training tion is required for accurate parsing. The objective datasets (Duong et al., 2017; Einolghozati et al., of this work is to parse utterances in some language, 2021). For adapting a parser to a new language, l, without observing paired training data, (xl, y), machine translation has been explored as mostly suitable machine translation, word alignment or reasonable proxy for in-language data (Sherborne bilingual dictionaries between l and English. Us- et al., 2020; Moradshahi et al., 2020). However, ing a multi-task objective, our system adapts pre- machine translation, in either direction, can intro- trained language models to generate logical forms duce limiting artifacts (Artetxe et al., 2020) and from multiple languages with a minimized penalty generalisation is limited due to a “parallax” error for cross-lingual transfer from English to German between gold test utterances and “translationese” (DE), Chinese (ZH), French (FR), Spanish (ES), training data (Riley et al., 2020). Portuguese (PT), Hindi (HI) and Turkish (TR). Zero-shot parsing has primarily focused on ‘cross-domain’ challenges to improve parsing 2 Related Work across varying query structure and lexicons (Herzig Cross-lingual Modeling This area has recently and Berant, 2018; Givoli and Reichart, 2019) or gained increasing interest across a range of natural different databases (Zhong et al., 2020; Suhr et al., language understanding settings, with benchmarks 2020). The Spider dataset (Yu et al., 2018) for- such as XGLUE (Liang et al., 2020) and XTREME malises this challenge with unseen tables at test (Hu et al., 2020) providing classification and gen- time for zero-shot generalisation. Combining zero- eration benchmarks across a range of languages shot parsing with cross-lingual modeling has been (Zhao et al., 2020). There has been significant inter- examined for the UCCA formalism (Hershcovich est in cross-lingual approaches to dependency pars- et al., 2019) and for task-oriented parsing (see be- ing (Tiedemann et al., 2014; Schuster et al., 2019), low). Generally, we find that cross-lingual exe- sentence simplification (Mallinson et al., 2020) and cutable parsing has been under-explored in the zero- spoken-language understanding (SLU; He et al., shot case. 2013; Upadhyay et al., 2018). Within this, pre- Executable parsing contrasts to more abstract training has shown to be widely beneficial for cross- meaning expressions (AMR, λ-calculus) or hybrid lingual transfer using models such as multilingual logical forms, such as decoupled TOP (Li et al., BERT (Devlin et al., 2019) or XLM-Roberta (Con- 2021), which cannot be executed to retrieve deno- neau et al., 2020a). tations. Datasets such as ATIS (Dahl et al., 1994) Broadly, pre-trained models trained on massive have both executable parsing and spoken language corpora purportedly learn an overlapping cross- understanding format. We focus on only the for- lingual latent space (Conneau et al., 2020b) but mer, as a generation task, and note that results for have also been identified as under-trained for some the latter classification task are not comparable. tasks (Li et al., 2021). A subset of cross-lingual Dialogue Modeling Usage of MT has also been modeling has focused on engineering alignment of extended to a task-oriented setting using the MTOP multi-lingual word representations (Conneau et al., dataset (Li et al., 2021), employing a pointer- 2017; Artetxe and Schwenk, 2018; Hu et al., 2021) generator network (See et al., 2017) to generate for tasks such as dependency parsing (Schuster dialogue-act style logical forms. This has been et al., 2019; Xu and Koehn, 2021) and word align- combined with adjacent SLU tasks, on MTOP ment (Jalili Sabet et al., 2020; Dou and Neubig, and MultiATIS++ (Xu et al., 2020), to combine 2021). cross-lingual task-oriented parsing and SLU clas- Semantic Parsing For parsing, there has been sification. Generally, these approaches addition- recent investigation into multilingual parsing us- ally require word alignment to project annotations ing multiple language ensembles using hybrid gen- between languages. Prior zero-shot cross-lingual work in SLU (Li et al., 2021; Zhu et al., 2020; Kr- tion space to minimize the penalty of cross-lingual ishnan et al., 2021) similarly identifies a penalty transfer and improve parsing of languages without for cross-lingual transfer and finds that pre-trained training data. To generate this latent space, we models and machine translation can only partially posit that only unpaired monolingual data in the mitigate this error. target language, and some pre-trained encoder, are Compared with similar exploration of cross- required. We remove the assumption that machine lingual parsing such as Xu et al.(2020) and Li translation is suitable and study the inverse case et al.(2021), the zero-shot case is our primary fo- wherein only paired data in English and monolin- cus. Our study assumes a case of no paired data gual data in target languages are available. This in the test and our proposed approach is more sim- frames the cross-lingual parsing task as one of la- ilar to Mallinson et al.(2020) and Xu and Koehn tent representation alignment only, to explore a (2021) in that we objectify the convergence of pre- possible upper bound of parsing accuracy without trained representations for a downstream task. Our errors from external dependencies. approach is also similar to work in zero-resource The benefit of our zero-shot method is that our (Firat et al., 2016) or unsupervised machine trans- approach requires only external corpora in the lation with monolingual corpora (Lample et al., test language. Using only English paired data 2018). Contrastingly, our approach is not pair- and monolingual corpora, we can generate log- wise between languages owing to our single multi- ical forms above back-translation baselines and lingual latent representation. compete with fully supervised in-language train- ing. For example, consider the German test of the 3 Problem Formulation Overnight dataset (Wang et al., 2015; Sherborne et al., 2020) which lacks German paired training The primary challenge for cross-lingual parsing data. To competitively parse this test set, our ap- is learning parameters, that can parse an utter- proach minimally requires only the original English ance, x, from any test language. Typically, a training data and collection of unlabeled German parser trained on language l, or multiple lan- utterances. guages {l1, l2, . . . , lN }, is only capable for these languages and performs poorly outside this set. For 4 Method a new language, conventional approaches requires parallel datasets and models (Jie and Lu, 2014; We adopt a multi-task sequence-to-sequence model Haas and Riezler, 2016; Duong et al., 2017). (Luong et al., 2016) combining logical form gen- eration with two auxiliary objectives. The first In our work, zero-shot parsing refers to parsing is a monolingual reconstruction loss, similar to utterances in languages without paired data during domain-adaptive pre-training (Gururangan et al., training. For some language, l, there exists no pair- 2020), and the second is a language identification ing of xl to a logical form, y, except for English. task. We describe each model component below: During training, the model only generates logical forms conditioned on English input data. Other Generating Logical Forms Predicting logical languages are incorporated into the model using forms is the primary output objective for the sys- auxiliary objectives detailed in Section4. Zero- tem. For an utterance x = (x1, x2, . . . , xT ), we de- shot parsing happens at test time, when a logical sire to predict a logical form y = (y1, y2, . . . , yM ) form is predicted conditioned upon an input ques- representing the same meaning in a machine- tion from any test language. executable language. We model this transduction Our approach improves the zero-shot case using task using a sequence-to-sequence neural network only monolingual objectives to parse additional lan- (Sutskever et al., 2014) based upon the Transformer guages beyond English1 without semantic parsing architecture (Vaswani et al., 2017). training data. We explore a hypothesis that a multi- The sequence x is encoded to a latent represen- lingual latent space can be learned through auxil- tation z = (z1, z2, . . . , zT ) through Equation (1) iary tasks in tandem with logical form generation. using a stacked self-attention Transformer encoder, We desire to learn a language-agnostic representa- E, with weights θE. The conditional probability of the output sequence y is expressed as Equation 1English acts as the “source” language in our work as the source language for all explored datasets. We express all other (2) as each token yi is autoregressively generated languages as “target” languages. based upon z and prior outputs, y

z = E (x | θE) (1) Language Prediction We augment the model M with a third objective designed to encourage Y p (y | x) = p (yi | y

(x, y)∈SLF the LP network, however, we add a gradient re- versal layer in the backward pass prior to the LP Reconstructing Utterances The secondary ob- network to encourage the encoder to produce lan- jective encourages multi-lingual semantic represen- guage invariant representations (Ganin et al., 2016). tations in encoding space, z, in Equation (1), using The LP network is optimized to discriminate the an additional decoder to recover a noisy input. This source language from z, but the encoder is now co-adaptation strategy steers a latent space suitable optimized against this objective2. Our intuition for both decoders, improving the parsing accuracy here is that discouraging language discriminabil- of languages without training logical forms. An ity in z encourages latent representation similarity utterance, x, is input to the encoder, E, and a re- across languages, and therefore reduce the penalty construction decoder, DNL, then tries to recover for cross-lingual transfer. x from the latent representation. We follow the denoising objective from Lewis et al.(2020) and LP (x) = Wdo G (Wdix + bdi) + bdo (9) replace x with noised input x˜ = N (x) for some !! 1 X noising function N. The output probability of re- p (l | x) = softmax LP z (10) T t construction, or denoising, is described in Equation t (6) with each token predicted through Equation (7) X LLP = − log p (l | x) (11) using decoder, DNL, with associated weights θDNL . x Combined Model The combined model uses a z = E (˜x | θE) (5) single encoder, E, and the three objective decoders T {D ,D , LP} to generate outputs. During train- Y LF NL p (x | x˜) = p (xi | x

p (xi|x

(1) DLF only 77.2 50.2 38.5 61.3 46.5 42.5 40.4 37.3 (2) DLF + DNL 77.7 61.1 51.2 62.7 58.2 57.5 49.5 44.7 (3) DLF + DNL + LP 76.3 67.1 59.2 66.9 58.2 60.5 54.1 47.1

Table 1: Denotation accuracy for ATIS (Dahl et al., 1994) in English (EN), German (DE), Chinese (ZH), French (FR), Spanish (ES), Portuguese (PT), Hindi (HI) and Turkish (TR) using data from Upadhyay et al.(2018); Xu et al.(2020). We report results for multiple systems: (1) using only English semantic parsing data, (2) Multi- task combined parsing and reconstruction and (3) Multitask parsing, reconstruction and language prediction. We compare to a back-translation baseline and monolingual upper-bound results where available.

Model EN DE ZH tives and encouraging language-agnostic encodings Baseline – we find the model can generate better latent repre- sentations for two typologically diverse languages Back-translation - 60.1 48.1 without guidance. This result suggests that our ap- Our model proach has a wider benefit to cross-lingual transfer

(1) DLF only 80.5 58.4 48.0 beyond the languages we explicitly desire to trans- (2) DLF + DNL 81.3 62.7 49.5 fer towards. We find our system improves above (3) DLF + DNL + LP 81.4 64.3 52.7 our baseline for Hindi but not Turkish. This could be another consequence of unbalanced pre-training, Table 2: Average denotation accuracy across all do- given the 1,327,206 Hindi sentences compared to mains for Overnight (Wang et al., 2015). for English 204,200 for Turkish. (EN), German (DE) and Chinese (ZH). We report re- sults for multiple systems: (1) using only English se- mantic parsing data, (2) Multi-task combined parsing Overnight Table2 similarly identifies improve- and reconstruction and (3) Multitask parsing, recon- ment for both German and Chinese using mono- struction and language prediction. We compare to a lingual reconstruction and language prediction ob- back-translation baseline for both ZH and DE. jectives. We observe smaller improvements be- tween Model (1) and Model (3) compared to ATIS, +5.9% for German and +4.7% for Chinese, which more to be gained in our downstream task. We may be a consequence of increased question com- find less improvement across our models in lan- plexity across Overnight domains. Results for indi- guages closer to English, finding only a +5.6% vidual domains are given in AppendixA for brevity, improvement for French and +11.7% for Spanish. however, we did not observe any strong trends However, only for these languages do we approach across domains in either language. Comparatively, the supervised upper-bound with −0.9% error for our best system is more accurate for German than French and a +0.5% gain for German. Given we Chinese by 9.6%, which may be another conse- used no semantic parsing data in these languages, quence of orthographic dissimilarity between lan- this represents significant competition to methods guages during pre-training and our approach. requiring dataset translation for success. Finally, we also note the capability of our model We also note that our system improves parsing for English appears mostly unaffected by our addi- on Hindi and Turkish, even though these are absent tional objectives. For both ATIS and Overnight, we from the auxiliary training objectives. The model observe no catastrophic forgetting (McCloskey and is not trained to reconstruct or identify either of Cohen, 1989) for the source language and, in some these languages, however, we find that our sys- cases, a marginal performance improvement from tem improves parsing regardless. By adapting our our multi-task objectives. While semantic parsing latent representation to multiple generation objec- for English is well studied and not the focus of our work, this result suggests there is minimal required cross-lingual structure in pretrained language mod- compromise in maintaining parsing accuracy for els. In Proceedings of the 58th Annual Meeting English and transferring this capability to other of the Association for Computational Linguistics, pages 6022–6034, Online. Association for Compu- languages. tational Linguistics.

8 Conclusion Deborah A. Dahl, Madeleine Bates, Michael Brown, William Fisher, Kate Hunicke-Smith, David Pallett, We present a new approach to zero-shot cross- Christine Pao, Alexander Rudnicky, and Elizabeth lingual semantic parsing for accurate parsing of Shriberg. 1994. Expanding the scope of the ATIS languages without paired training data. We define task: The ATIS-3 corpus. In Proceedings of the Workshop on Human Language Technology, HLT and evaluate a multi-task model combining logical ’94, pages 43–48, Stroudsburg, PA, USA. form generation with auxiliary objectives that re- quire only unlabeled, monolingual corpora. This Jacob Devlin, Ming-Wei Chang, Kenton Lee, and approach minimizes the error from cross-lingual Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language under- transfer and improves parsing accuracy across all standing. In Proceedings of the 2019 Conference of test languages. We demonstrate that an absence the North American Chapter of the Association for of semantic parsing data can be overcome through Computational Linguistics: Human Language Tech- aligning latent representations and extend this to nologies, Volume 1 (Long and Short Papers), pages examine languages also unseen during our multi- 4171–4186, Minneapolis, Minnesota. task alignment training. In the future, we plan to Li Dong and Mirella Lapata. 2016. Language to Log- explore additional objectives, such as translation, ical Form with Neural Attention. In Proceedings in our model and consider larger and more diverse of the 54th Annual Meeting of the Association for corpora for our auxiliary objectives. Computational Linguistics (Volume 1: Long Papers), pages 33–43, Stroudsburg, PA, USA.

Li Dong and Mirella Lapata. 2018. Coarse-to-Fine De- References coding for Neural Semantic Parsing. In Proceedings of the 56th Annual Meeting of the Association for Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2020. Computational Linguistics (Volume 1: Long Papers), Translation artifacts in cross-lingual transfer learn- pages 731–742, Melbourne, Australia. ing. arXiv preprint arXiv:2004.04721.

Mikel Artetxe and Holger Schwenk. 2018. Mas- Zi-Yi Dou and Graham Neubig. 2021. Word alignment sively multilingual sentence embeddings for zero- by fine-tuning embeddings on parallel corpora. shot cross-lingual transfer and beyond. ArXiv, abs/1812.10464. Long Duong, Hadi Afshar, Dominique Estival, Glen Pink, Philip Cohen, and Mark Johnson. 2017. Multi- Jonathan Berant, Andrew Chou, Roy Frostig, and Percy lingual Semantic Parsing And Code-Switching. In Liang. 2013. Semantic parsing on Freebase from Proceedings of the 21st Conference on Computa- question-answer pairs. In Proceedings of the 2013 tional Natural Language Learning, pages 379–389, Conference on Empirical Methods in Natural Lan- Vancouver, Canada. guage Processing, pages 1533–1544, Seattle, Wash- ington, USA. Arash Einolghozati, Abhinav Arora, Lorena Sainz- Maza Lecanda, Anuj Kumar, and Sonal Gupta. 2021. Alexis Conneau, Kartikay Khandelwal, Naman Goyal, El volumen louder por favor: Code-switching in Vishrav Chaudhary, Guillaume Wenzek, Francisco task-oriented semantic parsing. In EACL2021. As- Guzmán, Edouard Grave, Myle Ott, Luke Zettle- sociation for Computational Linguistics. moyer, and Veselin Stoyanov. 2020a. Unsupervised cross-lingual representation learning at scale. In Orhan Firat, Baskaran Sankaran, Yaser Al-Onaizan, Proceedings of the 58th Annual Meeting of the Asso- Fatos T. Yarman Vural, and Kyunghyun Cho. 2016. ciation for Computational Linguistics, pages 8440– Zero-Resource Translation with Multi-Lingual Neu- 8451, Online. Association for Computational Lin- ral Machine Translation. In Proceedings of the 2016 guistics. Conference on Empirical Methods in Natural Lan- guage Processing, pages 268–277, Stroudsburg, PA, Alexis Conneau, Guillaume Lample, Marc’Aurelio USA. Ranzato, Ludovic Denoyer, and Hervé Jégou. 2017. Word translation without parallel data. arXiv Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, preprint arXiv:1710.04087. Pascal Germain, Hugo Larochelle, François Lavio- lette, Mario Marchand, and Victor Lempitsky. 2016. Alexis Conneau, Shijie Wu, Haoran Li, Luke Zettle- Domain-adversarial training of neural networks. J. moyer, and Veselin Stoyanov. 2020b. Emerging Mach. Learn. Res., 17(1):2096–2030. Matt Gardner, Joel Grus, Mark Neumann, Oyvind Empirical Methods in Natural Language Processing, Tafjord, Pradeep Dasigi, Nelson F. Liu, Matthew E. pages 1619–1629, Brussels, Belgium. Peters, Michael Schmitz, and Luke S. Zettlemoyer. 2018. Allennlp: A deep semantic natural language Junjie Hu, Melvin Johnson, Orhan Firat, Aditya Sid- processing platform. ArXiv, abs/1803.07640. dhant, and Graham Neubig. 2021. Explicit Align- ment Objectives for Multilingual Bidirectional En- Ofer Givoli and Roi Reichart. 2019. Zero-shot seman- coders. In NAACL2021. tic parsing for instructions. In Proceedings of the 57th Annual Meeting of the Association for Com- Junjie Hu, Sebastian Ruder, Aditya Siddhant, Graham putational Linguistics, pages 4454–4464, Florence, Neubig, Orhan Firat, and Melvin Johnson. 2020. Italy. Association for Computational Linguistics. Xtreme: A massively multilingual multi-task bench- mark for evaluating cross-lingual generalization. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, Jayant Courville, and Yoshua Bengio. 2014. Generative ad- Krishnamurthy, and Luke Zettlemoyer. 2017. Learn- versarial nets. In Advances in Neural Information ing a neural semantic parser from user feedback. In Processing Systems, volume 27. Curran Associates, Proceedings of the 55th Annual Meeting of the As- Inc. sociation for Computational Linguistics (Volume 1: Long Papers), pages 963–973, Vancouver, Canada. Jiaqi Guo, Zecheng Zhan, Yan Gao, Yan Xiao, Jian-Guang Lou, Ting Liu, and Dongmei Zhang. Masoud Jalili Sabet, Philipp Dufter, François Yvon, 2019. Towards complex text-to-SQL in cross- and Hinrich Schütze. 2020. SimAlign: High qual- domain database with intermediate representation. ity word alignments without parallel training data us- In Proceedings of the 57th Annual Meeting of the ing static and contextualized embeddings. In Find- Association for Computational Linguistics, pages ings of the Association for Computational Linguis- 4524–4535, Florence, Italy. Association for Compu- tics: EMNLP 2020, pages 1627–1643, Online. As- tational Linguistics. sociation for Computational Linguistics.

Suchin Gururangan, Ana Marasovic,´ Swabha Robin Jia and Percy Liang. 2016. Data Recombination Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, for Neural Semantic Parsing. In Proceedings of the and Noah A. Smith. 2020. Don’t stop pretraining: 54th Annual Meeting of the Association for Compu- Adapt language models to domains and tasks. In tational Linguistics (Volume 1: Long Papers), pages Proceedings of the 58th Annual Meeting of the 12–22, Stroudsburg, PA, USA. Association for Computational Linguistics, pages 8342–8360, Online. Association for Computational Zhanming Jie and Wei Lu. 2014. Multilingual Seman- Linguistics. tic Parsing : Parsing Multiple Languages into Se- mantic Representations. In Proceedings of the 25th Carolin Haas and Stefan Riezler. 2016. A Corpus International Conference on Computational Linguis- and Semantic Parser for Multilingual Natural Lan- tics: Technical Papers, pages 1291–1301, Dublin, guage Querying of OpenStreetMap. In Proceedings Ireland. of the 2016 Conference of the North American Chap- ter of the Association for Computational Linguis- Diederik P. Kingma and Jimmy Ba. 2014. Adam: tics: Human Language Technologies, pages 740– A method for stochastic optimization. CoRR, 750, Stroudsburg, PA, USA. abs/1412.6980.

Xiaodong He, Li Deng, Dilek Hakkani-Tür, and Thomas Kollar, Danielle Berry, Lauren Stuart, Gokhan Tur. 2013. Multi-style adaptive training Karolina Owczarzak, Tagyoung Chung, Lambert for robust cross-lingual spoken language understand- Mathias, Michael Kayser, Bradford Snow, and Spy- ing. IEEE International Conference on Acoustics, ros Matsoukas. 2018. The Alexa meaning repre- Speech, and Signal Processing (ICASSP). sentation language. In Proceedings of the 2018 Conference of the North American Chapter of the Dan Hendrycks and Kevin Gimpel. 2020. Gaussian er- Association for Computational Linguistics: Human ror linear units (gelus). Language Technologies, Volume 3 (Industry Papers), pages 177–184, New Orleans - Louisiana. Daniel Hershcovich, Zohar Aizenbud, Leshem Choshen, Elior Sulem, Ari Rappoport, and Omri Jitin Krishnan, Antonios Anastasopoulos, Hemant Abend. 2019. Sem-Eval-2019 task 1: Cross-lingual Purohit, and Huzefa Rangwala. 2021. Multilin- semantic parsing with UCCA. In Proceedings gual code-switching for zero-shot cross-lingual in- of the 13th International Workshop on Semantic tent prediction and slot filling. Evaluation, pages 1–10, Minneapolis, Minnesota, USA. Tom Kwiatkowski, Jennimaria Palomaki, Olivia Red- field, Michael Collins, Ankur Parikh, Chris Alberti, Jonathan Herzig and Jonathan Berant. 2018. Decou- Danielle Epstein, Illia Polosukhin, Matthew Kelcey, pling Structure and Lexicon for Zero-Shot Semantic Jacob Devlin, Kenton Lee, Kristina N. Toutanova, Parsing. In Proceedings of the 2018 Conference on Llion Jones, Ming-Wei Chang, Andrew Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov. 2019. Natu- Jonathan Mallinson, Rico Sennrich, and Mirella Lap- ral questions: a benchmark for question answering ata. 2020. Zero-shot crosslingual sentence simplifi- research. Transactions of the Association of Compu- cation. In Proceedings of the 2020 Conference on tational Linguistics. Empirical Methods in Natural Language Process- ing (EMNLP), pages 5109–5126, Online. Associa- Tom Kwiatkowski, Luke Zettlemoyer, Sharon Gold- tion for Computational Linguistics. water, and Mark Steedman. 2011. Lexical Gener- alization in CCG Grammar Induction for Semantic Michael McCloskey and Neal J. Cohen. 1989. Catas- Parsing. In Proceedings of the 2011 Conference on trophic interference in connectionist networks: The Empirical Methods in Natural Language Processing, sequential learning problem. Psychology of Learn- pages 1512–1523, Edinburgh, Scotland, UK. ing and Motivation - Advances in Research and The- ory, 24(C):109–165. Guillaume Lample, Alexis Conneau, Ludovic Denoyer, and Marc’Aurelio Ranzato. 2018. Unsupervised Mehrad Moradshahi, Giovanni Campagna, Sina Sem- Machine Translation Using Monolingual Corpora nani, Silei Xu, and Monica Lam. 2020. Localizing Only. In ICLR2018. arXiv. open-ontology QA semantic parsers in a day using Mike Lewis, Yinhan Liu, Naman Goyal, Mar- machine translation. In Proceedings of the 2020 jan Ghazvininejad, Abdelrahman Mohamed, Omer Conference on Empirical Methods in Natural Lan- Levy, Veselin Stoyanov, and Luke Zettlemoyer. guage Processing (EMNLP), pages 5970–5983, On- 2020. BART: Denoising sequence-to-sequence pre- line. Association for Computational Linguistics. training for natural language generation, translation, Adam Paszke, Sam Gross, Soumith Chintala, Gregory and comprehension. In Proceedings of the 58th An- Chanan, Edward Yang, Zachary DeVito, Zeming nual Meeting of the Association for Computational Lin, Alban Desmaison, Luca Antiga, and Adam Linguistics, pages 7871–7880, Online. Association Lerer. 2017. Automatic differentiation in PyTorch. for Computational Linguistics. In NIPS Autodiff Workshop. Haoran Li, Abhinav Arora, Shuohui Chen, Anchit Gupta, Sonal Gupta, and Yashar Mehdad. 2021. Parker Riley, Isaac Caswell, Markus Freitag, and David Mtop: A comprehensive multilingual task-oriented Grangier. 2020. Translationese as a language in semantic parsing benchmark. “multilingual” NMT. In Proceedings of the 58th An- nual Meeting of the Association for Computational Percy Liang. 2016. Learning executable semantic Linguistics, pages 7737–7746, Online. Association parsers for natural language understanding. Com- for Computational Linguistics. mun. ACM, 59(9):68–76. Tal Schuster, Ori Ram, Regina Barzilay, and Amir Yaobo Liang, Nan Duan, Yeyun Gong, Ning Wu, Globerson. 2019. Cross-lingual alignment of con- Fenfei Guo, Weizhen Qi, Ming Gong, Linjun textual word embeddings, with applications to zero- Shou, Daxin Jiang, Guihong Cao, Xiaodong Fan, shot dependency parsing. In Proceedings of the Bruce Zhang, Rahul Agrawal, Edward Cui, Sining 2019 Conference of the North American Chapter of Wei, Taroon Bharti, Jiun-Hung Chen, Winnie Wu, the Association for Computational Linguistics: Hu- Shuguang Liu, Fan Yang, and Ming Zhou. 2020. man Language Technologies, Volume 1 (Long and Xglue: A new benchmark dataset for cross-lingual Short Papers), pages 1599–1613, Minneapolis, Min- pre-training, understanding and generation. nesota. Association for Computational Linguistics.

Kevin Lin, Ben Bogin, Mark Neumann, Jonathan Be- A. See, Peter J. Liu, and Christopher D. Manning. rant, and Matt Gardner. 2019. Grammar-based neu- 2017. Get to the point: Summarization with pointer- ral text-to- generation. CoRR, abs/1905.13326. generator networks. In ACL.

Yinhan Liu, Jiatao Gu, Naman Goyal, Xian Li, Sergey Peter Shaw, Philip Massey, Angelica Chen, Francesco Edunov, Marjan Ghazvininejad, Mike Lewis, and Piccinno, and Yasemin Altun. 2019. Generating log- Luke Zettlemoyer. 2020. Multilingual denoising ical forms from graph representations of text and pre-training for neural machine translation. entities. In Proceedings of the 57th Annual Meet- Shayne Longpre, Yi Lu, and Joachim Daiber. 2020. ing of the Association for Computational Linguistics, Mkqa: A linguistically diverse benchmark for multi- pages 95–106, Florence, Italy. Association for Com- lingual open domain question answering. putational Linguistics. Wei Lu. 2014. Semantic parsing with relaxed hybrid Tom Sherborne, Yumo Xu, and Mirella Lapata. 2020. trees. In Proceedings of the 2014 Conference on Bootstrapping a crosslingual semantic parser. In Empirical Methods in Natural Language Processing Findings of the Association for Computational Lin- (EMNLP), pages 1308–1318, Doha, Qatar. guistics: EMNLP 2020, pages 499–517, Online. As- sociation for Computational Linguistics. Thang Luong, Quoc V. Le, Ilya Sutskever, Oriol Vinyals, and Lukasz Kaiser. 2016. Multi-task se- Alane Suhr, Ming-Wei Chang, Peter Shaw, and Ken- quence to sequence learning. In International Con- ton Lee. 2020. Exploring unexplored generalization ference on Learning Representations. challenges for cross-database semantic parsing. In Proceedings of the 58th Annual Meeting of the Asso- Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. ciation for Computational Linguistics, pages 8372– Le, Mohammad Norouzi, Wolfgang Macherey, 8388, Online. Association for Computational Lin- Maxim Krikun, Yuan Cao, Qin Gao, Klaus guistics. Macherey, Jeff Klingner, Apurva Shah, Melvin John- son, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Raymond Hendy Susanto and Wei Lu. 2017a. Neural Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Architectures for Multilingual Semantic Parsing. In Stevens, George Kurian, Nishant Patil, Wei Wang, Proceedings of the 55th Annual Meeting of the As- Cliff Young, Jason Smith, Jason Riesa, Alex Rud- sociation for Computational Linguistics (Volume 2: nick, Oriol Vinyals, Greg Corrado, Macduff Hughes, Short Papers), pages 38–44, Stroudsburg, PA, USA. and Jeffrey Dean. 2016. Google’s neural machine translation system: Bridging the gap between human Raymond Hendy Susanto and Wei Lu. 2017b. Seman- and machine translation. CoRR, abs/1609.08144. tic parsing with neural hybrid trees. In AAAI Confer- ence on Artificial Intelligence, San Francisco, Cali- Haoran Xu and Philipp Koehn. 2021. Zero-shot cross- fornia, USA. lingual dependency parsing through contextual em- bedding transformation.

Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Weijia Xu, Batool Haider, and Saab Mansour. 2020. Sequence to sequence learning with neural networks. End-to-end slot alignment and recognition for cross- CoRR, abs/1409.3215. lingual NLU. In Proceedings of the 2020 Confer- ence on Empirical Methods in Natural Language Yuqing Tang, Chau Tran, Xian Li, Peng-Jen Chen, Na- Processing (EMNLP), pages 5052–5063, Online. As- man Goyal, Vishrav Chaudhary, Jiatao Gu, and An- sociation for Computational Linguistics. gela Fan. 2020. Multilingual translation with exten- sible multilingual pretraining and finetuning. Pengcheng Yin and Graham Neubig. 2017. A syntactic neural model for general-purpose code generation. Jörg Tiedemann, Željko Agic,´ and Joakim Nivre. 2014. In Proceedings of the 55th Annual Meeting of the As- Treebank translation for cross-lingual parser induc- sociation for Computational Linguistics (Volume 1: tion. In Proceedings of the Eighteenth Confer- Long Papers), pages 440–450, Vancouver, Canada. ence on Computational Natural Language Learning, pages 130–140, Ann Arbor, Michigan. Association Tao Yu, Rui Zhang, Kai Yang, Michihiro Yasunaga, for Computational Linguistics. Dongxu Wang, Zifan Li, James Ma, Irene Li, Qingning Yao, Shanelle Roman, Zilin Zhang, and Shyam Upadhyay, Manaal Faruqui, Gokhan Tur, Dilek Dragomir Radev. 2018. Spider: A Large-Scale Hakkani-Tur, and Larry Heck. 2018. (almost) zero- Human-Labeled Dataset for Complex and Cross- shot cross-lingual spoken language understanding. Domain Semantic Parsing and Text-to-SQL Task. In In Proceedings of the IEEE ICASSP. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Belgium. Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Mengjie Zhao, Yi Zhu, Ehsan Shareghi, Roi Reichart, Kaiser, and Illia Polosukhin. 2017. Attention is all Anna Korhonen, and Hinrich Schütze. 2020.A Advances in Neural Information Pro- you need. In closer look at few-shot crosslingual transfer: Vari- cessing Systems 30 , pages 5998–6008. Curran Asso- ance, benchmarks and baselines. ciates, Inc. Victor Zhong, Mike Lewis, Sida I. Wang, and Luke Bailin Wang, Richard Shin, Xiaodong Liu, Oleksandr Zettlemoyer. 2020. Grounded adaptation for zero- Polozov, and Matthew Richardson. 2019. Rat-sql: shot executable semantic parsing. In Proceedings of Relation-aware schema encoding and linking for the 2020 Conference on Empirical Methods in Nat- text-to-sql parsers. ural Language Processing (EMNLP), pages 6869– 6882, Online. Association for Computational Lin- Yushi Wang, Jonathan Berant, and Percy Liang. 2015. guistics. Building a Semantic Parser Overnight. In Proceed- ings of the 53rd Annual Meeting of the Association Qile Zhu, Haidar Khan, Saleh Soltan, Stephen Rawls, for Computational Linguistics and the 7th Interna- and Wael Hamza. 2020. Don’t parse, insert: Multi- tional Joint Conference on Natural Language Pro- lingual semantic parsing with insertion based decod- cessing (Volume 1: Long Papers), volume 1, pages ing. In Proceedings of the 24th Conference on Com- 1332–1342, Stroudsburg, PA, USA. putational Natural Language Learning, pages 496– 506, Online. Association for Computational Linguis- Thomas Wolf, Lysandre Debut, Victor Sanh, Julien tics. Chaumond, Clement Delangue, Anthony Moi, Pier- ric Cistac, Tim Rault, R’emi Louf, Morgan Funtow- A Additional Results icz, and Jamie Brew. 2019. Huggingface’s trans- formers: State-of-the-art natural language process- ing. ArXiv, abs/1910.03771. EN Avg. Ba. Bl. Ca. Ho. Pu. Res. Rec. So.

DLF only 80.5 90.0 66.7 82.7 76.7 75.8 87.7 83.3 80.9 DLF + DNL 81.3 88.5 60.9 83.3 78.8 83.9 88.6 83.8 82.6 DLF + DNL + LP 81.4 87.7 64.4 85.1 77.8 80.1 88.0 85.6 83.0

Table 3: Denotation accuracy for Overnight (Wang et al., 2015) using the source English data. Domains are Basketball, Blocks, Calendar, Housing, Publications, Recipes, Restaurants and Social Network.

DE Avg. Ba. Bl. Ca. Ho. Pu. Res. Rec. So. Baseline Back-translation 60.1 75.7 50.9 61.0 55.6 50.4 69.9 46.3 71.4 Our model

DLF only 58.4 70.3 51.1 61.9 54.0 49.7 65.4 42.1 73.1 DLF + DNL 62.7 73.1 56.1 66.1 58.7 49.7 70.2 57.9 70.1 DLF + DNL + LP 64.3 78.8 56.6 68.5 58.7 46.6 70.8 59.3 75.5

Table 4: Denotation accuracy for Overnight (Wang et al., 2015) using the German test set from Sherborne et al. (2020). Domains are Basketball, Blocks, Calendar, Housing, Publications, Recipes, Restaurants and Social Net- work.

ZH Avg. Ba. Bl. Ca. Ho. Pu. Res. Rec. So. Baseline Back-translation 48.1 62.3 39.6 49.8 43.1 48.3 51.4 29.2 61.2 Our model

DLF only 48.0 53.7 49.6 53.0 50.8 36.0 52.1 23.1 65.3 DLF + DNL 49.5 56.5 49.4 55.4 56.1 35.4 54.2 24.9 64.1 DLF + DNL + LP 52.7 59.1 50.1 67.9 54.5 41.6 59.6 20.8 67.6

Table 5: Denotation accuracy for Overnight (Wang et al., 2015) using the Chinese test set from Sherborne et al. (2020). Domains are Basketball, Blocks, Calendar, Housing, Publications, Recipes, Restaurants and Social Net- work.