Surface Realisation Using Full Delexicalisation

Surface Realisation Using Full Delexicalisation Anastasia Shimorina Claire Gardent LORIA / Lorraine University LORIA / CNRS [email protected] [email protected] Abstract where function words are removed and syntactic relations are replaced with semantic ones. In this Surface realisation (SR) maps a meaning rep- paper, we focus on the shallow track of the SR’18 resentation to a sentence and can be viewed as consisting of three subtasks: word ordering, Shared Task and we propose a neural approach morphological inflection and contraction gen- which decomposes surface realisation into three eration (e.g., clitic attachment in Portuguese subtasks: word ordering, morphological inflection or elision in French). We propose a modular and contraction generation (e.g., clitic attachment approach to surface realisation which models in Portuguese or elision in French). We provide a each of these components separately, and eval- detailed analysis of how each of these phenomena uate our approach on the 10 languages covered (word order, morphological realisation and con- by the SR’18 Surface Realisation Shared Task shallow track. We provide a detailed evalua- traction) is handled by the model, and we discuss tion of how word order, morphological realisa- the differences between languages. tion and contractions are handled by the model For reproducibility, all our experiments includ- and an analysis of the differences in word or- ing data and scripts are available at https:// dering performance across languages. gitlab.com/shimorina/emnlp-2019. 1 Introduction 2 Related Work Surface realisation maps a meaning representation to a sentence. In data-to-text generation, it is part Early approaches for surface realisation adopted of a complex process aiming to select, compress statistical methods, including both pipelined and structure the input data into a text. In text- (Bohnet et al., 2010) and joint (Song et al., 2014; to-text generation, it can be used as a mean to Puduppully et al., 2017) architecture for word or- rephrase part or all of the input content. For in- dering and morphological generation. stance, Takase et al.(2016) used surface realisa- Multilingual SR’18 was preceded by the SR’11 tion to generate a summary based on the meaning surface realisation task for the English language representations of multiple input documents and only (Belz et al., 2011). The submitted sys- Liao et al.(2018) to improve neural machine trans- tems in 2011 had grammar-based and statisti- lation. cal nature, mostly relying on pipelined archi- By providing parallel data of sentences and their tecture. Recently, Marcheggiani and Perez- meaning representation, the SR’18 Surface Reali- Beltrachini(2018) proposed a neural end-to-end sation shared task (Mille et al., 2018) allows for a approach based on graph convolutional encoders detailed evaluation and comparison of surface re- for the SR’11 deep track. alisation models. Moreover, as it provides training The SR’18 shallow track received submissions and test data for multiple languages, it also allows from eight teams with seven of them dividing the for an analysis of how well these models handle task into two subtasks: word ordering and inflec- languages with different morphological and topo- tion. Only Elder and Hokamp(2018) developed a logical properties. joint approach, however, they participated only in The SR’18 shared task includes two tracks: a the English track. shallow track where the input is an unordered, For word ordering, five teams chose an ap- lemmatised dependency tree and a deep track proach based on neural networks, two used a 3086 Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pages 3086–3096, Hong Kong, China, November 3–7, 2019. c 2019 Association for Computational Linguistics classifier, and one team resorted to a language module for contraction handling, as done also be- model. As for the inflection subtask, five teams fore by Basile and Mazzei(2018). We also ad- applied neural techniques, two used lexicon-based dress all the ten languages proposed by the shared approaches, and one used an SMT system (Basile task and outline the importance of handling con- and Mazzei, 2018; Castro Ferreira et al., 2018; El- tractions. der and Hokamp, 2018; King and White, 2018; Madsack et al., 2018; Puzikov and Gurevych, 3 Data 2018; Singh et al., 2018; Sobrevilla Cabezudo and Pardo, 2018). Overall, neural components The SR’18 data (shallow track) is derived from were dominant across all the participants. How- the ten Universal Dependencies (UD) v2.0 tree- ever, official scores of the teams that went neural banks (Nivre et al., 2017) and consists of (T;S) greatly differ. Furthermore, two teams (Elder and pairs where S is a sentence, and T is the UD Hokamp, 2018; Sobrevilla Cabezudo and Pardo, dependency tree of S after word information has 2018) applied data augmentation, which makes been removed and tokens have been lemmatised. their results not strictly comparable to others. The languages are those shown in Table1 and the size of the datasets (training, dev and test) varies One of the interesting findings of the shared between 7,586 (Arabic) and 85,377 (Czech) in- task is reported by Elder and Hokamp(2018) who stances with most languages having around 12K showed that applying standard neural encoder- instances (for more details about the data see Mille decoder models to jointly learn word ordering and et al.(2018)). inflection is highly challenging; their sequence-to- sequence baseline without data augmentation got 4 Model 43.11 BLEU points on English. Our model differs from previous work in three As illustrated by Example1, surface realisation main ways. First, it performs word ordering on from SR’18 shallow meaning representations can fully delexicalised data. Delexicalisation has been be viewed as consisting of three main steps: word used previously but mostly to handle rare words, ordering, morphological inflection and contraction e.g. named entities. Here we argue that surface generation. For instance, given an unordered de- realisation and, in particular, word ordering works pendency tree whose nodes are labelled with lem- 1 better when delexicalising all input tokens. This mas and morphological features (1a) , the lemmas captures the intuition that word ordering is mainly must be assigned the appropriate order (1b), they determined by the syntactic structure of the input. must be inflected (1c) and contractions may take Second, we we provide a detailed evaluation of place (1d). how our model handles the three subtasks under- (1) a. the find be not meaning of life it about lying surface realisation. While all SR’18 partici- b. it be not about find the meaning of life pants provided descriptions of their models, not all c. It is n’t about finding the meaning of life of them performed an in-depth analysis of model d. It isn’t about finding the meaning of life performance. Exceptions are works of King and White(2018), who provided a separate evalua- We propose a neural architecture which explic- tion for the morphological realisation module, and itly integrates these three subtasks as three sepa- Puzikov and Gurevych(2018), who evaluated both rate modules into a pipeline: word ordering (WO) word ordering and inflection modules. However, is applied first, then morphological realisation it is not clear how each of those modules affect (WO+MR) and finally, contractions (WO+MR+C) the global performance when merged in the full are handled. pipeline. In contrast, we propose a detailed incre- mental evaluation of each component of the full 4.1 Word Ordering pipeline and show how each component impacts the final scores. Third, we introduce a linguistic For word ordering, we combine a factored analysis, based on the dependency relations, of the sequence-to-sequence model with an “extreme word ordering component, allowing for deeper er- delexicalisation” step which replaces matching ror analysis of the developed systems. source and target tokens with an identifier. Furthermore, our model explicitly integrates a 1Features and tree structures have been omitted. 3087 (a) Unordered Source Tree (b) Output Lemmas with Gold Parse Tree root root obj obj nsubj det det nsubj John eat the apple apple the John eat John eats the apple noun det pnoun verb pnoun verb det noun 2 3 4 1 4 1 3 2 Input: 2:noun:obj:1 3:det:DET:2 4:pnoun:nsubj:1 1:verb:root:0 Output: 4 1 3 2 Figure 1: Delexicalising and linearising (in the parse tree of the output sentence the first row shows the lemmas, the second–the word forms, the third–the POS tags and the fourth–the identifiers). Identifiers are assigned to the source tree nodes in the order given by depth-first search. Delexicalisation. Delexicalisation has fre- through depth-first, left-to-right traversal of the quently been used in neural NLG to help handle input tree, each training instance captures the unknown or rare items (Wen et al., 2015; Dusekˇ mapping between lemmas in the input tree and and Jurcicek, 2015; Chen et al., 2018). Rare items the same lemmas in the output sequence. For are replaced by placeholders both in the input and instance, given the example shown in Figure1, in the output; models are trained on the delexi- delexicalisation will yield the training instance: calised data; and a post-processing step ensures that the generated text is relexicalised using the Input: tkn2 tkn3 tkn4 tkn1 placeholders’ original value. In these approaches, Output: tkn4 tkn1 tkn3 tkn2 delexicalisation is restricted to rare items (named where tkni is the factored representation (see entities). In contrast, we apply delexicalisation below) of each delexicalised input node. to all input lemmas. Abstracting away from Factored Sequence-to-Sequence Model.

Load more