
GPT-too: A Language-Model-First Approach for AMR-to-Text Generation Manuel Mager1∗ Ramon´ Fernandez Astudillo2 Tahira Naseem2 Md Arafat Sultan2 Young-Suk Lee2 Radu Florian2 Salim Roukos2 1 Institute for Natural Language Processing, University of Stuttgart, Germany 2 IBM Research AI, Yorktown Heights, NY 10598, USA [email protected] framon.astudillo,[email protected] ftnaseem, [email protected] Abstract use RNN encoders with dual graph representa- tions. Transformer-based seq2seq (Vaswani Abstract Meaning Representations (AMRs) et al., 2017) was first applied to AMR-to-text in are broad-coverage sentence-level semantic graphs. Existing approaches to generating text (Sinh and Le Minh, 2019). Zhu et al.(2019) from AMR have focused on training sequence- greatly improve over the prior state-of-the-art to-sequence or graph-to-sequence models on by modifying self-attention to account for AMR AMR annotated data only. In this paper, we graph structure. Using transformers has also been propose an alternative approach that combines recently explored by Wang et al.(2020) who pro- a strong pre-trained language model with cy- pose a mutli-head graph attention mechanism. cle consistency-based re-scoring. Despite the Pre-trained transformer representations (Rad- simplicity of the approach, our experimental ford et al., 2018; Devlin et al., 2019; Radford results show these models outperform all pre- vious techniques on the English LDC2017T10 et al., 2019) use transfer learning to yield pow- dataset, including the recent use of transformer erful language models that considerably outper- architectures. In addition to the standard eval- form the prior art. They have also shown great uation metrics, we provide human evalua- success when fine-tuned to particular text gener- tion experiments that further substantiate the ation tasks (See et al., 2019; Zhang et al., 2019; strength of our approach. Keskar et al., 2019). Given their success, it would 1 Introduction be desirable to apply pre-trained transformer mod- els to a graph-to-text task like AMR-to-text, but Abstract Meaning Representation (AMR) (Ba- the need for graph encoding precludes in princi- narescu et al., 2013) is a rooted, directed, acyclic ple that option. Feeding the network with some graph with labeled edges (relations) and nodes sequential representation of the graph, such as a (concepts) expressing “who is doing what to topological sorting, looses some of the graphs rep- whom”. AMR-to-text generates sentences repre- resentational power. Complex graph annotations, senting the semantics underlying an AMR graph. such as AMR, also contain many special symbols Initial works in AMR-to-text used transduc- and special constructs that departure from natural ers (Flanigan et al., 2016), phrase-based ma- language and may by not interpretable by a pre- chine translation (Pourdamghani et al., 2016) and trained language model. neural sequence-to-sequence (seq2seq) models In this paper we explore the possibility of di- with linearized graphs (Konstas et al., 2017). Cao rectly fine-tuning a pre-trained transformer lan- and Clark(2019) leverage constituency parsing guage model on a sequential representation of for generation. Beck et al.(2018) improve upon AMR graphs, despite the expected difficulties prior RNN graph encoding (Song et al., 2018) with listed above. For this we re-purpose a GPT-2 lan- Levi Graph Transformations. Damonte and Co- guage model (Radford et al., 2019) to yield an hen(2019) compare multiple representations and AMR-to-text system. We show that it is surpris- find graph encoders to be the best. Guo et al. ingly easy to fine-tune GPT-2 to learn AMR graph (2019) use RNN graph encoders with dense graph to text mapping that outperforms the previous convolutional encoding. Ribeiro et al.(2019) state-of-the-art on automatic evaluation metrics. ∗ This research was done during an internship at IBM Since a single graph AMR, graph corresponds to Research AI. multiple sentences with the same meaning, we 1846 Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 1846–1852 July 5 - 10, 2020. c 2020 Association for Computational Linguistics also provide human evaluation and semantic simi- also been used for filtering synthetic training data larity metric results (Zhang et al., 2020) which are for question answering (Alberti et al., 2019). Here less dependent on reference text. Human evalua- we propose the use of a cycle consistency measure tion and semantic similarity results highlight the to re-score the system outputs. positive impact of a strong language model strat- In particular, we take the top k sentences gen- egy. Finally we also introduce a simple re-scoring erated by our system from each gold AMR graph technique based on cycle-consistency that further and parse them using an off-the-shelf parser to ob- improves performance. tain a second AMR graph. We then re-score each sentence using the standard AMR parsing metric 2 Fine-tuning GPT-2 for conditional Smatch (Cai and Knight, 2013) by comparing the language generation gold and parsed AMRs. In order to fine-tune a generative model 4 Experimental setup (GPT-2; Radford et al.(2019)) for condi- tional text generation, prior works fine-tune the Following Previous works on AMR-to-text, we language model to predict target text starting Use the standard LDC2017T10 AMR corpus for from the additional source text as context. In our evaluation of the proposed model. This Corpus experiments, we found it beneficial to fine-tune contains 36,521 training instances of AMR graphs on the joint distribution of AMR and text instead in PENMAN notation and the corresponding texts. i.e. also reconstruct the source. Given a tokenized It also includes 1368 and 1371 development and sentence w1 ··· wN and the sequential AMR test instances, respectively. We tokenize each in- representation a1 ··· aM we maximized the joint put text using The JAMR toolkit (Flanigan et al., probability 2014). The concatenation of an AMR graph and the corresponding text is split into words, special symbols and sub-word units using the GPT-2 to- N Y kenizer. We add all arc labels seen in the train- p (w; a) = p (w j w ; a ) GPT-2 GPT-2 j 1:j−1 1:M ing set and the root node :root to the vocabu- j=1 lary of the GPT-2model, but we freeze the em- M Y bedding layer for training. We use the Hugging · pGPT-2(ai j a1:i−1) Face implementation of (Wolf et al., 2019) for i=1 GPT-2 small (GPT-2S), medium (GPT-2M) and A special separator token is added to mark the large (GPT-2L). Fine-tuning converges after 6 end of the sequential AMR representation. Spe- epochs, which takes just a few hours on a V100 cial AMR symbols that should not be interpreted GPU1. For cycle-consistency re-scoring we use an literally are assigned tokens from the GPT-2 un- implementation of Naseem et al.(2019) in Py- used token list. In addition to this, we also ob- Torch. For re-scoring experiments, we use a beam served that freezing the input embeddings when size of 15. fine-tuning had positive impact in performance. At test time, we provide the AMR as context as AMR input representation. we test three vari- in conventional conditional text generation: ants of AMR representation. First, a depth-first search (DFS) through the graph following Konstas et al.(2017), where the input sequence is the path w^j = arg maxfpGPT-2(wj j w1:j−1; a1:M )g followed in the graph. Second, to see if GPT-2 is wj in fact learning from the graph structure, we re- 3 Re-scoring via Cycle Consistency move all the edges from the DFS, keeping only the concept nodes. This has the effect of removing The general idea of cycle consistency is to assess the relation information between concepts, such as the quality of a system’s output based on how well subject/object relations. As a third option, we use an external ‘reverse’ system can reconstruct the in- the PENMAN representation without any modifi- put from it. In previous works, cycle-consistency cation. The three input representations are illus- based losses have been used as part of the training trated below: objective in machine translation (He et al., 2016) 1Code for this paper is available at: https:// and speech recognition (Hori et al., 2019). It has github.com/IBM/GPT-too-AMR2text 1847 Nodes recommend advocate-01 it Model Input BLEU chrF++ vigorous GPT-2S Rec. Only nodes AMR 9.45 41.59 DFS recommend :ARG1 advocate-01 GPT-2S Rec. Lin. AMR w/o edges. 11.35 43.25 :ARG1 it :manner vigorous GPT-2S Rec. Lin. AMR w/edges. 20.14 53.12 Penman (r / recommend-01 :ARG1 (a / GPT-2S Rec. Penman AMR 22.37 53.92 advocate-01 :ARG1 (i / it) GPT-2M Rec. Lin. AMR w/edges. 22.86 55.04 :manner (v / vigorous))) GPT-2M Rec. Penman AMR 27.99 61.26 Decoding. For generation, we experiment with Table 1: Results on the LDC2017T10 develop- greedy decoding, beam search, and nucleus sam- ment set using GPT-2 S(mall) and M(edium) with pling (Holtzman et al., 2019). For beam search, Rec(onstruction) loss (see x2) for different AMR rep- we explore beam sizes of 5, 10 and 15. As the resentations (see x4). system, in some cases, produces repetitive output Approach Decoding BLEU chrF++ at the end of the text, we additionally perform a GPT-2M Conditional Greedy 25.73 57.2 post-processing step to remove these occurrences. GPT-2M Rec.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages7 Page
-
File Size-