Deep Graph Convolutional Encoders for Structured Data to Text Generation
Total Page:16
File Type:pdf, Size:1020Kb
Deep Graph Convolutional Encoders for Structured Data to Text Generation Diego Marcheggiani1;2 Laura Perez-Beltrachini1 1ILCC, University of Edinburgh 2ILLC, University of Amsterdam [email protected] [email protected] 1 / 20 Structured Data-to-Text Generation agree Above the Veil country precededBy AM-TMP P A1 Australians month A0 . purchase followedBy Aenir Into Battle precededBy AINV A0 A1 followedBy last giant carrier Castle AINV The Violet Keystone the Above the Veil is an Australian novel and the Giant agreed last month to purchase sequel to Aenir and Castle . It was followed the carrier . by Into the Battle and The Violet Keystone . WebNLG [Gardent et al., 2017] SR11Deep [Belz et al., 2011] 2 / 20 Sequential Encoder-Decoder Models Above the Veil country precededBy Australians Above the Veil is an Australian novel and the followedBy Aenir Into Battle precededBy sequel to Aenir and Castle . It was followed followedBy by Into the Battle and The Violet Keystone . Castle The Violet Keystone 3 / 20 Graph Convolutional Network (GCN) encoder X Input encoding guided by the graph structure X Explicit encoding long-distance dependencies given by the graph X Requiere less amounts of data to learn them Directly Encoding Input Structure ? Sequential encoders, require a separate input linearisation step After training they will learn a “structure” representation However, input explicit structure is NOT directly exploited 4 / 20 X Input encoding guided by the graph structure X Explicit encoding long-distance dependencies given by the graph X Requiere less amounts of data to learn them Directly Encoding Input Structure ? Sequential encoders, require a separate input linearisation step After training they will learn a “structure” representation However, input explicit structure is NOT directly exploited Graph Convolutional Network (GCN) encoder 5 / 20 Directly Encoding Input Structure ? Sequential encoders, require a separate input linearisation step After training they will learn a “structure” representation However, input explicit structure is NOT directly exploited Graph Convolutional Network (GCN) encoder X Input encoding guided by the graph structure X Explicit encoding long-distance dependencies given by the graph X Requiere less amounts of data to learn them 6 / 20 Directed-Labelled Graph Convolutional Networks Message passing [Kipf and Welling, 2016] Edge directions, labels and importance [Marcheggiani and Titov, 2017] lab lab v lab lab lab ’ 0 X hv=ρ gu;v Wdir(u;v) hu + blab(u;v) u2N (v) 7 / 20 Sinlge Layer GCN Encoder GCN layer 1 W W A1 ‘ ‘ A1 AINV W W ‘ A0 W ‘ A0 W A0 ‘ A1 AINV W A1 W A0 W W Embedding layer agree giant purchase carrier the A0 A0 A1 AINV A1 8 / 20 Stacked Layers and Skip-Connections GCN layer 3 GCN layer 2 GCN layer 1 Embedding layer agree giant purchase carrier the A0 A0 A1 AINV A1 9 / 20 GCN Encoder-Decoder with Attention giant agreed to LSTM Decoder Attention + Mechanism <s> giant agreed GCN Encoder agree giant purchase carrier the [Luong et al., 2015] A0 A0 A1 AINV A1 10 / 20 precededBy precededBy subj obj subj obj Above the Veil Aenir Castle The representation of KB relations as entities enables Attention over them Reduces the number of KB relations to be modelled as network parameters Reification on Knowledge Base (KB) Graphs [Baader 2003] Above the Veil precededBy Aenir precededBy Castle 11 / 20 Reification on Knowledge Base (KB) Graphs [Baader 2003] Above the Veil precededBy precededBy precededBy subj obj subj obj Aenir precededBy Above the Veil Aenir Castle Castle The representation of KB relations as entities enables Attention over them Reduces the number of KB relations to be modelled as network parameters 12 / 20 Experimental Setup X Encoders Comparison X Existing Systems Comparison WebNLG [Gardent et al., 2017] GCN (4 layers +residual encoder, 1 layer decoder, 256 dim) LSTM (1 layer encoder, 1 layer decoder, 256 dim) + linearisation GCN + pre-trained Embeddings and Copy (GCN++) SR11Deep [Belz et al., 2011] GCN (7 layers +dense encoder, 1 layer decoder, 500 dim ) LSTM (1 layer encoder, 1 layer decoder, 500 dim ) + linearisation Encode morphological features present in the input (GCNmorph) ∗model selection done on the development set 13 / 20 GCN performance on WebNLG z Average BLEU-4 on 3 runs 0:61 z 0:6 LSTM 0:58 GCNz 0:56 0:55 GCN++ z 0:54 0:53 ++ 0:51 GCN Best PKUWriter 0:5 BLEU-4 Melbourne ADAPT 0:4 Encoders Comparison Systems Comparison *PKUWRITER,MELBOURNE and ADAPT neural systems participating on the WebNLG challenge 14 / 20 1 GCN performance on SR11Deep 0:79 0:81 0:8 LSTM GCN 0:65 0:67 GCN 0:6 morf STUMBA-D TBDIL 0:4 0:38 Avg. BLEU-4 0:2 0 Encoders Comparison Systems Comparison * STUMBA-D and TBDIL non-neural systems with pipeline of classifiers 15 / 20 1 Example Outputs WebNLG Input graph: (William Anders dateOfRetirement 1969 - 09 - 01) (William Anders was a crew member of Apollo 8) (Apollo 8 commander Frank Borman) (Apollo 8 backup pilot Buzz Aldrin) (LSTM) William Anders was a crew member of the OPERATOR operated Apollo 8 and retired on September 1st 1969 . (GCN) William Anders was a crew member of OPERATOR ’ s Apollo 8 alongside backup pilot Buzz Aldrin and backup pilot Buzz Aldrin. (GCN++) william anders , who retired on the 1st of september 1969 , was a crew member on apollo 8 along with commander frank borman and backup pilot buzz aldrin. 16 / 20 Example Outputs SR11Deep Reference: The economy ’s temperature will be taken from several vantage points this week , with readings on trade , output , housing and inflation . (LSTM) the economy ’s accords will be taken from several phases this week , housing and inflation readings on trade , housing and inflation . (GCN) the economy ’s temperatures will be taken from several vantage points this week , with reading on trades output , housing and inflation . 17 / 20 Future work Other input graph representations • Abstract Meaning Representations (AMR; [Banarescu et al., 2013]) • Scoped semantic representations [Van Noord et al., 2018] • Scene graphs [Schuster, et al., 2015 ] Multi-lingual training of GCN layers with universal dependencies [Mille, et al., 2017] Concluding Remarks GCN-based generation architecture that directly encodes explicit structure in the input GCN -based models outperform a sequential baseline on automatic evaluation • improve on over- and under- generation cases Relational inductive bias of the GCN encoder produces more informative representations of the input [Battaglia et al., 2018] 18 / 20 Concluding Remarks GCN-based generation architecture that directly encodes explicit structure in the input GCN -based models outperform a sequential baseline on automatic evaluation • improve on over- and under- generation cases Relational inductive bias of the GCN encoder produces more informative representations of the input [Battaglia et al., 2018] Future work Other input graph representations • Abstract Meaning Representations (AMR; [Banarescu et al., 2013]) • Scoped semantic representations [Van Noord et al., 2018] • Scene graphs [Schuster, et al., 2015 ] Multi-lingual training of GCN layers with universal dependencies [Mille, et al., 2017] 19 / 20 Code (PyTorch) + Data: github.com/diegma/graph-2-text Thank you! Questions? 20 / 20 Separate Input Linearisation Step Above the Veil country precededBy Australians followedBy Aenir Into Battle precededBy followedBy Castle The Violet Keystone Avobe the veil followedBy Into Battle j Avobe the veil followedBy the violet keystone j ... Avobe the veil country Australians j Avobe the veil followedBy Into Battle j ... Avobe the veil precededBy Aenir j Avobe the veil precededBy Castle j ... 21 / 20 Automatic Evaluation Encoder BLEU METEOR TER LSTM .526±.010 .38±.00 .43±.01 Encoder BLEU METEOR TER GCN .535±.004 .39±.00 .44±.02 LSTM .377±.007 .65±.00 .44±.01 ADAPT .606 .44 .37 GCN .647±.005 .77±.00 .24±.01 GCN++ .559±.017 .39±.01 0.41±.01 GCN+feat .666±.027 .76±.01 .25±.01 MELBOURNE .545 .41 .40 Table : Test results SR11Deep task. PKUWRITER .512 .37 .45 Table : Test results WebNLG task. 22 / 20 Ablation Study BLEU SIZE Model none res den none res den LSTM .543±.003 - - 4.3 - - GCN 1L .537±.006 - - 4.3 - - 2L .545±.016 .553±.005 .552±.013 4.5 4.5 4.7 3L .548±.012 .560±.013 .557±.001 4.7 4.7 5.2 4L .537±.005 .569±.003 .558±.005 4.9 4.9 6.0 5L .516±.022 .561±.016 .559±.003 5.1 5.1 7.0 6L .508±.022 .561±.007 .558±.018 5.3 5.3 8.2 7L .492±.024 .546±.023 .564±.012 5.5 5.5 9.6 23 / 20.