<<

Deep Graph Convolutional Encoders for Structured Data to Text Generation

Diego Marcheggiani1,2 Laura Perez-Beltrachini1

1ILCC, University of Edinburgh 2ILLC, University of Amsterdam [email protected] [email protected]

1 / 20 Structured Data-to-Text Generation

agree country precededBy AM-TMP P A1 Australians

month A0 . purchase

followedBy Aenir precededBy AINV A0 A1

followedBy last giant carrier

Castle AINV The Violet Keystone the

Above the Veil is an Australian novel and the Giant agreed last month to purchase sequel to Aenir and . It was followed the carrier . by Into the Battle and The Violet Keystone .

WebNLG [Gardent et al., 2017] SR11Deep [Belz et al., 2011] 2 / 20 Sequential Encoder-Decoder Models

Above the Veil country precededBy Australians Above the Veil is an Australian novel and the followedBy Aenir Into Battle precededBy sequel to Aenir and Castle . It was followed followedBy by Into the Battle and The Violet Keystone .

Castle The Violet Keystone

3 / 20 Graph Convolutional Network (GCN) encoder

X Input encoding guided by the graph structure X Explicit encoding long-distance dependencies given by the graph X Requiere less amounts of data to learn them

Directly Encoding Input Structure ? Sequential encoders, require a separate input linearisation step After training they will learn a “structure” representation However, input explicit structure is NOT directly exploited

4 / 20 X Input encoding guided by the graph structure X Explicit encoding long-distance dependencies given by the graph X Requiere less amounts of data to learn them

Directly Encoding Input Structure ? Sequential encoders, require a separate input linearisation step After training they will learn a “structure” representation However, input explicit structure is NOT directly exploited Graph Convolutional Network (GCN) encoder

5 / 20 Directly Encoding Input Structure ? Sequential encoders, require a separate input linearisation step After training they will learn a “structure” representation However, input explicit structure is NOT directly exploited Graph Convolutional Network (GCN) encoder

X Input encoding guided by the graph structure X Explicit encoding long-distance dependencies given by the graph X Requiere less amounts of data to learn them 6 / 20 Directed-Labelled Graph Convolutional Networks

Message passing [Kipf and Welling, 2016] Edge directions, labels and importance [Marcheggiani and Titov, 2017]

lab

lab

v

lab lab

lab ’

0  X  hv=ρ gu,v Wdir(u,v) hu + blab(u,v) u∈N (v)

7 / 20 Sinlge Layer GCN Encoder

GCN layer 1 W W A1 ‘ ‘ A1 AINV W W ‘ A0 W ‘ A0

W A0

‘ A1 AINV W A1 W A0 W W Embedding layer

agree giant purchase carrier the

A0 A0 A1 AINV

A1 8 / 20 Stacked Layers and Skip-Connections

GCN layer 3

GCN layer 2

GCN layer 1

Embedding layer

agree giant purchase carrier the

A0 A0 A1 AINV

A1 9 / 20 GCN Encoder-Decoder with Attention giant agreed to LSTM Decoder Attention + Mechanism

giant agreed

GCN Encoder

agree giant purchase carrier the [Luong et al., 2015] A0 A0 A1 AINV A1 10 / 20 precededBy precededBy

subj obj subj obj

Above the Veil Aenir Castle

The representation of KB relations as entities enables Attention over them Reduces the number of KB relations to be modelled as network parameters

Reification on Knowledge Base (KB) Graphs [Baader 2003]

Above the Veil precededBy

Aenir precededBy

Castle

11 / 20 Reification on Knowledge Base (KB) Graphs [Baader 2003]

Above the Veil precededBy

precededBy precededBy

subj obj subj obj Aenir precededBy Above the Veil Aenir Castle

Castle

The representation of KB relations as entities enables Attention over them Reduces the number of KB relations to be modelled as network parameters

12 / 20 Experimental Setup

X Encoders Comparison X Existing Systems Comparison WebNLG [Gardent et al., 2017] GCN (4 layers +residual encoder, 1 layer decoder, 256 dim) LSTM (1 layer encoder, 1 layer decoder, 256 dim) + linearisation GCN + pre-trained Embeddings and Copy (GCN++)

SR11Deep [Belz et al., 2011] GCN (7 layers +dense encoder, 1 layer decoder, 500 dim ) LSTM (1 layer encoder, 1 layer decoder, 500 dim ) + linearisation

Encode morphological features present in the input (GCNmorph)

∗model selection done on the development set 13 / 20 GCN performance on WebNLG

‡ Average BLEU-4 on 3 runs

0.61 ‡ 0.6 LSTM 0.58 GCN‡ 0.56 0.55 GCN++ ‡ 0.54 0.53 ++ 0.51 GCN Best PKUWriter 0.5

BLEU-4 Melbourne ADAPT

0.4 Encoders Comparison Systems Comparison

*PKUWRITER,MELBOURNE and ADAPT neural systems participating on the WebNLG challenge 14 / 20

1 GCN performance on SR11Deep

0.79 0.81 0.8 LSTM GCN 0.65 0.67 GCN 0.6 morf STUMBA-D TBDIL 0.4 0.38

Avg. BLEU-4 0.2

0 Encoders Comparison Systems Comparison

* STUMBA-D and TBDIL non-neural systems with pipeline of classifiers 15 / 20

1 Example Outputs WebNLG

Input graph: (William Anders dateOfRetirement 1969 - 09 - 01) (William Anders was a crew member of 8) ( commander Frank ) (Apollo 8 backup pilot )

(LSTM) William Anders was a crew member of the OPERATOR operated Apollo 8 and retired on September 1st 1969 .

(GCN) William Anders was a crew member of OPERATOR ’ s Apollo 8 alongside backup pilot Buzz Aldrin and backup pilot Buzz Aldrin.

(GCN++) william anders , who retired on the 1st of september 1969 , was a crew member on apollo 8 along with commander and backup pilot buzz aldrin.

16 / 20 Example Outputs SR11Deep

Reference: The economy ’s temperature will be taken from several vantage points this week , with readings on trade , output , housing and inflation .

(LSTM) the economy ’s accords will be taken from several phases this week , housing and inflation readings on trade , housing and inflation .

(GCN) the economy ’s temperatures will be taken from several vantage points this week , with reading on trades output , housing and inflation .

17 / 20 Future work

Other input graph representations • Abstract Meaning Representations (AMR; [Banarescu et al., 2013]) • Scoped semantic representations [Van Noord et al., 2018] • Scene graphs [Schuster, et al., 2015 ] Multi-lingual training of GCN layers with universal dependencies [Mille, et al., 2017]

Concluding Remarks

GCN-based generation architecture that directly encodes explicit structure in the input GCN -based models outperform a sequential baseline on automatic evaluation • improve on over- and under- generation cases Relational inductive bias of the GCN encoder produces more informative representations of the input [Battaglia et al., 2018]

18 / 20 Concluding Remarks

GCN-based generation architecture that directly encodes explicit structure in the input GCN -based models outperform a sequential baseline on automatic evaluation • improve on over- and under- generation cases Relational inductive bias of the GCN encoder produces more informative representations of the input [Battaglia et al., 2018]

Future work

Other input graph representations • Abstract Meaning Representations (AMR; [Banarescu et al., 2013]) • Scoped semantic representations [Van Noord et al., 2018] • Scene graphs [Schuster, et al., 2015 ] Multi-lingual training of GCN layers with universal dependencies [Mille, et al., 2017]

19 / 20 Code (PyTorch) + Data: github.com/diegma/graph-2-text

Thank you! Questions?

20 / 20 Separate Input Linearisation Step

Above the Veil country precededBy Australians

followedBy Aenir Into Battle precededBy

followedBy

Castle The Violet Keystone

Avobe the veil followedBy Into Battle | Avobe the veil followedBy the violet keystone | ... Avobe the veil country Australians | Avobe the veil followedBy Into Battle | ... Avobe the veil precededBy Aenir | Avobe the veil precededBy Castle | ...

21 / 20 Automatic Evaluation

Encoder BLEU METEOR TER LSTM .526±.010 .38±.00 .43±.01 Encoder BLEU METEOR TER GCN .535±.004 .39±.00 .44±.02 LSTM .377±.007 .65±.00 .44±.01 ADAPT .606 .44 .37 GCN .647±.005 .77±.00 .24±.01 GCN++ .559±.017 .39±.01 0.41±.01 GCN+feat .666±.027 .76±.01 .25±.01 MELBOURNE .545 .41 .40 Table : Test results SR11Deep task. PKUWRITER .512 .37 .45

Table : Test results WebNLG task.

22 / 20 Ablation Study

BLEU SIZE Model none res den none res den LSTM .543±.003 - - 4.3 - - GCN 1L .537±.006 - - 4.3 - - 2L .545±.016 .553±.005 .552±.013 4.5 4.5 4.7 3L .548±.012 .560±.013 .557±.001 4.7 4.7 5.2 4L .537±.005 .569±.003 .558±.005 4.9 4.9 6.0 5L .516±.022 .561±.016 .559±.003 5.1 5.1 7.0 6L .508±.022 .561±.007 .558±.018 5.3 5.3 8.2 7L .492±.024 .546±.023 .564±.012 5.5 5.5 9.6

23 / 20