<<

Learning to Select, Track, and Generate for Data-to-Text

∗ Hayate Iso† Yui Uehara‡ Tatsuya Ishigaki\‡ Hiroshi Noji‡ Eiji Aramaki†‡ Ichiro Kobayashi[‡ Yusuke Miyao]‡ Naoaki Okazaki\‡ Hiroya Takamura\‡ †Nara Institute of Science and Technology ‡Artificial Intelligence Research , AIST \Tokyo Institute of Technology [Ochanomizu University ]The University of Tokyo {iso.hayate.id3,aramaki}@is.naist.jp [email protected] {yui.uehara,ishigaki.t,hiroshi.noji,takamura.hiroya}@aist.go.jp [email protected] [email protected]

Abstract In addition, the salient part moves as the sum- mary explains the data. For example, when gen- We propose a data-to-text generation model erating a summary of a game (Table1 with two modules, one for tracking and the (b)) from the box score (Table1 (a)), the input other for text generation. Our tracking mod- ule selects and keeps track of salient infor- contains numerous data records about the game: mation and memorizes which record has been e.g., scored 18 points. Existing mentioned. Our generation module generates models often refer to the same data record mul- a summary conditioned on the state of track- tiple times (Puduppully et al., 2019). The mod- ing module. Our model is considered to simu- els may mention an incorrect data record, e.g., late the human-like writing process that gradu- added 19 points: the summary ally selects the information by determining the should mention LaMarcus Aldridge, who scored intermediate variables while writing the sum- mary. In addition, we also explore the ef- 19 points. Thus, we need a model that finds salient fectiveness of the writer information for gen- parts, tracks transitions of salient parts, and ex- eration. Experimental results show that our presses information faithful to the input. model outperforms existing models in all eval- In this paper, we propose a novel data-to- uation metrics even without writer informa- text generation model with two modules, one for tion. Incorporating writer information fur- saliency tracking and another for text generation. ther improves the performance, contributing to The tracking module keeps track of saliency in the content planning and surface realization. input data: when the module detects a saliency 1 Introduction transition, the tracking module selects a new data record1 and updates the state of the tracking mod- Advances in sensor and data storage technolo- ule. The text generation module generates a doc- gies have rapidly increased the amount of data ument conditioned on the current tracking state. produced in various fields such as weather, fi- Our model is considered to imitate the human-like nance, and sports. In order to address the infor- writing process that gradually selects and tracks mation overload caused by the massive data, data- the data while generating the summary. In ad- to-text generation technology, which expresses the dition, we note some writer-specific patterns and contents of data in natural language, becomes characteristics: how data records are selected to be more important (Barzilay and Lapata, 2005). Re- mentioned; and how data records are expressed as cently, neural methods can generate high-quality text, e.g., the order of data records and the word short summaries especially from small pieces of usages. We also incorporate writer information data (Liu et al., 2018). into our model. Despite this success, it remains challenging The experimental results demonstrate that, even to generate a high-quality long summary from without writer information, our model achieves data (Wiseman et al., 2017). One reason for the the best performance among the previous models difficulty is because the input data is too large for in all evaluation metrics: 94.38% precision of re- a naive model to find its salient part, i.e., to deter- lation generation, 42.40% F1 score of content se- mine which part of the data should be mentioned. lection, 19.38% normalized Damerau-Levenshtein ∗Work was done during the internship at Artificial Intel- ligence Research Center, AIST 1We use ‘data record’ and ‘relation’ interchangeably.

2102 Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2102–2113 Florence, Italy, July 28 - August 2, 2019. c 2019 Association for Computational Linguistics Distance (DLD) of content ordering, and 16.15% Graves et al., 2016). This module is often used of BLEU score. We also confirm that adding in natural language understanding to keep track writer information further improves the perfor- of the entity state (Kobayashi et al., 2016; Hoang mance. et al., 2018; Bosselut et al., 2018). Recently, entity tracking has been popular for 2 Related Work generating coherent text (Kiddon et al., 2016; Ji 2.1 Data-to-Text Generation et al., 2017; Yang et al., 2017; Clark et al., 2018). Kiddon et al.(2016) proposed a neural checklist Data-to-text generation is a task for generating de- model that updates predefined item states. Ji et al. scriptions from structured or non-structured data (2017) proposed an entity representation for the including sports commentary (Tanaka-Ishii et al., language model. Updating entity tracking states 1998; Chen and Mooney, 2008; Taniguchi et al., when the entity is introduced, their method selects 2019), weather forecast (Liang et al., 2009; Mei the salient entity state. et al., 2016), biographical text from infobox in Our model extends this entity tracking module Wikipedia (Lebret et al., 2016; Sha et al., 2018; for data-to-text generation tasks. The entity track- Liu et al., 2018) and market comments from stock ing module selects the salient entity and appropri- prices (Murakami et al., 2017; Aoki et al., 2018). ate attribute in each timestep, updates their states, Neural generation methods have become the and generates coherent summaries from the se- mainstream approach for data-to-text generation. lected data record. The encoder-decoder framework (Cho et al., 2014; Sutskever et al., 2014) with the attention (Bah- 3 Data danau et al., 2015; Luong et al., 2015) and copy mechanism (Gu et al., 2016; Gulcehre et al., 2016) Through careful examination, we found that in the has successfully applied to data-to-text tasks. original dataset ROTOWIRE, some NBA games However, neural generation methods sometimes have two documents, one of which is sometimes in yield fluent but inadequate descriptions (Tu et al., the training data and the other is in the test or val- 2017). In data-to-text generation, descriptions in- idation data. Such documents are similar to each consistent to the input data are problematic. other, though not identical. To make this dataset Recently, Wiseman et al.(2017) introduced more reliable as an experimental dataset, we cre- the ROTOWIRE dataset, which contains multi- ated a new version. sentence summaries of basketball games with box- We ran the script provided by Wiseman et al. score (Table1). This dataset requires the selection (2017), which is for crawling the ROTOWIRE of a salient subset of data records for generating website for NBA game summaries. The script col- descriptions. They also proposed automatic evalu- lected approximately 78% of the documents in the ation metrics for measuring the informativeness of original dataset; the remaining documents disap- generated summaries. peared. We also collected the box-scores associ- Puduppully et al.(2019) proposed a two-stage ated with the collected documents. We observed method that first predicts the sequence of data that some of the box-scores were modified com- records to be mentioned and then generates a pared with the original ROTOWIRE dataset. summary conditioned on the predicted sequences. The collected dataset contains 3,752 instances Their idea is similar to ours in that the both con- (i.e., pairs of a document and box-scores). How- sider a sequence of data records as content plan- ever, the four shortest documents were not sum- ning. However, our proposal differs from theirs maries; they were, for example, an announcement in that ours uses a recurrent neural network for about the postponement of a match. We thus saliency tracking, and that our decoder dynami- deleted these 4 instances and were left with 3,748 cally chooses a data record to be mentioned with- instances. We followed the dataset split by Wise- out fixing a sequence of data records. man et al.(2017) to split our dataset into train- ing, development, and test data. We found 14 in- 2.2 Memory modules stances that didn’t have corresponding instances in The memory network can be used to maintain the original data. We randomly classified 9, 2, and and update representations of the salient informa- 3 of those 14 instances respectively into training, tion (Weston et al., 2015; Sukhbaatar et al., 2015; development, and test data. Finally, the sizes of

2103 TEAM H/VWIN LOSS PTS REB AST FG PCT FG3 PCT ... The defeated the , 105-104, at on Wednesday. The KNICKS H 16 19 104 46 26 45 46 ... Knicks (16-19) checked in to Wednesday’s contest looking BUCKS V 18 16 105 42 20 47 32 ... to snap a five-game losing streak and heading into the fourth quarter, they looked like they were well on their way to that goal. ... Antetokounmpo led the Bucks with 27 points, 13 PLAYER H/VPTS REB AST BLK STL MIN CITY ... rebounds, four assists, a and three blocks, his second consecutive double-double. actually checked H 30 117 0 2 37 NEW YORK ... in as the second-leading scorer and did so in his customary H 1534 0 1 33 NEW YORK ... bench role, posting 18 points, along with nine boards, four H 1123 1 1 38 NEW YORK ... assists, three steals and a . contributed V 27 1343 1 39 MILWAUKEE ... 15 points, four rebounds, three assists and a steal. Malcolm GREG MONROE V 1894 1 3 31 MILWAUKEE ... Brogdon went for 12 points, eight assists and six rebounds. JABARI PARKER V 1543 0 1 37 MILWAUKEE ... Mirza Teletovic was productive in a reserve role as well, V 1268 0 0 38 MILWAUKEE ... generating 13 points and a . ... Courtney Lee MIRZA TELETOVIC V 13 1 0 0 0 21 MILWAUKEE ... checked in with 11 points, three assists, two rebounds, a V 2 2 0 0 0 14 MILWAUKEE ... steal and a block. ... The Bucks and Knicks face off once ...... again in the second game of the home-and-home series, with the meeting taking place Friday night in Milwaukee. (a) Box score: Top contingency table shows number of wins and losses and summary of each game. Bottom table shows statistics of each player such as points scored (PLAYER’s PTS), and total (b) NBA basketball game summary: Each summary rebounds (PLAYER’s REB). consists of game victory or defeat of the game and highlights of valuable players.

Table 1: Example of input and output data: task defines box score (1a) used for input and summary document of game (1b) used as output. Extracted entities are shown in bold face. Extracted values are shown in green.

t 199 200 201 202 203 204 205 206 207 208 209

Yt Jabari Parker contributed 15 points , four rebounds , three assists

Zt 1 1 0 10010010 JABARI JABARI JABARI JABARI JABARI Et ------PARKER PARKER PARKER PARKER PARKER At FIRST NAME LAST NAME -PLAYER PTS --PLAYER REB --PLAYER AST - Nt - - - 0--1--1-

Table 2: Running example of our model’s generation process. At every time step t, model predicts each random variable. Model firstly determines whether to refer to data records (Zt = 1) or not (Zt = 0). If random variable Zt = 1, model selects entity Et, its attribute At and binary variables Nt if needed. For example, at t = 202, model predicts random variable Z202 = 1 and then selects the entity JABARI PARKER and its attribute PLAYER PTS. Given these values, model outputs token 15 from selected data record. our training, development, test dataset are respec- that have been referred to. hENT is also used to up- tively 2,714, 534, and 500. On average, each sum- date hLM, meaning that the referred data records mary has 384 tokens and 644 data records. Each affect the text generation. match has only one summary in our dataset, as Our model decides whether to refer to x, which far as we checked. We also collected the writer data record r ∈ x to be mentioned, and how to ex- of each document. Our dataset contains 32 differ- press a number. The selected data record is used to ent writers. The most prolific writer in our dataset update hENT. Formally, we use the four variables: wrote 607 documents. There are also writers who wrote less than ten documents. On average, each 1. Zt: binary variable that determines whether the writer wrote 117 documents. We call our new model refers to input x at time step t (Zt = 1). dataset ROTOWIRE-MODIFIED.2 2. Et: At each time step t, this variable indi- cates the salient entity (e.g., HAWKS,LEBRON 4 Saliency-Aware Text Generation JAMES). 3. At: At each time step t, this variable indicates At the core of our model is a neural language the salient attribute to be mentioned (e.g., PTS). LM model with a memory state h to generate a 4. Nt: If attribute At of the salient entity Et is summary y1:T = (y1, . . . , yT ) given a set of data a numeric attribute, this variable determines if records x. Our model has another memory state a value in the data records should be output in hENT, which is used to remember the data records Arabic numerals (e.g., 50) or in English words (e.g., five). 2For information about the dataset, please follow this link: https://github.com/aistairc/ To keep track of the salient entity, our model rotowire-modified predicts these random variables at each time step

2104 t through its summary generation process. Run- saliency is represented as the entity and its at- ning example of our model is shown in Table2 tribute being talked about. We therefore propose and full algorithm is described in AppendixA. In a model that refers to a data record at each time- the following subsections, we explain how to ini- , and transitions to another as text goes. tialize the model, predict these random variables, To determine whether to transition to another and generate a summary. Due to space limitations, data record or not at time t, the model calculates bias vectors are omitted. the following probability: Before explaining our method, we describe our p(Z = 1 | hLM , hENT) = σ(W (hLM ⊕ hENT)), notation. Let E and A denote the sets of en- t t−1 t−1 z t−1 t−1 (3) tities and attributes, respectively. Each record r ∈ x consists of entity e ∈ E, attribute a ∈ A, where σ(·) is the sigmoid function. If p(Zt = 1 | LM ENT and its value x[e, a], and is therefore represented ht−1, ht−1 ) is high, the model transitions to an- as r = (e, a, x[e, a]). For example, the box- other data record. score in Table1 has a record r such that e = When the model decides to transition to another, , a = PTS, and x[e, a] = 20. the model then determines which entity and at- tribute to refer to, and generates the next word 4.1 Initialization (Section 4.3). On the other hand, if the model de- Let r denote the embedding of data record r ∈ x. cides not transition to another, the model generates Let e¯ denote the embedding of entity e. Note that the next word without updating the tracking states ENT ENT e¯ depends on the set of data records, i.e., it de- ht = ht−1 (Section 4.4). pends on the game. We also use e for static em- bedding of entity e, which, on the other hand, does 4.3 Selection and tracking not depend on the game. When the model refers to a new data record Given the embedding of entity e, attribute a, (Zt = 1), it selects an entity and its attribute. It and its value v, we use the concatenation layer also tracks the saliency by putting the informa- to combine the information from these vectors tion about the selected entity and attribute into the to produce the embedding of each data record memory vector hENT. The model begins to select (e, a, v), denoted as re,a,v as follows: the subject entity and update the memory states if the subject entity will change. R  re,a,v = tanh W (e ⊕ a ⊕ v) , (1) Specifically, the model first calculates the prob- ability of selecting an entity: where ⊕ indicates the concatenation of vectors, R 3 LM ENT and W denotes a weight matrix. p(Et = e | ht−1, ht−1 ) We obtain e¯ in the set of data records x, by sum- ( ENT OLD LM  exp h W h if e ∈ Et−1 ming all the data-record embeddings transformed ∝ s t−1 , (4) exp eW¯ NEWhLM  otherwise by a matrix: t−1 ! where Et−1 is the set of entities that have already X A been referred to by time step t, and s is defined as e¯ = tanh W a re,a,x[e,a] , (2) a∈A s = max{s : s ≤ t − 1, e = es}, which indicates the time step when this entity was last mentioned. A where W a is a weight matrix for attribute a. The model selects the most probable entity as Since e¯ depends on the game as above, e¯ is sup- the next salient entity and updates the set of enti- posed to represent how entity e played in the game. ties that appeared (Et = Et−1 ∪ {et}). To initialize the hidden state of each module, we If the salient entity changes (et 6= et−1), the use embeddings of for hLM and averaged model updates the hidden state of the tracking embeddings of e¯ for hENT. model hENT with a recurrent neural network with a gated recurrent unit (GRU; Chung et al., 2014): 4.2 Saliency transition  ENT ht−1 if et = et−1 Generally, the saliency of text changes during text 0  ENT E ENT generation. In our work, we suppose that the ht = GRU (e¯, ht−1 ) else if et 6∈ Et−1  GRUE(W ShENT, hENT) otherwise. 3We also concatenate the embedding vectors that repre- s s t−1 sents whether the entity is in home or away team. (5)

2105 Note that if the selected entity at time step t, et, is Subsequently, the hidden state of language model LM identical to the previously selected entity et−1, the h is updated: hidden state of the tracking model is not updated. hLM = LSTM(y ⊕ h0 , hLM ), (11) If the selected entity et is new (et 6∈ Et−1), the t t t t−1 hidden state of the tracking model is updated with where yt is the embedding of the word generated the embedding e¯ of entity et as input. In contrast, at time step t.4 if entity et has already appeared in the past (et ∈ Et−1) but is not identical to the previous one (et 6= 4.5 Incorporating writer information ENT et−1), we use hs (i.e., the memory state when We also incorporate the information about the this entity last appeared) to fully exploit the local writer of the summaries into our model. Specif- history of this entity. ically, instead of using Equation (9), we concate- Given the updated hidden state of the tracking LM ENT nate the embedding w of a writer to ht−1 ⊕ ht model hENT, we next select the attribute of the 0 t to construct context vector ht: salient entity by the following probability: h0 = tanh W 0H(hLM ⊕ hENT ⊕ w) , (12) LM ENT0 t t−1 t p(At = a | et, ht−1, ht ) (6) 0H  ATTR LM ENT0  where W is a new weight matrix. Since this new ∝ exp ret,a,x[et,a]W (ht−1 ⊕ ht ) . 0 context vector ht is used for calculating the proba- bility over words in Equation (10), the writer infor- After selecting at, i.e., the most probable attribute of the salient entity, the tracking model updates the mation will directly affect word generation, which ENT is regarded as surface realization in terms of tra- memory state ht with the embedding of the data record r introduced in Section 4.1: ditional text generation. Simultaneously, context et,at,x[et,at] 0 vector ht enhanced with the writer information is ENT A ENT0 LM ht = GRU (ret,at,x[et,at], ht ). (7) used to obtain ht , which is the hidden state of the language model and is further used to select the 4.4 Summary generation salient entity and attribute, as mentioned in Sec- Given two hidden states, one for language model tions 4.2 and 4.3. Therefore, in our model, the LM ENT ht−1 and the other for tracking model ht , the writer information affects both surface realization model generates the next word yt. We also incor- and content planning. porate a copy mechanism that copies the value of 4.6 Learning objective the salient data record x[et, at]. If the model refers to a new data record (Zt = We apply fully supervised training that maximizes 1), it directly copies the value of the data record the following log-likelihood: x[et, at]. However, the values of numerical at- tributes can be expressed in at least two different log p(Y1:T ,Z1:T ,E1:T ,A1:T ,N1:T | x) manners: Arabic numerals (e.g., 14) and English T X LM ENT words (e.g., fourteen). We decide which one to use = log p(Zt = zt | ht−1, ht−1 ) by the following probability: t=1 X LM ENT + log p(Et = et | ht−1, ht−1 ) p(N = 1 | hLM , hENT) = σ(W N(hLM ⊕ hENT)), t t−1 t t−1 t t:Zt=1 (8) X LM ENT0 + log p(At = at | et, ht−1, ht ) N where W is a weight matrix. The model then t:Zt=1 updates the hidden states of the language model: X LM ENT + log p(Nt = nt | ht−1, ht ) 0 H LM ENT  t:Zt=1,atis num attr ht = tanh W (ht−1 ⊕ ht ) , (9) X 0 + log p(Yt = yt | ht) W H where is a weight matrix. t:Zt=0 If the salient data record is the same as the pre- vious one (Zt = 0), it predicts the next word yt via 4In our initial experiment, we observed a word repetition a probability over words conditioned on the con- problem when the tracking model is not updated during gen- 0 erating each sentence. To avoid this problem, we also update text vector ht: the tracking model with special trainable vectors vREFRESH 0 Y 0 to refresh these states after our model generates a period: p(Yt | h ) = softmax(W h ). (10) ENT A ENT t t ht = GRU (vREFRESH , ht )

2106 RG CS CO Method BLEU # P% P% R% F1% DLD%

GOLD 27.36 93.42 100. 100. 100. 100. 100. TEMPLATES 54.63 100. 31.01 58.85 40.61 17.50 8.43 Wiseman et al.(2017) 22.93 60.14 24.24 31.20 27.29 14.70 14.73 Puduppully et al.(2019) 33.06 83.17 33.06 43.59 37.60 16.97 13.96 PROPOSED 39.05 94.43 35.77 52.05 42.40 19.38 16.15

Table 3: Experimental result. Each metric evaluates whether important information (CS) is described accurately (RG) and in correct order (CO).

5 Experiments TEMPLATES summaries. The GOLD summary is exactly identical with the reference summary and 5.1 Experimental settings each TEMPLATES summary is generated in the We used ROTOWIRE-MODIFIED as the dataset for same manner as Wiseman et al.(2017). our experiments, which we explained in Section3. In the latter half of our experiments, we exam- The training, development, and test data respec- ine the effect of adding information about writers. tively contained 2,714, 534, and 500 games. In addition to our model enhanced with writer in- Since we take a supervised training approach, formation, we also add writer information to the we need the annotations of the random variables model by Puduppully et al.(2019). Their method (i.e., Zt, Et, At, and Nt) in the training data, as consists of two stages corresponding to content shown in Table2. Instead of simple lexical match- planning and surface realization. Therefore, by ing with r ∈ x, which is prone to errors in the incorporating writer information to each of the annotation, we use the information extraction sys- two stages, we can clearly see which part of the tem provided by Wiseman et al.(2017). Although model to which the writer information contributes this system is trained on noisy rule-based annota- to. For Puduppully et al.(2019) model, we attach tions, we conjecture that it is more robust to errors the writer information in the following three ways: because it is trained to minimize the marginalized loss function for ambiguous relations. All training 1. concatenating writer embedding w with the in- details are described in AppendixB. put vector for LSTM in the content planning decoder (stage 1); 5.2 Models to be compared 2. concatenating writer embedding w with the in- We compare our model5 against two baseline put vector for LSTM in the text generator (stage models. One is the model used by Wiseman 2); et al.(2017), which generates a summary with an 3. using both 1 and 2 above. attention-based encoder-decoder model. The other For more details about each decoding stage, read- baseline model is the one proposed by Puduppully ers can refer to Puduppully et al.(2019). et al.(2019), which first predicts the sequence of data records and then generates a summary condi- 5.3 Evaluation metrics tioned on the predicted sequences. Wiseman et al. As evaluation metrics, we use BLEU score (Pap- (2017)’s model refers to all data records every ineni et al., 2002) and the extractive metrics pro- timestep, while Puduppully et al.(2019)’s model posed by Wiseman et al.(2017), i.e., relation gen- refers to a subset of all data records, which is pre- eration (RG), content selection (CS), and content dicted in the first stage. Unlike these models, our ordering (CO) as evaluation metrics. The extrac- model uses one memory vector hENT that tracks t tive metrics measure how well the relations ex- the history of the data records, during generation. tracted from the generated summary match the We retrained the baselines on our new dataset. We correct relations6: also present the performance of the GOLD and 6The model for extracting relation tuples was trained on 5Our code is available from https://github.com/ tuples made from the entity (e.g., team name, city name, aistairc/sports-reporter player name) and attribute value (e.g., “Lakers”, “92”) ex-

2107 - RG: the ratio of the correct relations out of all RonnyPhil Pressey Turiaf Deyonta DavisFurkan Aldemir Nikola PekovicBrandanRyan Shawn HollinsWright MarionTiago Splitter John JaValeSalmons McGee Michael Carter-Williams Gary HarrisCarlosKendrick Boozer PerkinsRay McCallumShavlik RandolphRobert SacreAustin Daye the extracted relations, where correct relations SebastianTayshaunLou Telfair SpencerAmundson Prince Dinwiddie QuincyJoey Dorsey Miller Omer AsikJodieJoel ElijahMeeks FreelandJose Millsap CalderonHenry Sims Nikola Vucevic BernardQuincyJeromeJarrettSolomonAlex JamesDante Pondexter JackKirkJordan HillCunningham T.J. McConnellLandry FieldsJusuf Nurkic Jannero HollisPargoJordan ThompsonPerry Hamilton Jones III JoakimDanilo Noah GallinariNickTyChris YoungLawson BoshAndreaDanny BargnaniKostas GrangerChasePero PapanikolaouDarrell AnticBudinger Arthur Mike Dunleavy GregMasonPaul StiemsmaChris Plumlee Zipser CopelandJameerCourtney Nelson Lee WilsonKenyon Chandler MartinJared DudleyJordanJoe DannyCotyCrawford InglesJonas Clarke Green Jerebko KJCole McDaniels Aldrich are relations found in the input data records x. KyleTyrekeSamuel KorverMarcin Evans MoDalembert Williams GortatErsanGorguiChandler Ilyasova DiengRobbie BrandonParsons Hummel Bass CharlieAndersonMarkel VillanuevaPierreJoe Brown Jackson HarrisVarejao Dwyane WadeJohnKevin Henson MartinJerami Grant JamesBrian Roberts AndersonSteveNemanja NovakJimmerJohnny BjelicaRuss O'BryantFredette Smith III BrandonLangstonTrevorAndreTobias Ingram Galloway ArizaIguodala HarrisJasonImanCleanthony RandyShumpertThompsonRamon Foye Sessions Early Brook LopezDirk NowitzkiLance StephensonJaeChanning Crowder FryeTyusGreivis RyanJones Kelly VasquezJeff WitheyJustin HolidayPJ TuckerLuc MbahFestus a EzeliMoute DeMarcusMike Cousins Conley Roy HibbertShawneMichaelLuolGeorge WilliamsRaymond Kidd-GilchristDengAveryJorge HillWayne Bradley Gutierrez Felton EllingtonAlexeyCJ Miles Shved Bruno CheickCaboclo Diallo Chris AndersenTreyEricAllen BurkeGordonPatrick CrabbeSethAndreKentaviousThabo PattersonCurry Roberson SefoloshaNateAlexis Caldwell-Pope Robinson AjincaArinzeAnthonyRobin OnuakuDenzel LopezAJMorrow Price Valentine JRTyrus SmithAmar'e Thomas Stoudemire RajonGiannis RondoMontaKevin AntetokounmpoEllisAl Love HorfordTim DuncanDraymondPatrick BeverleyGreen CoryTimothe JosephAlan Luwawu-CabarrotWilliamsDamienDwightReggie Inglis Buycks Evans JamesDerrickJimmy Harden NicolasRoseKent ButlerOtto Bazemore JeremyPorterBatumCodyTim Jr. HardawayLinZellerDion ElfridWaiters Jr. TylerPaytonChrisShabazz Hansbrough DewayneJohnson Muhammad DedmonShaun JerianLivingstonJason GrantTerry The average number of extracted relations is Gordon HaywardKyleJuliusBrandon LowryAndrewKenneth RandleJeffKhris KnightTaj Green Bogut FariedGibson TonyMiddletonDevinBrandonKellyGeraldJared Allen Harris Olynyk Jennings Sullinger HendersonKyle Singler DeMar DeRozanIsaiahEnesTyson WhiteheadKanterJonas Chandler ValanciunasDerrickCarisLavoyEvan BenCoreyLeVert Williams Allen Kirk McLemoreTurnerJeff Brewer LanceHinrich AyresJamesFrancisco Thomas JohnsonAronDrew Garcia GeorgiosBaynes GoodenChasson PapagiannisShawnJaKarr Randle LongRodney SampsonDeJuan McGruder Blair CarmeloDerrick AnthonyDariusDennisManu FavorsTonyMarcus Miller SchroderGinobili ParkerMatt Smart BarnesThomasAlecRickyMarceloLuis BurksJonGaryLuis Robinson VinceMonteroLeuer NikNorrisLedoJarell ScolaNeal Huertas Stauskas Carter MartinColeTylerIanMindaugas Clark Zeller Kuzminskas LaMarcusKyrieJohn PaulAldridge Irving Lucas Millsap IIIMarkieffAlexMyles Abrines Morris TurnerMarcusWill Thornton BartonWillyKosta MalcolmBenHernangomez Koufos GordonCJRaulTim ClintBrogdonAlan McCollum Frazier Neto CapelaAnderson ThanasisDanuel Antetokounmpo House Jr. Marc Gasol NerlensRobertGregMichaelJohnRicky MonroeNoel CovingtonJenkinsKyleRodneyMarvin BeasleyRubioDavidBojanArron O'QuinnDonald WilliamsStuckey ZazaMeyersWest BogdanovicAfflaloDomantas Sloan PachuliaSteveT.J. LeonardStanley WarrenSalah Blake SabonisDamjan Johnson MejriAlex Rudez StephesonDarrunO.J. Hilliard MayoSasha VujacicJosh Huestis RussellEric Westbrook BledsoeRudy GobertAaronIsaiahKyleWesley Brooks TerrenceAndersonThomasOmri Matthews Casspi RossQuincyJasonRonnieJordanKevon TonySmithAcyBoris PriceTylerHenry SnellClarksonLooney DiawNorman EnnisAndre Walker PowellCristianoMillerShelvin MackFelicio Damian LillardJoel EmbiidHassanLarryJrue DrewWhitesideDeAndre Holiday II StevenPatty ReggieCameronJordanTimofey Mills Adams Jackson Payne MozgovBrandonKrisIanJustinDavis MahinmiToneyHumphries Cory HamiltonRushBertans Douglas JeffersonJuanchoAndrewSemaj ChristonNicholsonHernangomezMannyJJJakob Hickson HarrisPoeltl KevinAl Garnett JeffersonMirzaLou TeletovicIsaiah BrandonWilliamsMattSean CanaanBonner Davies Kilpatrick AndrewChris GoudelockBobby McCulloughBrice Portis JohnsonElliot Williams also reported. AnthonyKevin Davis DurantBradleyJabariStephenChris KawhiBeal PaulParker Curry AndreLeonardPauAustin JahlilDrummondGasolNene Rivers OkaforAlex Len AnthonyJames Donatas EnnisBennett IIIJaMychal MotiejunasC.J.Rashad Watson Green VaughnGlenn Robinson III JoshKarl-Anthony McRobertsRudy Gay Towns VictorMike Oladipo MillerJoe Johnson MilesShabazzJames Plumlee Young NapierJeremy EvansJonathon Simmons LeBronKristaps James PorzingisDarrenJohn J.J.Collison Wall BareaAaronTylerIsh HarrisonGordonTristan SmithNikola UlisMarcusMarreeseE'Twaun Thompson MiroticBarnes MorrisHedo Moore SpeightsGarrett ArchieTurkoglu GoodwinTemple Nemanja Nedovic Deron WilliamsKlayGoran Thompson DragicBuddyDeMarre HieldJamal Carroll CrawfordOkaroYogi WhiteJaylenDwight FerrellNickTroyJonathan BrownCalathes PowellKelly Daniels OubreGibsonAdreianZoranMarioSkal Jr.Labissiere HezonjaDragic PayneDevynJoe YoungRasualDahntay Marble Butler Jones Al-FarouqJustiseRondae RichaunWinslowAminu Hollis-Jefferson DougHolmesJerrydTroyBryce McDermott WilliamsBaylessSam Cotton MontrezlDekker Harrell James Jones Chris Douglas-Roberts PaulKemba George Walker TonyMarcoJeremyAnthony Wroten Belinelli Lamb Brown TomasLeandro Satoransky BarbosaFred VanVleet KobeDwight BryantD'Angelo HowardRodney Russell Hood BenoWillieDario KevinUdrih Cauley-Stein TerrenceDanteSaricPaul Seraphin Matthew NickExumPierce GeraldJones Collison Dellavedova Green Gigi Datome - CS: precision and recall of the relations ex- BlakeEvan GriffinFournierTrevor BookerRichardJabariDeAndreDamian RyanBrown JeffersonJJ Redick LigginsAnderson JonesDavidDorianJordanZach LeeMitch Finney-Smith LaVineMcRae McGary SergeGlenMarioWesley Ibaka Davis Chalmers JohnsonLarryAnthony Nance Tolliver Jr. D.J. Augustin EmmanuelFrank Kaminsky MudiayMikeMike Scott Muscala Carl Landry Andrew NikolaWigginsJordan Jokic Hill TylerRon JohnsonBaker Ivica ZubacSheldon Mac JamesSpencer Michael JustinHawes McAdoo Anderson Jordan MickeyJaredAndrew CunninghamJason Harrison Richardson JoffreyJordan Lauvergne AdamsJarrod Uthoff tracted from the generated summary against DevinThaddeus BookerKrisBismack Young Dunn Biyombo MalcolmPascal SiakamDelaneyTarik BlackMalachi Richardson those from the reference summary. - CO: edit distance measured with normalized Figure 1: Illustrations of static entity embeddings e. Damerau-Levenshtein Distance (DLD) between Players with colored letters are listed in the ranking top 100 players for the 2016-17 NBA season at https: the sequences of relations extracted from the //www.washingtonpost.com/graphics/ generated and reference summary. sports/nba-top-100-players-2016/. Only LeBron James is in red and the other players in 6 Results and Discussions top 100 are in blue. Top-ranked players have similar representations of e. We first focus on the quality of tracking model and entity representation in Sections 6.1 to 6.4, where we use the model without writer information. We 1901). As shown in Figure1, which is the vi- examine the effect of writer information in Sec- sualization of static entity embedding e, the top- tion 6.5. ranked players are closely located. We also present the visualizations of dynamic 6.1 Saliency tracking-based model entity embeddings e¯ in Figure2. Although we As shown in Table3, our model outperforms all did not carry out feature engineering specific to baselines across all evaluation metrics.7 One of the NBA (e.g., whether a player scored double the noticeable results is that our model achieves digits or not)8 for representing the dynamic en- slightly higher RG precision than the gold sum- tity embedding e¯, the embeddings of the players mary. Owing to the extractive evaluation nature, who performed well for each game have similar the generated summary of the precision of the rela- representations. In addition, the change in embed- tion generation could beat the gold summary per- dings of the same player was observed depending formance. In fact, the template model achieves on the box-scores for each game. For instance, Le- 100% precision of the relation generations. Bron James recorded a double-double in a game The other is that only our model exceeds the on April 22, 2016. For this game, his embedding template model regarding F1 score of the con- is located close to the embedding of , tent selection and obtains the highest performance who also scored a double-double. However, he of content ordering. This imply that the tracking did not participate in the game on December 26, model encourages to select salient input records in 2016. His embedding for this game became closer the correct order. to those of other players who also did not partici- pate. 6.2 Qualitative analysis of entity embedding Our model has the entity embedding e¯, which de- 6.3 Duplicate ratios of extracted relations pends on the box score for each game in addition As Puduppully et al.(2019) pointed out, a gen- to static entity embedding e. Now we analyze the erated summary may mention the same relation difference of these two types of embeddings. multiple times. Such duplicated relations are not We present a two-dimensional visualizations of favorable in terms of the brevity of text. both embeddings produced using PCA (Pearson, Figure3 shows the ratios of the generated sum- maries with duplicate mentions of relations in the tracted from the summaries, and the corresponding attributes (e.g., “TEAM NAME”, “PTS”) found in the box- or line-score. development data. While the models by Wiseman The precision and the recall of this extraction model are re- et al.(2017) and Puduppully et al.(2019) respec- spectively 93.4% and 75.0% in the test data. 7The scores of Puduppully et al.(2019)’s model signifi- 8In the NBA, a player who accumulates a double-digit cantly dropped from what they reported, especially on BLEU score in one of five categories (points, rebounds, assists, metric. We speculate this is mainly due to the reduced amount steals, and blocked shots) in a game, is regarded as a good of our training data (Section3). That is, their model might be player. If a player had a double in two of those five cate- more data-hungry than other models. gories, it is referred to as double-double.

2108 JR SmithLeBron James James Jones LeBron James 4 MoJodie Williams Meeks Dahntay SpencerJones Dinwiddie Kevin Love 2 JR Smith KyrieKevin Irving Love Joel Anthony Tristan Thompson DeAndre LigginsJon Leuer Kentavious Caldwell-Pope 0 Reggie Jackson Richard Jefferson Reggie Jackson Tobias HarrisKentavious Caldwell-Pope James Jones Marcus Morris ChanningTobias Frye Harris SteveChanning Blake Frye Andre Drummond Iman Shumpert HenryDarrun Ellenson Hilliard 2 Jordan McRaeIsh Smith Mike Dunleavy Stanley Johnson Stanley Johnson AnthonyAron Baynes Tolliver 4

4 2 0 2 4 4 2 0 2 4 6 April 22, 2016 December 26, 2016

Figure 2: Illustrations of dynamic entity embedding e¯. Both left and right figures are for vs. Pistons, on different dates. LeBron James is in red letters. Entities with orange symbols appeared only in the reference summary. Entities with blue symbols appeared only in the generated summary. Entities with green symbols appeared in both the reference and the generated summary. The others are with red symbols. 2 represents player who scored in the double digits, and 3 represents player who recorded double-double. Players with 4 did not participate in the game. ◦ represents other players.

1.00 model, which mitigates redundant references and 15.8% 12.7% 1 0.75 20.2% 2 therefore rarely contains erroneous relations. > 2 0.50 95.8% However, when complicated expressions such 84.2% 64.0% 0.25 as parallel structures are used our model also gen-

0.00 erates erroneous relations as illustrated by the un- Wiseman+'17 Pudupully+'19 Proposed derlined sentences describing the two players who scored the same points. For example, “11-point Figure 3: Ratios of generated summaries with dupli- cate mention of relations. Each label represents number efforts” is correct for COURTNEY LEE but not for of duplicated relations within each document. While DERRICK ROSE. As a future study, it is necessary Wiseman et al.(2017)’s model exhibited 36.0% dupli- to develop a method that can handle such compli- cation and Puduppully et al.(2019)’s model exhibited cated relations. 15.8%, our model exhibited only 4.2%. 6.5 Use of writer information tively showed 36.0% and 15.8% as duplicate ra- We first look at the results of an extension of tios, our model exhibited 4.2%. This suggests that Puduppully et al.(2019)’s model with writer in- our model dramatically suppressed generation of formation w in Table4. By adding w to con- redundant relations. We speculate that the track- tent planning (stage 1), the method obtained im- ing model successfully memorized which input provements in CS (37.60 to 47.25), CO (16.97 ENT records have been selected in hs . to 22.16), and BLEU score (13.96 to 18.18). By adding w to the component for surface realization 6.4 Qualitative analysis of output examples (stage 2), the method obtained an improvement in Figure5 shows the generated examples from val- BLEU score (13.96 to 17.81), while the effects on idation inputs with Puduppully et al.(2019)’s the other metrics were not very significant. By model and our model. Whereas both generations adding w to both stages, the method scored the seem to be fluent, the summary of Puduppully highest BLEU, while the other metrics were not et al.(2019)’s model includes erroneous relations very different from those obtained by adding w colored in orange. to stage 1. This result suggests that writer infor- Specifically, the description about DERRICK mation contributes to both content planning and ROSE’s relations, “15 points, four assists, three surface realization when it is properly used, and rounds and one steal in 33 minutes.”, is also used improvements of content planning lead to much for other entities (e.g., JOHN HENSON and WILLY better performance in surface realization. HERNAGOMEZ). This is because Puduppully et al. Our model showed improvements in most met- (2019)’s model has no tracking module unlike our rics and showed the best performance by incor-

2109 RG CS CO Method BLEU # P% P% R% F1% DLD% Puduppully et al.(2019) 33.06 83.17 33.06 43.59 37.60 16.97 13.96 + w in stage 1 28.43 84.75 45.00 49.73 47.25 22.16 18.18 + w in stage 2 35.06 80.51 31.10 45.28 36.87 16.38 17.81 + w in stage 1 & 2 28.00 82.27 44.37 48.71 46.44 22.41 18.90

PROPOSED 39.05 94.38 35.77 52.05 42.40 19.38 16.15 + w 30.25 92.00 50.75 59.03 54.58 25.75 20.84

Table 4: Effects of writer information. w indicates that WRITER embeddings are used. Numbers in bold are the largest among the variants of each method.

The Milwaukee Bucks defeated the New York Knicks, 105-104, at The Milwaukee Bucks defeated the New York Knicks, 105- Madison Square Garden on Saturday. The Bucks (18-16) checked in 104, at Madison Square Garden on Wednesday evening. The to Saturday’s contest with a well, outscoring the Knicks (16-19) by Bucks (18-16) have been one of the hottest teams in the league, a margin of 39-19 in the first quarter. However, New York by just a having won five of their last six games, and they have now won 25-foot lead at the end of the first quarter, the Bucks were able to pull six of their last eight games. The Knicks (16-19) have now away, as they outscored the Knicks by a 59-46 margin into the second. won six of their last six games, as they continue to battle for the 45 points in the third quarter to seal the win for New York with the eighth and final playoff spot in the Eastern Conference. Giannis rest of the starters to seal the win. The Knicks were led by Giannis Antetokounmpo led the way for Milwaukee, as he tallied 27 Antetokounmpo, who tallied a game-high 27 points, to go along with points, 13 rebounds, four assists, three blocked shots and one 13 rebounds, four assists, three blocks and a steal. The game was steal, in 39 minutes . Jabari Parker added 15 points, four re- a crucial night for the Bucks’ starting five, as the duo was the most bounds, three assists, one steal and one block, and 6-of-8 from effective shooters, as they posted Milwaukee to go on a pair of low long range. John Henson added two points, two rebounds, one low-wise (Carmelo Anthony) and Malcolm Brogdon. Anthony added , three steals and one block. John Henson was the only 11 rebounds, seven assists and two steals to his team-high scoring total. other player to score in double digits for the Knicks, with 15 Jabari Parker was right behind him with 15 points, four rebounds, points, four assists, three rebounds and one steal, in 33 min- three assists and a block. Greg Monroe was next with a bench-leading utes. The Bucks were led by Derrick Rose, who tallied 15 18 points, along with nine rebounds, four assists and three steals. points, four assists, three rebounds and one steal in 33 minutes. Brogdon posted 12 points, eight assists, six rebounds and a steal. Willy Hernangomez started in place of Porzingis and finished Derrick Rose and Courtney Lee were next with a pair of {11 / 11} 15 four three one 33 with points, assists, rebounds and steal in -point efforts. Rose also supplied four assists and three rebounds, while Willy Hernangomez minutes. started in place of Jose Calderon Lee complemented his scoring with three assists, a rebound and a steal. one one ( knee ) and responded with rebound and block. The John Henson and Mirza Teletovic were next with a pair of {two / two} Knicks were led by their starting backcourt of Carmelo An- thony and Carmelo Anthony, but combined for just 13 points -point efforts. Teletovic also registered 13 points, and he added a re- on 5-of-16 shooting. The Bucks next head to Philadelphia to bound and an assist. supplied eight points, three rebounds take on the Sixers on Friday night, while the Knicks remain and a pair of steals. The Cavs remain in last place in the Eastern home to face the Clippers on Wednesday. Conference’s Atlantic Division. They now head home to face the on Saturday night.

(a) Puduppully et al.(2019) (b) Our model

Table 5: Example summaries generated with Puduppully et al.(2019)’s model (left) and our model (right). Names in bold face are salient entities. Blue numbers are correct relations derived from input data records but are not observed in reference summary. Orange numbers are incorrect relations. Green numbers are correct relations mentioned in reference summary.

porating writer information w. As discussed in generated highest quality summaries that scored Section 4.5, w is supposed to affect both content 20.84 points of BLEU score. planning and surface realization. Our experimen- tal result is consistent with the discussion. Acknowledgments 7 Conclusion We would like to thank the anonymous reviewers In this research, we proposed a new data-to-text for their helpful suggestions. This paper is based model that produces a summary text while track- on results obtained from a project commissioned ing the salient information that imitates a human- by the New Energy and Industrial Technology De- writing process. As a result, our model outper- velopment Organization (NEDO), JST PRESTO formed the existing models in all evaluation mea- (Grant Number JPMJPR1655), and AIST-Tokyo sures. We also explored the effects of incorpo- Tech Real World Big-Data Computation Open In- rating writer information to data-to-text models. novation Laboratory (RWBC-OIL). With writer information, our model successfully

2110 References Jiatao Gu, Zhengdong Lu, Hang Li, and Victor OK Li. 2016. Incorporating Copying Mechanism in Tatsuya Aoki, Akira Miyazawa, Tatsuya Ishigaki, Kei- Sequence-to-Sequence Learning. In Proceedings of ichi Goshima, Kasumi Aoki, Ichiro Kobayashi, Hi- the 54th Annual Meeting of the Association for Com- roya Takamura, and Yusuke Miyao. 2018. Gener- putational Linguistics, volume 1, pages 1631–1640. ating Market Comments Referring to External Re- sources. In Proceedings of the 11th International Caglar Gulcehre, Sungjin Ahn, Ramesh Nallapati, Conference on Natural Language Generation, pages Bowen Zhou, and Yoshua Bengio. 2016. Pointing 135–139. the Unknown Words. In Proceedings of the 54th An- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Ben- nual Meeting of the Association for Computational gio. 2015. Neural machine translation by jointly Linguistics, pages 140–149. learning to align and translate. In Proceedings of the Third International Conference on Learning Repre- Luong Hoang, Sam Wiseman, and Alexander Rush. sentations. 2018. Entity Tracking Improves Cloze-style Read- ing Comprehension. In Proceedings of the 2018 Regina Barzilay and Mirella Lapata. 2005. Collective Conference on Empirical Methods in Natural Lan- content selection for concept-to-text generation. In guage Processing, pages 1049–1055. Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Lan- Yangfeng Ji, Chenhao Tan, Sebastian Martschat, Yejin guage Processing, pages 331–338. Choi, and Noah A Smith. 2017. Dynamic Entity Antoine Bosselut, Omer Levy, Ari Holtzman, Corin Representations in Neural Language Models. In Ennis, Dieter Fox, and Yejin Choi. 2018. Simulating Proceedings of the 2017 Conference on Empirical Action Dynamics with Neural Process Networks. In Methods in Natural Language Processing, pages Proceedings of the Sixth International Conference 1830–1839. on Learning Representations. Chloe´ Kiddon, Luke Zettlemoyer, and Yejin Choi. David L Chen and Raymond J Mooney. 2008. Learn- 2016. Globally coherent text generation with neural ing to sportscast: a test of grounded language ac- checklist models. In Proceedings of the 2016 Con- quisition. In Proceedings of the 25th international ference on Empirical Methods in Natural Language conference on Machine learning, pages 128–135. Processing, pages 329–339.

Kyunghyun Cho, Bart van Merrienboer, Caglar Gul- Sosuke Kobayashi, Ran Tian, Naoaki Okazaki, and cehre, Dzmitry Bahdanau, Fethi Bougares, Hol- Kentaro Inui. 2016. Dynamic entity representation ger Schwenk, and Yoshua Bengio. 2014. Learn- with max-pooling improves machine reading. In ing Phrase Representations using RNN Encoder– Proceedings of the 15th Conference of the North Decoder for Statistical Machine Translation. In Pro- American Chapter of the Association for Computa- ceedings of the 2014 Conference on Empirical Meth- tional Linguistics: Human Language Technologies, ods in Natural Language Processing, pages 1724– pages 850–855. 1734. Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, Remi´ Lebret, David Grangier, and Michael Auli. 2016. and Yoshua Bengio. 2014. Empirical evaluation of Neural Text Generation from Structured Data with gated recurrent neural networks on sequence model- Application to the Biography Domain. In Proceed- ing. arXiv preprint arXiv:1412.3555. ings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 1203–1213. Elizabeth Clark, Yangfeng Ji, and Noah A Smith. 2018. Neural Text Generation in Stories Using Entity Rep- Percy Liang, Michael I Jordan, and Dan Klein. 2009. resentations as Context. In Proceedings of the 16th Learning semantic correspondences with less super- Conference of the North American Chapter of the vision. In Proceedings of the Joint Conference of Association for Computational Linguistics: Human the 47th Annual Meeting of the ACL and the 4th In- Language Technologies, pages 2250–2260. ternational Joint Conference on Natural Language Processing of the AFNLP, pages 91–99. Xavier Glorot and Yoshua Bengio. 2010. Understand- ing the difficulty of training deep feedforward neu- ral networks. In Proceedings of the thirteenth in- Tianyu Liu, Kexiang Wang, Lei Sha, Baobao Chang, ternational conference on artificial intelligence and and Zhifang Sui. 2018. Table-to-text Generation by statistics, pages 249–256. Structure-aware Seq2seq Learning. In Proceedings of the Thirty-Second AAAI Conference on Artificial Alex Graves, Greg Wayne, Malcolm Reynolds, Intelligence. Tim Harley, Ivo Danihelka, Agnieszka Grabska- Barwinska,´ Sergio Gomez´ Colmenarejo, Edward Thang Luong, Hieu Pham, and Christopher D Man- Grefenstette, Tiago Ramalho, John Agapiou, et al. ning. 2015. Effective Approaches to Attention- 2016. Hybrid computing using a neural net- based Neural Machine Translation. In Proceedings work with dynamic external memory. Nature, of the 2015 Conference on Empirical Methods in 538(7626):471. Natural Language Processing, pages 1412–1421.

2111 Hongyuan Mei, Mohit Bansal, and Matthew R Walter. Yasufumi Taniguchi, Yukun Feng, Hiroya Takamura, 2016. What to talk about and how? Selective Gener- and Manabu Okumura. 2019. Generating Live ation using LSTMs with Coarse-to-Fine Alignment. Soccer-Match Commentary from Play Data. In Pro- In Proceedings of the 15th Conference of the North ceedings of the Thirty-Third AAAI Conference on American Chapter of the Association for Computa- Artificial Intelligence. tional Linguistics: Human Language Technologies, pages 720–730. Zhaopeng Tu, Yang Liu, Zhengdong Lu, Xiaohua Liu, and Hang Li. 2017. Context gates for neural ma- Soichiro Murakami, Akihiko Watanabe, Akira chine translation. Transactions of the Association Miyazawa, Keiichi Goshima, Toshihiko Yanase, Hi- for Computational Linguistics, 5:87–99. roya Takamura, and Yusuke Miyao. 2017. Learning to generate market comments from stock prices. Jason Weston, Sumit Chopra, and Antoine Bordes. In Proceedings of the 55th Annual Meeting of the 2015. Memory Networks. In Proceedings of the Association for Computational Linguistics, pages Third International Conference on Learning Repre- 1374–1384. sentations. Graham Neubig, Chris Dyer, Yoav Goldberg, Austin Sam Wiseman, Stuart Shieber, and Alexander Rush. Matthews, Waleed Ammar, Antonios Anastasopou- 2017. Challenges in Data-to-Document Generation. los, Miguel Ballesteros, David Chiang, Daniel In Proceedings of the 2017 Conference on Empiri- Clothiaux, Trevor Cohn, et al. 2017. Dynet: The cal Methods in Natural Language Processing, pages dynamic neural network toolkit. arXiv preprint 2253–2263. arXiv:1701.03980. Zichao Yang, Phil Blunsom, Chris Dyer, and Wang Kishore Papineni, Salim Roukos, Todd Ward, and Wei- Ling. 2017. Reference-Aware Language Models. Jing Zhu. 2002. BLEU: a method for automatic In Proceedings of the 2017 Conference on Empiri- evaluation of machine translation. In Proceedings cal Methods in Natural Language Processing, pages of the 40th annual meeting on association for com- 1850–1859. putational linguistics, pages 311–318. Karl Pearson. 1901. On lines and planes of closest fit to A Algorithm systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of The generation process of our model is shown Science, 2(11):559–572. in Algorithm1. For a concise description, we omit the condition for each probability notation. Ratish Puduppully, Li Dong, and Mirella Lapata. 2019. Data-to-Text Generation with Content Selection and and represent “start of the doc- Planning. In Proceedings of the Thirty-Third AAAI ument” and “end of the document”, respectively. Conference on Artificial Intelligence. Sashank J Reddi, Satyen Kale, and Sanjiv Kumar. 2018. On the convergence of adam and beyond. In B Experimental settings Proceedings of the Sixth International Conference We set the dimensions of the embeddings to 128, on Learning Representations. and those of the hidden state of RNN to 512 and all Lei Sha, Lili Mou, Tianyu Liu, Pascal Poupart, Sujian of parameters are initialized with the Xavier ini- Li, Baobao Chang, and Zhifang Sui. 2018. Order- tialization (Glorot and Bengio, 2010). We set the planning neural text generation from structured data. In Proceedings of the Thirty-Second AAAI Confer- maximum number of epochs to 30, and choose the ence on Artificial Intelligence. model with the highest BLEU score on the devel- Sainbayar Sukhbaatar, Jason Weston, Rob Fergus, et al. opment data. The initial learning rate is 2e-3 and 2015. End-to-end memory networks. In Advances AMSGrad is also used for automatically adjusting in neural information processing systems, pages the learning rate (Reddi et al., 2018). Our imple- 2440–2448. mentation uses DyNet (Neubig et al., 2017). Ilya Sutskever, Oriol Vinyals, and Quoc V Le. 2014. Sequence to sequence learning with neural net- works. In Advances in neural information process- ing systems, pages 3104–3112. Kumiko Tanaka-Ishii, Koitiˆ Hasida, and Itsuki Noda. 1998. Reactive content selection in the generation of real-time soccer commentary. In Proceedings of the 36th Annual Meeting of the Association for Com- putational Linguistics and 17th International Con- ference on Computational Linguistics, pages 1282– 1288.

2112 Algorithm 1: Generation process Input: Data records s, Annotations Z1:T ,E1:T ,A1:T ,N1:T LM ENT 1 Initialize {re,a,v}r∈x, {e¯}e∈E , h0 , h0 2 t ← 0 3 et, yt ← NONE, < SOD > 4 while yt 6=< EOD > do 5 t ← t + 1 6 if p(Zt = 1) ≥ 0.5 then /* Select the entity */ 0 7 et ← arg max p(Et = et) 8 if et 6∈ Et−1 then /* If et is a new entity */ ENT0 E ENT 9 ht ← GRU (e¯t, ht−1 ) 10 Et ← Et−1 ∪ {et} 11 else if et 6= et−1 then /* If et has been observed before, but is different from the previous one. */ ENT’ E S ENT ENT 12 ht ← GRU (W hs , ht−1 ), 13 where s = max{s : s ≤ t − 1, e = es} 14 else ENT’ ENT 15 ht ← ht−1 /* Select an attribute for the entity, et. */ 0 16 at ← arg max p(At = at) ENT A ENT0 17 ht ← GRU (ret,at,x[et,at], ht ) 18 if at is a number attribute then 19 if p(Nt = 1) ≥ 0.5 then 20 yt ← numeral of x[et, at] 21 else 22 yt ← x[et, at] 23 end 24 else 25 yt ← x[et, at] 0 H LM ENT  26 ht ← tanh W (ht−1 ⊕ ht ) LM 0 LM 27 ht ← LSTM(yt ⊕ ht, ht−1) 28 else ENT ENT 29 et, at, ht ← et−1, at−1, ht−1 0 H LM ENT  30 ht ← tanh W (ht−1 ⊕ ht ) 31 yt ← arg max p(Yt) LM 0 LM 32 ht ← LSTM(yt ⊕ ht, ht−1) 33 end 34 if yt is “.” then ENT A ENT 35 ht ← GRU (vREFRESH, ht ) 36 end 37 return y1:t−1;

2113