Table-to-Text Generation with Effective Hierarchical Encoder on Three Dimensions (Row, Column and Time)

Heng Gong, Xiaocheng Feng, Bing Qin,∗ Ting Liu Harbin Institute of Technology, China {hgong, xcfeng, qinb, tliu}@ir.hit.edu.cn

Team POINTS WINS LOSSES … The ( 21 - 27 ) defeated the Abstract ( 31 - 18 ) 92 - 88 on Wizards 88 31 18 … Wednesday … The Hornets were led by the duo Hornets 92 21 27 of John Wall and Bradley Beal . Wall went 4 - for - 14 from the field and 1 - for - 4 from the Although Seq2Seq models for table-to-text Player PTS AST REB … three - line to score a game - high of 16 point … Gerald Henderson had a solid generation have achieved remarkable progress, Wizards showing as well , finishing with 17 points ( 6 - 11 1 3 … 13 FG , 1 - 2 3Pt , 4 - 4 FT ) and five assists . It modeling table representation in one dimen- Nene 8 1 7 … was his second double - double in a row… Bradley Beal 18 1 11 … Baseline result (CC) sion is inadequate. This is because (1) the table John Wall 16 10 1 … The Charlotte Hornets ( 21 - 27 ) defeated the consists of multiple rows and columns, which … … … … … Washington Wizards ( 31 - 18 ) 92 - 88 on 13 1 5 … Monday …The Hornets were led by in this game , who went 9 - for - 19 Hornets means that encoding a table should not de- from the floor to score 18 points ... It was the Michael Kidd-Gilchrist 13 3 13 … second time in the last three games he ’s pend only on one dimensional sequence or set Al Jefferson 18 1 12 … posted a double - double , while the two steals Gerald Henderson 17 5 2 … matched a season - high for the … Beal of records and (2) most of the tables are time has turned it on over his last two games , Brian Roberts 18 3 1 … combining for 44 points and 14 rebounds ... series data (e.g. NBA game data, stock mar- This double - double marked the second in a … … … … … row for Wall , who 's combined for 44 points ket data), which means that the description of Gary Neal 12 1 0 … and 22 asssists over his last two games … the current table may be affected by its histori- Tables Gold cal data. To address aforementioned problems, Figure 1: Generated example on ROTOWIRE by us- not only do we model each table cell consid- ing Conditional Copy (CC) as baseline (Wiseman et al., ering other records in the same row, we also 2017). Text that accurately reflects records in the table enrich table’s representation by modeling each is in red, and text that contradicts the records is in blue. table cell in context of other cells in the same column or with historical (time dimension) data respectively. In addition, we develop a ta- forecasting and medical monitoring, etc. The lat- ble cell fusion gate to combine representations ter generates text directly from the table through from row, column and time dimension into one a standard neural encoder-decoder framework to dense vector according to the saliency of each avoid error propagation and has achieved remark- dimension’s representation. We evaluated our methods on ROTOWIRE, a benchmark dataset able progress. In this paper, we particularly focus of NBA basketball games. Both automatic and on exploring how to improve the performance of human evaluation results demonstrate the ef- neural methods on table-to-text generation. fectiveness of our model with improvement of Recently, ROTOWIRE, which provides tables 2.66 in BLEU over the strong baseline and out- of NBA players’ and teams’ statistics with a de- performance of state-of-the-art model. scriptive summary, has drawn increasing attention from academic community. Figure1 shows an ex- arXiv:1909.02304v1 [cs.CL] 5 Sep 2019 1 Introduction ample of parts of a game’s statistics and its corre- sponding computer generated summary. We can Table-to-text generation is an important and chal- see that the tables has a formal structure includ- lenging task in natural language processing, which ing table row header, table column header and ta- aims to produce the summarization of numeri- ble cells. “Al Jefferson” is a table row header cal table (Reiter and Dale, 2000; Gkatzia, 2016). that represents a player, “PTS” is a table column The related methods can be empirically divided header indicating the column contains player’s into two categories, pipeline model and end-to- score and “18” is the value of the table cell, that end model. The former consists of content selec- is, Al Jefferson scored 18 points. Several related tion, document planning and realisation, mainly models have been proposed . They typically en- for early industrial applications, such as weather code the table’s records separately or as a long se- ∗ Email corresponding. quence and generate a long descriptive summary by a standard Seq2Seq decoder with some mod- important representation from those three dimen- ifications. Wiseman et al.(2017) explored two sion and combine them into a dense vector. In the types of copy mechanism and found conditional third layer, we use mean pooling method to merge copy model (Gulcehre et al., 2016) perform better the previously obtained table cell representations . Puduppully et al.(2019) enhanced content se- in the same row into the representation of the ta- lection ability by explicitly selecting and planning ble’s row. Then, we use self-attention with content relevant records. Li and Wan(2018) improved the selection gate (Puduppully et al., 2019) to filter precision of describing data records in the gen- unimportant rows’ information. To the best of our erated texts by generating a template at first and knowledge, this is the first work on neural table- filling in slots via copy mechanism. Nie et al. to-text generation via modeling column and time (2018) utilized results from pre-executed opera- dimension information so far. We conducted ex- tions to improve the fidelity of generated texts. periments on ROTOWIRE. Results show that our However, we claim that their encoding of tables as model outperforms existing systems, improving sets of records or a long sequence is not suitable. baseline BLEU from 14.19 to 16.85 (+18.75%), Because (1) the table consists of multiple play- P% of relation generation (RG) from 74.80 to ers and different types of information as shown in 91.46 (+22.27%), F1% of content selection (CS) Figure1. The earlier encoding approaches only from 32.49 to 41.21 (+26.84%) and content order- considered the table as sets of records or one di- ing (CO) from 15.42 to 20.86 (+35.28%) on test mensional sequence, which would lose the infor- set. It also exceeds the state-of-the-art model in mation of other (column) dimension. (2) the ta- terms of those metrics. ble cell consists of time-series data which change over time. That is to say, sometimes historical data can help the model select content. Moreover, 2 Preliminaries when a human writes a basketball report, he will not only focus on the players’ outstanding per- formance in the current match, but also summa- 2.1 Notations rize players’ performance in recent matches. Lets take Figure1 again. Not only do the gold texts The input to the model are tables S = {s1, s2, s3}. mention Al Jefferson’s great performance in this s1, s2, and s3 contain records about players’ per- match, it also states that “It was the second time in formance in home team, players’ performance in the last three games he’s posted a double-double”. visiting team and team’s overall performance re- Also gold texts summarize John Wall’s “double- spectively. We regard each cell in the table as double” performance in the similar way. Summa- record. Each record r consists of four types of in- rizing a player’s performance in recent matches re- formation including value r.v (e.g. 18), entity r.e quires the modeling of table cell with respect to (e.g. Al Jefferson), type r.c (e.g. POINTS) and a its historical data (time dimension) which is ab- feature r.f (e.g. visiting) which indicate whether sent in baseline model. Although baseline model a player or a team compete in home court or not. Conditional Copy (CC) tries to summarize it for Each player or team takes one row in the table Gerald Henderson, it clearly produce wrong state- and each column contains a type of record such ments since he didn’t get “double-double” in this as points, assists, etc. Also, tables contain the date match. when the match happened and we let k denote the date of the record. We also create timelines for To address the aforementioned problems, we records. The details of timeline construction is de- present a hierarchical encoder to simultaneously scribed in Section 2.2. For simplicity, we omit ta- model row, column and time dimension informa- ble id l and record date k in the following sections th th tion. In detail, our model is divided into three lay- and let ri,j denotes a record of i row and j col- ers. The first layer is used to learn the represen- umn in the table. We assume the records come tation of the table cell. Specifically, we employ from the same table and k is the date of the men- three self-attention models to obtain three repre- tioned record. Given those information, the model sentations of the table cell in its row, column and is expected to generate text y = (y1, ..., yt, ..., yT ) time dimension. Then, in the second layer, we describing these tables. T denotes the length of design a record fusion gate to identify the more the text. Hierarchical Encoding

Table Cell Table Row Hornets Row Dimension Self Attention Representation Representation PTS_18 Mean Pooling Self Attention Al AST_1 PTS_18 Jefferson REB_12 Player PTS AST REB … … Al Jefferson AST_1 Gate ContentSelection BLK_0 … Michael Kidd-Gilchrist 13 3 13 … Gerald Henderson Row REB_12 Al Attention 1 12 … Al Jefferson 18 Jefferson Column Dimension Self Attention Brian Roberts … Gerald Henderson 17 5 2 Michael Kidd- PTS_13 … Brian Roberts 18 3 1 … Gilchrist PTS_17 Al PTS_18 … … … … … Jefferson Gary Neal Gerald PTS_17 AST_5 … … Gary Neal 12 1 0 … Henderson … Gary PTS_12 Gerald Neal REB_2 Henderson Mean Decoding Pooling Timeline Mean Pooling PTS_17 PTS_14 PTS_18 Time Dimension Self Attention Dual PTS_17 Attention 2015_01_28 Cell Attention PTS_14 2015_01_28 2015_01_31 2015_02_02 2015_01_31 PTS_18 Record … Date Al Jefferson 2015_02_02 Fusion Gate Al Jefferson Position Embedding The Charlotte Hornets … Al Jefferson…

Layer 1: record encoders Layer 2 Layer 3: row-level encoder Figure 2: The architecture of our proposed model. 2.2 Record Timeline Constrcution of three tables and normalize the attention weight In this paper, we construct timelines tl = αt,i0,j0 across every records in every tables. Then it E,C combines the context vector with decoder’s hidden {tle,c}e=1,c=1 for records. E denotes the num- ber of distinct record entities and C denotes the state dt and form a new attentional hidden state ˜ number of record types. For each timeline tl , dt which is used to generate words from vocab- e,c ˜ we first extract records with the same entity e and ulary Pgen(yt|y

row X row of rows and columns, the history information is se- c = α 0 r 0 (2) i,j i,j,j i,j quential. Therefore, we introduce a trainable po- j0,j06=j 0 sition embedding embpos(k ) and add it to the Then, we combine record’s representation with record’s representation and obtain a new record representation rp 0 . It denotes the representation ci,j and obtain the row dimension record repre- k row row of a record with the same entity and type of r sentation r = tanh(Wf [ri,j; c ]). Wf is i,j i,j i,j 0 a trainable parameter. but of the date k before k in the corresponding time history window. We use ri,j to denote the his- 3.1.2 Column Dimension Encoder tory representation of the record of ith row and jth Each input table consists of multiple rows and column. Then the history dimension context vec- columns. Each column in the table covers one type tor is obtained by attending to history records in of information such as points. Only few of the row the window. Please note that we use 1-layer MLP time may have high points or other type of information as score function here and αk,k0 is normalized and thus become the important one. For exam- within the history window. We obtain the time di- time ple, in “Column Dimension” part of Figure2, “Al mension representation ri,j similar to row di- Jefferson” is more important than “Gary Neal” be- mension. cause the former one have more impressive points. time Therefore, when encoding a record, it is helpful to αk,k0 ∝ exp(score(rpk, rpk0 )) (4) compare it with other records in the same column time X time ci,j = αk,k0 rpk0 (5) in order to understand the performance level re- k0

Table 1: Automatic evaluation results. Results were obtained using Puduppully et al.(2019)’s updated models texts. CS (Content Selection) measures model’s 4.3 Results ability on content selection. CO (Content Order- ing) measures model’s ability on ordering the cho- 4.3.1 Automatic Evaluation sen records in texts. We refer the readers to Wise- Table1 displays the automatic evaluation results man et al.(2017)’s paper for more details. on both development and test set. We chose Con- ditional Copy (CC) model as our baseline, which 4.2 Implementation Details is the best model in Wiseman et al.(2017). We included reported scores with updated IE model Following configurations in Puduppully et al. by Puduppully et al.(2019) and our implementa- (2019), we set word embedding and LSTM de- tion’s result on CC in this paper. Also, we com- coder hidden size as 600. The decoder’s layer pared our models with other existing works on this was set to be 2. Input feeding (Luong et al., dataset including OpATT (Nie et al., 2018) and 2015) was also used for decoder. We applied Neural Content Planning with conditional copy dropout at a rate 0.3. For training, we used Ada- (NCP+CC) (Puduppully et al., 2019). In addition, grad (Duchi et al., 2010) optimizer with learning we implemented three other hierarchical encoders rate of 0.15, truncated BPTT ( length 100), that encoded tables’ row dimension information in batch size of 5 and learning rate decay of 0.97. both record-level and row-level to compare with For inferring, we set beam size as 5. We also the hierarchical structure of encoder in our model. set the history windows size as 3 from {3,5,7} The decoder was equipped with dual attention based on the results. Code of our model can be (Cohan et al., 2018). The one with LSTM cell found at https://github.com/ernestgong/data2text- is similar to the one in Cohan et al.(2018) with three-dimensions/. 1 layer from {1,2,3}. The one with CNN cell (Gehring et al., 2017) has kernel width 3 from RG CS CO Model BLEU {3, 5} and 10 layer from {5,10,15,20}. The one P% # P% R% DLD% with transformer-style encoder (MHSA) (Vaswani Gold 96.0117.17100.00100.00100.00 100.00 et al., 2017) has 8 head from {8, 10} and 5 layer TEM 99.9754.1423.88 72.63 11.90 8.33 from {2,3,4,5,6}. The heads and layers mentioned CC 75.2616.3732.63 39.62 15.34 14.03 above were for both record-level encoder and row- DEL∗ 84.8619.3130.81 38.79 16.34 16.19 level encoder respectively. The self-attention (SA) NCP 87.9924.5035.97 55.85 16.98 16.22 cell we used, as described in Section3, achieved Ours 92.5122.7338.52 52.98 19.95 16.69 better overall performance in terms of F1% of CS, CO and BLEU among the hierarchical encoders. Table 2: Automatic evaluation results on test set. Re- Also we implemented a template system same as sults were obtained using Wiseman et al.(2017)’s the one used in Wiseman et al.(2017) which out- trained extractive evaluation models with relexicaliza- ∗ putted eight sentences: an introductory sentence tion (Li and Wan, 2018). We include delayed copy (DEL)’s result in the paper (Li and Wan, 2018) for com- (two teams’ points and who win), six top players’ parison. statistics (ranked by their points) and a conclusion sentence. We refer the readers to Wiseman et al. drop. Also, position embedding is critical when (2017)’s paper for more detailed information on modeling time dimension information according templates. The gold reference’s result is also in- to the results. In addition, record fusion gate plays cluded in Table1. Overall, our model performs an important role because BLEU, CO, RG P% better than other neural models on both develop- and CS P% drop significantly after subtracting it ment and test set in terms of RG’s P%, F1% score from full model. Results show that each compo- of CS, CO and BLEU, indicating our model’s clear nent in the model contributes to the overall perfor- improvement on generating high-fidelity, informa- mance. In addition, we compare our model with tive and fluent texts. Also, our model with three delayed copy model (DEL) (Li and Wan, 2018) dimension representations outperforms hierarchi- along with gold text, template system (TEM), con- cal encoders with only row dimension represen- ditional copy (CC) (Wiseman et al., 2017) and tation on development set. This indicates that NCP+CC (NCP) (Puduppully et al., 2019). Li and cell and time dimension representation are impor- Wan(2018)’s model generate a template at first tant in representing the tables. Compared to re- and then fill in the slots with delayed copy mech- ported baseline result in Wiseman et al.(2017), anism. Since its result in Li and Wan(2018)’s pa- we achieved improvement of 22.27% in terms of per was evaluated by IE model trained by Wise- RG, 26.84% in terms of CS F1%, 35.28% in terms man et al.(2017) and “relexicalization” by Li and of CO and 18.75% in terms of BLEU on test set. Wan(2018), we adopted the corresponding IE Unsurprisingly, template system achieves best on model and re-implement “relexicalization” as sug- RG P% and CS R% due to the included domain gested by Li and Wan(2018) for fair comparison. knowledge. Also, the high RG # and low CS P% Please note that CC’s evaluation results via our re- indicates that template will include vast informa- implemented “relexicalization” is comparable to tion while many of them are deemed redundant. the reported result in Li and Wan(2018). We ap- In addition, the low CO and low BLEU indicates plied them on models other than DEL as shown in that the rigid structure of the template will pro- Table2 and report DEL’s result from (Li and Wan, duce texts that aren’t as adaptive to the given tables 2018)’s paper. It shows that our model outperform and natural as those produced by neural models. Li and Wan(2018)’s model significantly across all Also, we conducted ablation study on our model automatic evaluation metrics in Table2. to evaluate each component’s contribution on de- velopment set. Based on the results, the absence of 4.3.2 Human Evaluation row-level encoder hurts our model’s performance across all metrics especially the content selection In this section, we hired three graduates who ability. passed intermediate English test (College English Test Band 6) and were familiar with NBA games Row, column and time dimension information to perform human evaluation. are important to the modeling of tables because First, in order to check if history information is subtracting any of them will result in performance important, we sampled 100 summaries from train- Model #Sup #Cont #Gram #Coher #Conc different from the NCP model. As for average Gold 3.48 0.19 16.67 24.22 25.78 number of contradicting facts, our model is sig- Temp 7.83 0.00 11.56 -16.67 21.11 nificantly different from other two neural mod- CC 3.91 1.23 -11.33 -7.78 -28.00 els. Surprisingly, gold texts were found contain- NCP 5.15 0.82 -17.33 -5.33 -17.11 ing contradicting facts. We checked the raters’s re- Ours 3.63 0.44 0.44 5.56 -1.78 sult and found that gold texts occasionally include wrong field-goal or three-point percent or wrong Table 3: Human evaluation results. points difference between the winner and the de- feated team. We can treat the average contradict- ing set and asked raters to manually check whether ing facts number of gold texts as a lower bound. the summary contained expressions that need to be In the third experiment, following Puduppully inferred from history information. It turns out that et al.(2019), we asked raters to evaluate those 56.7% summaries of the sampled summaries need models in terms of grammaticality (is it more flu- history information. ent and grammatical?), coherence (is it easier to Following human evaluation settings in Pudup- read or follows more natural ordering of facts? ) pully et al.(2019), we conducted the following hu- and conciseness (does it avoid redundant informa- man evaluation experiments at the same scale. The tion and repetitions?). We adopted the same 30 second experiment is to assess whether the im- examples from above and arranged every 5-tuple provement on relation generation metric reported of summaries into 10 pairs. Then, we asked the in automatic evaluation is supported by human raters to choose which system performs the best evaluation. We compared our full model with given each pair. Scores are computed as the differ- gold texts, template-based system, CC (Wiseman ence between percentage of times when the model et al., 2017) and NCP+CC (NCP) (Puduppully is chosen as the best and percentage of times when et al., 2019). We randomly sampled 30 examples the model is chosen as the worst. Gold texts is sig- from test set. Then, we randomly sampled 4 sen- nificantly more grammatical than others across all tences from each model’s output for each exam- three metrics. Also, our model performs signif- ple. We provided the raters of those sampled sen- icantly better than other two neural models (CC, tences with the corresponding NBA game statis- NCP) in all three metrics. Template-based system tics. They were asked to count the number of sup- generates significantly more grammatical and con- porting and contradicting facts in each sentence. cise but significantly less coherent results, com- Each sentence is rated independently. We report pared to all three neural models. Because the rigid the average number of supporting facts (#Sup) and structure of texts ensures the correct grammatical- contradicting facts (#Cont) in Table3. Unsurpris- ity and no repetition in template-based system’s ingly, template-based system includes most sup- output. However, since the templates are stilted porting facts and least contradicting facts in its and lack variability compared to others, it was texts because the template consists of a large num- deemed less coherent than the others by the raters. ber of facts and all of those facts are extracted from the table. Also, our model produces less contra- 4.3.3 Qualitative Example dicting facts than other two neural models. Al- though our model produces less supporting facts Our model: The Charlotte Hornets ( 21 - 27 ) defeated the Washington Wizards ( 31 - 18 ) 92 - 88 on Monday … The Hornets were led by Al than NCP and CC, it still includes enough sup- Jefferson , who recorded a double - double of his own with 18 points porting facts (slightly more than gold texts). Also, ( 9 - 19 FG , 0 - 2 FT ) and 12 rebounds . It was his second double - double over his last three games … The only other Wizard to reach comparing to NCP+CC (NCP)s tendency to in- double - digit points was Kris Humphries , who came off the bench clude vast information that contain redundant in- for 13 points ( 4 - 8 FG , 5 - 6 FT ) and five rebounds in 26 minutes … formation, our models ability to select and accu- Figure 3: An generation example of our model based rately convey information is better. All other re- on the same tables in Figure 1. Text that accurately sults (Gold, CC, NCP and ours) are significantly reflects players (Al Jefferson and Kris Humphries) per- different from template-based system’s results in formance is in red. terms of number of supporting facts according to one-way ANOVA with posthoc Tukey HSD tests. Figure3 shows an example generated by our All significance difference reported in this paper model. It evidently has several nice properties: are less than 0.05. Our model is also significantly it can accurately select important player “Al Jef- ferson” from the tables who is neglected by base- representation of tables’ row, column and time di- line model, which need the model to understand mension information. In this paper, we propose an performance difference of a type of data (column) effective hierarchical encoder which models infor- between each rows (players). Also it correctly mation from row, column and time dimension si- summarize performance of “Al Jefferson” in this multaneously. match as “double-double” which requires abil- ity to capture dependency from different columns 6 Conclusion (different type of record) in the same row (player). In this work, we present an effective hierarchi- In addition, it models “Al Jefferson” history per- cal encoder for table-to-text generation that learns formance and correctly states that “It was his sec- table representations from row, column and time ond double-double over his last three games”, dimension. In detail, our model consists of which is also mentioned in gold texts included in three layers, which learn records’ representation Figure 1 in a similar way. in three dimension, combine those representations via their sailency and obtain row-level representa- 5 Related Work tion based on records’ representation. Then, dur- In recent years, neural data-to-text systems make ing decoding, it will select important table row be- remarkable progress on generating texts directly fore attending to records. Experiments are con- from data. Mei et al.(2016) proposes an encoder- ducted on ROTOWIRE, a benchmark dataset of aligner-decoder model to generate weather fore- NBA games. Both automatic and human evalua- cast, while Jain et al.(2018) propose a mixed tion results show that our model achieves the new hierarchical attention. Sha et al.(2018) pro- state-of-the-art performance. poses a hybrid content- and linkage-based atten- tion mechanism to model the order of content. Liu Acknowledgements et al.(2018) propose to integrate field informa- We would like to thank the anonymous review- tion into table representation and enhance decoder ers for their helpful comments. We’d also like to with dual attention. Bao et al.(2018) develops thank Xinwei Geng, Yibo Sun, Zhengpeng Xiang a table-aware encoder-decoder model. Wiseman and Yuyu Chen for their valuable input. This work et al.(2017) introduced a document-scale data-to- was supported by the National Key R&D Program text dataset, consisting of long text with more re- of China via grant 2018YFB1005103 and National dundant records, which requires the model to se- Natural Science Foundation of China (NSFC) via lect important information to generate. We de- grant 61632011 and 61772156. scribe recent works in Section1. Also, some studies in abstractive text summarization encode long texts in a hierarchical manner. Cohan et al. References (2018) uses a hierarchical encoder to encode in- Junwei Bao, Duyu Tang, Nan Duan, Zhao Yan, Yuan- put, paired with a discourse-aware decoder. Ling hua Lv, Ming Zhou, and Tiejun Zhao. 2018. Table- and Rush(2017) encode document hierarchically to-text: Describing table region with natural lan- and propose coarse-to-fine attention for decoder. guage. In The Thirty-Second AAAI Conference on Recently, Liu et al.(2019) propose a hierarchi- Artificial Intelligence, pages 5020–5027. Associa- cal encoder for data-to-text generation which uses tion for the Advancement of Artificial Intelligence. LSTM as its cell. Murakami et al.(2017) propose Arman Cohan, Franck Dernoncourt, Doo Soon Kim, to model stock market time-series data and gen- Trung Bui, Seokhwan Kim, Walter Chang, and Nazli erate comments. As for incorporating historical Goharian. 2018. A discourse-aware attention model background in generation, Robin(1994) proposed for abstractive summarization of long documents. In Proceedings of the 2018 Conference of the North to build a draft with essential new facts at first, American Chapter of the Association for Computa- then incorporate background facts when revising tional Linguistics: Human Language Technologies, the draft based on functional unification gram- pages 615–621. ACL. mars. Different from that, we encode the historical John C. Duchi, Elad Hazan, and Yoram Singer. 2010. (time dimension) information in the neural data- Adaptive subgradient methods for online learning to-text model in an end-to-end fashion. Existing and stochastic optimization. Journal of Machine works on data-to-text generation neglect the joint Learning Research, 12:2121–2159. Jonas Gehring, Michael Auli, David Grangier, Denis Soichiro Murakami, Akihiko Watanabe, Akira Yarats, and Yann Dauphin. 2017. Convolutional se- Miyazawa, Keiichi Goshima, Toshihiko Yanase, Hi- quence to sequence learning. In Proceedings of the roya Takamura, and Yusuke Miyao. 2017. Learning 34th International Conference on Machine Learn- to generate market comments from stock prices. ing, pages 1243–1252. JMLR. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pages Dimitra Gkatzia. 2016. Content selection in data-to- 1374–1384. ACL. text systems: A survey. Feng Nie, Jinpeng Wang, Jin-Ge Yao, Rong Pan, and Caglar Gulcehre, Sungjin Ahn, Ramesh Nallapati, Chin-Yew Lin. 2018. Operation-guided neural net- Bowen Zhou, and Yoshua Bengio. 2016. Pointing works for high fidelity data-to-text generation. In the unknown words. In Proceedings of the 54th An- Proceedings of the 2018 Conference on Empirical nual Meeting of the Association for Computational Methods in Natural Language Processing, pages Linguistics , pages 140–149. ACL. 3879–3889. ACL. Parag Jain, Anirban Laha, Karthik Sankaranarayanan, Preksha Nema, Mitesh M. Khapra, and Shreyas Kishore Papineni, Salim Roukos, Todd Ward, and Wei- Shetty. 2018. A mixed hierarchical attention based Jing Zhu. 2002. Bleu: a method for automatic eval- encoder-decoder approach for standard table sum- uation of machine translation. In Proceedings of marization. In Proceedings of the 2018 Conference 40th Annual Meeting of the Association for Compu- of the North American Chapter of the Association tational Linguistics, pages 311–318. ACL. for Computational Linguistics: Human Language Ratish Puduppully, Li Dong, and Mirella Lapata. 2019. Technologies, pages 622–627. ACL. Data-to-text generation with content selection and Liunian Li and Xiaojun Wan. 2018. Point precisely: planning. In Proceedings of the AAAI Conference Towards ensuring the precision of data in generated on Artificial Intelligence, pages 6908–6915. Associ- texts using delayed copy mechanism. In Proceed- ation for the Advancement of Artificial Intelligence. ings of the 27th International Conference on Com- Ehud Reiter and Robert Dale. 2000. Building natural putational Linguistics, pages 1044–1055. ACL. language generation systems. Cambridge university Jeffrey Ling and Alexander Rush. 2017. Coarse-to-fine press. attention models for document summarization. In Proceedings of the Workshop on New Frontiers in Jacques Robin. 1994. Revision-based generation of Summarization, pages 33–42. ACL. natural language summaries providing historical background: corpus-based analysis, design, imple- Tianyu Liu, Fuli Luo, Qiaolin Xia, Shuming Ma, mentation and evaluation. Ph.D. thesis. Baobao Chang, and Zhifang Sui. 2019. Hierarchical encoder with auxiliary supervision for neural table- Lei Sha, Lili Mou, Tianyu Liu, Pascal Poupart, Sujian to-text generation: Learning better representation Li, Baobao Chang, and Zhifang Sui. 2018. Order- for tables. In Proceedings of the AAAI Conference planning neural text generation from structured data. on Artificial Intelligence, pages 6786–6793. Associ- In The Thirty-Second AAAI Conference on Artificial ation for the Advancement of Artificial Intelligence. Intelligence, pages 5414–5421. Association for the Advancement of Artificial Intelligence. Tianyu Liu, Kexiang Wang, Lei Sha, Baobao Chang, and Zhifang Sui. 2018. Table-to-text generation by Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob structure-aware seq2seq learning. In The Thirty- Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Second AAAI Conference on Artificial Intelligence, Kaiser, and Illia Polosukhin. 2017. Attention is all pages 4881–4888. Association for the Advancement you need. In Advances in Neural Information Pro- of Artificial Intelligence. cessing Systems, pages 5998–6008. Curran Asso- ciates, Inc. Yang P. Liu and Mirella Lapata. 2018. Learning struc- tured text representations. Transactions of the Asso- Sam Wiseman, Stuart Shieber, and Alexander Rush. ciation for Computational Linguistics, 6:63–75. 2017. Challenges in data-to-document generation. In Proceedings of the 2017 Conference on Empiri- Thang Luong, Hieu Pham, and Christopher D. Man- cal Methods in Natural Language Processing, pages ning. 2015. Effective approaches to attention-based 2253–2263. ACL. neural machine translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pages 1412–1421. ACL. Hongyuan Mei, Mohit Bansal, and Matthew R. Walter. 2016. What to talk about and how? selective gen- eration using LSTMs with coarse-to-fine alignment. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computa- tional Linguistics: Human Language Technologies, pages 720–730. ACL.