Towards Comprehensive Description Generation from Factual Attribute-Value Tables

Towards Comprehensive Description Generation from Factual Attribute-value Tables Tianyu Liu1, Fuli Luo1, Pengcheng Yang1, Wei Wu1, Baobao Chang1,2 and Zhifang Sui1,2 1MOE Key Lab of Computational Linguistics, School of EECS, Peking University 2Peng Cheng Laboratory, Shenzhen, China ftianyu0421, luofuli, yang pc, wu.wei, chbb, [email protected] Abstract Attribute Value Birthplace Utah, America Position forward (soccer player) The comprehensive descriptions for factual attribute-value tables, which should be accu- Comprehensive:A Utah soccer player who plays as forward rate, informative and loyal, can be very helpful Missing Key Attri.:A soccer player who plays as forward Groundless info:A Utah forward in the national team for end users to understand the structured data Less Informative: An American forward in this form. However previous neural generators might suffer from key attributes miss- Table 1: An example for comprehensive generation. ing, less informative and groundless informa- Suppose we only have two attribute-value tuples, the tion problems, which impede the generation of underlined content is groundless information not men- high-quality comprehensive descriptions for tioned in source tables. tables. To relieve these problems, we first propose force attention (FA) method to encourage the generator to pay more attention to the tive and groundless content in its generated de- uncovered attributes to avoid potential key attributes missing. Furthermore, we propose re- scriptions towards source tables. For example, in inforcement learning for information richness Table1, the ‘missing key attribute’ case doesn’t to generate more informative as well as more mention where the player comes from (birthplace) loyal descriptions for tables. In our experi- while the ‘less informative’ one chooses American ments, we utilize the widely used WIKIBIO rather than Utah. The case with groundless infor- dataset as a benchmark. Additionally we cre- mation contains ‘in the national team’ which is WB-filter WIKIBIO ate based on to test not mentioned in the source attributes. Although our model in the simulated user-oriented scenarios, in which the generated descriptions the ‘key points missing’ problem exists in many should accord with particular user interests. text-to-text and data-to-text datasets, for large- Experimental results show that our model out- scale structured tables with vast heterogeneous at- performs the state-of-the-art baselines on both tributes such as Wikipedia infoboxes, ‘Key at- automatic and human evaluation. tribute missing’ and ‘less informative’ problems might be even more challenging. As the key at- 1 Introduction tributes, like the ‘position’ of a basketball player Generating descriptions for the factual attribute- or the ‘political party’ of a senator, are very likely value tables has attracted widely interests among to be unique features to particular tables, which NLP researchers especially in a neural end-to-end usually appear much less frequently and are sel- fashion (e.g. Lebret et al.(2016); Liu et al.(2018); domly mentioned than the common attributes like Sha et al.(2018); Bao et al.(2018); Puduppully ‘Name’ and ‘Birthdate’. The ‘groundless infor- et al.(2018); Li and Wan(2018); Nema et al. mation’, which is also known as the ‘hallucina- (2018)) as shown in Fig1a. For broader potential tion’ problem, remains a long-standing problem in applications in this field, we also simulate user- NLG. oriented generation, whose goal is to provide com- In this paper, we show that our model can gen- prehensive generation for the selected attributes erate more accurate and informative descriptions according to particular user interests like Fig1b. with less groundless content for tables. Firstly we However, we find that previous models might design a force-attention (FA) method to encour- miss key information and generate less informa- age the decoder to pay more attention to the un- 5985 Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5985–5996 Florence, Italy, July 28 - August 2, 2019. c 2019 Association for Computational Linguistics Wikipedia Infobox Attribute Va l ue sive table descriptions (Table1). Then we demon- (a) End-to-end (neural) Table-to-text Generation Name Dillon Sheppard Table Encoder Description Decoder strate how and why we create WB-filter (Sec Birthdate 27 Feb 1979 … … 4.1) as well as evaluations (Sec 4.2), experimental Birthplace Durban, South Africa configurations (Sec 4.3 and 4.4), case studies and Current Club Bidvest Wits ( ( (b) User-oriented Description Generation for the Tables ( Number 29 visualizations )( (Sec ) 4.5 ) and error analysis (Sec User interests Description Generation Height 1.80 m (5 ft 11 in) Attributes selected by users : Name played as a 4.6). Name ; Current Club ; Position Position in Current Club Position Left-winger 2 Background: Table-to-Description Figure 1: The end-to-end (a) and user-oriented table-to- 2.1 Table Encoder text generation (b) for an infobox (left) in WIKIBIO. Given a structured table like Fig1 (left), we model the attribute-value tuples in the table as a sequence covered attributes to avoid potential key attributes of words with related attribute names. After seri- missing by both stepwise and global constraints. alizing all the words in the ‘Value’ columns, for In addition, we define the ‘information richness’ ak the i-th word in the table xi whose attribute is measurement of the generated descriptions to the ak (the k-th attribute), we use the attribute name source tables. Based on that, we use the rein- ak and the word’s position in that tuple to lo- forcement learning to encourage the generator to cate the word (Lebret et al., 2016). Specifically cover infrequent and rarely mentioned attributes ak ak ak we utilize a triple zi = fak; pi+; pi−g to rep- as well as generate more informative descriptions ak resent the structure information for word xi , in with less groundless content. ak ak ak which pi+ and pi− are the positions of xi counted We test our models on two settings: from the beginning and end of ak, respectively. 1) For neural table-to-text generation like Fig For example, for the ‘Birthplace’ attribute in Fig 1a, we test our model on WIKIBIO (Lebret et al., 1 (left), we can use triples fbirthplace,1,4g and 2016), a crawled dataset from Wikipedia with fbirthplace,4,1g to represent the structure infor- paired infoboxes and associated descriptions. It is mation for the words ‘Durban’ 1 and ‘Africa’. We a widely used benchmark dataset for description concatenate the word xt and its structure represen- generation for factual attribute-value tables and tation zt at the t-th time step and feed them into also a quite meaningful testbed in the real-world LSTM (Hochreiter and Schmidhuber, 1997) unit scenarios with vast and heterogenous attributes. to encode the table. ht = LSTM([xt; zt]; ht−1) 2) To test our model in the user-oriented set- is the t-th hidden state among the encoder states T ting, we filter WIKIBIO to form WB-filter. In H = fhtgt=1. In the following sections, we might ak this setting, we suppose all attributes in the source omit the superscript of xi if it is not necessary. tables of WB-filter are selected by users that should be covered in the corresponding descrip- 2.2 Description Decoder tions. We try to make sure the gold descriptions in For the generated description y∗, the generated to- WB-filter cover all the attributes of the source ∗ ken yt at the t-th time step is predicted based on tables in this condition. Details in Sec4. ∗ ∗ all the previously generated tokens y<t before yt Both automatic and human evaluation show that and the hidden states H of the table encoder: our model relieves the 3 problems (Table1) and ∗ ∗ helps the generator to produce accurate, informa- P (yt jH; y<t) = softmax(Ws tanh(Wt[st; ct])) tive and loyal descriptions. We also achieve the (1) state-of-the-art performance on the end-to-end ta- where is element-wise product, st = ∗ ble description and the user-oriented generation LSTM(yt−1; st−1) is the t-th hidden state of the PT i tasks. decoder. ct = i=1 αthi is the context vec- The remainder of this paper is organized as fol- tor, which is the weighted sum of encoder hid- lows. We first introduce how we formulate table- den states according to the attention matrix α. i g(st;hi) to-text generation into encoder-decoder frame- αt / e is the attention element of the t- work in Sec2. After that, we discuss force- th decoder state st and the i-th encoder state hi. attention method (Sec 3.1) and richness-oriented 1More concretely, ‘Durban’ is the first word counted from reinforcement learning (Sec 3.2), which are moti- the begining and also the fourth word counted from the end vated by the three goals we set up for comprehen- of birthplace attribute in Fig1 (left). 5986 Average Word-level Compensation Values Coverage ! % = '()(! + where g(st; hi) is a relevance score between st and "# "# "# "# ',- %"# − %"# Dillon hi. We use Bahdanau-style attention mechanism Dillon Sheppard Sheppard (Bahdanau et al., 2014) to calculate g(st; hi). Low Compensation 27 27 $ February February "# Average Attribute-level g(s ; h ) = tanh(W h + W s + b) (2) t i p i q t 1979 Coverage +"# 1979 Durban Name Durban South Birthdate South Ws;Wt;Wp;Wq are learnable parameters. High CompensationHigh Africa Birthplace Africa Bidvest Currentclub Bidvest 3 Comprehensive Table Description Wits Position Wits leftwinger leftwinger The problems listed in Table1 not only prevent th Decoder at 14 timestep: Dillon Sheppardborn27 february 1979, the generators to produce comprehensive descrip- Durban SouthAfricais a tions for selected entries in the tables (Fig1b), but also prevent the generator to produce informative, Figure 2: Stepwise forcing attention at the 14-th step accurate and loyal table descriptions (Fig1a).

Load more