A Deep Architecture for Automatic News Comment Generation
Total Page:16
File Type:pdf, Size:1020Kb
Read, Attend and Comment: A Deep Architecture for Automatic News Comment Generation Ze Yangy, Can Xu}, Wei Wu}, Zhoujun Liy∗ yState Key Lab of Software Development Environment, Beihang University, Beijing, China }Microsoft Corporation, Beijing, China ftobey, [email protected] fwuwei, [email protected] Abstract Title: FIFA rankings: France number one, Croatia and England soar, Germany and Argentina plummet Automatic news comment generation is a new Body (truncated): World Cup glory has propelled France testbed for techniques of natural language gen- to the top of FIFA’s latest world rankings, with the impact eration. In this paper, we propose a “read- of Russia 2018 felt for better or worse among a number of attend-comment” procedure for news com- football’s heavyweight nations. ment generation and formalize the procedure These are the first set of rankings released under FIFA’s new formula that ”relies on adding/subtracting points won with a reading network and a generation net- or lost for a game to/from the previous point totals rather work. The reading network comprehends a than averaging game points over a given time period”. news article and distills some important points FIFA world rankings: 1.France 2.Belgium 3. Brazil 4. from it, then the generation network creates a Croatia 5. Uruguay 6. England 7. Portugal 8. Switzer- comment by attending to the extracted discrete land 9. Spain 10. Denmark points and the news title. We optimize the Comment A: If it’s heavily based on the 2018 WC, hence model in an end-to-end manner by maximiz- England leaping up the rankings, how are Brazil at 3? ing a variational lower bound of the true objec- Comment B: England above Spain, Portugal and Ger- many. Interesting. tive using the back-propagation algorithm. Ex- perimental results on two datasets indicate that Table 1: A news example from Yahoo! our model can significantly outperform exist- ing methods in terms of both automatic evalu- ation and human judgment. et al., 2018). While there are risks with this kind 1 Introduction of AI research, we believe that developing and demonstrating such techniques is important for In this work, we study the problem of automatic understanding valuable and potentially troubling news comment generation, which is a less ex- applications of the technology. plored task in the literature of natural language Existing work on news comment generation in- generation (Gatt and Krahmer, 2018). We are cludes preliminary studies, where a comment is aware that numerous uses of these techniques can generated either from the title of a news article pose ethical issues and that best practices will be only (Zheng et al., 2018; Qin et al., 2018) or by necessary for guiding applications. In particular, feeding the entire article (title plus body) to a ba- we note that people expect comments on news to sic sequence-to-sequence (s2s) model with an at- be made by people. Thus, there is a risk that peo- tention mechanism (Qin et al., 2018). News titles ple and organizations could use these techniques at are short and succinct, and thus only using news scale to feign comments coming from people for titles may lose quite a lot of useful information in purposes of political manipulation or persuasion. comment generation. On the other hand, a news In our intended target use, we explicitly disclose article and a comment is not a pair of parallel text. the generated comments on news as being formu- The news article is much longer than the com- lated automatically by an entertaining and engag- ment and contains much information that is irrel- ing chatbot (Shum et al., 2018). Also, we under- evant to the comment. Thus, directly applying the stand that the behaviors of deployed systems may s2s model, which has proven effective in machine need to be monitored and guided with methods, translation (Bahdanau et al., 2015), to the task of including post-processing techniques (van Aken news comment generation is unsuitable, and may ∗ Corresponding Author bring a lot of noise to generation. Both approaches 5077 Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pages 5077–5089, Hong Kong, China, November 3–7, 2019. c 2019 Association for Computational Linguistics oversimplify the problem of news comment gen- phrases, and sentences, etc.) from the article. With eration and are far from how people behave on the reading network, our model comprehends the news websites. In practice, people read a news news article and boils it down to some key points article, draw attention to some points in the arti- (i.e., the salient spans). The generation network cle, and then present their comments along with is an RNN language model that generates a com- the points they are interested in. Table1 illustrates ment word by word through an attention mecha- news commenting with an example from Yahoo! nism (Bahdanau et al., 2015) on the selected spans News.1 The article is about the new FIFA rank- and the news title. In training, since salient spans ing, and we pick two comments among the many are not explicitly available, we treat them as a la- to explain how people behave in the comment sec- tent variable, and jointly learn the two networks tion. First, both commenters have gone through from article-comment pairs by optimizing a lower the entire article, as their comments are built upon bound of the true objective through a Monte Carlo the details in the body. Second, the article gives sampling method. Thus, training errors in com- many details about the new ranking, but both com- ment prediction can be back-propagated to span menters only comment on a few points. Third, selection and used to supervise news reading com- the two commenters pay their attention to differ- prehension. ent places of the article: the first one notices that We conduct experiments on two large scale the ranking is based on the result of the new world datasets. One is a Chinese dataset published re- cup, and feels curious about the position of Brazil; cently in (Qin et al., 2018), and the other one is while the second one just feels excited about the an English dataset built by crawling news articles new position of England. The example indicates and comments from Yahoo! News. Evaluation re- a “read-attend-comment” behavior of humans and sults on the two datasets indicate that our model sheds light on how to construct a model. can significantly outperform existing methods on We propose a reading network and a generation both automatic metrics and human judgment. network that generate a comment from the entire Our contributions are three-folds: (1) proposal news article. The reading network simulates how of “read-attend-comment” procedure for news people digest a news article, and acts as an encoder comment generation with a reading network and of the article. The generation network then simu- a generation network; (2) joint optimization of lates how people comment the article after reading the two networks with an end-to-end learning ap- it, and acts as a decoder of the comment. Specifi- proach; and (3) empirical verification of the effec- cally, from the bottom to the top, the reading net- tiveness of the proposed model on two datasets. work consists of a representation layer, a fusion layer, and a prediction layer. The first layer repre- 2 Related Work sents the title of the news with a recurrent neural network with gated recurrent units (RNN-GRUs) News comment generation is a sub-task of nat- (Cho et al., 2014) and represents the body of the ural language generation (NLG). Among various news through self-attention which can model long- NLG tasks, the task studied in this paper is most term dependency among words. The second layer related to summarization (Rush et al., 2015; Nal- forms a representation of the entire news article by lapati et al., 2016; See et al., 2017) and product fusing the information of the title into the repre- review generation (Tang et al., 2016; Dong et al., sentation of the body with an attention mechanism 2017). However, there is stark difference between and a gate mechanism. The attention mechanism news comment generation and the other two tasks: selects useful information in the title, and the gate the input of our task is an unstructured document, mechanism further controls how much such infor- while the input of product review generation is mation flows into the representation of the article. structured attributes of a product; and the output Finally, the third layer is built on top of the pre- of our task is a comment which often extends the vious two layers and employs a multi-label clas- content of the input with additional information, sifier and a pointer network (Vinyals et al., 2015) while the output of summarization is a condensed to predict a bunch of salient spans (e.g., words, version of the input that contains the main infor- mation from the original. Very recently, there 1https://www.yahoo.com/news/ fifa-rankings-france-number-one-112047790. emerge some studies on news comment genera- html tion. For example, Zheng et al.(2018) propose 5078 a gated attention neural network model to gener- P (SjT;B) · P (CjS; T ), where S = (s1; : : : ; sw) ate news comments from news titles. The model refers to a set of spans in B, P (SjT;B) represents is further improved by a generative adversarial net. the reading network, and P (CjS; T ) refers to the Qin et al.(2018) publish a dataset with results of generation network.