Twitter Event Summarization by Exploiting Semantic Terms and Graph Network

PRELIMINARY PREPRINT VERSION: DO NOT CITE The AAAI Digital Library will contain the published version some time after the conference. Twitter Event Summarization by Exploiting Semantic Terms and Graph Network Quanzhi Li, Qiong Zhang Alibaba Group, US Bellevue, WA 98004, USA {[email protected], [email protected]} Abstract choose only one to three tweets as event summary. Other Twitter is a fast communication channel for gathering and applications may want to select more tweets as summary. spreading breaking news, and it generates a large volume of To find the best tweets to represent an event, we first need tweets for most events. Automatically creating a summary for to understand the event characteristics. An event is usually an event is necessary and important. In this study, we explored defined by the 4Ws questions: who, what, where and when two extractive approaches for summarizing events on Twitter. (Mohd 2007). An event tweet usually contains terms corre- The first one exploits the semantic types of event related sponding to these aspects, and these terms can be classified terms, and ranks the tweets based on the score computed from into different semantic classes/types, such as entity names these semantic terms. The second one utilizes a graph convo- (who), location (where), hashtag (topic & what), temporal lutional network built from a tweet relation graph to generate expression (when), verb & noun (what), and mention (who). tweet hidden features for tweet salience estimation. And the Different semantic types have different degrees of im- most salient tweets are selected as the summary of the event. portance for describing an event, and this will have implica- Our experiments show that these two approaches outperform tions on both event detection and event summarization algo- the compared methods. rithms. Previous studies on event summarization from social media have not explicitly exploited this type of information. They do not classify the terms into different semantic clas- Introduction! ses; they just treat them equally important in their algo- rithms, e.g. in Hybrid TF-IDF (Inouye and Kalita 2016) and Social media services, such as Twitter, generate a large vol- sumBasic (Vanderwende et al. 2007). In the first approach, ume of content (tweets) for most events. This study aims at we take the semantic type of a term into consideration when exploring new summarization methods for real-time events determining its weight. We hypothesize that classifying detected from Twitter. Text summarization methods can be tweet terms into their corresponding semantic classes, as- classified into two categories. Extractive methods select a signing different weights to them, and then integrating them subset of words, phrases, or sentences from the original doc- together will improve the event summarization perfor- ument to generate a summary. In contrast, abstractive ap- mance. Therefore, in the first approach, we split the term proaches generate a summary by using words and phrases space into groups of terms, or semantic classes. that may not appear in the source text. Our approaches be- In the second approach, we consider the tweet event long to the extractive method category. We choose one or summarization task as a special multi-document summari- more tweets from a event cluster to represent the event. Usu- zation problem, and utilize graph-based neural networks. ally, only one or two tweets are selected. One reason of se- The tweets selected as the summary should be both repre- lecting only a couple of tweets is that many applications pre- sentative and informative. To be representative in an event fer a very short summary, due to their limited UI space or cluster, a tweet needs to convey similar information with other reasons. The proposed approaches have been adapted other tweets in this cluster. We use a tweet relation graph to in three production systems, which need to monitor real- measure how representative a tweet is. In this graph, each time events on different topics. They prefer selecting only node is a tweet, and the edge weight measures how similar one to three tweets per event as summary, in order to have two nodes are. To reflect the informative aspect, we calcu- more events displayed on one screen. In this study, our ex- late a score for each node in the relation graph, using the periment also shows that human annotators usually also semantic terms the tweet contains. A tweet having more Copyright © 2021, Association for the Advancement of Artificial Intelli- gence (www.aaai.org). All rights reserved. semantic terms usually will be more informative. To better Ling 2016; Cheng and Lapata 2016; Nallapati et al. 2017; utilize the tweet relation information and the representa- See et al. 2017; Cao et al. 2017). Chang et al. (2013; 2016) tional power of neural network, we apply graph convolu- regard Twitter summarization as a supervised classification tional network (GCN) (Kipf and Welling 2017; Yasunaga et task through mining rich social features, such as user influ- al. 2017) on the tweet relation graph. A salience score is es- ence. Wang and Ling (2016) employ encoder-decoder timated for each tweet based on the hidden features gener- RNNs to generate short abstractive summaries for opinions. ated by GCN, and the most salient ones are chosen as the Cheng and Lapata (2016) train an extractive summarization summary. We evaluated the two approaches using 1,000 approach using attention-based encoder-decoder RNNs to events detected from Twitter. sequentially label summary-worth sentences. See et al. (2017) augment the standard attention-based encoder-decoder RNNs using pointer generator network. SUM- Related Studies MARUNNER (Nallapati et al. 2017) is an abstractive summarization model using an RNN-based encoder. Narayan et There are many studies on text summarization, but only a al. (2018) proposed a reinforcement learning-based system few of them focus on social media. One of the most popular trained by globally optimizing the ROUGE score. The extractive summarization methods is the centroid-based ap- model from Zhou et al. (2018) extracts sentences from a proach (Becker et al. 2011; Radev et al. 2004). Becker et al. document by jointly learning to score and select sentences. (2011) studied three centroid-based methods, and their ex- LATENT (Zhang et al., 2018) uses a latent model to directly periment showed that the Centroid method outperformed the maximize the likelihood of human summaries given se- other two, Degree and LexRank. LexRank (Erkan and lected sentences. Liu et al. (2019) use structured attention to Radev 2004) is a graph-based approach, which creates an induce a multi-root dependency tree representation of the adjacency matrix for text units and computes the stationary document while predicting the output summary. Most of distribution considering it as a Markov chain. TextRank these models focus on either abstractive method or single (Mihalcea and Tarau 2004) is another graph-based one, us- document. Applying these sequence-to-sequence ap- ing PageRank to find the highly ranked sentences. proaches to the multi-document summarization task has not sumBasic is a simple but effective method, which mainly been successful. He and Duan (2018) utilize the Twitter net- depends on word frequency (Vanderwende et al. 2007). It work structure to explore whether social relations can help chooses the tweet that has the highest sum of word proba- Twitter summarization. Wang and Zhang (2017) build a bility, which is computed from word frequency. To handle joint model to filter, cluster, and summarize the tweets for the special situation of tweet event, Inouye and Kalita (2016) new events. Their summarization model uses a multi-layer redefine TF-IDF in terms of a hybrid document. In their Hy- perceptron network to estimate a probability score for each brid TF-IDF approach, the TF component of the TF-IDF for- tweet and then rank them. The two approaches we presented mula uses the entire collection of tweets, while the IDF com- in this paper are different from previous studies. ponent treats each tweet as a separate document. Experi- ments from (Inouye and Kalita 2016; Alsaedi et al. 2016) show that Hybrid TF-IDF and sumBasic outperform other Semantic Class Based Approach approaches. Alsaedi et al. (2016) augmented the TF-IDF and Centroid methods by considering the time window of an In the semantic class-based approach, we have two methods event. This method is mainly used for retrospective events. to rank the tweets. We first describe each semantic class, and Mishra and Berberich (2017) propose a novel divergence- then present the two methods, Semantic-A and Semantic-B, based framework that selects excerpts from an initial set of for selecting the best tweets as the summary of an event. pseudo-relevant documents. Their algorithm requires an ex- Given a tweet, the first method (i.e., Semantic-A) integrates plicit user input, and it mainly works for long text. This is different semantic terms together to compute a tweet score, different from our case, which is to summarize tweets with- and the second method (i.e., Semantic-B) uses learning to out using any user query. rank algorithm to decide if a tweet is the top candidate. We The task B of TREC 2018 real-time summarization track call a term belonging to one of the semantic classes de- (Sequiera et al, 2018) asks the participating systems to out- scribed below as “semantic term”. Previous study (Li et al, put a list of tweets ordered by relevance to users’ interests 2017) has used semantic terms to detect event clusters from every day during the evaluation period. This task is different Tweet data streaming. from our case, which asks for the general summary of an event, not specific to a user’s interest.

Load more