Subjective Event Summarization of Sports Events Using Twitter
Total Page:16
File Type:pdf, Size:1020Kb
Two Sides to Every Story: Subjective Event Summarization of Sports Events using Twitter David Corney Carlos Martin Ayse G¨oker IDEAS Research Institute IDEAS Research Institute IDEAS Research Institute School of Computing School of Computing School of Computing & Digital Media & Digital Media & Digital Media Robert Gordon University Robert Gordon University Robert Gordon University Aberdeen AB10 7QB Aberdeen AB10 7QB Aberdeen AB10 7QB [email protected] [email protected] [email protected] the most relevant topic at each moment. In this way, we produce a subjective summary of Abstract the match in near-real-time from the point of view of each set of fans. Ask two people to describe an event they have both experienced, and you will usually hear two very different accounts. Witnesses bring 1 Introduction their own preconceptions and biases which Document summarization consists of substantially re- makes objective story-telling all but impossi- ducing the length of a text (such as a document or a ble. Despite this, recent work on algorithmic collection of documents) while retaining the main ideas topic detection, event summarization and con- [13]. Automatic summarization systems are therefore tent generation often has a stated aim of ob- designed to extract the most important aspects of doc- jectively answering the question, \What just uments in order to produce a more compact represen- happened?" Here, in contrast, we ask \How tation. Multi-document summarization presents par- did people respond to what just happened?" ticular challenges due to redundancy of information We describe some initial studies of sports fans' across documents. This is especially true when sum- discussions of football matches through online marizing from social media, as many messages are re- social networks. peated multiple times (e.g. as retweets), leading to During major sporting events, spectators send great redundancy. Moreover, additional features may many messages through social networks like modify the importance of each message, such as counts Twitter. These messages can be analysed to of `likes' and `favourites', making this task more chal- detect events, such as goals, and to provide lenging. summaries of sports events. Our aim is to pro- Objectivity and fairness are usually seen as virtues, duce a subjective summary of events as seen and the aim of most summarization systems is to gen- by the fans. We describe simple rules to esti- erate objective summaries without introducing bias to- mate which team each tweeter supports and so wards any particular viewpoint. Journalists describing divide the tweets between the two teams. We events, be they sports, politics or anything else, also then use a topic detection algorithm to dis- claim to be neutral, fair and objective [4]. But while cover the main topics discussed by each set of journalists may strive for objectivity, there is a con- fans. Finally we compare these to live main- tinuing debate about whether that is possible or even stream media reports of the event and select entirely desirable [3]. As journalists become experts they inevitably form their own opinions which will in- Copyright c by the paper's authors. Copying permitted only evitably shape their story-telling. for private and academic purposes. Rather than entering the debate about objectivity In: S. Papadopoulos, P. Cesar, D. A. Shamma, A. Kelliher, R. Jain (eds.): Proceedings of the SoMuS ICMR 2014 Workshop, in journalism, our exploratory work here focusses on Glasgow, Scotland, 01-04-2014, published at http://ceur-ws.org people who make no claims to be objective, namely sports fans. And rather than trying to impose objec- events often passes several million, recent work has tivity on them, or to obtain the appearance of objec- included attempts to summarize tweet collections au- tivity by aggregating or processing their messages, we tomatically [15]. instead aim to summarize their subjective opinions. Kobu et al. [7] describe a recent attempt at summa- In this way, we can tell the same story from two (or rizing football matches, by detecting bursts in activ- more) perspectives simultaneously, giving us a richer ity on Twitter and then identifying \good reporters". and more rounded depiction of events. These are people who provide detailed, authoritative accounts of events. They measure this by finding mes- 2 Related work sages that share words and phrases with other simul- taneous messages (to show they are on-topic) that are Although automatic summarization usually aims to also longer messages (suggesting they contain useful produce objective summaries, some work has also been information). They identify users who send several carried out to identify the range of opinions or senti- such messages early within each burst and use their ments expressed, for example to summarize responses messages as the basis for their match summarization expressed to a consumer product or government pol- system. icy [6]. This works by finding significant sentences Nichols et al. [11] describe an approach to produce a in a document and then estimating the sentiment ex- \journalistic summary of an event" using tweets. They pressed. The document is then summarized by sepa- search for spikes in activity to identify important mo- rately presenting positive and negative sentences that ments; they remove spam and off-topic tweets using have been extracted. In contrast, our work here is various heuristics, such as removing replies, and also driven by a stream of messages, making time a critical ignore hashtags and stop words; they find the longest factor. Rather than identify important messages and repeated phrases across multiple tweets, with a posi- then estimate their sentiment, we first identify group tive weight for words that appear in many tweets. Fi- of users likely to express similar sentiment and then nally, they pick out whole sentences to ensure readabil- identify their important messages. ity and reduce noise and display the top N sentences Evaluating summarization is non-trivial as there that do not share any significant words (i.e. ignoring are many ways to summarize text that still convey stopwords). They evaluate their system by measur- the main points. ROUGE is an automated evaluation ing recall and precision against mainstream media ac- tool [8], which assumes that a good generated \candi- counts of three international football matches. They date" summary will contain many of the same words found all goals, red cards, disallowed goals and the as a given \reference" summary (typically human- end of each game, but missed some other events such authored). This is useful in many document under- as yellow-cards, kick-offs and half-times. They used standing tasks but is inappropriate for our work here. ROGUE [8] to compare their summaries with pub- Our candidate summaries will use different words from lished accounts, and also performed a human evalu- any objective reference summary, such as a main- ation for readability and meaning. stream media account, precisely because they are sub- One similar study that also used Twitter to detect jective and reflect a particular point of view. In this events during football matches is by Van Oorschot et initial study, we are only analysing a limited set of al. [18]. They consider five fixed classes of event (goals, summarizations, so we use a human intrinsic summa- own-goals, red cards, yellow cards and substitutions) rization evaluation approach. Specifically, two of the and evaluate their system by comparing the predicted authors compared each generated summary with the classifications to the official match data. They also corresponding mainstream media comment and judged classify individual tweeters as fans of one team or an- whether the same information was being conveyed, other by counting the number of mentions of each team even if the details of the vocabulary and sentiment over several matches, similar to our approach. Their were different. aim is to recreate a \gold standard" of official data Twitter has been used to detect and predict events summarizing each match. as diverse as earthquakes [14] and elections [17] with In our work here, we categorise users into groups varying degrees of success, and is increasingly being based on which team they appear to support (Section used by journalists (including sports journalists) to 3). This is related to community detection, an area track breaking news [10, 16]. Here, we consider online that has led to much useful work in the analysis of social media messages discussing football matches. As- online social networks [12]. For our purposes, a sim- sociation Football (\soccer") is the world's most pop- ple analysis of the frequency of different hashtags used ular sport and during matches between major teams, a in tweets is sufficient to confidently identify team sup- large number of tweets are typically published. Given port; however if subjective event summarization were that the volume of tweets generated around major applied to other areas, it could be coupled with more sophisticated community detection methods. For the 2013 final2, 100 separate comments were made. In both cases, this amounts to 4000-5000 words in to- 3 Methods tal. From these, we manually selected the most signif- icant events, including goals, bookings (for player dis- We first attempt to identify the team that each Twit- ciplinary offences), and near-misses. We ignored other ter user supports (if any). For each user, we count the comments such as quotes from former players, general total number of times they mention each team across comments about the state of the match and so on. For all their tweets. Manual inspection suggests that fans 2012, we chose 25 events and 29 for 2013. Each event tend to use their team's standard abbreviation (e.g. is defined by its time of occurrence; we used all tweets CFC or MCFC) greatly more often than any other starting from that moment and ending four minutes teams', irrespective of sentiment.