Finding Spoiler Bias in Tweets by Zero-Shot Learning and Knowledge Distilling from Neural Text Simplification
Total Page:16
File Type:pdf, Size:1020Kb
Finding Spoiler Bias in Tweets by Zero-shot Learning and Knowledge Distilling from Neural Text Simplification Avi Bleiweiss BShalem Research Sunnyvale, CA, USA [email protected] Abstract evoke far less interest to users who consult online reviews first, and later engage with the media itself. Automatic detection of critical plot informa- Social media platforms have placed elaborate tion in reviews of media items poses unique challenges to both social computing and com- policies to better guard viewers from spoilers. On putational linguistics. In this paper we propose the producer side, some sites adopted a strict con- to cast the problem of discovering spoiler bias vention of issuing a spoiler alert to be announced in in online discourse as a text simplification task. the subject of the post (Boyd-Graber et al., 2013). We conjecture that for an item-user pair, the Recently, Twitter started to offer provisions for the simpler the user review we learn from an item tweet consumer to manually mute references to summary the higher its likelihood to present a specific keywords and hashtags and stop display- spoiler. Our neural model incorporates the ad- vanced transformer network to rank the sever- ing tweets containing them (Golbeck, 2012). But ity of a spoiler in user tweets. We constructed a despite these intricate mechanisms that aim to pre- sustainable high-quality movie dataset scraped emptively ward off unwanted spoilers, the solutions from unsolicited review tweets and paired with proposed lack timely attraction of consumers and a title summary and meta-data extracted from may not scale, as spoilers remain a first-class prob- a movie specific domain. To a large extent, our lem in online discourse. Rather than solicit spoiler quantitative and qualitative results weigh in on annotations from users, this work motivates auto- the performance impact of named entity pres- matic detection of spoiler bias in review tweets, of ence in plot summaries. Pretrained on a split- and-rephrase corpus with knowledge distilled which spoiler annotations are unavailable or scarce. from English Wikipedia and fine-tuned on our Surprisingly, for the past decade spoiler detec- movie dataset, our neural model shows to out- tion only drew little notice and remained a rela- perform both a language modeler and monolin- tively understudied subject. Earlier work used ma- gual translation baselines. chine learning techniques that incorporated human- curated features in a supervised settings, including 1 Introduction a Latent Dirichlet Allocation (LDA) based model People who expose themselves to the process of that combines simple bag-of-words (BOA) with satisfying curiosity expect to enhance the pleasure linguistic cues to satisfy spoiler detection in review derived from obtaining new knowledge (Loewen- commentary (Guo and Ramakrishnan, 2010), base- stein, 1994; Litman, 2005). Conversely, induced line n-gram features augmented with binary meta- revelatory information about a plot of a motion pic- data that was extracted from their review dataset ture, TV program, video game, or book can spoil showed dramatically improved performance of the viewer sense of surprise and suspense, and thus spoiler detection in text (Boyd-Graber et al., 2013), greatly diminish the enjoyment in consuming the and while frequent verbs and named entities play a media. As social media has thrived into a medium critical role in identifying spoilers, adding objectiv- for self-expression, live tweets, opinion dumps, or ity and main sentence tense improved classification even hashtags tend to proliferate within minutes of accuracy by about twelve percentage points (Jeon the media reaching the public, and hence the risk et al., 2013; Iwai et al., 2014). of uncovering a spoiler widens rapidly. Spoilers on More recently researchers started to apply deep review websites may inevitably contain undesired learning methods to detect spoiler sentences in re- information and disclose critical plot twists that view corpora. The study by Chang et al.(2018) pro- 51 Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion, pages 51–60 April 19, 2021. ©2020 Association for Computational Linguistics poses a model architecture that consists of a convo- • Through exhaustive experiments that we con- lutional neural network (CNN) based genre encoder ducted, we provide both qualitative analysis and a sentence encoder that uses a bidirectional and quantitative evaluation of our system. The gated recurrent unit (bi-GRU) (Cho et al., 2014). A results show that our method accomplished genre-aware attention layer aids in learning spoiler performance that outperforms strong base- relations that tend to vary by the class of the item lines. reviewed. Their neural model was shown to outper- form spoiler detection of machine-learning base- X i literally tried everything the force is strong with daisy lines that use engineered features. Using a book re- ridley theriseofskywalker (p=0.08) view dataset with sentence-level spoiler tags, Wan × nfcs full podcast breakdown of star wars episode ix the et al.(2019) followed a similar architecture ex- rise of skywalkerstarwars (p=0.18) plored by Chang et al.(2018) and introduced Spoil- × new theriseofskywalker behindthescenes images show erNet that comprises a word and sentence encoders, the kijimi set see more locations here (p=0.27) each realizing a bi-GRU network. We found their X weekend box office report for janfebbadboysfor- error analysis interesting in motivating the rendi- lifemoviedolittlegret (p=0.12) tion of spoiler detection as a ranking task rather × in my latest for the officialsite i examine some of the than a conventional binary classification. Although themes from theriseofskywalker and how they reinforce only marginally related, noteworthy is an end-to- some (p=0.23) end similarity neural-network with a variance at- tention mechanism (Yang et al., 2019), proposed to Table 1: Zero-shot classification of unlabeled tweets address real-time spoiled content in time-sync com- about the movie The Rise of the Skywalker (annotating spoiler free and spoiler biased × posts). Along with ments that are issued in live video viewing. In this X text simplification ranking of spoiler bias p 2 [0; 1]. scenario, suppressing the impact of often occurring noisy-comments remains an outstanding challenge. In our approach we propose to cast the task of 2 Related Work spoiler detection in online discourse as ranking the quality of sentence simplification from an item de- Text simplification is an emerging NLP discipline scription to a multitude of user reviews. We used with the goal to automatically reduce diction com- the transformer neural architecture that dispenses plexity of a sentence while preserving its original entirely of recurrence to learn the mapping from semantics. Inspired by the success of neural ma- a compound item summarization to the simpler chine translation (NMT) models (Sutskever et al., tweets. The contributions of this work are summa- 2014; Cho et al., 2014), sentence simplification rized as follows: has been the subject of several neural architectural • A high-quality and sustainable movie review efforts in recent years. Zhang and Lapata(2017) dataset we constructed to study automatic addressed the simplification task with a long short- spoiler detection in social media microblogs. term memory (LSTM) (Hochreiter and Schmid- The data was scraped from unsolicited Twit- huber, 1997) based encoder-decoder network, and ter posts and paired with a title caption and employed a reinforcement learning framework to meta-data we extracted from the rich Internet inject prior knowledge and reward simpler outputs. Movie Database (IMDb). 1 We envision the In their work, Vu et al.(2018) propose to combine dataset to facilitate future related research. Neural Semantic Encoders (NSE) (Munkhdalai and • We propose a highly effective transformer Yu, 2017), a novel class of memory augmented neu- model for ranking spoiler bias in tweets. Our ral networks which offer a variable sized encoding novelty lies in applying a preceding text sim- memory, with a trailing LSTM-based decoder archi- plification stage that consults an external para- tecture. Their automatic evaluation suggests that by phrase knowledge-base, to aid our down- allowing access to the entire input sequence, NSE stream NLP task. Motivated by our zero-shot present an effective solution to simplify highly com- learning results (Table1), we conjectured that plex sentences. More recently, Zhao et al.(2018) the more simplified tweet, predicted from the incorporated a hand-crafted knowledge base of sim- movie caption, is likely to make up a spoiler. plification rules into the self-attention transformer network (Vaswani et al., 2017). This model accu- 1www.imdb.com rately selects simplified words and shows empiri- 52 10 1000 150 8 750 100 400 6 500 Span Reviews 4 250 50 Reviews 200 2 0 0 0 ABeautifulDay AmericanFactory BackToFuture BatMan BlindedByLight Cabaret ET FieldDreams ForrestGump Frozen GiveMeLiberty Godfather Hobbit IndianaJones LittleWomen PridePrejudice RiseOfSkywalker RoboCop Shrek Superman ToyStory WestSideStory WizardOfOz 1 3 5 7 9 11 13 15 17 19 21 23 25 27 ABeautifulDay AmericanFactory BackToFuture BatMan BlindedByLight Cabaret ET FieldDreams ForrestGump Frozen GiveMeLiberty Godfather Hobbit IndianaJones LittleWomen PridePrejudice RiseOfSkywalker RoboCop Shrek Superman ToyStory WestSideStory WizardOfOz Action Adventure Animation Biography Comedy Crime Documentary