Learning Personas from Dialogue with Attentive Memory Networks
Total Page:16
File Type:pdf, Size:1020Kb
Learning Personas from Dialogue with Attentive Memory Networks Eric Chu ∗ Prashanth Vijayaraghavan ∗ Deb Roy MIT Media Lab MIT Media Lab MIT Media Lab [email protected] [email protected] [email protected] Abstract sense personality in order to generate more inter- esting and varied conversations. The ability to infer persona from dialogue can have applications in areas ranging from We define persona as a person’s social role, computational narrative analysis to personal- which can be categorized according to their con- ized dialogue generation. We introduce neu- versations, beliefs, and actions. To learn personas, ral models to learn persona embeddings in a we start with the character tropes data provided in supervised character trope classification task. the CMU Movie Summary Corpus by (Bamman The models encode dialogue snippets from et al., 2014). It consists of 72 manually identified IMDB into representations that can capture the commonly occurring character archetypes and ex- various categories of film characters. The best- performing models use a multi-level attention amples of each. In the character trope classifica- mechanism over a set of utterances. We also tion task, we predict the character trope based on utilize prior knowledge in the form of tex- a batch of dialogue snippets. tual descriptions of the different tropes. We In their original work, the authors use apply the learned embeddings to find similar Wikipedia plot summaries to learn latent variable characters across different movies, and clus- models that provide a clustering from words to ter movies according to the distribution of the embeddings. The use of short conversational topics and topics to personas – their persona clus- text as input, and the ability to learn from terings were then evaluated by measuring similar- prior knowledge using memory, suggests these ity to the ground-truth character trope clusters. We methods could be applied to other domains. asked the question – could personas also be in- ferred through dialogue? Because we use quotes 1 Introduction as a primary input and not plot summaries, we be- Individual personality plays a deep and pervasive lieve our model is extensible to areas such as dia- role in shaping social life. Research indicates that logue generation and conversational analysis. it can relate to the professional and personal rela- Our contributions are: tionships we develop (Barrick and Mount, 1993), 1. Data collection of IMDB quotes and charac- (Shaver and Brennan, 1992), the technological in- ter trope descriptions for characters from the terfaces we prefer (Nass and Lee, 2000), the be- CMU Movie Summary Corpus. arXiv:1810.08717v1 [cs.CL] 19 Oct 2018 havior we exhibit on social media networks (Self- 2. Models that greatly outperform the baseline hout et al., 2010), and the political stances we take model in the character trope classification (Jost et al., 2009). task. Our experiments show the importance With increasing advances in human-machine di- of multi-level attention over words in dia- alogue systems, and widespread use of social me- logue, and over a set of dialogue snippets. dia in which people express themselves via short 3. We also examine how prior knowledge in text messages, there is growing interest in systems the form of textual descriptions of the per- that have an ability to understand different person- sona categories may be used. We find that a ality types. Automated personality analysis based ‘Knowledge-Store’ memory initialized with on short text analysis could open up a range of po- descriptions of the tropes is particularly use- tential applications, such as dialogue agents that ful. This ability may allow these models to be The∗ first two authors contributed equally to this work. used more flexibly in new domains and with Character Trope Character Movie Corrupt corporate executive Les Grossman Tropic Thunder Retired outlaw Butch Cassidy Butch Cassidy and the Sundance Kid Lovable rogue Wolverine X-Men Table 1: Example tropes and characters different persona categories. we utilized three different datasets for our mod- els: (a) the Movie Character Trope dataset, (b) 2 Related Work the IMDB Dialogue Dataset, and (c) the Charac- ter Trope Description Dataset. We collected the Prior to data-driven approaches, personalities were IMDB Dialogue and Trope Description datasets, largely measured by asking people questions and and these datasets are made publicly available 1. assigning traits according to some fixed set of di- mensions, such as the Big Five traits of openness, 3.1 Character Tropes Dataset conscientiousness, extraversion, agreeability, and The CMU Movie Summary dataset provides neuroticism (Tupes and Christal, 1992). Compu- tropes commonly occurring in stories and media tational approaches have since advanced to infer (Bamman et al., 2014). There are a total of 72 these personalities based on observable behaviors tropes, which span 433 characters and 384 movies. such as the actions people take and the language Each trope contains between 1 and 25 characters, they use (Golbeck et al., 2011). with a median of 6 characters per trope. Tropes Our work builds on recent advances in neural and canonical examples are shown in Table1. networks that have been used for natural language processing tasks such as reading comprehension 3.2 IMDB Dialogue Snippet Dataset (Sukhbaatar et al., 2015) and dialogue modeling To obtain the utterances spoken by the charac- and generation (Vinyals and Le, 2015; Li et al., ters, we crawled the IMDB Quotes page for each 2016; Shang et al., 2015). This includes the grow- movie. Though not every single utterance spoken ing literature in attention mechanisms and mem- by the character may be available, as the quotes ory networks (Bahdanau et al., 2014; Sukhbaatar are submitted by IMDB users, many quotes from et al., 2015; Kumar et al., 2016). most of the characters are typically found, espe- The ability to infer and model personality has cially for the famous characters found in the Char- applications in storytelling agents, dialogue sys- acter Tropes dataset. The distribution of quotes tems, and psychometric analysis. In particular, per trope is displayed in Figure1. Our models personality-infused agents can help “chit-chat” were trained on 13,874 quotes and validated and bots avoid repetitive and uninteresting utterances tested on a set of 1,734 quotes each. (Walker et al., 1997; Mairesse and Walker, 2007; We refer to each IMDB quote as a (contextu- Li et al., 2016; Zhang et al., 2018). The more alized) dialogue snippet, as each quote can con- recent neural models do so by conditioning on a tain several lines between multiple characters, as ‘persona’ embedding – our model could help pro- well as italicized text giving context to what might duce those embeddings. be happening when the quote took place. Fig- Finally, in the field of literary analysis, graphi- ure2 show a typical dialogue snippet. 70.3% of cal models have been proposed for learning char- the quotes are multi-turn exchanges, with a mean acter personas in novels (Flekova and Gurevych, of 3.34 turns per multi-turn exchange. While the 2015; Srivastava et al., 2016), folktales (Valls- character’s own lines alone can be highly indica- Vargas et al., 2014), and movies (Bamman et al., tive of the trope, our models show that account- 2014). However, these models often use more ing for context and the other characters’ lines and structured inputs than dialogue to learn personas. context improves performance. The context, for instance, can give clues to typical scenes and ac- 3 Datasets tions that are associated with certain tropes, while Characters in movies can often be categorized into the other characters’ lines give further detail into archetypal roles and personalities. To understand 1https://pralav.github.io/emnlp_ the relationship between dialogue and personas, personas/ 1800 [wE1 ; wE2 :::; wET ] is the contextual information 1600 and O = [wO1 ; wO2 :::; wOT ] denotes the other 1400 characters’ lines. We define all three components 1200 of S to have fixed sequence length T and pad when 1000 necessary. Let NS be the total number of dialogue 800 snippets for a trope. We sample a set of Ndiag 600 (where Ndiag NS) snippets from NS snippets 400 related to the trope as inputs to our model. 200 0 5 Attentive Memory Network ditz klutz bully stoner slacker coward ophelia revenge trickster charmer doormat final_girl jerk_jock the_chief dirty_cop casanova valley_girl the_editor storyteller chanteuse evil_prince young_gun broken_bird surfer_dude warrior_poet tranquil_fury byronic_hero prima_donna child_prodigy dumb_blonde stupid_crooks dumb_muscle bromantic_foil retired_outlaw playful_hacker bounty_hunter junkie_prophet loveable_rogue granola_person self_made_man psycho_for_hire dean_bitterman gentleman_thief cultured_badass crazy_survivalist brainless_beauty grumpy_old_man loser_protagonist eccentric_mentor gadgeteer_genius crazy_jealous_guy father_to_his_men classy_cat_burglar egomaniac_hunter romantic_runnerup master_swordsman pupil_turned_to_evil drill_sargeant_nasty heartbroken_badass henpecked_husband hitman_with_a_heart big_man_on_campus hardboiled_detective arrogant_kungfu_guy fastest_gun_in_the_west officer_and_a_gentleman morally_bankrupt_banker adventurer_archaeologist absent_minded_professor consummate_professional bruiser_with_a_soft_center corrupt_corporate_executive The Attentive Memory Network consists of two major components: (a) Attentive Encoders, and Figure 1: Number of IMDB dialogue snippets per trope (b) a Knowledge-Store