Data-Driven Models for Personality Recognition and Generation

Data-Driven Models for Personality Recognition and Generation

Learning to Adapt in Dialogue Systems: Data-driven Models for Personality Recognition and Generation Franr;ois Mairesse Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Computer Science University of Sheffield, United Kingdom February 2008 Abstract Dialogue systems are artefacts that converse with human users in order to achieve some task. Each step of the dialogue requires understanding the user's input, de­ ciding on what to reply, and generating an output utterance. Although there are many ways to express any given content, most dialogue systems do not take lin­ guistic variation into account in both the understanding and generation phases, i.e. the user's linguistic style is typically ignored, and the style conveyed by the system is chosen once for all interactions at development time. We believe that modelling linguistic variation can greatly improve the interaction in dialogue sys­ tems, such as in intelligent tutoring systems, video games, or information retrieval systems, which all require specific linguistic styles. Previous work has shown that linguistic style affects many aspects of users' perceptions, even when the dialogue is task-oriented. Moreover, users attribute a consistent personality to machines, even when exposed to a limited set of cues, thus dialogue systems manifest per­ sonality whether designed into the system or not. Over the past few years, psy­ chologists have identified the main dimensions of individual differences in human behaviour: the Big Five personality traits. We hypothesise that the Big Five provide a useful computational framework for modelling important aspects of linguistic variation. This thesis first explores the possibility of recognising the user's person­ ality using data-driven models trained on essays and conversational data. We then test whether it is possible to generate language varying consistently along each personality dimension in the information presentation domain. We present PER­ SONAGE: a language generator modelling findings from psychological studies to project various personality traits. We use PERSONAGE to compare various gener­ ation paradigms: (1) rule-based generation, (2) overgenerate and select and (3) generation using parameter estimation models-a novel approach that learns to produce recognisable variation along meaningful stylistic dimensions without the computational cost incurred by overgeneration techniques. We also present the first human evaluation of a data-driven generation method that projects multiple stylistic dimensions simultaneously and on a continuous scale. Acknowledgements Here I would like to express my sincere gratitude to the many people who have contributed to making this thesis an enjoyable and rewarding experience. First of all, I would like to thank Lyn Walker for her enthusiastic supervision, and for making me discover the field of computational linguistics, as well as re­ search methods in general. This thesis could not have existed without her. I am also grateful to Roger Moore for our fruitful conversations, as well as to Rob Gaizauskas for his valuable advice. On a different continent, I would like to thank Matthias Mehl, James Pennebaker and Cindy Chung for their collaboration, and for intro­ ducing me to the world of psychology research. On a more personal note, I would like to thank my family-An nick, Pierre, Cecile and Jeremie-who I should have seen much more often, as well as Matt Gibson, Sarah Creer and Helen Cross for their humour, kindness and friendship throughout these three years, and the Cobden View staff for making Thesday the quiz day. Many thanks to John Allan, Joe Polifroni, Jonathan Laidler and Nathaniel King for their conversations, lunches and the occasional pint. I am also grateful to the departmental staff and support for making the department work, and es­ pecially to Gillian Callaghan for her northern humour. I also had the pleasure of living with many people-with whom I hope I have shared a bit more than a house-including Rachel Fairlamb, Jonathan Chambers, Catherine Quigley, Dean Young, Jonathon Rudkin, Dave Robinson, Hector Marin Reyes, Bouka Maamra, Yannis Balaskas, Sotiris Saravelos, Thdor Grecu and Daniele Musumeci. While Sheffield has been my new home for the past three years, I must also thank the people who made me want to go back to Belgium (and not forget about strong fizzy beers): Fran~ois Delfosse, Simon Dubois, Geoffroy Tassenoy, Guillaume De Bo, Benoit Roland, Lionel Lawson, Ludovic Lecocq, Fabrice Quinet, Quentin Vroye, Rebecca Deraeck, Nathalie Hargot, Maxime Melchior, Benjamin Cogels, Gerard Paquet and many others. Lastly, and most importantly, I would like to thank Veronique Lefebvre for her constant affection and smile. Table of Contents 1 Introduction 1 1.1 Modelling the user's linguistic variation . 2 1.2 Generating linguistic variation to the user 2 1.3 Dimensions of linguistic variation . 3 1.3.1 Definitions of linguistic style. 3 1.3.2 Factors affecting linguistic style 4 1.3.2.1 Formality....... 4 1.3.2.2 Politeness....... 5 1.3.2.3 Dialects and sociolects . 5 1.3.2.4 Personality . 6 1.4 Motivation for personality-based dialogue modelling 7 1.4.1 Recognising the user's personality. 8 1.4.2 Controlling the system's personality . 9 1.5 Research hypotheses . 11 1.5.1 Boundaries . 12 1.6 Contributions and organisation of the thesis 13 2 Background 16 2.1 Elements of personality psychology 16 2.1.1 The main dimensions of personality 17 2.1.2 Biological causes .... 18 2.2 Language and personality . 19 2.2.1 Markers of extraversion . 20 2.2.2 Markers of other Big Five traits 21 2.3 User modelling in dialogue . 22 2.3.1 Individual preferences 23 2.3.2 Expertise . 23 2.3.3 Personality ..... 24 2.4 Modelling individual differences in natural language generation . 26 2.4.1 Early work: ELIZA and PARRY ........ 26 2.4.2 The standard NLG architecture . 27 2.4.3 Template and rule-based stylistic generation 28 ii TABLE OF CONTENTS iii 2.4.3.1 Pragmatic effects . 29 2.4.3.2 Linguistic style . 30 2.4.3.3 Politeness...... 30 2.4.3.4 Personality and embodied conversational agents 32 2.4.4 Data-driven stylistic generation . 33 2.4.4.1 Overgenerate and select methods. 33 2.4.4.2 Direct control of the generation process 35 2.5 Summary . ....................... 36 I Recognising the User's Personality in Dialogue 37 3 Personality Recognition from Linguistic Cues 38 3.1 Adapting to the user's personality . 38 3.2 Experimental method. 40 3.2.1 Sources of language and personality 41 3.2.2 Features ............. 43 3.2.2.1 Content and syntax 44 3.2.2.2 Utterance type 46 3.2.2.3 Prosody ... 46 3.2.3 Correlational analysis 47 3.2.4 Statistical models . 53 3.3 Classification results 55 3.3.1 Essays corpus . 55 3.3.2 EAR corpus .... 57 3.3.3 Qualitative analysis . 59 3.4 Regression results. 61 3.4.1 Essays corpus 61 3.4.2 EAR corpus 62 3.4.3 Qualitative analysis. 64 3.5 Ranking results ... 65 3.5.1 Essays corpus 66 3.5.2 EAR corpus 67 3.5.3 Qualitative analysis. 68 3.6 Discrete personality modelling in related work . 69 3.7 Discussion and summary . 72 II Generating a Recognisable System Personality 76 4 From Personality Markers to Generation Decisions 77 4.1 Personality marker studies . 78 4.1.1 Sources of language ............ 78 TABLE OF CONTENTS iv 4.1.2 Personality assessment methods . 80 4.2 NLG parameter mapping . 82 4.3 Extraversion.... 83 4.4 Emotional stability 89 4.5 Agreeableness.. 94 4.6 Conscientiousness 96 4.7 Openness to experience 99 4.8 Summary . · 102 5 Implementing Personality Markers in a Generator 103 5.1 Framework overview ........... · 103 5.2 Projecting personality in a specific domain · 105 5.3 Input structure . · 105 5.4 PERSONAGE'S architecture . .106 5.5 Implementation of generation decisions · 107 5.5.1 Content planning . .. · 108 5.5.2 Syntactic template selection .112 5.5.3 Aggregation......... .114 5.5.4 Pragmatic marker insertion .117 5.5.5 Lexical choice . · 122 5.5.6 Surface realisation · 125 5.6 Summary . · 126 6 Psychologically Informed Rule-based Generation 127 6.1 Methodology . 127 6.2 Human evaluation . 130 6.3 Results.. 133 6.4 Summary . 139 7 Stochastic Generation Capabilities 141 7.1 Generation coverage and quality . 143 7.1.1 Ratings distribution .................. 143 7.1.1.1 Comparison with the rule-based approach. 145 7.1.2 Inter-rater agreement . 147 7.1.3 Naturalness..... 150 7.2 Feature analysis. 152 7.2.1 Generation decisions. 153 7.2.2 Content-analysis features . 159 7.2.3 N-gram features . 164 7.3 Discussion and summary. 166 TABLE OF CONTENTS v 8 Generation of Personality through Overgeneration 169 8.1 Methodology . · 170 8.2 Statistical models ....... 172 8.3 Results with in-domain models · 173 8.3.1 Modelling error ..... · 173 8.3.1.1 Discussion .. · 178 8.3.1.2 Modelling error distribution. · 179 8.3.2 Sampling error . · 182 8.3.3 Psychologically informed selection models · 187 8.4 Results with out-of-domain models . · 189 8.4.1 Out-of-domain model accuracy · 190 8.4.2 Domain adaptation . · 192 8.5 Summary . .. · 195 9 Generation of Personality through Parameter Estimation 198 9.1 Methodology ............. .199 9.1.1 Pre-processing steps . .201 9.1.2 Statistical learning algorithms . .203 9.1.3 Qualitative model analysis. .204 9.1.4 Model selection .. .205 9.1.5 Generation phase. .206 9.2 Large-scale evaluation .. .207 9.2.1 Evaluation method .208 9.2.2 Evaluation results .209 9.2.3 Comparison with rule-based generation .211 9.2.4 Perception of fine-grained variation.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    284 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us