Put Your Title Here
Total Page:16
File Type:pdf, Size:1020Kb
Affect Detection from Open-Ended Improvisational Text
Li Zhang, John A. Barnden, Robert J. Hendley, and Alan M. Wallington School of Computer Science, University of Birmingham Birmingham, B15 2TT, UK [email protected]
Abstract
We report progress on adding affect-detection to a program for virtual dramatic improvisation, mon- itored by a human director. We have developed an affect-detection module to control an automated virtual actor and to contribute to the automation of directorial functions. The work also involves ba- sic research into how affect is conveyed through metaphor. The relevance of the project to the sym - posium is mainly in the application of AI to the creation of emotionally believable synthetic agents for interactive narrative environments, and in the study of the language used in improvisatory story- construction.
1 Introduction bubbles typed by the actor operating the character. One director and up to five actors are involved in an The benefits of using improvised drama or role-play e-drama session. A graphical interface shows the in education, training, conflict resolution and coun- characters and virtual stage on the director’s and selling are widely recognised, and researchers have each actor’s terminal. Actors can choose the clothes recently explored the possibility of e-drama, in and bodily appearance for their own characters. So which virtual characters (avatars) on computer dis- far, the characters’ visual forms have been static car- plays interact under the control of human users (e.g. toon figures, with real-life photographic images Machado et al., 2000). This paper focuses on our ex- used as backdrops. However, we are now bringing perience1 in adding types of intelligent processing to in animated gesturing avatars and 3D computer-gen- an existing e-drama system, edrama, created by one erated settings using technology from another indus- of our industrial partners (Hi8us) and used in trial partner (BT), although these matters are outside schools for creative writing and teaching in various the scope of the present paper. subjects. The intelligent processing is focussed on Actors and a human director all work at separate the detection and assessment of emotional and other terminals, through software clients connecting with “affective” aspects of language utterances created the server. The clients communicate by XML stream freely by users. messages via the server, over the Internet using Although the symposium is not focused on im- standard browsers. The terminals are generally at a provisatory drama systems as such, an important de- site or sites remote from the server site, and may be sirable aspect of various types of believable synthet- remote from each other. ic agent, in games or other environments, is the abil- The actors are given a loose scenario around ity to be sensitive to the affect apparently expressed which to improvise, but are at liberty to be creative. in natural-language expressions uttered by other par- (There are no canned speeches and no firmly re- ticipants, in order to be able to respond in an affect- quired plot elements.) For example, one scenario we ively appropriate way. have used is a school-bullying one involving a In the original edrama system mentioned above, schoolgirl Lisa, who is being bullied by her class- “actors” (human users) control virtual characters on mate Mayid. Lisa is a shy child and afraid of Mayid. a virtual stage, with “speeches” displayed as text Other characters are Lisa’s mother, a friend, and a schoolteacher. Actors are expected to improvise in- 1 The project is supported by grant RES-328-25-0009 under the teresting interchanges within these parameters. It is ESRC/EPSRC/DTI “PACCIT” programme, and its metaphor as- expected that normally the Mayid character will ex- pects also by EPSRC grant EP/C538943/1. We thank our partners press hostility to Lisa and that she will express fear, —Hi8us Midlands Ltd, Maverick Television Ltd and BT—and colleagues W.H. Edmondson, S.R. Glasbey, M.G. Lee and Z. but actors can be creative, so that for example the Wen. Mayid actor might play him as repenting of his bul- such as hostility; and value judgments of goodness, lying. importance, etc. The human director has a number of roles. S/he The name EmEliza, while appropriate at an ini- must constantly monitor the unfolding drama and tial stage of its development because of the pro- the actors’ interactions, or lack of them, to intervene gram’s Eliza-like qualities, is now outmoded as we if they are not keeping to the general spirit of the have begun to include more sophisticated syntactic scenario. For example, a director may intervene and semantic processing, as sketched below. when the emotions expressed or discussed are not as Although merely detecting affect is limited com- expected (or are not leading consistently in a new pared to extracting the full meaning of characters’ interesting direction). The director may also inter- utterances, we have found that in many cases this is vene if, for example, one character is not getting in- sufficient for the purposes of stimulating the impro- volved, or is dominating the improvisation. visation. Also, even limited types of affect detection Intervention can take a number of forms. The can be useful. We do not purport to be able to make director can send suggestions to actors. However, EmEliza detect all types of affect under all ways af- another important means of intervention is for the fect can be expressed or implied, or to do it with a director to introduce and control a ‘bit-part’ charac- high degree of reliability. The spirit of the project is ter. This character may not have a major role in the to see how far we can get with practical processing drama, but might, for example, try to interact with a techniques, while at the same time investigating the- character who is not participating much in the drama oretically the nature of, and potential computational or who is being ignored by the other characters. Al- ways of dealing with, forms of affective expression ternatively, it might make comments intended to that are too difficult to handle in a usable implemen- ‘stir up’ the emotions of those involved, or, by inter- ted system. vening, diffuse an inappropriate exchange develop- Much research has been done on creating affect- ing between two characters. ive virtual characters in interactive systems. Indeed, Clearly, all this imposes a heavy burden on the Picard’s work (2000) makes great contributions to director. Playing the role of the bit-part character building affective virtual characters. Also, emotion makes it difficult to fully monitor the improvisation theories, particularly that of Ortony, et al. (1988) and send appropriate messages to actors. The diffi- (OCC), have been used widely therein. Prendinger culty is particularly acute if the directors are and Ishizuka (2001) used the OCC model in part to novices, such as teachers trying to use e-drama with- reason about emotions and to produce believable in their lessons. emotional expression. Wiltschko’s eDrama Front One major research aim is accordingly to auto- Desk (2003) is designed as an online emotional nat- mate some directorial functions, either to take some ural language dialogue simulator with a virtual re- of the burden away from a human director or to ception interface for pedagogical purposes. Mehdi et provide a fully automated and therefore necessarily al. (2004) combined the widely accepted five-factor very restricted director. With a fully-automated dir- model of personality (McCrae et al., 1992), mood ector, even if highly restricted in what it can do, and OCC in their approach for the generation of little or no human supervision might be required for emotional behaviour for a fireman training applica- at least minimally adequate improvisations and so tion. Gratch and Marsella (2004) presented an integ- these directorial functions could make a useful addi- rated model of appraisal and coping, in order to tion to role-playing games. reason about emotions and to provide emotional re- However, our main current work is on assisting a sponses, facial expressions and potential social intel- human director by providing fully-automated con- ligence for virtual agents. Egges et al. (2003) have trol of a bit-part character, although we are also provided virtual characters with conversational emo- working on automating limited types of direct- tional responsiveness. Elliott et al. (1997) demon- or-to-actor message-sending to allow the human dir- strated tutoring systems that reason about users’ ector to concentrate on the more difficult aspects of emotions. There is much other work in a similar the task. For this reason, we have created a simple vein. automated actor, EmEliza, which is under the con- There has been only a limited amount of work trol of an affect-detection module. The module tries directly comparable to our own, especially given our to identify affect in characters’ speeches, allowing concentration on improvisation and open-ended lan- the EmEliza character to make responses that it is guage. There has been relevant work on general lin- hoped will stimulate the improvisation. Within af- guistic clues that could be used in practice for affect fect we include: basic and complex emotions such as detection (e.g. Craggs & Wood (2004)), whilst anger and embarrassment respectively; meta-emo- Façade (Mateas, 2002) included shallow natural lan- tions such as desiring to overcome anxiety; moods guage processing for characters’ open-ended utter- ances, but the detection of major emotions, rudeness and value judgements is not mentioned. Zhe and bels, and intensity. Affect labels plus intensity are Boucouvalas (2002) demonstrated an emotion ex- used when strong text clues signalling affect are de- traction module embedded in an Internet chatting tected, while the evaluation dimension plus intensity environment. It uses a part-of-speech tagger and a is used for weak text clues. Moreover, our analysis syntactic chunker to detect the emotional words and reported here is based on the transcripts of previous to analyse emotion intensity for the first person (e.g. e-drama sessions. Since even a person’s interpreta- ‘I’ or ‘we’). Unfortunately the emotion detection fo- tions of affect can be very unreliable, our approach cuses only on emotional adjectives, and does not ad- combines various weak relevant affect indicators dress deep issues such as figurative expression of into a stronger and more reliable source of informa- emotion. Also, the concentration purely on first-per- tion for affect detection. Now we summarize our af- son emotions is narrow. fect detection based on multiple streams of informa- Our work is distinctive in several respects. Our tion. interest is not just in (a) the first-person, positive ex- pression of affect: the affective states or attitudes 2.1 Pre-processing Modules that a virtual character X implies that it itself has (or had or will have, etc.), but also in (b) affect that the The language in the speeches created in e-drama character X implies it lacks, (c) affect that X implies sessions severely challenges existing language-ana- that other characters have or lack, and (d) questions, lysis tools if accurate semantic information is sought commands, injunctions, etc. concerning affect. We even for the purposes of restricted affect-detection. aim also for the software to cope partially with the The language includes misspellings, ungrammatical- important case of communication of affect via meta- ity, abbreviations (often as in text messaging), slang, phor (Fussell & Moss, 1998; Kövecses, 1998), and use of upper case and special punctuation (such as to push forward the theoretical study of such lan- repeated exclamation marks) for affective emphasis, guage, as part of our research on metaphor generally repetition of letters or words also for affective em- (see, e.g., Barnden et al. (2004)). phasis, and open-ended interjective and onomato- Our project does not involve using or developing poeic elements such as “hm”, “ow” and “grrrr”. In deep, scientific models of how emotional states, etc., the examples we have studied, which so far involve function in cognition. Instead, the deep questions in- teenage children improvising around topics such as vestigated are on linguistic matters such as the meta- school bullying, the genre is similar to Internet chat. phorical expression of affect and how ordinary To deal with the misspellings, abbreviations, let- people understand and talk about affect in ordinary ter repetitions, interjections and onomatopoeia, sev- life. What is of prime importance is their common- eral types of pre-processing occur before actual de- sense views of how affect works, irrespective of tection of affect. how scientifically accurate those views are. Meta- A lookup table has been created containing ab- phor is strongly involved in such views. breviations for Internet chat rooms and abbrevi- It should also be appreciated that this paper does ations that we have found by analyzing previous e- not address the emotional, etc. states of the actors drama sessions (e.g. ‘im (I am)’ and ‘c u (see you)’). (or director, or any audience). Our focus is on the Multiword phrases and original words, which trans- affect that the actors make their characters express late the abbreviations, are listed correspondingly in or mention. While an actor may work him/herself up the lookup table. The abbreviation module can into, or be put into, a state similar to or affected by handle most of the abbreviation cases in users’ in- those in his/her own characters’ speeches or those of put. Especially we also deal with abbreviations such other characters, such interesting effects, which go as numbers embedded within words (e.g., “l8r” for to the heart of the dramatic experience, are beyond “later”) using the lookup table. Unfortunately cer- the scope of this paper, and so is the possibility of tain abbreviations can be ambiguous. E.g., ‘2’ may using information one might be able to get about stand for ‘to’, ‘too’ or ‘two’ (although the last is rare actors’ own affective states as a hint about the af- in our genre), as exemplified by “I’m 2 hungry 2 fective states of their characters or vice-versa. walk”. A lookup table on its own cannot solve such context-sensitive ambiguity. In order to solve this problem, part-of-speech information has been as- 2 Our Current Affect Detection signed to the surrounding words using the lexicon from Brill’s tagger (1994). Then, simple context- Various characterizations of emotion are used in sensitive strategies are used to find the appropriate emotion theories. The OCC model uses emotion la- words for the ambiguous items. These simple bels (anger, etc.) and intensity, while Watson and strategies may lead to errors in some special cases, Tellegen (1985) use positivity and negativity of af- but we have evaluated them using examples from fect as the major dimensions. Currently, we use an previous e-drama transcripts and obtained an 85.7% evaluation dimension (negative-positive), affect la- accuracy rate, which is adequate. We are also con- sidering dealing with abbreviations, etc. in a more produced from earlier edrama improvisations based general way by including them as special lexical on a school bullying scenario, using school children items in the lexicon of the robust parser we are us- aged from 8 to 12. We have also worked on another, ing (see below). distinctly different scenario – Crohn’s disease, based Letter repetition comes in two flavours. One is on a programme from one of our industrial partners, repetition added to ordinary words (e.g. ‘yessss’, Maverick Television Ltd. One interesting feature in ‘seeeee’) and the other is repetition that expands in- this scenario is meta-emotion (emotion about emo- terjections or onomatopoeic elements (e.g. tion) and cognition about emotion, because of the ‘grrrrrrrrr’, ‘agggghhhhh’). The iconic use of word need for people to cope with emotions about their length here (i.e., written word length corresponding illnesses. The rule sets created for one scenario have roughly to imagined sound length) normally implies a useful degree of applicability to other scenarios, strong affective states in the characters’ input. Use- though there will be a few changes in the related fully, adding letters does not change the pronunci- knowledge database according to EmEliza’s differ- ation a great deal. We therefore handle added letter ent roles in specific scenarios. repetitions by means of the Metaphone spelling-cor- A rule-based Java framework called Jess (2004) rection algorithm (2005), whose working strategy is is used to implement the pattern/template-matching based on pronunciations, together with a small dic- rules in EmEliza. When Mayid says “Lisa, you tionary that we created, containing base forms of Pizza Face! You smell”, EmEliza detects that he is various interjections and onomatopoeic elements to- insulting Lisa. Patterns such as ‘you smell’ have gether with some ordinary words that are often sub- been used for rule implementation. The rules con- ject to letter-repetition in e-drama sessions. We also jecture the character’s emotions, evaluation dimen- aim to develop a more general detector of onomato- sion (negative or positive), politeness (rude or po- poeic elements that does not rely on particular base lite) and what response EmEliza should make. forms. We stress that added letter-repetition is not Multiple exclamation marks, capitalisation of simply eliminated: its occurrence is recorded, to aid whole words and added letter repetition (Werry, affect-detection. 1996) are frequently employed to express affective Finally, the Levenshtein distance algorithm emphasis in e-drama sessions. If emotion and ex- (2005) with a contemporary English dictionary deals clamation marks or capitalisation are detected in a with spelling mistakes in users’ input. character’s utterance, then the emotion intensity is Having described the necessary preprocessing, deemed to be comparatively high (and emotion is we now turn to the core detection of affect in users’ suggested even in the absence of other indicators). input. In an initial stage of our work, affect detection A reasonably good indicator that an inner state is was based purely on textual pattern-matching rules being described is the use of ‘I’ (see also Craggs & that looked for simple grammatical patterns or tem- Wood (2004)), especially in combination with the plates partially involving lists of specific alternative present or future tense. In the school-bullying scen- words. This continues to be a core aspect of our sys- ario, when ‘I’ is followed by a future-tense verb the tem but we have now added robust parsing and affective state ‘threatening’ is normally being ex- some semantic analysis. First we describe the pat- pressed; and the utterance is usually the shortened tern-matching. version of an implied conditional, e.g., “I’ll scream [if you stay here]”. When ‘I’ is followed by a 2.2 Pattern Matching present-tense verb, other emotional states tend to be expressed, e.g. “I want my mum” (fear) and “I hate In the textual pattern-matching, particular keywords, you” (dislike). Further analysis of first-person, phrases and fragmented sentences are found, but present-tense cases is provided in section 2.4. also certain partial sentence structures are extracted. This procedure possesses the robustness and flexib- 2.3 Processing of Imperatives ility to accept many ungrammatical fragmented sen- tences and to deal with the varied positions of One useful pointer to affect, particularly to strong sought-after phraseology in speeches. However, it emotions and/or rude attitudes, is the use of imperat- lacks other types of generality and can be fooled ive mood, especially when used without softeners when the phrases are suitably embedded as subcom- such as ‘please’ or ‘would you’. There are special, ponents of other grammatical structures. For ex- common imperative phrases we deal with explicitly, ample, if the input is “I doubt she’s really angry”, such as “shut up” and “mind your own business”. rules looking for anger in a simple way will fail to They usually indicate strong negative emotions. But provide the expected results. Below we indicate our the phenomenon is more general. path beyond these limitations. Detecting imperatives accurately in general is by The transcripts analysed to inspire our initial itself an example of the non-trivial problems we knowledge base and pattern-matching rules were face. To go beyond the limitations of the text match- ing we have done, we have also used syntactic out- this speaker recently. If there is, then the input is puts from the Rasp parser (Briscoe & Carroll, 2002) conjectured to be declarative. and semantic information in the form of the semant- There is another type of sentence: ‘don’t you + ic profiles for the 1,000 most frequently used Eng- base form of verb’ that we have started to address. lish words (Heise, 1965) to deal with certain types Though such sentences are often interrogative, they of imperatives. This helps us to deal with at least can be negative versions of imperatives with a ‘you’ some of the difficulties. subject (e.g. “Don’t you call me a dog”). Normally The Rasp parser recognises some types of imper- Rasp regards them as interrogatives. Thus, further atives directly. Unfortunately, the grammar of the analysis has been implemented for such a sentence 2002 version of the Rasp parser that we have used structure to change the sentence type to imperative. does not deal properly with certain imperatives Although currently this has limited effect, as we (John Carroll, p.c), which means that examples like only infer a (negative) affective quality when the “you shut up”, “Dave bring me the menu”, “Matt verb is “dare”, we plan to add semantic processing don’t be so blunt” and “please leave me alone”, are in an attempt to glean affect more generally from not recognized as imperatives, but as normal declar- “Don’t you …” imperatives. ative sentences. Therefore, further analysis is In general, the imperative-mood detection is one needed to detect imperatives, by additional pro- useful tool for extracting potential affective flavour cessing applied to the possibly-incorrect syntactic from users’ input. trees produced by Rasp. Aside from imperatives, we have also worked on If Rasp outputs a subject, ‘you’, followed by cer- implementing simple types of semantic extraction of tain verbs (e.g. ‘shut’, ‘calm’, etc) or certain verb affect using affect dictionaries and electronic phrases (e.g. ‘get lost’, ‘go away’ etc), the sentence thesauri, such as WordNet (2005). The way we are type will be changed to imperative. (Note: in “you currently using WordNet is briefly as follows. get out” the “you” could be a vocative rather than the subject of “get”, especially as punctuation such 2.4 Using WordNet as commas is often omitted in our genre; however these cases are not worth distinguishing and we as- As we mentioned earlier, use of the first-person with sume that the “you” is a subject.) If a softener a present-tense verb tends to express an affective ‘please’ is followed by the base forms of a verb, state in the speaker. We have used the Rasp parser then the input is taken to be imperative. If a singular to detect such a sentence. First of all, such user’s in- proper noun is followed by a base form of the verb, put is sent to the pattern-matching rules in order to then this sentence is taken to be an imperative as obtain the speaker’s current affective state and well (e.g. “Dave get lost”). However, when a subject EmEliza’s response to the user. If there is no rule is followed by a verb for which there is no differ- fired (i.e. we don’t obtain any information of the ence at all between the base form and the past tense speaker’s affective state and EmEliza’s response form, then ambiguity arises between imperative and from the pattern-matching rules), the subsequent declarative (e.g. “Lisa hit me”). further processing is applied. Then we use WordNet An important special case of this ambiguity is to track down the synonyms of the verb (possibly when the object of the verb is ‘me’. To solve the from different synsets) in the verb phrase of the in- ambiguity, we have adopted the evaluation value of put sentence, in order to allow a higher degree of the verb from Heise’s compilation of semantic dif- generality than would be achieved just with the use ferential profiles (1965). In these profiles, Heise lis- of our pattern-matching rules. In order to find the ted values of evaluation, activation, potency, dis- closest synonyms to the verb in different synsets, tance from neutrality, etc. for the 1,000 most fre- Heise’s (1965) semantic profiles of the 1,000 most quently used English words. In the evaluation di- frequently used English words have again been em- mension, positive values imply goodness. Because ployed, especially to find the evaluation values of normally people tend to use ‘a negative verb + me’ every synonym in different synsets and the original to complain about an unfair fact, if the evaluation verb. We currently transform the graded positive value is negative for such a verb, then this sentence and negative evaluation values in Heise’s dictionary is probably not imperative but declarative (e.g into binary ‘positive’ and ‘negative’ only. Then if “Mayid hurt me”). Otherwise, other factors imply- any synonym has the same evaluation (‘positive’ or ing imperative are checked in this sentence, such as ‘negative’) as that of the original verb, then it will exclamation marks and capitalizations. If these be selected as a member of the set of closest syn- factors occur, then the input is probably an imperat- onyms. Then, we use one closest synonym to re- ive. Otherwise, the conversation logs are checked to place the original verb in the user’s input. This see if there is any question sentence directed toward newly built sentence will be sent to the pat- tern-matching rules in order to obtain the user’s af- fective state and EmEliza’s response. Such pro- cessing (using a closest synonym to replace the ori- stay in character and said pointless things, while in ginal verb and sending the newly built sentence to another session one student, who played a main the pattern-matching rules) continues until we ob- character, believed that the EmEliza character was tain the speaker’s affective state and appropriate re- the only one that stuck to scenario related topics. sponse. The directors reported that, even when a main char- acter was silent and the director did not intervene 2.5 Responding Regimes very much, the EmEliza character led the improvisa- tion on the right track by raising new topics other EmEliza normally responds to, on average, every characters were concerned about. Nth speech by another character in the e-drama ses- sion, where N is a changeable parameter (currently set to 3). However, it also responds when EmEliza’s 3 Affect via Metaphor character’s name is mentioned, and makes no re- The direct metaphorical description of emotional sponse if it cannot detect anything useful in the ut- states is common and has been extensively studied terance it is responding to. The one-in-N average is (Fussell & Moss, 1998). Examples are “He nearly achieved by sampling a random variable every time exploded” and “Joy ran through me”. We say that another character says something. As a future devel- such descriptions are “direct” because they are dir- opment we plan to have N dynamically adjustable ectly about emotional states, even though in many according to how confident EmEliza is about what it cases no emotional state is named. But affect is of- has discerned in the utterance at hand. ten conveyed more indirectly via metaphor, as in EmEliza sometimes makes a random response “His room is a cess-pit”, where affect associated from several stored response candidates that are with a source item (cess-pit) gets carried over to the suitable for the affective quality it has discerned in corresponding target item (the room). an utterance it is responding to. In addition, EmEliza In our research on metaphor (see, e.g., Barnden sometimes reflects back, in modified form, part of et al., 2004; Barnden, forthcoming) we are con- the user’s input string as a sub-component of its re- cerned with metaphor in general and are in particu- sponse. Notice here that, because the pre-processing lar interested in both of the types of affective meta- module reported in section 2.1 expands abbrevi- phor in the previous paragraph. We are bringing this ations and corrects misspellings, it helps to obscure metaphor research to bear upon the e-drama applica- the fact that part of EmEliza’s response is only a re- tion, and using the application as a useful source of flection. For example: theoretical inspiration. Our intended approach to metaphor handling in