Creating Interactive Virtual Humans: Some Assembly Required
Total Page:16
File Type:pdf, Size:1020Kb
University of Pennsylvania ScholarlyCommons Departmental Papers (CIS) Department of Computer & Information Science July 2002 Creating Interactive Virtual Humans: Some Assembly Required Jonathan Gratch USC Institute for Creative Technologies Jeff Rickel USC Information Sciences Institute Elisabeth André University of Augsburg Justine Cassell MIT Media Lab Eric Petajan Face2Face Animation FSeeollow next this page and for additional additional works authors at: https:/ /repository.upenn.edu/cis_papers Recommended Citation Jonathan Gratch, Jeff Rickel, Elisabeth André, Justine Cassell, Eric Petajan , and Norman I. Badler, "Creating Interactive Virtual Humans: Some Assembly Required", . July 2002. Copyright © 2002 IEEE. Reprinted from IEEE Intelligent Systems, Volume 17, Issue 4, July-August 2002, pages 54-63. Publisher URL:http://ieeexplore.ieee.org/xpl/tocresult.jsp?isNumber=22033&puNumber=5254 This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of the University of Pennsylvania's products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to [email protected]. By choosing to view this document, you agree to all provisions of the copyright laws protecting it. This paper is posted at ScholarlyCommons. https://repository.upenn.edu/cis_papers/17 For more information, please contact [email protected]. Creating Interactive Virtual Humans: Some Assembly Required Abstract Discusses some of the key issues that must be addressed in creating virtual humans, or androids. As a first step, we overview the issues and available tools in three key areas of virtual human research: face-to- face conversation, emotions and personality, and human figure animation. Assembling a virtual human is still a daunting task, but the building blocks are getting bigger and better every day. Comments Copyright © 2002 IEEE. Reprinted from IEEE Intelligent Systems, Volume 17, Issue 4, July-August 2002, pages 54-63. Publisher URL:http://ieeexplore.ieee.org/xpl/tocresult.jsp?isNumber=22033&puNumber=5254 This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of the University of Pennsylvania's products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to [email protected]. By choosing to view this document, you agree to all provisions of the copyright laws protecting it. Author(s) Jonathan Gratch, Jeff Rickel, Elisabeth André, Justine Cassell, Eric Petajan , and Norman I. Badler This journal article is available at ScholarlyCommons: https://repository.upenn.edu/cis_papers/17 Workshop Report Creating Interactive Virtual Humans: Some Assembly Required Jonathan Gratch, USC Institute for Creative Technologies Jeff Rickel, USC Information Sciences Institute Elisabeth André, University of Augsburg Justine Cassell, MIT Media Lab Eric Petajan, Face2Face Animation Norman Badler, University of Pennsylvania cience fiction has long imagined a future populated crepancies from human norms. Thus, virtual human research must draw heavily on psychology and communication with artificial humans—human-looking devices with S theory to appropriately convey nonverbal behavior, emotion, human-like intelligence. Although Asimov’s benevolent and personality. robots and the Terminator movies’ terrible war machines This broad range of requirements poses a serious prob- lem. Researchers working on particular aspects of virtual are still a distant fantasy, researchers across a wide range humans cannot explore their component in the context of of disciplines are beginning to work together toward a a complete virtual human unless they can understand more modest goal—building virtual humans. These soft- results across this array of disciplines and assemble the ware entities look and act like people and can engage in vast range of software tools (for example, speech recog- conversation and collaborative tasks, but they live in simu- nizers, planners, and animation systems) required to con- lated environments. With the untidy problems of sensing struct one. Moreover, these tools were rarely designed to and acting in the physical world thus dispensed, the focus interoperate and, worse, were often designed with differ- of virtual human research is on capturing the richness and ent purposes in mind. For example, most computer graph- dynamics of human behavior. ics research has focused on high fidelity offline image The potential applications of this technology are con- rendering that does not support the fine-grained interac- siderable. History students could visit ancient Greece tive control that a virtual human must have over its body. and debate Aristotle. Patients with social phobias could In the spring of 2002, about 30 international researchers rehearse threatening social situations in the safety of a from across disciplines convened at the University of virtual environment. Social psychologists could study Southern California to begin to bridge this gap in knowl- theories of communication by systematically modifying edge and tools (see www.ict.usc.edu/~vhumans). Our a virtual human’s verbal and nonverbal behavior. A vari- ultimate goal is a modular architecture and interface stan- ety of applications are already in progress, including dards that will allow researchers in this area to reuse each education and training,1 therapy,2 marketing,3,4 and other’s work. This goal can only be achieved through a entertainment.5,6 close multidisciplinary collaboration. Towards this end, Building a virtual human is a multidisciplinary effort, the workshop gathered a collection of experts representing joining traditional artificial intelligence problems with a the range of required research areas, including range of issues from computer graphics to social science. Virtual humans must act and react in their simulated envi- • Human figure animation ronment, drawing on the disciplines of automated reason- • Facial animation ing and planning. To hold a conversation, they must exploit • Perception the full gamut of natural language processing research, • Cognitive modeling from speech recognition and natural language understand- • Emotions and personality ing to natural language generation and speech synthesis. • Natural language processing Providing human bodies that can be controlled in real time • Speech recognition and synthesis delves into computer graphics and animation. And because • Nonverbal communication an agent looks like a human, people expect it to behave like • Distributed simulation one as well and will be disturbed by, or misinterpret, dis- • Computer games 54 1094-7167/02/$17.00 © 2002 IEEE IEEE INTELLIGENT SYSTEMS Here we discuss some of the key issues Regarding the goals achieved by the ities. The architecture should be flexible that must be addressed in creating virtual different modalities, in natural conversa- enough to track these different threads of humans. As a first step, we overview the tion speakers tend to produce a gesture communication in ways appropriate to issues and available tools in three key areas with respect to their propositional goals (to each thread. Different threads have dif- of virtual human research: face-to-face advance the conversation content), such as ferent response-time requirements; conversation, emotions and personality, making the first two fingers look like legs some, such as feedback and interruption, and human figure animation. walking when saying “it took 15 minutes to occur on a sub-second time scale. The get here,” and speakers tend to use eye architecture should reflect this by allow- Face-to-face conversation movement with respect to interactional ing different processes to concentrate on Human face-to-face conversation involves goals (to ease the conversation process), activities at different time scales. both language and nonverbal behavior. The such as looking toward the other person • Understanding and synthesis of proposi- behaviors during conversation don’t just when giving up the turn.7 To realistically tional and interactional information. function in parallel, but interdependently. generate all the different verbal and non- Dealing with propositional information— The meaning of a word informs the inter- verbal behaviors, then, computational the communication content—requires pretation of a gesture, and vice versa. The architectures for virtual humans must con- building a model of the user’s needs and time scales of these behaviors, however, are trol both the propositional and interactional knowledge. The architecture must in- different—a quick look at the other person structures. In addition, because some of clude a static domain knowledge base to check that they are listening lasts for less these goals can be equally well met by one and a dynamic discourse knowledge time than it takes to pronounce a single modality or the other, the architecture must base. Presenting propositional informa- word, while a hand gesture that indicates tion requires a planning module for what the word “caulk” means might last presenting