University of Pennsylvania ScholarlyCommons
Center for Human Modeling and Simulation Department of Computer & Information Science
October 1998
Gesticulation Behaviors for Virtual Humans
Liwei Zhao University of Pennsylvania
Norman I. Badler University of Pennsylvania, [email protected]
Follow this and additional works at: https://repository.upenn.edu/hms
Recommended Citation Zhao, L., & Badler, N. I. (1998). Gesticulation Behaviors for Virtual Humans. Retrieved from https://repository.upenn.edu/hms/21
Copyright 1998 IEEE. Reprinted from Sixth Pacific Conference on Computer Graphics and Applications, 1998, pages 161-168. Publisher URL: http://dx.doi.org/10.1109/PCCGA.1998.732100
This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of the University of Pennsylvania's products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to [email protected]. By choosing to view this document, you agree to all provisions of the copyright laws protecting it.
This paper is posted at ScholarlyCommons. https://repository.upenn.edu/hms/21 For more information, please contact [email protected]. Gesticulation Behaviors for Virtual Humans
Abstract Gesture and speech are two very important behaviors for virtual humans. They are not isolated from each other but generally employed simultaneously in the service of the same intention. An underlying PaT-Net parallel finite-state machine may be used to coordinate them both. Gesture selection is not arbitrary. Typical movements correlated with specific textual elements are used to select and produce gesticulation online. This enhances the expressiveness of speaking virtual humans.
Keywords virtual human, agent, avatar, gesture, posture, PaT-Nets
Comments Copyright 1998 IEEE. Reprinted from Sixth Pacific Conference on Computer Graphics and Applications, 1998, pages 161-168. Publisher URL: http://dx.doi.org/10.1109/PCCGA.1998.732100
This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of the University of Pennsylvania's products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to [email protected]. By choosing to view this document, you agree to all provisions of the copyright laws protecting it.
This conference paper is available at ScholarlyCommons: https://repository.upenn.edu/hms/21
Gesticulation Behaviors for Virtual Humans
Liwei Zhao and Norman I. Badler
Center for Human Mo deling and Simulation
Department of Computer and Information Science
UniversityofPennsylvania
Philadelphia PA 19104-638 9 USA
[email protected] enn.edu, [email protected] enn.edu
1-215-898-5 862 phone; 1-215-573- 745 3 fax
Abstract Beats are small formless waves of the hand that
o ccur with heavily emphasized words, o ccasions
Gesture and speech are two very important behaviors
of turning over the o or to another sp eaker, and
for virtual humans. They are not isolatedfrom each
other kinds of sp ecial linguistic work.
other but general ly employed simultaneously in the ser-
While Cassell's system implemented instances of each
vice of the same intention. An underlying PaT-Net
typ e of gesture, the most prevalentwere iconics linked
paral lel nite-state machine may be usedtocoordinate
to mentions of sp eci c ob jects, metaphorics linked to
them both. Gesture selection is not arbitrary. Typi-
sp eci c actions, and b eats linked to sp eechintonation.
cal movements correlated with speci c textual elements
Following Cassell's lead, new problems in gesture
are used to select and produce gesticulation online.
generation were exp osed.
This enhances the expressiveness of speaking virtual hu-
mans.
1. Coarticulation: Generating a smo oth transition
Keywords: Virtual Human, Agent, Avatar, Gesture,
from one gesture to the next without returning
Posture, PaT-Nets
to a sp eci c rest p ose.
2. Expression: Mo difying the p erformance of a ges-
ture to re ect the agent's manner or p ersonality.
1 Intro duction
3. Spatialization: Integrating a deictic gesture into
the surrounding context.
The past few years have seen several research e orts on
human gestures i.e. [1,2, 5, 8, 27, 18, 16, 13]. Many
4. Selection: Generating a metaphoric that mightbe
of these pro jects have fo cused on interpreting human
asso ciated with an abstract concept.
gestures for interactive control. Creating appropriate
Problem 1, coarticulation, has b een addressed bya
gestures in a virtual human has not b een as well studied
numb er of computer graphics researchers [8,12, 28,
b ecause the range of gestures p erformed during sp eech
26], although the issue has other asp ects suchas
output is much larger than a symb olic selection set used
preparatory actions which remain unsolved. Prob-
for discrete inputs. For example, in [8] four gesture
lem 2, expression, is b eing investigated at a number
typ es are distinguished:
of places [6, 33, 10]. In this pap er, weinvestigate
problems 3 and 4. Of these two, spatialization is eas-
Iconics represent some concrete feature of the ac-
ier, since the desired gesture is combined or comp os-
companying sp eech, such as an ob ject's shap e.
ited with inverse kinematics to p oint or align the ges-
turing b o dy part with the spatial referent. Selection
Metaphorics represent an abstract feature concur-
entails determining gestures that p eople would likely
rently sp oken ab out.
interpret and accept as \natural" and \representative."
These concepts are orthogonal: a naturally p erformed Deictics indicate a p oint in space, and may refer to
motion captured gesture might not b e appropriate to p ersons, places and other spatializeable discourse
entities.
the sp eech text, while a synthesized less natural arm
motion might nevertheless b e representative of the ex- tures and sp eech. Kendon [20] o ers a distinction b e-
pressed concepts. tween autonomous gestures gestures p erformed with-
out accompanying sp eech and gesticulation gesture
The selection problem itself splits into two: one is
p erformed concurrently with phonological utterance.
the creation of the gestural motion and the other is the
Gestures and sp eech are closely asso ciated together.
mapping from the textual content to the gesture. For
They are generally employed simultaneously in the ser-
example, to create a character waving hello during a
vice of the same intention. Well-co ordinated gestures
greeting, one has to create the waving motion as well
and sp eech enhance the expressiveness and b elievabil-
as know when to invoke it up on encountering a greet-
ity of sp eaking virtual humans. In this pap er, we re-
ing context. In this work we assume that the motions
strict our investigation to gesticulation.
themselves are generated byinverse kinematics, mo-
tion capture, or otherwise pre-created e.g. key p ose
sequences. Our contribution lies in prop osing a repre-
2.1 Gestures
sentative mapping from concepts to gestures such that
they are selected based on stylized rhetorical sp eaking.
The study of gestures in dance and oratory may date
To select and spatialize various gestures correlating
back to the b eginning of seventeenth century [7]. More
sp eech and language, we use an underlying co ordina-
recently, semioticists from the elds of anthrop ology,
tion scheme called PaT-Nets [3]. The virtual human
neurophysiology, neuropsychology and psycholinguis-
1
animation is implemented as an extensions to Jack .
tics Freedman [17]; Wiener, Devo e, Rubinow and
The inputs see b elow to the system are in the form of
Geller [34]; McNeill and Levy [24] have b een inter-
sp eech texts with emb edded commands, most of which
ested in the study of gestures. The Lexis dictionary
are related to gestures. The gestures are controlled
1977 gives the most general de nition of gesture |
by PaT-Nets to coincide with the utterance of the
\movements of b o dy parts, particularly the arms, the
sp eech. While the emb edded commands in our exam-
hands or the head conveying, or not conveying, mean-
ples are manually inserted for now, the idea is to detect
ing."
the presence of the corresp onding concepts in the raw
While gestures are the \little" movements that are
text stream and automatically insert the deictics and
con ned to a part or parts of the b o dy, if just con-
metaphorics based solely on the words used.
sidered in isolation they havevery limited contribu-
tion to make to non-verbal communication. Emblems
warning welcome. Hello, ngest
nhead front Currently, I can supp ort following basic arm gestures.
and manual languages, such as American Sign Lan-
Now let me intro duce you some simple ob jects I know:
idxfftable.table.cornerg this is a table np oint
guage, are exceptions b ecause the communication is
np oint idxffdo or.do or.panelg this is a do or
idxffchair1.chair.redg this is red chair np oint
fully b orne bymovements. Gestures are rarely p er-
np oint idxffchair0.chair.yellowg this is yellowchair
slant right Let me showyou the basic arm gestures nhead
formed outside a communicative context and only o cca-
ngest arm reject arm reject gesture
ngest arm unlikely arm unlikely gesture
sionally transmit any depth of emotion or information,
arm not arm not gesture ngest
ngest arm improbable arm improbable gesture
since, as so on as there is any complicated meaning, the
ngest arm doubtful arm doubtful gesture
arm probable arm probable gesture ngest gestures can only b e \read" in relation to the whole
ngest arm tis arm it is gesture
expressivemovement of the b o dy [14,9,22].
arm certain arm certain gesture ngest
ngest arm obvious arm obvious gesture
Most of the current research in gestures is related
arm enchanting arm enchanting gesture ngest
ngest arm absolute arm absolute gesture
to computer vision, human-computer interaction, and
Next, let me showyou some hand gestures:
nhand convulsivefplane0.plane.stand1g convulsive hand gesture
pattern recognition, where the gestures are mainly
expandedfplane0.plane.stand1g expanded hand gesture nhand
nhand exasp erationfplane0.plane.stand1g exasp eration hand gesture
studied in isolation [27, 18, 16,13]. However, gestures
authorityfplane0.plane.stand1g authority hand gesture nhand
nhand relaxedfplane0.plane.stand1g relaxed hand gesture
used by an agentoravatar in a virtual environment
nhand exalationfplane0.plane.stand1g exalation hand gesture
conflictfplane0.plane.stand1g conflict hand gesture nhand
are quite di erent. First, it is a pro cess, not a xed
nhand prostrationfplane0.plane.stand1g prostration hand gesture
nhead slant left
p osture. For example, when someone waves a hand, it
Finally, I can supp ort following basic general gestures:
reject reject gesture ngest
is not the nal p osition of the hand which is the prop er
give and take gesture ngest givetake
warning gesture ngest warning
ob ject of study, but the process by which it got there |
good bye
the actual pro cess of movement. Secondly, it is almost
always accompanied by other gestures or communica-
tivechannels.
2 Gesticulation
In the following we study arm, hand, and head ges-
tures. Ab ove all, we recognize that gesticulation has its
An agentoravatar mayhave a wide varietyofmove-
limitations. The interpretation might b e b oth cultur-
ment b ehaviors, but we fo cus our attention on ges-
ally oriented and individually biased. Personality and
1
Jack is a software pro duct from Transom Technologies, Inc.
so cial context may constrict or amplify the motions. 2
But in general we seek to set a baseline of gesticula- 2.1.2 Hand Gestures
tory b ehavior which can then b e parameterized and
The hand is the most uent and articulate part of the
mo di ed by other means.
body, capable of expressing almost in nite meanings.
Hand gesture languages have b een invented by commu-
2.1.1 Arm Gestures
nicative needs and by the deaf communities of various
cultures. The classic gesture languages of the Hindu
Human arms serve at least two basic separate functions
dance contains ab out 57,000 cataloged hand p ositions
[1]: they allow an agent/avatar to change the lo cal envi-
eachhaving the sp eci c value of a word or explicit and
ronment through dextrous movements by reaching for
distinct meaning [29]. It is virtually imp ossible to im-
and grasping ob jects [19, 15]; and serve so cial interac-
plement all these hand gestures. In this pap er, we in-
tion functions by augmenting the sp eechchannel with
vestigate the selection problem. So we fo cus on the
communicativeemblems, gestures and b eats [8].
hand gestures which can easily generate a metaphoric
Awell-p erformed arm gesture, accompanied by
asso ciated with an abstract concept. In the system,
prop er hand gestures, plays an imp ortant role in inte-
the virtual human agent attempts to use hand ges-
grating some deictic gestures into the surrounding con-
tures that are more selective and which are much more
text spatialization problem and re ecting the agent's
closely co ordinated with what is b eing said in words.
manner or p ersonality to some extentexpression prob-
For example, when attempting to o er a de nition of
lem. For example, in [30]itwas noted that arm ges-
aword such as \write," the agentmay pantomime the
tures with di erent inclinations indicate di erent de-
writing action while vo calizing the verbal de nition [8].
grees of armation | from 0 straightdown to 45
Delsarte [30] provided a small set of stereotypi-
degrees indicates neutral, timid, cold; from 45 degrees
cal hand gestures correlated with grasping, indicating,
to 90 degrees, expansive and warm; and from 90 de-
p ointing, and reaching illustrated in Figure 2. We
grees to 180 degrees, enthusiastic see Figure 1. We
implemented all these hand gestures and they can b e
implemented this series of stereotypical arm gestures
p erformed either by left or by right hand, with pref-
as a representative metaphorical mapping from af-
erence for the right hand under default circumstances.
rmation concepts to gestures such that they can b e
Toavoid crossing the arm over the b o dy and to keep
correlated with the degree of armation in a sp eech.
the b o dy p osture op en, the nearer hand to the target
ob ject is always used. In addition, every hand ges-
ture is co ordinated with head and eye orientation, arm
gestures, and vo calization, all of which are employed si-
multaneously in the service of interpreting an abstract
concept.
Figure 1: Arm gestures with di erent degrees indicate
Figure 2: Grasping, indicating and reaching hand ges-
di erent degrees of armation taken from [30]
tures taken from [30] 3
2.2 Postures 2.1.3 Head Gestures
Postures are highly correlated with sp eech. We usu-
ally use the p ostures as interpretative to ols to under-
The head can b e a very e ective gesturing to ol. The
stand the sp eech and we don't allow ourselves to b e
face is one of the most imp ortant parts in computer
in uenced bywords whichmay b e quite at variance
animation. It can b e divided into three zones: 1 the
with what is b eing \said" in the silent p ostures. In
forehead and eyes; 2 the nose and upp er check; 3
our gesticulation system, to avoid having the words
the mouth, jaw, and lower cheeks [22]. The eyes in
discounted, a virtual human agent usually adopts a
turn have three comp onents | the eyeballs, the eye-
neutral p osture | standing up straight with b oth feet
lids, and the eyebrows. In [23] 405 combinations of
slightly apart and rmly planted on the o or, and
these comp onents alone are listed. When these uses
should adopt an orientation and eye gaze facing to the
are combined with expressions of the mouth, and the
audience.
attitudes or p osition of the head | the p ossible combi-
Postures are also highly correlated with gestures. nations are almost b eyond computing. Again, we fo cus
Within a sequence of movements a small gesture, such
our attention on those head gestures that are related
as waving and smiling, maybevery signi cant, but
to the spatialization and selection problems.
it is also signi cant as a part of the whole b o dy. Ges-
Di erent from arm gestures and hand gestures, head
tures need p ostures as a background [22, 9, 14]. On the
gestures are employed more selectively.For example,
other hand, p ostures almost always have gestures go-
a 10-year-old gestures elab orately using arms or hands
ing on around them. Together gesturing and p osturing
while he is talking as if, as Freedman puts it [17], \he
make up the pro cess of movement. Postural semantics
surrounds himself with a visual, p erceptual and imag-
has received very little systematic attention in virtual
istic asp ect of his message." On the other hand, the
human research. Lamb and Watson [22] note that p os-
head gestures are used very selectively, usually only in
ture is an individual characteristic, and is highly in u-
relation to sp eci c words, with which the head gestures
enced by the conventions of the so ciety. DeWall et al.
are highly co ordinated.
1992 provide metho ds improving sitting p ostures of
CAD/CAM workers. Ankrum 1997 rep orts the in-
Delsarte [30]gave 9 p ositions or attitudes of head
terrelationships b etween gaze angle and neck p osture.
gestures combined with eyes as shown in Figure 3,
Tsukasa Noma 1997 uses p osture as a visual aid to
whichwe think can b e acted as a set of representative
presentation. But in none of these e orts is the inter-
of head gestures that help express abstract concepts
dep endence b etween gestures and p ostures addressed.
gesturally.
In our gesticulation system we implemented twoof
the p ostures given by Delsarte: b oth are related to
standing and may b e either merged with, or segregated
from, various gestures.
2.3 Lo comotion
In order to expressively interpret an abstract concept,
an agentoravatar mightinteract with an ob ject which
visually corresp onds to the concept b eing interpreted.
The interaction includes detecting, orienting to, lo cat-
ing, reaching, and p ointing to a visual ob ject. It can b e
argued, though, that these interactions can b e distin-
guished according to the spatial eld in which they
o ccur. In fact, these interactions can o ccur either in
immediate surroundings in which reaching, indicating
or grasping is achieved without lo comotion, or in the
visual eld outside of direct reaching and grasping.
Therefore, to interact with a target ob ject, an agent
Figure 3: Head gestures combined with eyes taken
or avatar must determine if she is within a suitable dis-
from [30]
tance from the target. Otherwise, she must rst walk
to an action-dep endent p osition and orientation pre-
action b efore the initiation of the sp eci ed action. Af- 4
assigned to individual nets: WalkNet, ArmNet, Hand- ter completing the action, she must decide if she needs
Net, FaceNet, SeeNet and Sp eakNet. All these nets to walk to the next action-dep endent p osition and ori-
are organized in a hierarchical way. The structure of entation p ost-action. Also, she must keep in mind an
the nets is shown in Figure 4. It makes the inter- explicit list of ob jects to b e avoided during the lo co-
action b etween agents/avatars [8] and synchronization motion pro cess. Such decision-making and walking are
of movements relatively easy, b ecause the action gen- co ordinated by PaT-Nets.
erator ParserNet is not involved in directly assign-
ing joint angles to the whole b o dy: instead it sends
messages to designate individual nets to do the job,
3 The Underlying Co ordination
hence its main function is co ordination. For example,
Mo del
to move a hand, ParserNet do es not need to directly
assign joint angles. All it needs to do is to send a mes-
3.1 Co ordination via PaT-Nets
sage to the GestureNet, which in turn sends a message
to the HandNet. Then the HandNet moves the joints
Using traditional animation techniques, human b ehav-
dep ending on the timing and joint angles in the mes-
ior is de ned as a set of linear sequences which are
sage. This co ordination can b e applied to the game
determined in advance. During motion transitions, a
of \Hide and Seek" [4], two p erson animated conver-
motion generator has to monitor the whole transition
sation [8], simulated emergency medical care [11], and
from the current motion to the next one [3]. This gives
TV presenter or weatherman [25].
the animator great control over the lo ok and feel of
the animation. Anyone who go es to the movies can
elous synthetic characters such as aliens, Mar-
see marv STParser
tians, etc. However, all these characters are created
typically for one scene or one movie and are not meant
to b e re-used [1, 26]. Should the same techniques b e
SpeakNet GestureNet
used in virtual humans, it would greatly limit their au-
tonomy, individuality, and therefore b elievability.
Some researchers have attempted to get around this
ArmNet HandNet
y breaking the animation down into smaller problem b WalkNet SitNet HeadNet
(R/L) (R/L)
linear sequences and then switching b etween them con-
tingent up on user input. So the main concern is deal-
ing with the transitions b etween these sequences. The
SeeNet FaceNet
simplest approach is to ignore the transition and simply
jump from one motion to the next. This works in situ-
ations where fast transitions are exp ected, but app ears
Figure 4: PaT-Nets for gesticulation b ehaviors
jerky and unnatural when b eing applied to virtual hu-
mans. Another approachistohave the b eginning and
ending in the same standard p osture, thus eliminat-
ing the instantaneous jump. While this approach of-
fers smo oth continuous motion, b eginning and ending
3.2 PaT-Nets
each motion in the same still p osture is very unnatu-
ral: each time the b o dy needs to return to a \neutral"
PaT-Nets Parallel Transition Networks are nite
generic intermediate p osture b efore the next motion
state machines that can execute motions e ectively in
can b egin. Moreover, the transitions b etween motions
parallel. The original PaT-Nets were implemented in
need to b e de ned for every pair of motions in advance.
lisp byWelton Becket [3]. In order to maximize real-
In NYU's Improv Pro ject [26] they prop osed a tech-
time animation control, Tsukasa Noma re-implemented
nique called motion blending to automatically gener-
the PaT-Nets in C++, with further mo di cations
ate smo oth transitions b etween isolated motions with-
made by Sonu Chopra. Each class of PaT-Nets is de-
out jarring discontinuities or the need to return to a
ned as a derived class of the base class LWNet, which
\neutral" p ose. But the motion generator still needs to
ightWeightPaT-Nets. They have the fol- stands for L
assign joint angles to the whole b o dy. In some sophisti-
lowing prop erties:
cated scenarios where agent and avatars are engaging in
some complex b ehaviors and interactions, this b ecomes Two or more PaT-Nets can b e simultaneously ac-
ine ective. Using PaT-Nets, groups of b o dy parts are tive. 5
Two or more no des can b e simultaneously active Speak
Node
in a PaT-Net. It enables us to represent simple
JOIN Normal aT-Net. parallel execution of actions in a single P Walk PAL Node
Node Node Node
PaT-Nets can call for actions and make state tran-
Gesture Normal PointAt
sitions either conditionally or probabilistically.
Node Node Node
All activePaT-Nets are maintained on a list called
the LWNetList. This list is scanned every clo ck
Figure 5: A PaT-Nets example: walking, sp eaking and
tick.
p ointing
Jack commands can b e invoked within PaT-Nets
to manipulate any Jack data structure.
hand gestures. During the animation, the virtual hu-
Currently PaT-Nets supp ort 9 di erent no de
man agentwalks around the ro om and p oints out some
typ es: Normal, Call, PAL, Join, Indy, Kldp, Moni-
interesting ob jects such as table, do or, red chair, yel-
tor, Exit and Halt. Normal no de is used to execute
lowchair, etc. We do not yet deal with automatically
an action and Call no de is used to call a function.
recognizing the ob jects in the virtual environment; in-
The action/call is preceded by a pre-action and suc-
stead as a pre-pro cessing step we asso ciate sites in the
ceeded by a p ost-action. Transition to one of a set
co ordinate system with the ob jects. Then he walks
of p ost-actions dep ends on the action's b o olean func-
to the front scene and demonstrates some arm gestures
tion or the p ointer returned by the call function. All
and hand gestures Figures 6 and 7.
the no des spawned from PAL no de should b e done
Animations are generated in real-time 30 frames
in parallel. Join/Indy/Kldp no des link the spawned
p er second. For voice output, we use an En-
no des: the di erences are that the Join no de waits
TM
tropic Research Lab oratory T r ueT al k TTS Text-
for all spawned no des to b e nished b efore moving on
To-Sp eech system [32] running on an SGI Indigo2.
to the next no de; the Indy no de moves to the next
The gesture movements are controlled by PaT-Nets
as so on as the rst spawned no de is done and leaves
to coincide with the utterance of the sp eech.
the remaining spawned no des untouched; and Kldp is
similar but kills the remaining spawned no des. The
Join/Indy/Kldp no des make synchronization p ossible.
5 Conclusions
The Monitor no de checks the monitor condition ev-
ery clo ck tick and activates the monitor action when-
We discussed a virtual human gesticulation system
ever the condition is evaluated true. The Halt no de
where typical gestures correlated with sp eech are used
simply terminates the currentPatNet no de, but the
to select and pro duce gesticulation in real time. We
Exit no de removes the currentPatNet from the active
also investigated the Spatialization and Selection prob-
LWNetList. For example, in the movements shown in
lems and prop osed a representative mapping from con-
Figure 5, the Walk no de is rst executed. Then the
cepts to gestures such that they are selected based on
PAL no de spawns a Sp eak no de and a series of sequen-
stylized rhetorical sp eaking. An underlying co ordina-
tial actions de ned by a Gesture no de, a Normal no de,
tion mechanism called PaT-Nets is employed to select
andaPointAt no de. The Sp eak no de should b e run
and spatialize various gestures asso ciated with sp eech
simultaneously with the sequential actions. Basically
and language.
this is walking followed by sp eech and a p ointing ges-
In our current implementation, there is still much
ture in parallel.
work to do for the near future:
Add more no des to FaceNet to improve the facial
4 Results
expression and mimic the mouth movements more
precisely during sp eech.
We implemented the gesticulation system on an SGI
Onyx/RealityEngine. In the current implementation,
Add more gestures/movements which are neces-
PaT-Nets are extended to contain twelve di erent
sary in a dialogue structure, and environment- and
nets that can b e running simultaneously. The motion
ob ject-sensitiveinteraction.
generator ParserNet contains 66 no des to synchro-
Transp ort all gestures/movements to JackMOO nize di erentmovements: now it can supp ort up to
[31] to expand the scop e and range of human ac-
2 p ostures, 3 head gestures, 12 arm gestures and 12 6
tions that an avatar must p ortrayinaweb-based
virtual environment.
6 Acknowledgments
The authors would like to thank Sonu Chopra, Rama
Bindiganavale and Pei-Hwa Ho for discussions and
technical supp ort. This research is partially supp orted
by U.S. Air Force through Delivery Orders 8 and
17 on F41624-97-D-5002; Oce of Naval Research
through Univ. of Houston K-5-55043/3916-1552793,
DURIP N0001497-1-0396, and AASERTs N00014-97-
1-0603 and N0014-97-1-0605; Army Research Lab
HRED DAAL01-97-M-0198; DARPA SB-MDA-97-
2951001 through the Franklin Institute; NSF IRI95-
04372; NASA NRA NAG 5-3990; National Institute
of Standards and Technology 60 NANB6D0149 and 60
NANB7D0058; SERI Korea, and JustSystem Japan.
References
[1] N. Badler. Real-time Virtual Humans. Paci c Graph-
ics, 1997.
[2] N. Badler. Virtual Humans for Animation, Er-
gonomics, and Simulation . IEEE Wrkshp. on Non-
Rigid and Articulated Motion. Puerto Rico, June 1997.
Figure 6: Examples: arm gestures by Jack
[3] N. Badler, C. Phillip s, and B. Webb er. Simulating Hu-
mans: Computer Graphics, Animation, and Control.
Oxford University Press, New York, 1993.
[4] N. Badler, B. Webb er, W. Becket, C. Geib, M. Mo ore,
C. Pelachaud, B. Reich and M. Stone. Planning for an-
imation. Computer Animation N.Magnenat-Thalmann
and D.Thalmann, editors. Prentice-Hall, 1996.
[5] R. Boulic, P. Becheiraz, L. Emering, and D. Thalmann.
Integration of motion control techniques for virtual hu-
man and avatar real-time animation. pp. 111-118 Pro c.
VRST '97,ACM Press, 1997.
[6] A. Bruderlin and L. Willia ms. Motion signal pro cess-
ing. Pro c. SIGGRAPH 1995, pp. 97-104.
[7] J. Bulwer, Chirologia: Or the natural language of hand
and Chironomia: or the manual art of rhetoric. South-
ern Illinois University Press, Carb ondale, IL. 1994.
[8] J. Cassell, C. Pelachaud, N. Badler, M. Steedman, B.
Achorn, W. Becket, B. Douville, S. Prevost, and M.
Stone. Animated conversation: Rule-based generation
of facial expression, gesture and sp oken intonation for
multiple conversational agents. In Computer Graphics,
Annual Conf. Series, pp. 413-420. ACM, 1994
[9] M. Cranach and I. Vine. Expressive Movement and
Figure 7: Examples: hand gestures by Jack
Non-verbal Communication, Academic Press, London,
1975. 7
[10] D. Chi. Animating expressivity through e ort ele- [26] K. Perlin and A. Goldb erg. Improv: A System for
ments. PhD Dissertation, in progress, Universityof Scripting Interactive Actors in Virtual Worlds. Pro c.
Pennsylvania , 1998. SIGGRAPH 1996, pp. 205-216.
[11] D. Chi, B. Webb er, J. Clarke, and N. Badler. Casu-
[27] F. Quek. Toward a vision-based hand gesture interface.
alty mo deling for real-time medical training. Presence,
Proceedings of the Virtual Reality System Technology
54:359-366, 1995.
Conference, pp. 17-29, August 23-26, 1994, Singap ore.
[12] M. Cohen and D. Massaro. Mo deling coarticulation
[28] C. Rose, B. Guenter, B. Bo denheimer and M. Cohen.
in synthetic visual sp eech. In M. Thalmann and D.
Ecient generation of motion transitions using space-
Thalmann, eds., Models and Techniques in Computer
time constraints. In ACM Computer Graphics, Annual
Animation,Tokyo, 1993. Springer-Verlag.
Conf. Series, pp. 147-154, 1996.
[13] Y. Cui, D. Swets and J. Weng. Learning-based hand
[29] H. Russell, The gesture language of the Hindu dance,
sign recognition using SHOSLIF-M. Wrkshp. on Inte-
New York, B. Blom [1964, c1941]
gration of GestureinLanguage and Speech, '96.
[30] T. Shawn. Every little movement { a b o ok ab out Del-
[14] M. Davis, Understanding b o dy movement, an anno-
sarte. M. Witmark & Sons, 1954
tated bibliography. New York, Arno Press, 1972.
[31] T.J. Smith, J. Shi, J. Granieri, and N. Badler Jack-
[15] B. Douville, L. Levison, and N. Badler. Task level ob-
MOO: A web-based system for virtual human simula-
ject grasping for simulated agents. Presence 54 pp.
tion, WebSim Conference, 1998.
416-430, 1996
[32] TrueTalk Programmer's Manual, 1995. Entropic Re-
[16] R. Foulds and A. Moynahan. Computer Recognition
search Lab oratory.
of the gestures of p eople with disabiliti es. Wrkshp.
[33] M. Unuma, K. Anjyo, and R. Takeuchi. Fourier princi-
on Integration of GestureinLanguage and Speech '96,
ples for emotion-based human gure animation. Pro c.
Wilmington , DE, USA.
SIGGRAPH 1995, pp. 91-96.
[17] N. Freedman. Hands, words and mind: on the struc-
[34] M. Wiener, S. Devo e, S. Rubinow and J. Geller. Non-
turalization of b o dy movements during discourse and
verbal b ehavior and nonverbal communication. Psy-
the capacity for verbal representation, pp. 110-132.
chological Review, pp. 185-210.
Communicative Structures and Psychic Structures:
A Psychoanalytic Approach. New York and London:
Plenum Press, 1977.
[18] W. Freedman and C. Weissman. Television control by
hand gestures IEEE Intl. Wkshp. on Automatic Face
and GestureRecognition, Zurich, June, 1995
[19] J. Gourret, N. Magnenat-Thalmann, and D. Thal-
mann, Simulation of ob ject and human skin defor-
mations in a grasping task. ACM Computer Graphics
233 pp. 21-30,1989
[20] A. Kendon. Current issues in the study of gesture. The
Biological Foundations of Gestures: Motor and Semi-
otic aspects, pp. 23-47. Lawrence Erlbaum Asso ciates,
Hillsdale, NJ. 1986.
[21] A. Kendon. Gesticulation and sp eech: Two asp ects of
the pro cess of utterance. In M.R.KeyEd., The rela-
tionship of verbal and nonverbal communication The
Hague: Mouton Publishers, 1980.
[22] W. Lamb and E. Watson. Body Code: The Meaning in
Movement. Routledge & Kegan Paul Ltd. pp. 85-99.
1979.
[23] Steele Mackaye. Harmonic Gymnastics and Pan-
tomimic Expression. edited by Marion Lowell M.
Witmark & Sons, New York, 1963.
[24] D. McNeill The conceptual basis of language. Hillsdale ,
NJ: Lawrence Erlbaum Asso c, 1996.
[25] T. Noma and N. Badler. A virtual human presenter.
In IJCAI '97 Workshop on Animated InterfaceAgents,
Nagoya, Japan, 1997. 8