University of Pennsylvania ScholarlyCommons

Center for Human Modeling and Simulation Department of Computer & Information Science

October 1998

Gesticulation Behaviors for Virtual Humans

Liwei Zhao University of Pennsylvania

Norman I. Badler University of Pennsylvania, [email protected]

Follow this and additional works at: https://repository.upenn.edu/hms

Recommended Citation Zhao, L., & Badler, N. I. (1998). Gesticulation Behaviors for Virtual Humans. Retrieved from https://repository.upenn.edu/hms/21

Copyright 1998 IEEE. Reprinted from Sixth Pacific Conference on Computer Graphics and Applications, 1998, pages 161-168. Publisher URL: http://dx.doi.org/10.1109/PCCGA.1998.732100

This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of the University of Pennsylvania's products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to [email protected]. By choosing to view this document, you agree to all provisions of the copyright laws protecting it.

This paper is posted at ScholarlyCommons. https://repository.upenn.edu/hms/21 For more information, please contact [email protected]. Gesticulation Behaviors for Virtual Humans

Abstract Gesture and are two very important behaviors for virtual humans. They are not isolated from each other but generally employed simultaneously in the service of the same intention. An underlying PaT-Net parallel finite-state machine may be used to coordinate them both. Gesture selection is not arbitrary. Typical movements correlated with specific textual elements are used to select and produce gesticulation online. This enhances the expressiveness of speaking virtual humans.

Keywords virtual human, agent, avatar, gesture, posture, PaT-Nets

Comments Copyright 1998 IEEE. Reprinted from Sixth Pacific Conference on Computer Graphics and Applications, 1998, pages 161-168. Publisher URL: http://dx.doi.org/10.1109/PCCGA.1998.732100

This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of the University of Pennsylvania's products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to [email protected]. By choosing to view this document, you agree to all provisions of the copyright laws protecting it.

This conference paper is available at ScholarlyCommons: https://repository.upenn.edu/hms/21

Gesticulation Behaviors for Virtual Humans

Liwei Zhao and Norman I. Badler

Center for Human Mo deling and Simulation

Department of Computer and Information Science

UniversityofPennsylvania

Philadelphia PA 19104-638 9 USA

[email protected] enn.edu, [email protected] enn.edu

1-215-898-5 862 phone; 1-215-573- 745 3 fax

Abstract  Beats are small formless waves of the that

o ccur with heavily emphasized words, o ccasions

Gesture and speech are two very important behaviors

of turning over the o or to another sp eaker, and

for virtual humans. They are not isolatedfrom each

other kinds of sp ecial linguistic work.

other but general ly employed simultaneously in the ser-

While Cassell's system implemented instances of each

vice of the same intention. An underlying PaT-Net

typ e of gesture, the most prevalentwere iconics linked

paral lel nite-state machine may be usedtocoordinate

to mentions of sp eci c ob jects, metaphorics linked to

them both. Gesture selection is not arbitrary. Typi-

sp eci c actions, and b eats linked to sp eechintonation.

cal movements correlated with speci c textual elements

Following Cassell's lead, new problems in gesture

are used to select and produce gesticulation online.

generation were exp osed.

This enhances the expressiveness of speaking virtual hu-

mans.

1. Coarticulation: Generating a smo oth transition

Keywords: Virtual Human, Agent, Avatar, Gesture,

from one gesture to the next without returning

Posture, PaT-Nets

to a sp eci c rest p ose.

2. Expression: Mo difying the p erformance of a ges-

ture to re ect the agent's manner or p ersonality.

1 Intro duction

3. Spatialization: Integrating a deictic gesture into

the surrounding context.

The past few years have seen several research e orts on

human gestures i.e. [1,2, 5, 8, 27, 18, 16, 13]. Many

4. Selection: Generating a metaphoric that mightbe

of these pro jects have fo cused on interpreting human

asso ciated with an abstract concept.

gestures for interactive control. Creating appropriate

Problem 1, coarticulation, has b een addressed bya

gestures in a virtual human has not b een as well studied

numb er of computer graphics researchers [8,12, 28,

b ecause the range of gestures p erformed during sp eech

26], although the issue has other asp ects suchas

output is much larger than a symb olic selection set used

preparatory actions which remain unsolved. Prob-

for discrete inputs. For example, in [8] four gesture

lem 2, expression, is b eing investigated at a number

typ es are distinguished:

of places [6, 33, 10]. In this pap er, weinvestigate

problems 3 and 4. Of these two, spatialization is eas-

 Iconics represent some concrete feature of the ac-

ier, since the desired gesture is combined or comp os-

companying sp eech, such as an ob ject's shap e.

ited with inverse kinematics to p oint or align the ges-

turing b o dy part with the spatial referent. Selection

 Metaphorics represent an abstract feature concur-

entails determining gestures that p eople would likely

rently sp oken ab out.

interpret and accept as \natural" and \representative."

These concepts are orthogonal: a naturally p erformed  Deictics indicate a p oint in space, and may refer to

motion captured gesture might not b e appropriate to p ersons, places and other spatializeable discourse

entities.

the sp eech text, while a synthesized less natural arm

motion might nevertheless b e representative of the ex- tures and sp eech. Kendon [20] o ers a distinction b e-

pressed concepts. tween autonomous gestures gestures p erformed with-

out accompanying sp eech and gesticulation gesture

The selection problem itself splits into two: one is

p erformed concurrently with phonological .

the creation of the gestural motion and the other is the

Gestures and sp eech are closely asso ciated together.

mapping from the textual content to the gesture. For

They are generally employed simultaneously in the ser-

example, to create a character waving hello during a

vice of the same intention. Well-co ordinated gestures

, one has to create the waving motion as well

and sp eech enhance the expressiveness and b elievabil-

as know when to invoke it up on encountering a greet-

ity of sp eaking virtual humans. In this pap er, we re-

ing context. In this work we assume that the motions

strict our investigation to gesticulation.

themselves are generated byinverse kinematics, mo-

tion capture, or otherwise pre-created e.g. key p ose

sequences. Our contribution lies in prop osing a repre-

2.1 Gestures

sentative mapping from concepts to gestures such that

they are selected based on stylized rhetorical sp eaking.

The study of gestures in dance and oratory may date

To select and spatialize various gestures correlating

back to the b eginning of seventeenth century [7]. More

sp eech and language, we use an underlying co ordina-

recently, semioticists from the elds of anthrop ology,

tion scheme called PaT-Nets [3]. The virtual human

neurophysiology, neuropsychology and psycholinguis-

1

animation is implemented as an extensions to Jack .

tics Freedman [17]; Wiener, Devo e, Rubinow and

The inputs see b elow to the system are in the form of

Geller [34]; McNeill and Levy [24] have b een inter-

sp eech texts with emb edded commands, most of which

ested in the study of gestures. The Lexis dictionary

are related to gestures. The gestures are controlled

1977 gives the most general de nition of gesture |

by PaT-Nets to coincide with the utterance of the

\movements of b o dy parts, particularly the arms, the

sp eech. While the emb edded commands in our exam-

or the head conveying, or not conveying, mean-

ples are manually inserted for now, the idea is to detect

ing."

the presence of the corresp onding concepts in the raw

While gestures are the \little" movements that are

text stream and automatically insert the deictics and

con ned to a part or parts of the b o dy, if just con-

metaphorics based solely on the words used.

sidered in isolation they havevery limited contribu-

tion to make to non-verbal . Emblems

warning welcome. Hello, ngest

nhead front Currently, I can supp ort following basic arm gestures.

and manual languages, such as American Lan-

Now let me intro duce you some simple ob jects I know:

idxfftable.table.cornerg this is a table np oint

guage, are exceptions b ecause the communication is

np oint idxffdo or.do or.panelg this is a do or

idxffchair1.chair.redg this is red chair np oint

fully b orne bymovements. Gestures are rarely p er-

np oint idxffchair0.chair.yellowg this is yellowchair

slant right Let me showyou the basic arm gestures nhead

formed outside a communicative context and only o cca-

ngest arm reject arm reject gesture

ngest arm unlikely arm unlikely gesture

sionally transmit any depth of emotion or information,

arm not arm not gesture ngest

ngest arm improbable arm improbable gesture

since, as so on as there is any complicated meaning, the

ngest arm doubtful arm doubtful gesture

arm probable arm probable gesture ngest gestures can only b e \read" in relation to the whole

ngest arm tis arm it is gesture

expressivemovement of the b o dy [14,9,22].

arm certain arm certain gesture ngest

ngest arm obvious arm obvious gesture

Most of the current research in gestures is related

arm enchanting arm enchanting gesture ngest

ngest arm absolute arm absolute gesture

to computer vision, human-computer interaction, and

Next, let me showyou some hand gestures:

nhand convulsivefplane0.plane.stand1g convulsive hand gesture

pattern recognition, where the gestures are mainly

expandedfplane0.plane.stand1g expanded hand gesture nhand

nhand exasp erationfplane0.plane.stand1g exasp eration hand gesture

studied in isolation [27, 18, 16,13]. However, gestures

authorityfplane0.plane.stand1g authority hand gesture nhand

nhand relaxedfplane0.plane.stand1g relaxed hand gesture

used by an agentoravatar in a virtual environment

nhand exalationfplane0.plane.stand1g exalation hand gesture

conflictfplane0.plane.stand1g conflict hand gesture nhand

are quite di erent. First, it is a pro cess, not a xed

nhand prostrationfplane0.plane.stand1g hand gesture

nhead slant left

p osture. For example, when someone waves a hand, it

Finally, I can supp ort following basic general gestures:

reject reject gesture ngest

is not the nal p osition of the hand which is the prop er

give and take gesture ngest givetake

warning gesture ngest warning

ob ject of study, but the process by which it got there |

good bye

the actual pro cess of movement. Secondly, it is almost

always accompanied by other gestures or communica-

tivechannels.

2 Gesticulation

In the following we study arm, hand, and head ges-

tures. Ab ove all, we recognize that gesticulation has its

An agentoravatar mayhave a wide varietyofmove-

limitations. The interpretation might b e b oth cultur-

ment b ehaviors, but we fo cus our attention on ges-

ally oriented and individually biased. Personality and

1

Jack is a software pro duct from Transom Technologies, Inc.

so cial context may constrict or amplify the motions. 2

But in general we seek to set a baseline of gesticula- 2.1.2 Hand Gestures

tory b ehavior which can then b e parameterized and

The hand is the most uent and articulate part of the

mo di ed by other means.

body, capable of expressing almost in nite meanings.

Hand gesture languages have b een invented by commu-

2.1.1 Arm Gestures

nicative needs and by the deaf communities of various

cultures. The classic gesture languages of the Hindu

Human arms serve at least two basic separate functions

dance contains ab out 57,000 cataloged hand p ositions

[1]: they allow an agent/avatar to change the lo cal envi-

eachhaving the sp eci c value of a word or explicit and

ronment through dextrous movements by reaching for

distinct meaning [29]. It is virtually imp ossible to im-

and grasping ob jects [19, 15]; and serve so cial interac-

plement all these hand gestures. In this pap er, we in-

tion functions by augmenting the sp eechchannel with

vestigate the selection problem. So we fo cus on the

communicativeemblems, gestures and b eats [8].

hand gestures which can easily generate a metaphoric

Awell-p erformed arm gesture, accompanied by

asso ciated with an abstract concept. In the system,

prop er hand gestures, plays an imp ortant role in inte-

the virtual human agent attempts to use hand ges-

grating some deictic gestures into the surrounding con-

tures that are more selective and which are much more

text spatialization problem and re ecting the agent's

closely co ordinated with what is b eing said in words.

manner or p ersonality to some extentexpression prob-

For example, when attempting to o er a de nition of

lem. For example, in [30]itwas noted that arm ges-

aword such as \write," the agentmay pantomime the

tures with di erent inclinations indicate di erent de-

writing action while vo calizing the verbal de nition [8].

grees of armation | from 0 straightdown to 45

Delsarte [30] provided a small set of stereotypi-

degrees indicates neutral, timid, cold; from 45 degrees

cal hand gestures correlated with grasping, indicating,

to 90 degrees, expansive and warm; and from 90 de-

p ointing, and reaching illustrated in Figure 2. We

grees to 180 degrees, enthusiastic see Figure 1. We

implemented all these hand gestures and they can b e

implemented this series of stereotypical arm gestures

p erformed either by left or by right hand, with pref-

as a representative metaphorical mapping from af-

erence for the right hand under default circumstances.

rmation concepts to gestures such that they can b e

Toavoid crossing the arm over the b o dy and to keep

correlated with the degree of armation in a sp eech.

the b o dy p osture op en, the nearer hand to the target

ob ject is always used. In addition, every hand ges-

ture is co ordinated with head and eye orientation, arm

gestures, and vo calization, all of which are employed si-

multaneously in the service of interpreting an abstract

concept.

Figure 1: Arm gestures with di erent degrees indicate

Figure 2: Grasping, indicating and reaching hand ges-

di erent degrees of armation taken from [30]

tures taken from [30] 3

2.2 Postures 2.1.3 Head Gestures

Postures are highly correlated with sp eech. We usu-

ally use the p ostures as interpretative to ols to under-

The head can b e a very e ective gesturing to ol. The

stand the sp eech and we don't allow ourselves to b e

is one of the most imp ortant parts in computer

in uenced bywords whichmay b e quite at variance

animation. It can b e divided into three zones: 1 the

with what is b eing \said" in the silent p ostures. In

forehead and eyes; 2 the nose and upp er check; 3

our gesticulation system, to avoid having the words

the mouth, jaw, and lower cheeks [22]. The eyes in

discounted, a virtual human agent usually adopts a

turn have three comp onents | the eyeballs, the eye-

neutral p osture | standing up straight with b oth feet

lids, and the eyebrows. In [23] 405 combinations of

slightly apart and rmly planted on the o or, and

these comp onents alone are listed. When these uses

should adopt an orientation and eye gaze facing to the

are combined with expressions of the mouth, and the

audience.

attitudes or p osition of the head | the p ossible combi-

Postures are also highly correlated with gestures. nations are almost b eyond computing. Again, we fo cus

Within a sequence of movements a small gesture, such

our attention on those head gestures that are related

as waving and smiling, maybevery signi cant, but

to the spatialization and selection problems.

it is also signi cant as a part of the whole b o dy. Ges-

Di erent from arm gestures and hand gestures, head

tures need p ostures as a background [22, 9, 14]. On the

gestures are employed more selectively.For example,

other hand, p ostures almost always have gestures go-

a 10-year-old gestures elab orately using arms or hands

ing on around them. Together gesturing and p osturing

while he is talking as if, as Freedman puts it [17], \he

make up the pro cess of movement. Postural semantics

surrounds himself with a visual, p erceptual and imag-

has received very little systematic attention in virtual

istic asp ect of his message." On the other hand, the

human research. Lamb and Watson [22] note that p os-

head gestures are used very selectively, usually only in

ture is an individual characteristic, and is highly in u-

relation to sp eci c words, with which the head gestures

enced by the conventions of the so ciety. DeWall et al.

are highly co ordinated.

1992 provide metho ds improving sitting p ostures of

CAD/CAM workers. Ankrum 1997 rep orts the in-

Delsarte [30]gave 9 p ositions or attitudes of head

terrelationships b etween gaze angle and neck p osture.

gestures combined with eyes as shown in Figure 3,

Tsukasa Noma 1997 uses p osture as a visual aid to

whichwe think can b e acted as a set of representative

presentation. But in none of these e orts is the inter-

of head gestures that help express abstract concepts

dep endence b etween gestures and p ostures addressed.

gesturally.

In our gesticulation system we implemented twoof

the p ostures given by Delsarte: b oth are related to

standing and may b e either merged with, or segregated

from, various gestures.

2.3 Lo comotion

In order to expressively interpret an abstract concept,

an agentoravatar mightinteract with an ob ject which

visually corresp onds to the concept b eing interpreted.

The interaction includes detecting, orienting to, lo cat-

ing, reaching, and p ointing to a visual ob ject. It can b e

argued, though, that these interactions can b e distin-

guished according to the spatial eld in which they

o ccur. In fact, these interactions can o ccur either in

immediate surroundings in which reaching, indicating

or grasping is achieved without lo comotion, or in the

visual eld outside of direct reaching and grasping.

Therefore, to interact with a target ob ject, an agent

Figure 3: Head gestures combined with eyes taken

or avatar must determine if she is within a suitable dis-

from [30]

tance from the target. Otherwise, she must rst walk

to an action-dep endent p osition and orientation pre-

action b efore the initiation of the sp eci ed action. Af- 4

assigned to individual nets: WalkNet, ArmNet, Hand- ter completing the action, she must decide if she needs

Net, FaceNet, SeeNet and Sp eakNet. All these nets to walk to the next action-dep endent p osition and ori-

are organized in a hierarchical way. The structure of entation p ost-action. Also, she must keep in mind an

the nets is shown in Figure 4. It makes the inter- explicit list of ob jects to b e avoided during the lo co-

action b etween agents/avatars [8] and synchronization motion pro cess. Such decision-making and walking are

of movements relatively easy, b ecause the action gen- co ordinated by PaT-Nets.

erator ParserNet is not involved in directly assign-

ing joint angles to the whole b o dy: instead it sends

messages to designate individual nets to do the job,

3 The Underlying Co ordination

hence its main function is co ordination. For example,

Mo del

to move a hand, ParserNet do es not need to directly

assign joint angles. All it needs to do is to send a mes-

3.1 Co ordination via PaT-Nets

sage to the GestureNet, which in turn sends a message

to the HandNet. Then the HandNet moves the joints

Using traditional animation techniques, human b ehav-

dep ending on the timing and joint angles in the mes-

ior is de ned as a set of linear sequences which are

sage. This co ordination can b e applied to the game

determined in advance. During motion transitions, a

of \Hide and Seek" [4], two p erson animated conver-

motion generator has to monitor the whole transition

sation [8], simulated emergency medical care [11], and

from the current motion to the next one [3]. This gives

TV presenter or weatherman [25].

the animator great control over the lo ok and feel of

the animation. Anyone who go es to the movies can

elous synthetic characters such as aliens, Mar-

see marv STParser

tians, etc. However, all these characters are created

typically for one scene or one movie and are not meant

to b e re-used [1, 26]. Should the same techniques b e

SpeakNet GestureNet

used in virtual humans, it would greatly limit their au-

tonomy, individuality, and therefore b elievability.

Some researchers have attempted to get around this

ArmNet HandNet

y breaking the animation down into smaller problem b WalkNet SitNet HeadNet

(R/L) (R/L)

linear sequences and then switching b etween them con-

tingent up on user input. So the main concern is deal-

ing with the transitions b etween these sequences. The

SeeNet FaceNet

simplest approach is to ignore the transition and simply

jump from one motion to the next. This works in situ-

ations where fast transitions are exp ected, but app ears

Figure 4: PaT-Nets for gesticulation b ehaviors

jerky and unnatural when b eing applied to virtual hu-

mans. Another approachistohave the b eginning and

ending in the same standard p osture, thus eliminat-

ing the instantaneous jump. While this approach of-

fers smo oth continuous motion, b eginning and ending

3.2 PaT-Nets

each motion in the same still p osture is very unnatu-

ral: each time the b o dy needs to return to a \neutral"

PaT-Nets Parallel Transition Networks are nite

generic intermediate p osture b efore the next motion

state machines that can execute motions e ectively in

can b egin. Moreover, the transitions b etween motions

parallel. The original PaT-Nets were implemented in

need to b e de ned for every pair of motions in advance.

lisp byWelton Becket [3]. In order to maximize real-

In NYU's Improv Pro ject [26] they prop osed a tech-

time animation control, Tsukasa Noma re-implemented

nique called motion blending to automatically gener-

the PaT-Nets in C++, with further mo di cations

ate smo oth transitions b etween isolated motions with-

made by Sonu Chopra. Each class of PaT-Nets is de-

out jarring discontinuities or the need to return to a

ned as a derived class of the base class LWNet, which

\neutral" p ose. But the motion generator still needs to

ightWeightPaT-Nets. They have the fol- stands for L

assign joint angles to the whole b o dy. In some sophisti-

lowing prop erties:

cated scenarios where agent and avatars are engaging in

some complex b ehaviors and interactions, this b ecomes  Two or more PaT-Nets can b e simultaneously ac-

ine ective. Using PaT-Nets, groups of b o dy parts are tive. 5

Two or more no des can b e simultaneously active  Speak

Node

in a PaT-Net. It enables us to represent simple

JOIN Normal aT-Net. parallel execution of actions in a single P Walk PAL Node

Node Node Node

 PaT-Nets can call for actions and make state tran-

Gesture Normal PointAt

sitions either conditionally or probabilistically.

Node Node Node

 All activePaT-Nets are maintained on a list called

the LWNetList. This list is scanned every clo ck

Figure 5: A PaT-Nets example: walking, sp eaking and

tick.

p ointing

 Jack commands can b e invoked within PaT-Nets

to manipulate any Jack data structure.

hand gestures. During the animation, the virtual hu-

Currently PaT-Nets supp ort 9 di erent no de

man agentwalks around the ro om and p oints out some

typ es: Normal, Call, PAL, Join, Indy, Kldp, Moni-

interesting ob jects such as table, do or, red chair, yel-

tor, Exit and Halt. Normal no de is used to execute

lowchair, etc. We do not yet deal with automatically

an action and Call no de is used to call a function.

recognizing the ob jects in the virtual environment; in-

The action/call is preceded by a pre-action and suc-

stead as a pre-pro cessing step we asso ciate sites in the

ceeded by a p ost-action. Transition to one of a set

co ordinate system with the ob jects. Then he walks

of p ost-actions dep ends on the action's b o olean func-

to the front scene and demonstrates some arm gestures

tion or the p ointer returned by the call function. All

and hand gestures Figures 6 and 7.

the no des spawned from PAL no de should b e done

Animations are generated in real-time 30 frames

in parallel. Join/Indy/Kldp no des link the spawned

p er second. For voice output, we use an En-

no des: the di erences are that the Join no de waits

TM

tropic Research Lab oratory T r ueT al k TTS Text-

for all spawned no des to b e nished b efore moving on

To-Sp eech system [32] running on an SGI Indigo2.

to the next no de; the Indy no de moves to the next

The gesture movements are controlled by PaT-Nets

as so on as the rst spawned no de is done and leaves

to coincide with the utterance of the sp eech.

the remaining spawned no des untouched; and Kldp is

similar but kills the remaining spawned no des. The

Join/Indy/Kldp no des make synchronization p ossible.

5 Conclusions

The Monitor no de checks the monitor condition ev-

ery clo ck tick and activates the monitor action when-

We discussed a virtual human gesticulation system

ever the condition is evaluated true. The Halt no de

where typical gestures correlated with sp eech are used

simply terminates the currentPatNet no de, but the

to select and pro duce gesticulation in real time. We

Exit no de removes the currentPatNet from the active

also investigated the Spatialization and Selection prob-

LWNetList. For example, in the movements shown in

lems and prop osed a representative mapping from con-

Figure 5, the Walk no de is rst executed. Then the

cepts to gestures such that they are selected based on

PAL no de spawns a Sp eak no de and a series of sequen-

stylized rhetorical sp eaking. An underlying co ordina-

tial actions de ned by a Gesture no de, a Normal no de,

tion mechanism called PaT-Nets is employed to select

andaPointAt no de. The Sp eak no de should b e run

and spatialize various gestures asso ciated with sp eech

simultaneously with the sequential actions. Basically

and language.

this is walking followed by sp eech and a p ointing ges-

In our current implementation, there is still much

ture in parallel.

work to do for the near future:

 Add more no des to FaceNet to improve the facial

4 Results

expression and mimic the mouth movements more

precisely during sp eech.

We implemented the gesticulation system on an SGI

Onyx/RealityEngine. In the current implementation,

 Add more gestures/movements which are neces-

PaT-Nets are extended to contain twelve di erent

sary in a dialogue structure, and environment- and

nets that can b e running simultaneously. The motion

ob ject-sensitiveinteraction.

generator ParserNet contains 66 no des to synchro-

 Transp ort all gestures/movements to JackMOO nize di erentmovements: now it can supp ort up to

[31] to expand the scop e and range of human ac-

2 p ostures, 3 head gestures, 12 arm gestures and 12 6

tions that an avatar must p ortrayinaweb-based

virtual environment.

6 Acknowledgments

The authors would like to thank Sonu Chopra, Rama

Bindiganavale and Pei-Hwa Ho for discussions and

technical supp ort. This research is partially supp orted

by U.S. Air Force through Delivery Orders 8 and

17 on F41624-97-D-5002; Oce of Naval Research

through Univ. of Houston K-5-55043/3916-1552793,

DURIP N0001497-1-0396, and AASERTs N00014-97-

1-0603 and N0014-97-1-0605; Army Research Lab

HRED DAAL01-97-M-0198; DARPA SB-MDA-97-

2951001 through the Franklin Institute; NSF IRI95-

04372; NASA NRA NAG 5-3990; National Institute

of Standards and Technology 60 NANB6D0149 and 60

NANB7D0058; SERI Korea, and JustSystem .

References

[1] N. Badler. Real-time Virtual Humans. Paci c Graph-

ics, 1997.

[2] N. Badler. Virtual Humans for Animation, Er-

gonomics, and Simulation . IEEE Wrkshp. on Non-

Rigid and Articulated Motion. Puerto Rico, June 1997.

Figure 6: Examples: arm gestures by Jack

[3] N. Badler, C. Phillip s, and B. Webb er. Simulating Hu-

mans: Computer Graphics, Animation, and Control.

Oxford University Press, New York, 1993.

[4] N. Badler, B. Webb er, W. Becket, C. Geib, M. Mo ore,

C. Pelachaud, B. Reich and M. Stone. Planning for an-

imation. Computer Animation N.Magnenat-Thalmann

and D.Thalmann, editors. Prentice-Hall, 1996.

[5] R. Boulic, P. Becheiraz, L. Emering, and D. Thalmann.

Integration of motion control techniques for virtual hu-

man and avatar real-time animation. pp. 111-118 Pro c.

VRST '97,ACM Press, 1997.

[6] A. Bruderlin and L. Willia ms. Motion signal pro cess-

ing. Pro c. SIGGRAPH 1995, pp. 97-104.

[7] J. Bulwer, Chirologia: Or the natural language of hand

and Chironomia: or the manual art of . South-

ern Illinois University Press, Carb ondale, IL. 1994.

[8] J. Cassell, C. Pelachaud, N. Badler, M. Steedman, B.

Achorn, W. Becket, B. Douville, S. Prevost, and M.

Stone. Animated conversation: Rule-based generation

of , gesture and sp oken intonation for

multiple conversational agents. In Computer Graphics,

Annual Conf. Series, pp. 413-420. ACM, 1994

[9] M. Cranach and I. Vine. Expressive Movement and

Figure 7: Examples: hand gestures by Jack

Non-verbal Communication, Academic Press, London,

1975. 7

[10] D. Chi. Animating expressivity through e ort ele- [26] K. Perlin and A. Goldb erg. Improv: A System for

ments. PhD Dissertation, in progress, Universityof Scripting Interactive Actors in Virtual Worlds. Pro c.

Pennsylvania , 1998. SIGGRAPH 1996, pp. 205-216.

[11] D. Chi, B. Webb er, J. Clarke, and N. Badler. Casu-

[27] F. Quek. Toward a vision-based hand gesture interface.

alty mo deling for real-time medical training. Presence,

Proceedings of the Virtual Reality System Technology

54:359-366, 1995.

Conference, pp. 17-29, August 23-26, 1994, Singap ore.

[12] M. Cohen and D. Massaro. Mo deling coarticulation

[28] C. Rose, B. Guenter, B. Bo denheimer and M. Cohen.

in synthetic visual sp eech. In M. Thalmann and D.

Ecient generation of motion transitions using space-

Thalmann, eds., Models and Techniques in Computer

time constraints. In ACM Computer Graphics, Annual

Animation,Tokyo, 1993. Springer-Verlag.

Conf. Series, pp. 147-154, 1996.

[13] Y. Cui, D. Swets and J. Weng. Learning-based hand

[29] H. Russell, The gesture language of the Hindu dance,

sign recognition using SHOSLIF-M. Wrkshp. on Inte-

New York, B. Blom [1964, c1941]

gration of GestureinLanguage and Speech, '96.

[30] T. Shawn. Every little movement { a b o ok ab out Del-

[14] M. Davis, Understanding b o dy movement, an anno-

sarte. M. Witmark & Sons, 1954

tated bibliography. New York, Arno Press, 1972.

[31] T.J. Smith, J. Shi, J. Granieri, and N. Badler Jack-

[15] B. Douville, L. Levison, and N. Badler. Task level ob-

MOO: A web-based system for virtual human simula-

ject grasping for simulated agents. Presence 54 pp.

tion, WebSim Conference, 1998.

416-430, 1996

[32] TrueTalk Programmer's Manual, 1995. Entropic Re-

[16] R. Foulds and A. Moynahan. Computer Recognition

search Lab oratory.

of the gestures of p eople with disabiliti es. Wrkshp.

[33] M. Unuma, K. Anjyo, and R. Takeuchi. Fourier princi-

on Integration of GestureinLanguage and Speech '96,

ples for emotion-based human gure animation. Pro c.

Wilmington , DE, USA.

SIGGRAPH 1995, pp. 91-96.

[17] N. Freedman. Hands, words and mind: on the struc-

[34] M. Wiener, S. Devo e, S. Rubinow and J. Geller. Non-

turalization of b o dy movements during discourse and

verbal b ehavior and . Psy-

the capacity for verbal representation, pp. 110-132.

chological Review, pp. 185-210.

Communicative Structures and Psychic Structures:

A Psychoanalytic Approach. New York and London:

Plenum Press, 1977.

[18] W. Freedman and C. Weissman. Television control by

hand gestures IEEE Intl. Wkshp. on Automatic Face

and GestureRecognition, Zurich, June, 1995

[19] J. Gourret, N. Magnenat-Thalmann, and D. Thal-

mann, Simulation of ob ject and human skin defor-

mations in a grasping task. ACM Computer Graphics

233 pp. 21-30,1989

[20] A. Kendon. Current issues in the study of gesture. The

Biological Foundations of Gestures: Motor and Semi-

otic aspects, pp. 23-47. Lawrence Erlbaum Asso ciates,

Hillsdale, NJ. 1986.

[21] A. Kendon. Gesticulation and sp eech: Two asp ects of

the pro cess of utterance. In M.R.KeyEd., The rela-

tionship of verbal and nonverbal communication The

Hague: Mouton Publishers, 1980.

[22] W. Lamb and E. Watson. Body : The Meaning in

Movement. Routledge & Kegan Paul Ltd. pp. 85-99.

1979.

[23] Steele Mackaye. Harmonic Gymnastics and Pan-

tomimic Expression. edited by Marion Lowell M.

Witmark & Sons, New York, 1963.

[24] D. McNeill The conceptual basis of language. Hillsdale ,

NJ: Lawrence Erlbaum Asso c, 1996.

[25] T. Noma and N. Badler. A virtual human presenter.

In IJCAI '97 Workshop on Animated InterfaceAgents,

Nagoya, Japan, 1997. 8