Jun’ichiro Seyama* The Valley: Effect of Department of Psychology Faculty of Letters Realism on the Impression of University of Tokyo Artificial Human Faces 7-3-1 Hongo, Bunkyo-ku Tokyo 113-0033, Japan

Ruth S. Nagayama Department of Humanities and Social Sciences Abstract Shizuoka Eiwa Gakuin University Roboticists believe that people will have an unpleasant impression of a humanoid that has an almost, but not perfectly, realistic human appearance. This is called the uncanny valley, and is not limited to , but is also applicable to any

type of human-like object, such as , masks, facial caricatures, avatars in virtual Downloaded from http://direct.mit.edu/pvar/article-pdf/16/4/337/1624623/pres.16.4.337.pdf by guest on 30 September 2021 reality, and characters in movies. The present study investigated the uncanny valley by measuring observers’ impressions of facial images whose de- gree of realism was manipulated by morphing between artificial and real human faces. Facial images yielded the most unpleasant impressions when they were highly realistic, supporting the hypothesis of the uncanny valley. However, the uncanny valley was confirmed only when morphed faces had abnormal features such as bi- zarre eyes. These results suggest that to have an almost perfectly realistic human appearance is a necessary but not a sufficient condition for the uncanny valley. The uncanny valley emerges only when there is also an abnormal feature.

1 Introduction

Roboticists have attempted to construct humanoid robots whose physical appearance is indistinguishable from real humans (e.g., Kobayashi, Ichikawa, Senda, & Shiba, 2003; Minato, MacDorman, et al., 2004; Minato, Shimada, Ishiguro, & Itakura, 2004). However, Mori (1970) warned that robots should not be made too similar to real humans because such robots can fall into the “uncanny valley,” where too high a degree of human realism evokes an un- pleasant impression in the viewer (see also Norman, 2004; Reichardt, 1978). To summarize his informal observations and predictions of how a robot’s degree of realism in physical appearance (or humanlikeness) can affect a human observer’s impression of the robot, Mori introduced a hypothetical graph of the impression of pleasantness as a function of the degree of realism (Figure 1). Although the degree of realism was defined as the robot’s physical similar- ity to real humans, the impression of pleasantness for the ordinate of the graph was not clearly defined. One definition consistent with Mori’s conjecture is that the ordinate numerically represents degrees of any pleasant impressions (e.g., attractive, pretty, and fascinating) in the positive range and any unpleas- ant impressions (e.g., unattractive, ugly, and uncanny) in the negative range.

Presence, Vol. 16, No. 4, August 2007, 337–351 © 2007 by the Massachusetts Institute of Technology *Correspondence to [email protected].

Seyama and Nagayama 337 338 PRESENCE: VOLUME 16, NUMBER 4

The same holds true for the physical appearances of agents in and characters in computer graphics movies. Mori’s hypothesis has been adopted as a guideline for designing the physical appearance of ro- bots (Can˜amero & Fredslund, 2001; DiSalvo, Gem- perle, Forlizzi, & Kiesler, 2002; Fong, Nourbakhsh, & Dautenhahn, 2003; Hara, 2004; Hinds, Roberts, & Jones, 2004; Minato, MacDorman, et al., 2004; Mi- nato, Shimada, et al., 2004; Woods, Dautenhahn, & Schulz, 2004) and agents in virtual reality (Aylett,

2004; Fabri, Moor, & Hobbs, 2004; Wages, Gru¨nvo- Downloaded from http://direct.mit.edu/pvar/article-pdf/16/4/337/1624623/pres.16.4.337.pdf by guest on 30 September 2021 gel, & Gru¨tzmacher, 2004). According to Mori’s hy- pothesis, designers should seek a moderate level of real- ism (e.g., within the range B in Figure 1) for the Figure 1. Mori’s (1970) uncanny valley. Human observers’ physical appearance of robots and virtual reality agents impression of artificial objects is plotted hypothetically as a function of in order to avoid falling into the uncanny valley. the degree of realism (the artificial objects’ similarity to real humans). However, the validity of the uncanny valley has not Mori assumed that the level of impression would drop suddenly at a relatively higher degree of realism, and called this dip the uncanny been confirmed with psychological evidence. Thus it is valley. Regions A and B are realism ranges related to discussion in uncertain whether the uncanny valley actually emerges Experiment 2. Modified after Mori (1970). at certain realism levels. In the present study we mea- sured observers’ ratings of pleasantness of facial images with varying degrees of realism. The degree of realism was manipulated by morphing between images of artifi- In Mori’s graph, a positive impression of pleasantness of cial and real human faces: the degree of realism was rep- a robot increases with an increasing degree of realism. resented as a morphing percentage (% of real human). For example, Honda ASIMO (Sakagami et al., 2002) The artificial face images used in this study were photo- may be more attractive than industrial robots. Mori graphs of dolls and computer graphics images of human claimed, however, that human observers have excep- models. Participants in each experiment rated their im- tionally unpleasant impressions of robots that have an pressions of the pleasantness of the morphed images on almost, but not perfectly, realistic human appearance. a five-point scale. The rated scores were plotted against This effect is shown in Mori’s graph as a negative peak the degree of realism in order to empirically validate at a relatively high level of realism (Figure 1). Mori Mori’s hypothetical graph. called this negative peak the uncanny valley, by analo- In hypothesizing about the emergence of the un- gizing the shape of his graph to a mountain. Along the canny valley, Mori assumed that the impression of pleas- abscissa of Mori’s graph, not only robots but also vari- antness is zero (i.e., neutral impression) when the real- ous kinds of humanlike artificial objects (e.g., dolls and ism of robots is extremely low (e.g., industrial robots) prosthetic hands) were sorted in subjective order of de- and highest for a perfectly realistic human appearance gree of realism. Thus, Mori’s hypothesis is not limited (Figure 1). In other words, he assumed that impressions to robots but is also applicable to any type of artificial of pleasantness increase monotonically with an increas- humanlike object. ing degree of realism, except for the uncanny valley. The physical appearance of robots that are supposed However, this assumption may not be valid. In fact, to communicate, cooperate, and coexist with humans some robots, dolls, and human characters in computer should be designed with due consideration of the emo- graphics films seem highly pleasant although very unre- tional and psychological impact on human observers. alistic, while others are unpleasant. Similarly, humans Seyama and Nagayama 339

vary in impressions of pleasantness although all are real- istic. Thus, the realism-pleasantness graph might not necessarily show a monotonic increasing trend. How- ever, in the present study, we focus only on whether the negative peak, that is, the uncanny valley, actually emerges in an empirically obtained realism-pleasantness graph.

2 Method Downloaded from http://direct.mit.edu/pvar/article-pdf/16/4/337/1624623/pres.16.4.337.pdf by guest on 30 September 2021 2.1 Participants

The experiments were web-based. Each participant accessed a web page using a web browser. They were first guided to pages where they read instructions writ- ten in Japanese, and then they decided whether or not Figure 2. A screenshot of a browser window during Experiment 4. to participate in this study. Although most of the partic- ipants were students at the University of Tokyo, Shi- zuoka Eiwa Gakuin University, or Tokyo National Uni- versity of Fine Arts and Music, some of them were morphing, were defined as the percentages of real hu- identified only by their self-reported age and gender. man. Thus, an image with a morphing ratio of 0% would correspond to an unmorphed image of an artifi- cial face, and an image with a morphing ratio of 100% 2.2 Stimuli would be a perfectly realistic human face image. Details The stimuli were frames of image sequences in of the morphing procedure differed among the experi- which an artificial face was gradually morphed into a real ments, which will be described later. We obtained writ- face. Examples of the stimuli are shown in Figures 2, 3, ten consent to use the photographs of the human faces 5, 7, and 9. To define the correspondence between the for academic purposes. two source images (i.e., artificial and real facial images), Each face was presented on a uniform square back- landmarks on each face were manually chosen. For the ground with a height and width of 256 pixels. The size stimuli used in Experiments 1–3, the number of the and orientation of each face were normalized so that the landmarks was 9 for each eye, 8 for each eyebrow, 9 for eyes were aligned along a horizontal line, and the dis- the nose, 14 for the mouth, and 26 for facial contour. tance between the left and right pupils was 60 pixels. For the stimuli used in Experiment 4, the number of The actual size of the stimuli measured in visual angles the landmarks varied from 222 to 325 depending on is not known, because it depended on the viewing con- the details of the image content. Although the internal dition of each participant when he/she accessed the facial features and the face lines in the lower halves of web page for the experiments. the faces corresponded as precisely as possible, the outer contours of the upper halves of the faces (i.e., hair 2.3 Procedure shapes) and the hairlines corresponded only roughly, because they were drastically different. Morphing soft- Each participant executed JavaScript programs in a ware transformed positions and/or pixel values between web browser. In each trial a stimulus image and five corresponding points of the artificial and real faces. buttons were presented in the web browser window Morphing ratios, which controlled the magnitude of the (Figure 2). Each of the five buttons showed a Japanese 340 PRESENCE: VOLUME 16, NUMBER 4 Downloaded from http://direct.mit.edu/pvar/article-pdf/16/4/337/1624623/pres.16.4.337.pdf by guest on 30 September 2021

Figure 3. Examples of stimuli used in Experiment 1. (a) Morphing sequence from A to Human A, (b) Morphing sequence from CG A to Human B, and (c) Morphing sequence from CG B to Human A, with morphing percentages of 0, 30, 50, 70, and 100.

phrase corresponding to “extremely unpleasant,” “un- 2.4 Data analysis pleasant,” “difficult to decide (uncertain),” “pleasant,” The pleasantness scores for each experiment were or “extremely pleasant.” These buttons represented a averaged across participants and submitted to repeated- five-point scale ranging from –2 (extremely unpleasant) measures ANOVAs. When necessary, Bonferroni’s mul- to ϩ2 (extremely pleasant). Participants were instructed tiple comparisons were performed. Statistical tests for to interpret the word pleasant as representing any emo- tionally positive adjective such as attractive, pretty, natu- the other analyses will be specified as necessary. ral, healthy, intimate, or elegant, and the word unpleas- ant as representing any emotionally negative adjective such as unattractive, fearful, ugly, abnormal, sick, or 3 Results and Discussion inelegant. Each participant rated the pleasantness of the 3.1 Experiment 1 presented image by clicking the corresponding button. After the button was clicked, the stimulus image was Let us hypothesize that an almost, but not per- replaced with a uniform black field for 1 s, and then the fectly, realistic human appearance possesses an excep- next trial started. The images for each experiment were tional perceptual significance and that for some reason presented in random order. Participants performed two the human visual system generates an unpleasant im- practice trials at the beginning of each experiment. At pression for such a “special” degree of realism. This hy- the end of each experiment, participants sent the data, pothesis predicts that the uncanny valley would always as well as their age and gender, to the authors. At this emerge at some point when the degree of realism is in- stage, participants were able to decide whether or not to creased approaching the most realistic level. send the data. The number of participants who did not To test this, we showed the participants images from send the data is not known. three types of morphing sequences, in each of which an Seyama and Nagayama 341

old Japanese female’s face (Human B), the pleasantness score increased monotonically with an increasing degree of realism (Figure 4, open triangles). Thirty-seven participants (mean age 24.9 years, 15 female) rated the pleasantness scores for another morphing sequence in which CG B (Aiko 3.0, DAZ Productions, Inc., Figure 3c) was morphed into Human A. To improve the precision of the data analysis, the step size of the morphing ratio was halved (i.e., 5%), and sixty-eight landmarks were used for the eyes to

achieve smoother morphing. In spite of these improve- Downloaded from http://direct.mit.edu/pvar/article-pdf/16/4/337/1624623/pres.16.4.337.pdf by guest on 30 September 2021 Figure 4. Pleasantness scores averaged across participants ments, however, there was no clear indication of a nega- (Experiment 1) for the Doll A–Human A sequence (circles), for the tive peak corresponding to the uncanny valley (Figure 4, Doll B–Human G sequence (squares), for the CG A–Human B open diamonds). Although the pleasantness scores were sequence (triangles), and for CG B–Human A sequence (diamonds). significantly influenced by the percentage of real human Error bars are 95% confidence intervals. (F(20, 720) ϭ 2.48, p Ͻ .001), a multiple comparison revealed that a significant difference was obtained only between the scores for 0% real human and 70% real hu- man (p Ͻ .05). artificial face was gradually morphed into a real human Although the morphing sequences produced different face (Figure 3). The morphing ratio varied from 0% to tendencies, none of the four types of morphing se- 100% in increments of 10%. Forty-nine participants quences showed that an almost perfectly realistic human (mean age 26.5 years, 20 female) rated the pleasantness appearance was a sufficient condition for the uncanny of the face images. The percentage of real human (i.e., valley to emerge. the degree of realism) was significantly related to the pleasantness score for all three morphing sequences 3.2 Experiment 2 (Fs(10, 480) Ն 5.96, ps Ͻ .001). However, the empiri- cally obtained realism-pleasantness graphs did not show In Mori’s (1970) hypothetical graph (Figure 1), negative peaks (Figure 4). In one morphing sequence robots were supposed to have no resemblance to real (Figure 3a), the face of a doll (Doll A, Pongratz Pup- humans at the lowest level of realism (e.g., industrial pen) was morphed into a 19-year-old Japanese female’s robots equipped with only manipulators). On the other face (Human A). This morphing sequence produced a hand, the stimuli used in Experiment 1 had reasonably positive rather than a negative peak at 80% real human realistic human appearances even at 0% real human. The (Figure 4, open circles). The second morphing se- morphing ratio of 0% did not imply that the face had no quence, in which the face of another doll (Doll B, resemblance to real humans; it only indicated that the BP053-1, CITITOY) was morphed to a one-year-old image was the same as the original image of an artificial Japanese female’s face (Human G), yielded the highest face. Thus, one may argue that Experiment 1 failed to score at 60% real human without producing a negative detect the uncanny valley because we tested only a lim- peak (Figure 4, filled squares). The images based on ited range of realism, denoted as A in Figure 1. Further- Doll B are not shown in Figure 3 due to copyright con- more, the images were somewhat unrealistic even at the siderations. For the third morphing sequence (Figure morphing ratio of 100%, because they were low-resolu- 3b), in which a computer graphics (CG) image of an tion images presented on computer displays. So it is adult female face (CG A, Poser2 model bundled in possible that Experiment 1 instead may only have tested Poser4, Curious Labs Inc.) was morphed into a 21-year- the range of realism denoted as B in Figure 1, where the 342 PRESENCE: VOLUME 16, NUMBER 4 Downloaded from http://direct.mit.edu/pvar/article-pdf/16/4/337/1624623/pres.16.4.337.pdf by guest on 30 September 2021

Figure 5. Examples of stimuli used in Experiment 2. (a) Morphing sequence from Doll A to Human A, where the eyes were morphed first and then the head. Morphing ratios were 0–0, 60–0, 100–0, 100–60, and 100–100 (eyes %–head %). (b) Sequence where the head was morphed first.

uncanny valley was not involved. In Experiment 2, we tested whether the failure to detect the uncanny valley in Experiment 1 was due to an inappropriate range of realism in the stimuli. Forty-five participants (mean age 23.6 years, 22 fe- male) observed images from morphing sequences where the eyes and head (i.e., facial regions other than the eyes) were asynchronously morphed. In the eyes first sequence (Figure 5a), only the eyes of Doll A were morphed into those of Human A while the head was unchanged, resulting in realistic human eyes in an artifi- Figure 6. Pleasantness scores averaged across participants cial head. Next, the artificial doll head was morphed (Experiment 2) for the eyes first sequence (squares), and for the into that of Human A, resulting in a wholly realistic hu- head first sequence (triangles). Error bars are 95% confidence man face. Participants gave the lowest pleasantness score intervals. when the eyes were 100% real human and the head was 0% real human. This morphed image had higher realism than the unmorphed image of Doll A because of its real human eyes, and lower realism than the unmorphed quences, the negative peaks were found where the de- image of Human A because of its artificial head. In the grees of realism were between those of Doll A and head first sequence (Figure 5b), the head was morphed Human A. As shown in Figure 6, the percentage of real first and then the eyes. Participants gave the lowest human significantly influenced the pleasantness score pleasantness score when the head was 100% real human (Fs(10, 440) Ն 28.0, ps Ͻ .001), and the lowest scores and the eyes were 0% real human. This face also had were significantly lower than those for the unmorphed higher realism than the unmorphed image of Doll A images of Doll A and Human A (ps Ͻ .001). because of its real human head, and lower realism than The negative peaks found in Experiment 2 give em- the unmorphed image of Human A because of its artifi- pirical evidence for the emergence of the uncanny valley cial eyes. Thus, in these two types of morphing se- within a range comparable to the range tested in Experi- Seyama and Nagayama 343 Downloaded from http://direct.mit.edu/pvar/article-pdf/16/4/337/1624623/pres.16.4.337.pdf by guest on 30 September 2021

Figure 7. Examples of stimuli used in Experiment 3. (a) Morphing sequence from Doll A to Human A with a manipulation of eye size (Doll A, Doll A with eyes scaled to 150%, 50% morph between Doll A and Human A both with 150% eyes, Human A with 150% eyes, and Human A). (b) Morphing sequence from CG B to Human B.

ment 1. Therefore, the failure to detect the uncanny ism, Human A’s eyes in Doll A’s head may have been valley in Experiment 1 is not attributable to the limited judged abnormal as a doll’s eyes (Figure 5a) and Doll range of realism. A’s eyes in Human A’s head may have been judged ab- The uncanny valley emerged when the eyes and the normal as real human eyes (Figure 5b). Such judgments head showed the largest mismatch in the degree of real- about the eyes, with reference to the head, are consis- ism. Such mismatched realism was not presented in Ex- tent with past findings that visual features of the head periment 1, where the facial features were morphed syn- can influence the perceptual processing of the eyes (Hi- chronously and the uncanny valley did not emerge. This etanen, 1999; Kontsevich & Tyler, 2004; Langton, suggests that mismatched realism may be a necessary 2000; Seyama & Nagayama, 2002, 2005). The abnor- condition for the uncanny valley’s emergence. However, malities induced by the mismatched realism may have we suspect that abnormalities in the stimuli, rather than produced unpleasant impressions, which in turn pro- the mismatched realism per se, were the direct cause of duced the uncanny valley. the emergence of the uncanny valley. Doll A’s face is The same reasoning holds true for the judgments of abnormal if it is viewed as a real human face, in the the abnormality of the head based on the realism of the sense that its features remarkably deviate from a real eyes. However, it is still uncertain whether the percep- human. Nevertheless, the pleasantness scores for the tual processing of the head is influenced by the appear- unmorphed image of Doll A were not significantly ance of the eyes. lower than zero (one-tailed t tests, Experiment 1, t(48) ϭ .60, p Ͼ .05; Experiment 2, t(44) ϭ .17, p Ͼ 3.3 Experiment 3 .05). This suggests that the deviation may have been viewed as an artistic representation rather than abnor- If the uncanny valley reflects unpleasant impres- mality, although Doll A’s potential attractiveness for sions of abnormalities, then any type of abnormality children (Zeit, 1992) may have been underestimated by besides mismatched realism should also produce the adult participants. However, the faces at the bottom of uncanny valley. Among various factors that can make the uncanny valley may have been judged abnormal due faces bizarre (see, e.g., Murray, Rhodes, & Schuchinsky, to the mismatched realism between the eyes and the 2003), we tested the effect of abnormal eye size using head. If the eyes were judged based on the head’s real- two morphing sequences in Experiment 3 (Figure 7). 344 PRESENCE: VOLUME 16, NUMBER 4

reduced to their original size. The pleasantness scores increased with decreasing eye size (Figure 8, right plot area). For each image sequence, the eye size significantly influenced the pleasantness score (F(5, 195) ϭ 49.73 for Human A; F(5, 190) ϭ 51.33 for Human B; ps Ͻ .001), and the score for 150% eye size was significantly different from 100% eye size (ps Ͻ .001). As a result, the uncanny valley emerged around the real human faces with 150% eyes. For the faces with varying degree of realism, the eyes scaled to 150% and the head were

morphed synchronously in each morphing sequence. Downloaded from http://direct.mit.edu/pvar/article-pdf/16/4/337/1624623/pres.16.4.337.pdf by guest on 30 September 2021 Figure 8. Pleasantness scores averaged across participants Thus, the abnormalities induced by the mismatched (Experiment 3). Upper abscissa: eye size scaling factor. Lower realism do not account for the results. abscissa: degree of realism. Circles: Doll A to Human A. Triangles: CG The results suggest that an interaction between the B to Human B. Error bars are 95% confidence intervals. abnormal eye size and realism produced negative peaks that are comparable to Mori’s (1970) uncanny valley only for the faces with higher degrees of realism. It Forty participants (mean age 22.5 years, 33 female) should be pointed out, however, that in each morphing rated images from the Doll A–Human A sequence (Fig- sequence the degree of realism covaried with other facial ure 7a), and thirty-nine participants (mean age 24.2 characteristics such as gender, age, and expression. years, 19 female) rated images from the CG B–Human Thus, it is possible that the factor that interacted with B sequence (Figure 7b). the abnormality in Experiment 3 was a covarying factor In each sequence, the eye size of an artificial face first rather than the realism. increased from the original size (100%) up to 150%; To test this possibility, we presented the face stimuli however, the artificial faces did not produce unpleasant with 100% eye size to 24 naive raters (undergraduates at impressions (i.e., negative scores) even when their eyes Tokyo National University of Fine Arts and Music and were scaled to 150% (Figure 8, left plot area). The eye Rikkyo University), and asked them to judge the real- size did not significantly influence the pleasantness score ism, gender, , and age of each face. For for Doll A (F(5, 195) ϭ 1.49, p Ͼ .05). Although the the realism judgment, raters compared two faces pre- eye size significantly influenced the pleasantness score sented side by side, and all raters consistently judged for CG B (F(5, 190) ϭ 7.91, p Ͻ .001), the score for that the human faces were more realistic than the artifi- 150% eye size was not significantly different from that cial faces (ps Ͻ .0001, binomial tests). This suggests for 100% eye size (p Ͼ .05). that the realism in the two morphing sequences in- After the eye size was scaled to 150%, each artificial creased from left to right along the abscissa of Figure 8 face was morphed into a real human face with 150% (middle plot area). eyes. Pleasantness scores for the faces with 150% eyes The raters’ judgments about gender and facial expres- decreased with increasing degree of realism (Figure 8, sion were not consistent between the two morphing middle plot area). The Doll A–Human A sequence pro- sequences. In the gender judgment, 18 of the 25 raters duced the lowest score for 100% real human face with judged Doll A as male, and all raters judged Human A enlarged eyes. Although the CG B–Human B sequence as female (ps Ͻ .05, binomial tests). For CG B and Hu- produced the lowest score for 80% real human, the man B, all raters judged both faces as female (p Ͻ scores for 60–100% were not significantly different from .0001). For the facial expression judgment, the raters one another (ps Ͼ .05). chose the most suitable expression for each face among Finally, the enlarged eyes of the real human face were happy, sad, surprised, angry, disgusted, fearful (six basic Seyama and Nagayama 345 Downloaded from http://direct.mit.edu/pvar/article-pdf/16/4/337/1624623/pres.16.4.337.pdf by guest on 30 September 2021

Figure 9. Examples of stimuli used in Experiment 4. From left to right columns, dolls with 100% eye size, dolls with 150% eye size, 50% morphs between doll and human faces with 150% eye size, humans with 150% eye size, and humans with 100% eye size.

emotions; Ekman & Friesen, 1975) and neutral expres- p Ͻ .001). Also the mean estimated age for Human B sions. The overall patterns of the expression choice were (24.5 years, SD ϭ 4.1) and CG B (20.6 years, SD ϭ significantly different between Doll A and Human A 4.7) were significantly different (t(24) ϭ 4.15, p Ͻ (p Ͻ .0001, Fisher’s exact test), but were not signifi- .001). cantly different between CG B and Human B (p ϭ .79). Thus, one may argue that what produced the un- Therefore, neither gender nor facial expressions consis- canny valley in Figure 8 was the age rather than the real- tently explain the results for the two morphing sequences. ism interacting with abnormal eye size. It should also be For the judgments about age, the raters consistently pointed out that, even if the realism played a major role judged that the human faces were older than the artifi- in Experiment 3, there still remains a possibility that the cial faces. The raters observed two faces presented side realism interacted with confounding factors other than by side, and chose the older one. All raters judged that abnormal eye size. The possible influences of confound- Human A was older than Doll A, and 22 of the 25 rat- ing factors were investigated further in Experiment 4. ers judged that Human B was older than CG B (ps Ͻ .001, binomial tests). The raters also estimated the age 3.4 Experiment 4 in years for each face. The mean estimated ages for Hu- man A (24.0 years, SD ϭ 1.7) and Doll A (5.6 years, To minimize the influence of confounding factors, SD ϭ 2.2) were significantly different (t(24) ϭ 35.0, we used images of artificial faces that were as similar as 346 PRESENCE: VOLUME 16, NUMBER 4 Downloaded from http://direct.mit.edu/pvar/article-pdf/16/4/337/1624623/pres.16.4.337.pdf by guest on 30 September 2021

Figure 10. Pleasantness scores for faces with eyes scaled to 150% Figure 11. Pleasantness scores for faces with 100% eye size (Experiment 4). Upper abscissa: eye size scaling factor. Lower (Experiment 4). Squares: Doll C to Human C. Diamonds: Doll D to abscissa: degree of realism. Squares: Doll C to Human C. Diamonds: Human D. Triangles: Doll E to Human E. Circles: Doll F to Human F. Doll D to Human D. Triangles: Doll E to Human E. Circles: Doll F to Error bars are 95% confidence intervals. Human F. Error bars are 95% confidence intervals. Figure 10 shows the mean pleasantness scores in the same manner as Figure 8. As shown in the left plot area, possible to real human faces. The artificial face images the effect of eye size was not consistent across the four were produced by mapping the textures of artificial faces dolls. The eye size significantly influenced the pleasant- onto real human faces (Figure 9). In other words, ness score only for Dolls C and E (Fs(2, 92) Ͼ 5.88, shapes of artificial face images were warped into those of ps Ͻ .01). In contrast, the eye size influenced the pleas- real human faces without changing their textures. As is antness score in a consistent manner across the four hu- shown later, most raters judged these stimuli to be arti- man faces (Figure 10, right plot area). For the human ficial faces in spite of the realistic humanlike shapes. In faces, the pleasantness score decreased with increasing producing the stimuli, four human face images (2 male eye size (Fs(2, 92) Ͼ 27.8, ps Ͻ .001), and the eyes and 2 female Japanese, age 21 to 22 years; Humans C, scaled to 150% produced the most unpleasant impres- D, E, and F) were paired with four doll face images sion. (Dolls C, D, E, and F). Doll C was a classic Japanese Although the results of Experiment 4 did demon- doll photographed by the authors, and Dolls D–F were strate how the impression of faces with a 150% eye size images of masks obtained from a commercially available varied as a function of the morphing percentage (Figure photo collection (Mask by Corel, SKU:CPH480). 10, middle plot area), the influence of various con- Based on these four Doll–Human pairs, stimuli were founding factors must be removed from these results to produced in a manner similar to Experiment 3. For sim- isolate the effects of abnormal eye size and realism. Fig- plicity, the eye size was scaled only to 100%, 125%, and ure 11 shows the pleasantness scores for faces with 150%, and the doll faces with 150% eye size were 100% eye size. These scores may also reflect the influ- morphed into the human faces with 150% eye size in ence of confounding factors. Thus, by subtracting the only five steps (0, 25, 50, 75, and 100% real human). In scores in Figure 11 from those for 150% eye size pre- addition, morph images between the artificial and real sented in Figure 10 (middle plot area), the influence of human faces with 100% eye size were also produced to confounding factors can be eliminated (calibrated measure the pleasantness scores for normal eye size. scores). For all four morphing sequences (solid lines in Forty-seven participants (mean age 20.6 years, 26 fe- Figure 12), the calibrated scores decreased with increas- male, 20 male, one did not report his/her gender) rated ing morphing percentage (Fs(4, 184) Ͼ 20.71, ps Ͻ their impressions of the pleasantness of these stimuli. .001). Seyama and Nagayama 347

Although the interpretation of the ordinate of Figure 12 (i.e., the calibrated score) is not confounded, there still remains the possibility noted earlier, that the ab- scissa may represent variation in a confounding factor other than realism. To test this possibility, 45 naive rat- ers (undergraduates at Tokyo National University of Fine Arts and Music) judged the gender, expression, age, and realism of the stimuli in the same manner as Experiment 3. All raters judged the gender correctly for Humans D

(male), E (female), and F (male), and only three were Downloaded from http://direct.mit.edu/pvar/article-pdf/16/4/337/1624623/pres.16.4.337.pdf by guest on 30 September 2021 Figure 12. Calibrated scores (Experiment 4) obtained by incorrect for Human C (female) (ps Ͻ .0001, binomial subtracting the scores for 100% eye size from those for 150% eye tests). The gender judgment for Human D was not sig- size. Filled squares: Doll C to Human C. Diamonds: Doll D to Human nificantly different from that for Doll D, for which 43 D. Triangles: Doll E to Human E. Open circles: Doll F to Human F. raters judged as male (p ϭ .49, Fisher’s exact test). Error bars are 95% confidence intervals. Filled circles and open Thus, gender does not account for the results for the squares connected with broken lines represent the scores for Doll A– Doll D–Human D sequence. On the other hand, the Human A sequence and CG B–Human B sequence, respectively. genders of the other doll faces were judged less consis- tently, and the male/female judgment ratios were sig- Figure 12 also shows the calibrated scores for the nificantly different between the doll and human faces Doll A–Human A and CG B–Human B sequences (bro- (ps Ͻ .01, Fisher’s exact tests). The Doll C–Human C ken lines). For the Doll A–Human A sequence, the cali- sequence and the Doll E–Human E sequence showed brated scores were obtained by subtracting the results of increasing femininity, and the Doll F–Human F se- Experiment 1, scores for 100% eye size (Figure 4), from quence showed increasing masculinity. In spite of these those of Experiment 3, scores for 150% eye size (middle opposite variations in gender, all doll–human sequences plot area in Figure 8). For the CG B–Human B se- yielded the same trends for impressions of pleasantness quence, the calibrated scores were obtained based only (Figure 12). Therefore, the factor of gender does not on the results of Experiment 3. In Experiment 3, the explain the results. pleasantness scores for 100% eye size were not measured For the facial expression judgment, most raters chose at intermediate morphing percentages. Thus, the cali- either a happy or neutral expression for each face, and brated scores for this morphing sequence were yielded the overall patterns of the expression choices were not only at 0 and 100% real human. significantly different between each of the paired doll As can be seen in Figure 12, the calibrated scores for and human faces (ps Ͼ .39, Fisher’s exact test) except the artificial faces (0% real human) were close to zero, between Doll C and Human C (p Ͻ .05). As noted ear- indicating that the 100% eye size and the 150% eye size lier, the facial expression judgments in Experiment 3 yielded similar pleasantness scores. This suggests that were significantly different between Doll A and Human the scaling of the eyes from 100% to 150% had only a A, but not significantly different between CG B and weak influence on the impression created by artificial Human B. Thus, among the six morphing sequences faces. On the other hand, for the human faces (100% tested in Experiments 3 and 4, the factor of facial ex- real human), all six morphing sequences produced the pression does not explain the results for the four lowest calibrated scores, which were lower than zero. morphing sequences. For Doll C, 30 of the 45 raters This suggests that the abnormally scaled eyes decreased chose the neutral expression, but this ratio was reduced pleasantness scores the most when the faces were the to 16/45 for Human C, suggesting that the human face most realistic. appeared to be more expressive than the artificial face. 348 PRESENCE: VOLUME 16, NUMBER 4

In contrast, the neutral expression was chosen by only 5 comparable experimental results (e.g., Driver et al., of the 25 raters for Doll A, but 21 of the 25 raters for 1999 and Friesen & Kingstone, 1998; Wilson, Loffler, Human A, suggesting that the artificial face was more & Wilkinson, 2002; Yin, 1969). Nevertheless, in daily expressive than the human face. Because of this incon- life people rarely confuse artificial faces with real human sistency between the two morphing sequences, it is diffi- faces; people do not ask a mannequin in a store window cult to explain the results shown in Figure 12 based on for directions to a train station. This suggests that the the facial expression. visual system has sensitivity to the degree of realism of For the age judgment, the raters judged that Human faces. C was older than Doll C (p Ͻ .001, binomial test). For The present study investigated an effect of the degree the other doll–human pairs, they judged that the dolls of realism on the impression of pleasantness of artificial were older than the humans (ps Ͻ .001). Thus, among human faces, and in particular investigated the uncanny Downloaded from http://direct.mit.edu/pvar/article-pdf/16/4/337/1624623/pres.16.4.337.pdf by guest on 30 September 2021 the six morphing sequences tested in Experiments 3 and valley hypothesis proposed by Mori (1970). The results 4, human faces were perceived to be older than artificial of our experiments showed that the uncanny valley ac- faces in three morphing sequences, but the opposite was tually emerged as Mori (1970) had predicted. However, true in the other three morphing sequences. Despite our results also showed that the uncanny valley emerged these inconsistent age judgments, the six morphing se- only when the face images involved abnormal features. quences yielded similar results for the calibrated scores Thus, to fully understand the nature of the uncanny as shown in Figure 12. Therefore, the factor of age does valley, we need to consider the effects of both the real- not solely explain the results. ism and the abnormality of artificial human appearance. In contrast to the inconsistent judgments of gender, For example, if human observers have unpleasant im- age, and facial expression, the raters consistently judged pressions of the faces of avatars in virtual reality or ro- that the human faces were more realistic than the artifi- bots, the unpleasantness should not be attributed solely cial faces in all six morphing sequences tested in Experi- to the degree of realism. Probably, the physical appear- ments 3 and 4 (ps Ͻ .0001, binomial tests). Thus, ance of such avatars and robots may involve certain ab- among the four factors considered here, only realism normal visual features. Thus, improving the degree of can consistently explain the results as the factor that in- realism of such avatars and robots without removing the teracted with the abnormal eye size. Therefore, we may abnormal features may simply lead to an exaggeration of interpret the abscissa of Figure 12 as representing the the human observers’ unpleasant impressions of the arti- degree of realism. ficial faces. Although the degrees of abnormality investigated in Experiments 3 and 4 (i.e., scaling of the eyes to 150%) 4 General Discussion were identical for artificial and real faces, its impact was greater for faces with higher realism. Participants may It is assumed that the human visual system in- have judged that the eyes scaled to 150% were too large volves sophisticated mechanisms for processing facial for real human eyes, but such eyes were acceptable as information (e.g., Haxby, Hoffman, & Gobbini, 2000). artificial human eyes. This implies that the judgment Such mechanisms seem to be broadly tuned to faces criterion for real faces was different from that for artifi- with various degrees of realism. Real human faces, artifi- cial faces. The human visual system may have knowl- cial faces of dolls and robots, computer generated facial edge about how eye size varies among humans (i.e., images, schematic line drawings of faces, and even sim- data of the statistical distribution of the size of real hu- ple face-like patterns consisting of simple geometric man eyes) from past experience, and such knowledge shapes (e.g., Robert & Robert, 2000; Turati, 2004) are can serve as a judgment criterion of abnormality. If the all accepted as “faces.” Past studies showed that facial eye size on a face deviated from the center of the statis- images with different degrees of realism often yielded tical distribution of normal eye sizes, such a face may be Seyama and Nagayama 349

judged abnormal. On the other hand, knowledge about images. Lacking this knowledge made it difficult to ef- how eye size varies among artificial faces may constitute fectively manipulate only the degree of realism in our another judgment criterion of abnormality. In fact, arti- stimuli. Since we simply morphed artificial faces into real ficial faces can have arbitrary eye sizes depending on the human faces, various confounding factors covaried with designers’ intentions, and past experience in observing the realism. Although we showed that the realism ex- such artificial eyes may constitute statistical knowledge plains the results of the present study better than the that is different from that about real human faces. It confounding factors of gender, age, and facial expres- should be pointed out that the participants’ tolerance of sion, there still remains a possibility that an interaction the huge eyes on the artificial faces might have reflected between abnormality and an untested confounding fac- their cultural background. Since most of the participants tor better explains the results. The participants’ task in were Japanese, their judgment criterion may have been the present study can be interpreted as a judgment of Downloaded from http://direct.mit.edu/pvar/article-pdf/16/4/337/1624623/pres.16.4.337.pdf by guest on 30 September 2021 formed through their experience in watching Japanese- facial attractiveness (or unattractiveness). Researchers style moves, computer games, and comics have shown that the facial attractiveness is influenced by (manga), in which huge eye size designs have been fre- various factors, such as averageness, youthfulness, sym- quently employed. metry, hairstyle, and skin smoothness (see Rhodes & Otherwise, the uncanny valley may indicate that per- Zebrowitz, 2002 for review). The influences of such ceptual sensitivity to facial features was higher for real factors on the results of the present study are still un- faces than for artificial faces, and the higher sensitivity clear. for real faces produced unpleasant impressions of abnor- The distinction between realism and abnormality is mality while the lower sensitivity for artificial faces did not so straightforward. We have operationally defined not. Sensitivity to facial features is known to be better the degree of realism as the morphing percentage; for familiar faces than for unfamiliar faces (Walker & that is, the similarity of a morphed image to the pho- Tanaka, 2003). Thus, the results of Experiments 3 and tograph of a real human face that was used as a 4 may reflect the fact that participants were more famil- source image for the morphing sequence. In the ac- iar with real faces than with artificial faces. tual definition of realism employed by the human vi- Although we tested only static images, Mori sual system, the similarity may be measured between (1970) noted that robots’ motion would also influ- an observed (artificial) face and a certain standard ence the uncanny valley. If the judgment criterion of an abnormality in motion is different for real and arti- face. The average face of real humans (e.g., Rhodes et ficial human appearances (Hodgins, O’Brien, & Tum- al., 2001; Rhodes & Zebrowitz, 2002) may serve as blin, 1998), then abnormality in motion would pro- the standard face, since unrealistic artificial faces are duce the uncanny valley depending on the degree of supposed to have visual features that deviate consider- realism. ably from those of the average face of real humans. It Further studies are necessary to unveil other aspects should be noted, however, that the degree of abnor- of the effect of realism on facial perception and cogni- mality (or normality) may also be defined based on tion. Such studies will provide clues to further under- similarity to the average (or normal) face, since ab- standing human responses to artificial human-like ob- normal faces are supposed to have deviant visual fea- jects (e.g., Arita, Hiraki, Kanda, & Ishiguro, 2005; tures. In spite of the similarity between realism and Breazeal, 2003; Garau, Slater, Pertaub, & Razzaque, abnormality, the results of the present study suggest 2005; Hinds et al., 2004; Minato, MacDorman, et al., that the human visual system processes realism and 2004; Minato, Shimada, et al., 2004; Shinozawa, Naya, abnormality as separate perceptual dimensions. In Yamato, & Kogure, 2005). One of our unanswered other words, the human visual system may define re- questions is how the human visual system extracts the alism and abnormality based on different visual fea- information of realism from the visual features of face tures. 350 PRESENCE: VOLUME 16, NUMBER 4

Acknowledgments Garau, M., Slater, M., Pertaub, D.-P., & Razzaque, S. (2005). The responses of people to virtual humans in an immersive We thank S. Akamatsu for assistance in producing the virtual environment. Presence: Teleoperators and Virtual En- morphed facial images; M. Nomura, K. Hasegawa, I. Ito, K. vironments, 14(1), 104–116. Taki, and K. Enokida for assistance in recruiting participants; Hara, F. (2004). Artificial emotion of face robot through A. Tanaka and D. Norman for comments; and E. Pongratz for learning in communicative interactions with humans. Pro- permission to use the photo of Doll A. The preparation of this ceedings of the 2004 IEEE Workshop on Robot and Human manuscript was supported by MEXT.KAKENHI (17730423). Interactive Communication, 20–22. Haxby, J. V., Hoffman, E. A., & Gobbini, M. I. (2000). The distributed human neural system for face perception. Trends in Cognitive Sciences, 4(6), 223–233.

References Downloaded from http://direct.mit.edu/pvar/article-pdf/16/4/337/1624623/pres.16.4.337.pdf by guest on 30 September 2021 Hietanen, J. K. (1999). Does your gaze direction and head orientation shift my visual attention? NeuroReport, 10(16), Arita, A., Hiraki, K., Kanda, T., & Ishiguro, H. (2005). Can 3443–3447. we talk to robots? Ten-month-old infants expected interac- Hinds, P. J., Roberts, T. L., & Jones, H. (2004). Whose job tive humanoid robots to be talked to by persons. , is it anyway? A study of human-robot interaction in a col- 95(3), B49–B57. laborative task. Human-Computer Interaction, 19(1–2), Aylett, R. S. (2004). Agents and affect: Why embodied agents 151–181. need affective systems. Lecture Notes in Computer Science, Hodgins, J. K., O’Brien, J. F., & Tumblin, J. (1998). Percep- 3025, 496–504. tion of human motion with different geometric models. Breazeal, C. (2003). Emotion and sociable humanoid robots. IEEE Transactions on Visualization and Computer Graphics, International Journal of Human-Computer Studies, 59(1– 4(4), 307–316. 2), 119–155. Kobayashi, H., Ichikawa, Y., Senda, M., & Shiba, T. (2003). Can˜amero, L., & Fredslund, J. (2001). I show you how I like Realization of realistic and rich facial expressions by face you: Can you read it in my face? IEEE Transactions on Sys- robot. Proceedings of the 2003 IEEE International Confer- tems, Man, and Cybernetics Part A: Systems and Humans, ence on Intelligent Robots and Systems, 1123–1128. 31(5), 454–459. Kontsevich, L. L., & Tyler, C. W. (2004). What makes Mona DiSalvo, C. F., Gemperle, F., Forlizzi, J., & Kiesler, S. Lisa smile? Vision Research, 44(13), 1493–1498. (2002). All robots are not created equal: The design and Langton, S. R. H. (2000). The mutual influence of gaze and perception of humanoid robot heads. Proceedings of the DIS head orientation in the analysis of social attention direction. Conference, 321–326. Quarterly Journal of Experimental Psychology, 53A(3), 825– Driver, J., Davis, G., Ricciardelli, P., Kidd, P., Maxwell, E., & 845. Baron-Cohen, S. (1999). Gaze perception triggers reflexive Minato, T., MacDorman, K. F., Shimada, M., Itakura, S., Lee, visuospatial orienting. Visual Cognition, 6, 509–540. K., & Ishiguro, H. (2004). Evaluating humanlikeness by Ekman, P., & Friesen, W. V. (1975). Unmasking the face: A comparing responses elicited by an and a person. guide to recognizing emotions from facial clues. Englewood Proceedings of the 2nd International Workshop on Man-Ma- Cliffs, NJ: Prentice-Hall. chine Symbiotic Systems, 373–383. Fabri, M., Moor, D., & Hobbs, D. (2004). Mediating the Minato, T., Shimada, M., Ishiguro, H., & Itakura, S. (2004). expression of emotion in educational collaborative virtual Development of an android robot for studying human- environments: An experimental study. Virtual Reality, 7(2), robot interaction. Lecture Notes in Computer Science, 3029, 66–81. 424–434. Fong, T., Nourbakhsh, I., & Dautenhahn, K. (2003). A sur- Mori, M. (1970). Bukimi no tani [The uncanny valley]. En- vey of socially interactive robots. and Autonomous ergy, 7(4), 33–35. Systems, 42(3–4), 143–166. Murray, J. E., Rhodes, G., & Schuchinsky, M. (2003). When Friesen, C. K., & Kingstone, A. (1998). The eyes have it! Re- is a face not a face? The effects of misorientation on mecha- flexive orienting is triggered by nonpredictive gaze. Psy- nisms of face perception. In M. A. Peterson & G. Rhodes chonomic Bulletin & Review, 5(3), 490–495. (Eds.), Perception of faces, objects, and scenes: Analytic and Seyama and Nagayama 351

holistic processes (pp. 75–91). New York: Oxford University Shinozawa, K., Naya, F., Yamato, J., & Kogure, K. (2005). Press. Differences in effect of robot and screen agent recommen- Norman, D. A. (2004). Emotional design: Why we love (or dations on human decision-making. International Journal hate) everyday things. New York: Basic Books. of Human-Computer Studies, 62(2), 267–279. Reichardt, J. (1978). Robots: Fact, fiction, and prediction. Har- Turati, C. (2004). Why faces are not special to newborns: An mondsworth, Middlesex: Penguin Books Ltd. alternative account of the face preference. Current Direc- Rhodes, G., Yoshikawa, S., Clark, A., Lee, K., McKay, R., & tions in Psychological Science, 13(1), 5–8. Akamatsu, S. (2001). Attractiveness of facial averageness Wages, R., Gru¨nvogel, S. M., & Gru¨tzmacher, B. (2004). and symmetry in non-Western cultures: In search of biolog- How realistic is realism? Considerations on the of ically based standards of beauty. Perception, 30(5), 611– computer games. Lecture Notes in Computer Science, 3166, 625. 216–225. Rhodes, G., & Zebrowitz, L. A. (Eds.). (2002). Facial attrac- Downloaded from http://direct.mit.edu/pvar/article-pdf/16/4/337/1624623/pres.16.4.337.pdf by guest on 30 September 2021 Walker, P. M., & Tanaka, J. W. (2003). An encoding advan- tiveness: Evolutionary cognitive and social perspectives. West- tage for own-race versus other-race faces. Perception, 32(9), port, CT: Ablex. 1117–1125. Robert, F., & Robert, J. (2000). Faces. San Francisco: Chroni- Wilson, H. R., Loffler, G., & Wilkinson, F. (2002). Synthetic cle Books. faces, face cubes, and the geometry of face space. Vision Re- Sakagami, Y., Watanabe, R., Aoyama, C., Matsunaga, S., Hi- search, 42(27), 2909–2923. gaki, N., & Fujimura, K. (2002). The intelligent ASIMO: Woods, S., Dautenhahn, K., & Schultz, J. (2004). The design System overview and integration. Proceedings of the 2002 IEEE/RSJ International Conference on Intelligent Robots space of robots: Investigating children’s views. Proceedings and Systems, 2478–2483. of the 2004 IEEE International Workshop on Robot and Hu- Seyama, J., & Nagayama, R. S. (2002). Perceived eye size is man Interactive Communication, 20–22. larger in happy faces than in surprised faces. Perception, Yin, R. K. (1969). Looking at upside-down faces. Journal of 31(8), 1153–1155. Experimental Psychology, 81(1), 141–145. Seyama, J., & Nagayama, R. S. (2005). The effect of torso Zeit, U. (1992). Ku¨nstler machen Puppen fu¨r Kinder: von direction on the judgment of eye direction. Visual Cogni- Marion Kaulitz bis Elisabeth Pongratz. Duisburg, Germany: tion, 12(1), 103–116. Verlag Puppen und Spielzeug.