Perception, 2006, volume 35, pages 1089 ^ 1105

DOI:10.1068/p5547 Information processing during face recognition: The effects of familiarity, inversion, and morphing on scanning fixations

Jason J S Bartonô½Á, Nathan Radcliffe#, Mariya V Cherkasova, Jay Edelman^, James M Intriligator Department of Neurology (½ and Department of Ophthalmology), Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA 02215, USA (Á also Department of Bioengineering, Boston University, Boston, MA 02215, USA); # Faculty of Medicine, Temple University, Philadelphia, PA 19122, USA; ^ Department of Psychology, Harvard University, Cambridge, MA 02138, USA Received 26 April 2005, in revised form 6 October 2005; published online 9 June 2006

Abstract. Where we make ocular fixations when viewing an object likely reflects interactions between `external' object properties and internal `top ^ down' factors, as our perceptual system tests hypotheses and attempts to make decisions about our environment. These scanning fixation patterns can tell us how and where the gathers information critical to specific tasks. We determined the effects of the internal factors of expertise, experience, and on scanning during a face-recognition task, in eight subjects. To assess the effects of expertise, we compared upright with inverted faces, since it is hypothesized that inverted faces do not access an orientation-dependent face-expert processor. To assess the effects of experience, we compared famous with novel faces, as famous faces would have stronger internal representations than anonymous ones. Ambiguity in matching seen and remembered faces was manipulated with morphed faces. We measured three classes of variables: (i) total scanning time and fixations; (ii) the spatial distribution of scanning; and (iii) the sequence of scanning, using first-order Markov matrices for local scan structure and string editing for global scan structure. We found that, with inverted faces, subjects redistributed fixations to the mouth and lower face, and their local and global scan structure became more random. With novel or morphed faces, they scanned the eyes and upper face more. Local scan structure was not affected by familiarity, but global scan structure was least random (most stereotyped) for novel upright faces. We conclude that expertise (upright faces) leads to less lower-face scanning and more predictable global patterns of information gathering. Experience (famous faces) leads to less upper-face scanning and more idiosyncratic global scan structures, suggesting a superseding influence of facial memories. With morphed faces, subjects return to the upper face to resolve ambiguity, implying a greater importance of this region in face recognition.

1 Introduction The viewing of scenes is usually accompanied by a scanning series of fixations linked by . These fixations are not randomly scattered but targeted to specific regions: thus, what is seen determines where we look. On the other hand, saccades shift both the locus of attention and the great resolving power of the fovea to new parts of the scene, generating new perceptual data. Hence, it is also true that where we look determines what we see. This interplay between sensory processing and eye movements renders not just a passive experience but also an active exploration of the world. Scanning is thus both a manifestation of, and a mechanism for, information process- ing. Its relation to perception has long been studied. One early concept, the `scanpath', proposed that perception is built much like a string of beads (Noton and Stark 1971a, 1971c). Each fixation generates a small local view of the image tagged by the location of the fixation, and the overall percept is formed when these regional bits of perceptual data are assembled into a larger whole with the aid of the addresses of their tags. How- ever, this concept does not account for certain aspects of normal scanning, such as the

ô Address for correspondence: Jason J S Barton, Neuro-ophthalmology Section D, VGH Eye Care Center, 2550 Willow Street, Vancouver, BC V5Z 3N9, Canada; e-mail: [email protected] 1090 J J S Barton, N Radcliffe, M V Cherkasova, J Edelman, J M Intriligator

great variability between different subjects viewing the same pictures (Walker-Smith et al 1977) or the fact that some subjects accurately perceive objects with a single fixation (Groner et al 1984). This latter point indicates that scanpaths are not always necessary for perception. While it has been suggested that in such cases internal `covert' shifts of attention might be substitutes for overt shifts of fixation (Noton and Stark 1971b, 1971c), this is unlikely when recognition occurs with very short viewing times (Groner et al 1984). Recent considerations of scanning have shifted the emphasis from its role in forming a percept to its role in making judgments about the image (Deco and Schurmann 2000; Rybak et al 1998). These stress the view that the goal of perception is to obtain specific knowledge about the world. Hence scanning acquires data for a goal-directed process aimed at some decision threshold. If so, analyzing the locations scanned can reveal the type of data critical to a perceptual decision. Furthermore, since scanning is a sequential process, it can also inform us about the temporal organization of such information processing. In a goal-driven process, aims are determined as much by the observer as by the stimulus. [This explains why scanpaths vary with both the stimulus being viewed and the observer doing the viewing (Noton and Stark 1971c).] Several factors within the observer can affect scanning. Prior experience affects the ease and rapidity with which a decision is reached. Thus, novel items require more fixations in more regions and generate different scanning sequences than familiar items (Althoff and Cohen 1999). The observer's motives influence what kind of information processing occurs. This can reflect pre-existing tendencies, contrasting art experts with non-professional viewers, for example (Zangemeister et al 1995), the contingencies of ongoing tasks, or the instructed goals in an experiment, such as making judgments about expression versus familiarity in faces (Althoff and Cohen 1999). In this study we analyzed scanning to reveal specific aspects of information process- ing during face recognition. Several previous investigators have studied how normal subjects scan faces (Althoff and Cohen 1999; Groner et al 1984; Luria and Strauss 1978; Rizzo et al 1987; Walker-Smith et al 1977). We sought to advance upon these earlier data by using controlled manipulations of perceptual difficulty. By doing so, we aimed to reveal the effects of three internal (`top ^ down') observer factors: expertise, experi- ence, and ambiguity in the information supporting the perceptual decision. First, to study expertise we contrasted upright with inverted faces. Faces are more difficult to recognize upside down (Yin 1970), likely because inversion impairs the perception of facial structure (Barton et al 2001, 2003; Kemp et al 1990; Leder and Bruce 1998; Malcolm et al 2004; Searcy and Bartlett 1996; Sergent 1984). While some have suggested that inversion simply makes face encoding noisier (Valentine 1988), others have argued for a `dual-mode' hypothesis (Bartlett and Searcy 1993; Farah et al 1998). This proposes that, in addition to more generic object processing, there is a second expert mechanism used to process faces, which requires practice to become expert. Since almost all the faces we see are upright, this expert mechanism develops an orientation-selectivity, so that only the generic route processes inverted faces. We hypothesized that differences in processing modes would also be reflected in differences in scanning. The goal of this manipulation was to discover the signature scanning effects of an expert face-processing mechanism, in the contrast between upright and inverted stimuli. The second and third factors we examinedöexperience and ambiguityöare related. In face identification, the goal of viewing is to match the face currently seen to memo- ries of previously encountered faces. The ease of this matching process is affected by both experience and ambiguity. First, past experience with famous faces leads to stronger internal representations of these faces in the observer's memory, making the Information processing during face recognition 1091

match between perceived and remembered faces easier to accomplish, compared to only recently encountered (novel) faces. The contrast between famous and novel faces can reveal the effect of these internal representations. Thus, we contrasted famous with novel faces, as done in two other studies (Althoff and Cohen 1999; Rizzo et al 1987). Second, the match between the currently seen face and facial memories can be affected by ambiguity in the current seen face, as with morphed faces. The ambiguity of morphed faces introduces more uncertainty into the process of reaching a decision as the observer tries to match the face to a memory. Thus, both novelty (through experience) and morphing (through ambiguity) can reveal the effects of a degraded `sample-to- memory' match on identity processing. We hypothesized that this degradation would alter scanning patterns. Furthermore, if the dual-mode hypothesis is correct, the effects of degraded identification matching should be distinct from the effects of inversion.

2 Methods 2.1 Subjects We studied eight normal subjects, three female and five male, with a mean age of 31.5 years (range 23 to 41 years). None had met the individuals whose faces constituted the novel set. All subjects gave informed consent and protocols were approved by the institutional review board of our hospital. 2.2 Images We created a set of eight morph series, where each series represented a gradual physical transformation from one face to a second. Four series were transitions between two male faces, and four were transitions between two female faces. In each gender set, two series used pairs of famous faces and two used pairs of novel faces. For famous faces, we selected eight images with a high-resolution frontal view from the internet. We rejected images with excess jewelry, make-up, facial hair, open mouths, or glasses. Subjects rated these faces for familiarity on a 10-point scale, with 0 being not familiar at all and 10 being extremely familiar. Averaging across subjects, the mean familiarity rating for all famous faces was 7.4 (SD 1:2), ranging from 5.6 for Matthew Broderick to 8.8 for John Travolta. For novel faces, we used a digital camera to take pictures of eight anonymous individuals, ensuring that their images matched the above criteria for famous faces. Using Photoshop 5.5 (Adobe, San Jose, CA), we replaced the backgrounds with a black surround, removed earrings, and cropped the images to eliminate the external hair contour and neck. Each image was set to a height of 330 pixels, and, because face shapes varied, widths ranged from 299 pixels to 392 pixels. Morph pairs were matched for hair color, facial shape, image size, image luminance, sex, and fame. The celebrity pairs were Demi Moore/Julia Roberts, Michelle Pfeiffer/Gwyneth Paltrow, John Travolta/ Paul Newman, Matthew Broderick/Kevin Costner. For each morph series the two base (unmorphed) images were loaded into Morph Studio 1.0, where 120 points were matched between each image on analogous facial regions (ie center of , corner of mouth). We created a morph series of 21 images, repre- senting a 5% change in position, hue, and contrast between the two base images (designated images 1 and 21). From each series we used the two base images (images 1 and 21, representing mixtures of {0 : 100}) and five morphed images from near the center of each series (images 9, 10, 11, 12, and 13, representing mixtures of {40 : 60}, {45 : 55}, and {50 : 50}, figure 1). In a separate preliminary series of psychophysical experiments on categorical perception of these facial images, we had asked eight subjects to state the base face that each morphed image most closely resembled. Though there were small variations between different morph series, the average morph where subjects were equally likely 1092 J J S Barton, N Radcliffe, M V Cherkasova, J Edelman, J M Intriligator

Figure 1. Example of a morph series. Upright and inverted images of morphs between Demi Moore (left, DM) and Julia Roberts (right, JR). From left to right the {DM : JR} morph mixtures are {100 : 0}, {60 : 40}, {55 : 45}, {50 : 50}, {45 : 55}, {40 : 60}, {0 : 100}. The two middle rows are plots of scanning density averaged across all subjects, showing where in the face subjects spent the most time fixating, the top belonging to the upright faces and the bottom to the inverted faces. In the upright faces there is a preference for the eyes, particularly the face's right eye (left side of image). In inverted faces there is an increase in the scanning of the lower face (nose, mouth, and chin). There is more scanning for the morphed images in general. However, note the para- doxical reduction in scanning of the {50 : 50} morph face. A color version of this figure can be viewedonthePerceptionwebsiteathttp://www.perceptionweb.com/misc/p5547/. to respond that an image belonged to either base face was close to the {50 : 50} morph, for both upright and inverted faces (figure 2). Thus, on average, the {50 : 50} morph was also the image of maximum ambiguity as to identity. 2.3 protocol We recorded eye movements with a magnetic-search-coil technique, using a scleral contact lens and a 3-foot field coil (Crist Instruments, Bethesda, MD). The subjects' heads were secured in a chin-rest. The lens was placed in the left eye. We calibrated the system by having subjects fixate nine targets in a square grid spanning 50 deg. Twelve data points were collected at each of nine targets, and a regression method was used to find the best linear fit. Eye position was digitized at 500 samples s1. Displays were generated by a Power Macintosh 9600/233 with programs written in C++ on the Vision Shell programming platform (www.kagi.com/visionshell) and back- projected onto the tangent screen with an Eiki LC-7000U projector. Distance from the subject's cornea to the screen was 80.5 cm. On the screen, each facial image spanned a height of 67 deg with widths ranging from 61 to 82 deg. We divided our faces into two experimental sets, one male and one female. Before recordings, we showed the subjects four panels, each containing an array of eight faces, all female or all male. Two of these arrays showed the images used for the morphed series, while the other two showed alternate pictures of the same individuals. The faces had the names of the individuals printed below them, and subjects were asked to try to remember the identity of the faces. They were allowed to study these panels as long as they wished. They were told that in the experiment they would be seeing a sequence of faces that had been altered, and that they would be asked to identify to whom each face belonged. Information processing during face recognition 1093

100 Orientation upright inverted 75 % =

50

Answers: `face 2' 25

0 0 25 50 75 100 `Face 2' in morph=% Figure 2. Face classification data. A different group of subjects was shown images from the same morph pairs and asked to classify the faces as either base face 1 or base face 2. The per- centage of answers that were base face 2 is plotted as a function of the percentage of base face 2 that is in the image, for both upright and inverted faces (averaging famous and non- famous data together). The horizontal dotted line indicates the 50% point, at which subjects are as likely to say that the image belongs to base face 1 as to base face 2. Note that this corre- sponds closely to the {50 : 50} morph (50% base face 2 in image). Hence, on average, the {50 : 50} morph is the most ambiguous image regarding identity. We presented our faces in two blocks, one female and one male. The order of male/ female blocks was counterbalanced across the eight subjects. Within each block, in half the trials the faces were upright and in the other half inverted. In each block, morph series (eg Demi Moore/Julia Roberts), morph image (eg 40Demi : 60Julia), and orien- tation were randomly ordered. Each face image was seen just twice, once in each orientation. Thus each subject saw a total of 112 faces (7 morph levels, 4 same-sex face pairs, 2 orientations, 2 different-sex blocks). A trial began with a fixation screen with a small red spot at center. Fixation within 38 of this zone triggered presentation of a stimulus face, along with a red spot 308 below screen center. Subjects were asked to look at this second red spot when they had reached a decision, then state whether this face was famous or novel. On their report, the collection of eye movement data stopped and the single face was replaced with an array of the eight alternate upright facial images (ie the faces not used as base images in the morphed series; these alternates were used to emphasize the identity recognition element of the task, rather than pattern matching). Each of the eight faces had a number below it and the subject indicated the number of the individual whose face they thought they had just seen. When they had done so, the panel vanished, the fixa- tion screen for the next trial appeared, and collection of eye movement data resumed. After every ten trials a re-calibration screen appeared, with five fixation spots, one at center and one at the four corners of the region where the faces were displayed. These were numbered, and subjects were asked to look at each spot in order, and then to return to screen center. 2.4 Data analysis Using Matlab (Mathworks, Natick, MA), we identified saccades as eye movements with velocities exceeding 478 s1. The onset of a was the point at which the velocity of the eye first exceeded 318 s1, and the end of a saccade was a point where velocity fell below this baseline. Fixations were defined as the intervals between the end of one saccade and the beginning of the next saccade. We characterized each fixation by its horizontal position, vertical position, duration, and its time of onset relative to the start of the trial. 1094 J J S Barton, N Radcliffe, M V Cherkasova, J Edelman, J M Intriligator

We recorded all fixations from the beginning to the end of each trial. Our plan was to include all fixations that contributed to the formation of a perceptual decision. The initial fixation, evident as a long fixation of several seconds at screen center, was discarded, and the first fixation after this long fixation was taken as the onset of the scanning sequence. The end of the scanning sequence was the point at which subjects fixated the red dot at 308 below screen center. Not all trials were terminated by such a fixation, however, as subjects often forgot to fixate this peripheral spot when they had made a familiarity decision. In these trials, a single long fixation of more than 1500 ms duration was often present near the end of the string of eye movement data, usually at an eccentric position in relation to the image, and this was taken to indicate the end of the subject's scanning. In the absence of either a fixation at the red spot or a long near-terminal fixation, we included all fixations occurring to the end of data collection for a trial. Our next step was to adjust our data for any baseline drift or gain change that might have occurred during recording, since absolute eye position was critical to our results. From the recurring calibration trials we obtained the average eye position for fixations near each of the five spots. Offset shift was calculated as the mean difference in x and y positions from the expected value for each point. Gain shift was calculated as the mean difference in distance of each of the four corner fixations from the fixa- tions at screen center. We derived the offset and gain correction required by averaging these calculations from the two calibration trials sandwiching a set of ten face trials, and applied to all ten trials. We then collated the corrected data from all subjects for each trial, displayed these data on an x^y plot of position upon the facial image of that trial, and identified all fixations that fell upon the face. 2.5 Total scanning For each subject we calculated the total number of fixations and total duration of those fixations used to scan each face. We performed an ANOVA using JMP 4.0 (SAS Institute, www.JMPdiscovery.com), with fame (novel versus famous), orientation (upright versus inverted), and morphing (base versus all morphed images) as the main variables. Planned comparisons were made between base and morphed images for each of the four different `fame6orientation' categories. 2.6 Spatial distribution of scanning We divided each face into eight different feature regions: brow, left eye, right eye, nose, mouth, chin, right cheek, left cheek (figure 3) (Groner et al 1984). Approximate criteria were used uniformly to draw the boundaries between regions for different faces. Fixa- tions were classified according to their region, and we then performed an ANOVA with fame, orientation, morphing, and face part as the main variables. 2.7 Sequence of scanning To analyze the sequence of fixations between different facial regions, we performed two separate analyses. Both were directed at determining the variability of scanning sequences. In the first analysis we examined the local predictability of scanning using Markov first-order matrices (Gbadamosi and Zangemeister 2001; Rizzo et al 1987). We created a variable called `pairs', where the first digit represented the facial region where the prior fixation had been located, and the second digit the facial region of the current fixation. With eight different regions, this creates 64 possible pairs. These can be for- matted as an 868 first-order Markov matrix, in which the rows represent the first fixation and the columns represent the second fixation. The asymmetric lambda (aL) is a measure of the predictability or non-random nature of the pairs within this matrix. Information processing during face recognition 1095

Figure 3. Face part analysis. Michelle Pfeiffer's face is shown segmented into the eight different facial regions: 1, brow; 2, right eye; 3, left eye; 4, nose; 5, mouth; 6, chin; 7, right cheek; 8, left cheek. Fixations are shown as black disks, with disk diameter proportional to duration. A color version of this figure can be viewed on the Perception website athttp://www.perceptionweb.com/misc/ p5547/.

We calculated 8 aL values for each subject, grouping data with common orientation (upright versus inverted), fame (novel versus famous), and morphing (base versus morph). One particular sequence pair of interest is the `regionally repetitive pair'. These are pairs where the two consecutive fixations fall within the same feature region of the eight we defined above. These may represent information gathering on a different scale, possibly directed at regional component analysis rather than perception of the whole structure. Whether these regionally repetitive fixation pairs should be included in anal- yses of scan sequencing is contentious. Others have used `compressed' Markov matrices that omit regionally repetitive pairs (Gbadamosi and Zangemeister 2001). We, too, performed a second analysis using compression versions of the Markov matrices, which we felt better assessed the inter-regional flow of information processing. This yielded compressed asymmetric lambda values (aLc ). However, we also performed a separate analysis of regionally repetitive pairs, by computing the frequency of such pairs in the eight different conditions varying in orientation, morphing, and fame. Asymmetric lambda (aL), compressed asymmetric lambda (aLc ), and the frequency of regionally repetitive pairs, were subjected to repeated-measures ANOVA,withfame, orientation, and morphing as the main variables. In the second analysis we examined the global predictability of scanning, that is, how much the whole scanning sequence for one face resembled that of another by the same subject. Recent scanning studies (Brandt and Stark 1997; Gbadamosi and Zangemeister 2001) have borrowed techniques from DNA sequence analysis to attempt to characterize this. The strategy is to quantify how much change is required to trans- form one sequence into another. One may make insertions, deletions, or substitutions, with the latter counting twice, since a substitution is the equivalent of a combined insertion and deletion. Here, too, we used a compressed variant that eliminated conse- cutive repetitions of a single area (Gbadamosi and Zangemeister 2001). The number of changes is divided by the average number of items (fixations) in the two sequences being compared, to give the number of changes per fixation. This is called the `editing cost'.The higher the editing cost, the more dissimilar the two sequences. The editing costs were then subjected to an ANOVA with repeated measures, with fame, orientation, and morphing as the main variables. 1096 J J S Barton, N Radcliffe, M V Cherkasova, J Edelman, J M Intriligator

3Results 3.1 Total scanning ANOVA for number of fixations showed significant main effects of fame (F ˆ 90:2, p 5 0:0001) and orientation (F ˆ 6:35, p 5 0:012). Subjects scanned novel faces more than famous ones, and inverted faces more than upright ones (figure 4). These main effects were even greater for duration (fame F ˆ 63:1, p 5 0:0001; orientation F ˆ 12:9, p 5 0:0003). There was also a significant interaction between fame and morphing (F ˆ 3:8, p 5 0:05). Linear contrasts showed that there was a trend for the morphs of famous faces to be scanned more than the original versions ( p 5 0:07), but not for those of novel faces. Although the three-way interaction was not signifi- cant, figure 4 suggests that most of the effect of morphing on scanning duration for famous faces stemmed from the upright famous faces (different between original and morphed data, p 5 0:027), as there was no significant difference between original and morphed stimuli for inverted famous faces or for either upright or inverted novel faces.

Faces Faces novel famous novel famous 12

11 2500 ms = 10 2200

9 1900 8

Number of fixations 1600 7 Scanning duration

6 1300 (a) upright inverted upright inverted Faces (b) Orientation novel famous 300 Figure 4. Total scanning: (a) number of fixations, (b) cumulative duration of all fixations, and (c) 250 ms mean duration per fixation. Solid bars are for = base faces, striped bars for all morphed faces. 200 Novel faces are scanned more than famous ones under all circumstances, and inverted faces more 150 than upright ones. Morphing increases scanning only for upright famous faces. Most of the 100 increase in cumulative scanning duration is due 50 to increased number of fixations, as the mean

Duration per fixation duration of each fixation does not differ much 0 between the conditions in (c). upright inverted upright inverted (c) Orientation

The effects on duration of scanning were mainly due to more fixations rather than longer fixations: a separate analysis on the mean (rather than cumulative) duration of fixations showed a significant effect only for orientation (F ˆ 6:11, p 5 0:02), where fixations for inverted faces lasted 12 ms longer on average than fixations on upright faces, and no significant main effects or interactions involving orientation or morphing (figure 4). 3.2 Spatial distribution of scanning Analysis of the number of fixations showed a significant main effect of face part (F ˆ 25:2, p 5 0:0001). Subjects fixated most upon the eyes and nose, followed by the mouth, brow, chin, and cheeks. There were significant interactions of face part with Information processing during face recognition 1097

fame (F ˆ 2:57, p 5 0:013) and orientation (F ˆ 29:42, p 5 0:0001), and a trend to a significant three-way interaction between face part, fame, and orientation (F ˆ 1:97, p 5 0:055). Results for duration were similar, except that the only significant interaction was between face part and orientation (F ˆ 25:9, p 5 0:0001). Linear contrasts for the different face parts revealed the origins of these inter- actions. Contrasts of the two orientations showed that with inverted faces subjects made more fixations to the mouth and chin and fewer to the brow (figure 5). This shift to the mouth region is evident in the scanning densities in figure 1. Also, there was a reversal of a large preference for the right eye of the depicted person in upright images to a preference for the left eye of the depicted person in inverted images. Thus subjects fixated whichever eye was on the left side of space. A similar preference for the left side of space can be seen for the cheeks in figure 5.

Orientation: upright inverted Fixations Duration 3 700 600

500 2 ms

= 400

300 1

Duration 200

Number of fixations 100

0 0 brow R L nose chin R L brow R L nose chin R L eyes mouth cheek eyes mouth cheek Part of face Part of face Figure 5. Spatial distribution of scanning: effect of orientation. Inversion increases scanning in the mouth and chin, and there is a reversal of the emphasis on the images' right eyes to the left. A similar, smaller left ^ right reversal can be seen for the cheeks. We performed a secondary analysis of spatial distribution to eliminate issues of right/left asymmetry. We divided the face parts into an upper (brow and eyes) and a lower (nose, mouth, chin, cheeks) half. An ANOVA for duration showed a significant interaction between orientation and face half (F ˆ 17:4, p 5 0:0001), with the same inter- action approaching significance for number of fixations (F ˆ 3:54, p 5 0:06). There was a predominance of scanning on the upper face in upright faces, and a predom- inance of scanning on the lower face in inverted faces, for both novel and famous faces (figure 6). Contrasting novel and famous faces, there were significant differences for scanning of the eyes and nose regions. Subjects made more fixations to these regions in novel faces than in famous faces (figure 7). Figure 6 also shows that the increased scanning of novel faces was disproportionately allocated to the upper face. None of the analyses of spatial distributions showed significant interactions with morphing when all trials were included. However, recall that the total scanning anal- ysis above showed that the condition of most interest is famous upright faces, the only condition where morphed faces had significantly more scanning than base faces. How is this increased scanning distributed spatially? Planned comparisons showed more scanning of the upper face in morphed upright famous faces than in base upright faces ( p 5 0:003 for number of fixations, p 5 0:021 for duration) but no difference for the lower face (figure 8). This morph effect resembles the fame effect in figure 7: increases in scanning of the eyes and nose with little change elsewhere. 1098 J J S Barton, N Radcliffe, M V Cherkasova, J Edelman, J M Intriligator

Parts of face: upper lower Orientation Orientation upright inverted upright inverted 5.5 1200 5.0

4.5 ms = 1000

4.0

Duration 800 3.5 Number of fixations

3.0 600 novel famous novel famous novel famous novel famous Faces Faces Figure 6. Spatial distribution of scanning: effect of orientation. To eliminate left/right issues, we simply grouped brow and eyes into the upper face; and nose, mouth, cheeks, and chin into the lower face. The normal emphasis of scanning on the upper face in upright faces is reversed in inverted faces, with more time spent scanning the lower face.

famous novel Fixations Duration 2.5 600

2.0 400 1.5 ms =

1.0 200 Duration 0.5 Number of fixations

0.0 0 brow R L nose chin R L brow R L nose chin R L eyes mouth cheek eyes mouth cheek Part of face Part of face Figure 7. Spatial distribution of scanning: effect of familiarity. Scanning is increased in the region of the eyes and nose with novel faces.

base morph Novel faces, upright Famous faces, upright

3

2

1 Number of fixations

0 brow R L nose chin R L brow R L nose chin R L eyes mouth cheek eyes mouth cheek (a)Part of face (b) Part of face Figure 8. Spatial distribution of scanning: effect of morphing. Data for number of fixations only is shown, contrasting upright novel faces (a) with upright famous faces (b). The increase in scanning of upright famous faces caused by morphing in figure 4 can be seen to derive from more scanning of the eyes and nose. This resembles the effect of novel faces seen in figure 7. Information processing during face recognition 1099

3.3 Sequence of scanning In the first Markov analysis, including regionally repetitive pairs, ANOVA showed no significant effects or interactions on asymmetric lambda (aL). Notably, there was no significant main effect or interactions related to fame. Planned linear contrasts for the upright base condition also did not show a significant difference between famous and novel faces. In fact, there was a slight trend for famous faces to have a higher aL than novel faces (mean aL ˆ 0:18 for famous, 0.12 for novel). The compressed Markov analysis (omitting regionally repetitive pairs of fixations) also showed a lack of effect of fame (figure 9). In this case, however, there was a significant main effect of orientation, with inverted faces having a lower aLc, and therefore more random local scanning structure. For regionally repetitive pairs, the findings from familiar and novel faces were again very similar (figure 10). However, there was a significant interaction between morph- ing and orientation (F17, ˆ 8:97, p 5 0:005). Linear contrasts showed that morphing Faces novel famous

) 0.45 c base

0.40 morph

0.35

0.30

0.25

Compressed asymmetric lambda0.20 (aL upright inverted upright inverted Orientation Orientation Figure 9. Scanning sequence: the asymmetric lambda (aLc ) from the compressed first-order Markov matrix analysis. This is taken to represent local scan structure. aLc is calculated for each individual, averaging over all faces in that category, and then averaged over the group of observers. Famous faces have the highest aLc values, and hence the least random pairing of fixa- tions. Inversion, anonymity, and morphing tend to create more random local scanning patterns.

Faces novel famous 0.35 base

morph 0.30

0.25

0.20

Frequency of0.15 locally repetitive pairs upright inverted upright inverted Orientation Orientation Figure 10. Frequency of regionally repetitive pairs: the proportion of total number of fixation pairs that were pairs with both fixations in the same facial region. Orientation and fame had little effect on this variable. However, morphing increased repetitive pairs in upright faces but not in inverted faces. 1100 J J S Barton, N Radcliffe, M V Cherkasova, J Edelman, J M Intriligator

was associated with a significantly higher frequency of repetitive pairs in upright faces ( p 5 0:015), while the difference for inverted facesöwhich was actually a trend in the opposite directionöwas not significant. We found different results with the string analysis of editing costs, which indexes the global similarity between two sequences. There were significant main effects of fame

(F17, ˆ 17:56, p 5 0:0001) with less editing costs for novel faces than famous ones; and a significant main effect of orientation (F17, ˆ 12:8, p 5 0:0004), with less editing costs for upright faces than inverted ones. There was also a significant interaction between fame and orientation (F17, ˆ 4:14, p 5 0:043). This was because the effect of fame was seen on scanpaths when faces were upright but not when they were inverted (figure 11). Although the figure suggests a trend to increasing editing costs with degree of morph dissimilarity from the unmorphed face, particularly for famous faces, there was no significant main effect or interaction involving morphing.

1.2 Face, orientation

novel, upright 1.1 novel, inverted

famous, upright 1.0 famous, inverted

0.9

0.8 Editing cost (changes per fixation)

0.7 60 55 50 45 40 0 Comparison image in morph=% Figure 11. Scanning sequence: string-editing costs. For each subject, the similarity of scanning of each image in a morph series is compared to the scanning of the base faces at either end of the series. High editing costs imply greater dissimilarity of the global scan structure of the two sequences being compared. Scan sequences of novel upright faces have the most global resemblance to each other. In contrast, scanning sequences for famous faces are more dissim- ilar from each other. Inversion causes scan sequences to differ more also. There is a suggestion that morphing gradually introduces more dissimilarity into scan structure when viewing famous faces.

4 Discussion Subjects scanned upright famous base faces the most rapidly, in keeping with the intuition that reaching a decision threshold for identity is easiest with these stimuli. Information gathering for all other faces was less effective, reflected in a need to accu- mulate more data and hence more fixations and longer scanning durations to reach a decision. Morphing disrupts the process in famous faces by making the data acquired at each fixation ambiguous, and hence of less value in resolving competing hypotheses about identity. Novel faces do not contain ambiguous information, but the strength of the facial memories to which the perceptual data are matched is more limited than that of famous faces. This, too, resulted in the need for more data and more scanning, as others have also found (Althoff and Cohen 1999). While morphed and novel upright faces still have access to efficient face-processing mechanisms, inverted faces do not. Information processing during face recognition 1101

The gross result of inversion was similar, thoughöa need to prolong scanning to reach a decision (with the exception that, unlike the other manipulations, inversion also increased slightly the duration of individual fixations, consistent with greater difficulty in acquiring data with this orientation). Beyond these similarities on total measures, the effects of these manipulations diverged significantly when we examined spatial distribution and dynamic structure in detail. 4.1 Regional distribution of scanning The data with upright faces reproduced known patterns of salience across facial stimuli. A variety of methods have shown that the upper face attracts more attention than the lower face, with the eyes particularly emphasized (Shepherd et al 1981). These include some scanning studies (Althoff and Cohen 1999; Walker-Smith et al 1977). Whether this pattern is specifically related to the task of identifying a face is less clear from earlier studies, though. One might argue that the eyes automatically attract attention because of their importance in expressing social signals such as direction of and emotion, even if the subjects happen to be engaged in a task of identification. Our data contribute to this issue by comparing faces that differ in the ease of recognition. We use morphed famous faces and novel faces to ask where subjects scan when the stimulus is either ambiguous or lacking a strong internal representation. Our results show that it is indeed the eyes and upper face that attract even more emphasis under these circumstances. Hence, subjects do appear to derive much of their information about individual facial structure from the region of the eyes. Disabling expert face processing by inverting faces led to a very different result. Here increased scanning resulted from more fixations in the mouth and lower face, not the upper face. Why should this be? It is possible that this simply represents a bias to search the upper space of any stimulus, though we are unaware of specific data on this point. A more face-specific explanation may lie in recent perceptual studies we have performed on the inversion effect (Barton et al 2001, 2003; Malcolm et al 2004). These showed that changes in the internal spatial structure of the lower face are diffi- cult to discern in inverted faces, especially with brief viewing times, whereas those in the highly salient ocular region are easily seen. We concluded that this indicated a loss of an efficient mechanism for processing spatial structure over the entire face rapidly. Without such a process, structural information can only be extracted slowly and partially, beginning with highly salient regions first. We propose that the redirec- tion of fixations to the mouth and lower face in inverted faces is an adaptation to the disabling of that efficient mechanism for upright faces. When this occurs, fixations must be deployed to each region specifically to extract data about spatial structure, whereas with upright faces these data are readily obtained without overt inspection of the less-salient lower face. Another point of interest is the right/left asymmetry in scanning that we found. Previous studies have documented a similar emphasis of fixations on the part of the face in the left hemifield (Althoff and Cohen 1999; Butler et al 2005; Gilbert and Bakan 1973; Mertens et al 1993), although in some of these studies this asymmetry depended upon the task (judgments of gender or emotion) or the familiarity of the face. Because this effect was found for both original and mirror-reflected faces, one study concluded that the bias reflected `internal factors' of the observer (Mertens et al 1993). Our results extend upon this, by showing that the same hemifield bias exists for inverted faces, so that the bias is a hemifield rather than a hemiface effect. A left hemifield bias has also been documented in chimeric and tachistoscopic studies of face processing, though again this may depend upon factors such as processing strategy (analytic versus holisticö Rhodes 1985), spatial frequency content (Sergent 1986), and task (Schyns et al 2002). 1102 J J S Barton, N Radcliffe, M V Cherkasova, J Edelman, J M Intriligator

In general, this left-hemifield advantage has been interpreted as a consequence of a right hemispheric specialization for face processing, which is borne out by fMRI experiments showing predominant activation of the right fusiform gyrus by faces (Kanwisher et al 1997), particularly with whole faces rather than face parts (Rossion et al 2000), and patient data showing that prosopagnosia from unilateral lesions is far more likely with right rather than left occipitotemporal damage (Barton 2003). 4.2 Temporal dynamics of scanning The sequential organization of scanning provides information on how data are accumu- lated towards a perceptual decision. Some earlier work with Markov matrices suggested that famous faces were associated with more random local scanning sequences than novel faces (Althoff and Cohen 1999). This randomness was attributed to the influence of idiosyncracies in the facial memories on scanning patterns with famous faces. With- out such memories, novel faces are scanned by a more generic ocular motor sequence. However, Rizzo et al (1987) did not find any effect of familiarity in normal subjects; similarly, we found no difference between famous and novel faces or between morphed and base faces. Thus recognition difficulty did not seem to alter moment-to-moment transitions in fixations. Rather, the chief finding was that these local transitions became more random when faces were turned upside-down. Some of these discrepancies may be methodological. Rizzo et al (1987) used four quadrants for regions and did not appear to engage the subjects in a specific identi- fication task. Althoff and Cohen (1999) based their Markov matrix analysis upon `viewer-determined regions of interest', meaning areas of space with a high density of fixations. It is unclear what relation these regions of interest have to feature-defined facial regions, and how this difference might impact the results. In our study, string-editing analysis of global scan structure provided better evi- dence of a familiarity effect. Novel faces had lower editing costs and therefore a more stereotyped global scan structure than famous faces. Interestingly, this finding agrees with the matrix-based conclusions of the earlier study of the fame effect (Althoff and Cohen 1999). The effects of morphing confirmed this as well. When the scanning of a famous face was compared with the scans for morphs containing gradually lesser degrees of that face, there was a clear trend for global scan structure to become more dissimilar. This morph effect was not seen in upright novel faces. These two findings suggest that the process of data accumulation with famous faces is guided by unique facial memories that redirect scanning to specific regions during hypothesis testing, generating idiosyncratic scan patterns that differ for different faces. Lack of a strong facial memory is associated with a more generic scanning sequence. This generic sequence, however, is the product of a learned efficient mechanism for face processing, since it is disrupted by inversion, which disables the expert mechanism that processes facial structure (Barton et al 2001, 2003; Leder and Bruce 2000). Repeated scanning within a single-feature region is also an interesting dynamic phenomenon. It is not clear what generates repeated regional scanning. One possibility is that it is an attempt to resolve regional feature ambiguity in the data generated during the first fixation of the pair. If so, repeated scanning may be related to local feature-based processing, as opposed to the generation of the global face percept. Unlike many other aspects of our data, the frequency of regionally repetitive scan- ning pairs was similar for upright and inverted faces. Furthermore, this parameter did not differ between famous and novel faces. Hence this might suggest that local feature processing is minimally affected by either inversion or the strength of internal facial memories. However, the fact that morphing generated more repeated local scanning in upright faces but not inverted faces implies that regional in feature structure may have been more apparent in upright faces than in inverted faces. Information processing during face recognition 1103

This finding would agree with perceptual studies showing that the perception of feature properties is not immune to the inversion effect (Barton et al 2001; Endo 1986; Leder and Bruce 2000; Malcolm et al 2004; Riesenhuber et al 2004; Searcy and Bartlett 1996; Yovel and Kanwisher 2004). It may also be an ocular motor parallel to reports that the context of a face improves the detection of feature changes in upright but not inverted faces (Rhodes et al 1993; Tanaka and Sengco 1997). 4.3 Differential effects of expertise versus experience The contrast between upright and inverted faces revealed significant internal expertise factors in the generation of scan patterns. Since the stimuli were identical apart from orientation, differences in scanning were not due to low-level stimulus attributes. Rather, the `dual-mode' hypothesis (Searcy and Bartlett 1996) postulates that the main differ- ence lies in the access to an orientation-dependent expert mechanism for rapid face individuation, as opposed to more generic object processing. Our results showed that the expert mechanism with upright faces is reflected in several effects on scanning: it requires less scanning to reach a decision threshold and the face is processed with less need for direct attention to the less-salient lower half of the face. Dynamically, local transitions between regions and global scan structure are less randomly distrib- uted with the expert mechanism. Global scanning of novel faces revealed a more stereotyped structure to information gathering that does not depend upon the proper- ties of the viewed face. In addition, regional feature ambiguities may be more easily detected by this expert mechanism, given the increase in regionally repetitive pairs of fixations for upright morphed faces. The contrast between famous and novel faces revealed the influence of a different internal factor, that of the strength of pre-existing representations of the stimulus within the subject's memories. Hence it showed the effects of experience rather than expertise. The results showed that experience leads also to more rapid decisions, but that this is due to a decreased need to scan the highly salient eyes and upper face. Presumably this reflects a strong internal representation of these parts of the face, with easier resolution of hypothesis-testing in this region. The key influence of fame on scanning dynamics is to make the global sequence for each face more idiosyncratic, suggesting that a generic face-expert sequence has been superseded by a process of hypothesis-testing being guided by internal knowledge of elements specific to that particular face. The effects of experience on scanning distribution were corroborated by the effects of ambiguity, as induced by morphed stimuli. With morphed famous faces, it was the upper face that the scanning process sampled for more perceptual data. Repeated sampling of the region with the strongest representation in internal memories is likely the most effective strategy for resolving ambiguity in the process of making a perceptual decision about identity. In summary, our data showed that the main effects of expertise and experience differ, though both reduce the amount of scanning. Expertise results in more efficient whole- object processing with less need to scan the less-salient lower regions of the face, a less random, more organized global scan pattern, and more regionally repetitive transitions with morphed stimuli, which we interpret as enhanced detection of regional ambiguities. Experience results in a decreased need to stress the upper facial half, indicating more rapid decision-making in this region, and more variable global scan patterns, which likely reflect an influence of the idiosyncratic structures specific to each facial memory. Acknowledgments. We thank Aaron Seitz and Chris Pack for assistance with data analysis. This work was supported by NIMH grant 1R01 MH069898, a Canada Research Chair, and a Michael Smith Foundation for Health Research Senior Scholarship. It was presented in part at the annual meeting of the North American Neuro-ophthalmology Society, February 2003. 1104 J J S Barton, N Radcliffe, M V Cherkasova, J Edelman, J M Intriligator

References Althoff R, Cohen N, 1999 ``Eye-movement-based memory effect: a reprocessing effect in '' Journal of Experimental Psychology: Learning, Memory, and Cognition 25 997 ^ 1010 Bartlett J, Searcy J, 1993 ``Inversion and configuration of faces'' Cognitive Psychology 25 281 ^ 316 Barton J, 2003 ``Disorders of face perception and recognition'' Neurological Clinics 21 521 ^ 548 Barton J, Deepak S, Malik N, 2003 ``Attending to faces: change detection, familiarization, and inversion effects'' Perception 32 15 ^ 28 Barton J, Keenan J, Bass T, 2001 ``Discrimination of spatial relations and features in faces: effects of inversion and viewing duration'' British Journal of Psychology 92 527 ^ 549 Brandt S, Stark L, 1997 ``Spontaneous eye movements during visual imagery reflect the content of the visual scene'' Journal of Cognitive Neuroscience 9 27 ^ 38 Butler S, Gilchrist I, Burt D, Perrett D, Jones E, Harvey M, 2005 ``Are the perceptual biases found in chimeric face processing reflected in eye-movement patterns?'' Neuropsychologia 43 52 ^ 59 Deco G, Schurmann B, 2000 ``A neuro-cognitive visual system for object recognition based on testing of interactive attentional top ^ down hypotheses'' Perception 29 1249 ^ 1264 Endo M, 1986 ``Perception of upside-down faces: an analysis from the viewpoint of cue saliency'', in Aspects of Face Processing Eds H Ellis, M Jeeves, F Newcombe, A Young (Dordrecht: Kluwer) pp 53 ^ 58 Farah M, Wilson K, Drain M, Tanaka J, 1998 ``What is `special' about face perception?'' Psychological Review 105 482^498 Gbadamosi J, Zangemeister W H, 2001 ``Visual imagery in hemianopic patients'' Journal of Cognitive Neuroscience 13 855 ^ 866 Gilbert C, Bakan P, 1973 ``Visual asymmetry in perception of faces'' Neuropsychologia 11 355^362 Groner R, Walder F, Groner M, 1984 ``Looking at faces: local and global aspects of scanpaths'', in Theoretical and Applied Aspects of Eye Movement Research Eds A Gale, F Johnson (Amsterdam: Elsevier) pp 523 ^ 533 Kanwisher N, McDermott J, Chun M, 1997 ``The fusiform face area: a module in human extra- striate cortex specialized for face perception'' Journal of Neuroscience 17 4302^4311 Kemp R, McManus C, Pigott T, 1990 ``Sensitivity to the displacement of facial features in negative and inverted images'' Perception 19 531^543 Leder H, Bruce V, 1998 ``Local and relational aspects of face distinctiveness'' Quarterly Journal of Experimental Psychology A 51 449 ^ 473 Leder H, Bruce V, 2000 ``When inverted faces are recognized: the role of configural information in face recognition'' Quarterly Journal of Experimental Psychology A 53 513 ^ 536 Luria S, Strauss M, 1978 ``Comparison of eye movements over faces in photographic positives and negatives'' Perception 7 349 ^ 358 Malcolm G, Leung C, Barton J, 2004 ``Regional variation in the inversion effect for faces: different patterns for feature shape, spatial relations, and external contour'' Perception 33 1221 ^ 1231 Mertens I, Siegmund H, Grusser O J, 1993 ``Gaze motor asymmetries in the perception of faces during a memory task'' Neuropsychologia 31 989 ^ 998 Noton D, Stark L, 1971a ``Eye movements and '' Scientific American 224(..) 34 ^ 43 Noton D, Stark L, 1971b ``Scanpaths in eye movements during pattern perception'' Science 171 308 ^ 311 Noton D, Stark L, 1971c ``Scanpaths in saccadic eye movements while viewing and recognizing patterns'' Vision Research 11 929 ^ 942 Rhodes G, 1985 ``Lateralized processes in face recognition'' British Journal of Psychology 76 249 ^ 271 Rhodes G, Brake S, Atkinson A, 1993 ``What's lost in inverted faces?'' Cognition 47 25 ^ 57 Riesenhuber M, Jarudi I, Gilad S, Sinha P, 2004 ``Face processing in humans is compatible with a simple shape-based model of vision'' Proceedings of the Royal Society of London, Series B, Supplement ... Rizzo M, Hurtig R, Damasio A, 1987 ``The role of scanpaths in facial recognition and learning'' Annals of Neurology 22 41 ^ 45 Rossion B, Dricot L, Devolder A, Bodart J M, Crommelinck M, De Gelder B, Zoontjes R, 2000 ``Hemispheric asymmetries for whole-based and part-based face processing in the human fusiform gyrus'' Journal of Cognitive Neuroscience 12 793 ^ 802 Rybak I, Gusakova V, Golovan A, Podladchikova L, Shevtsova N, 1998 ``A model of attention- guided visual perception and recognition'' Vision Research 38 2387 ^ 2400 Schyns P, Bonnar L, Gosselin F, 2002 ``Show me the features! Understanding recognition from the use of visual information'' Psychological Science 13 402 ^ 409 Information processing during face recognition 1105

Searcy J, Bartlett J, 1996 ``Inversion and processing of component and spatial-relational information in faces'' Journal of Experimental Psychology 22 904 ^ 915 Sergent J, 1984 ``An investigation into component and configurational processes underlying face perception'' British Journal of Psychology 75 221 ^ 242 Sergent J, 1986 ``Methodological constraints on neuropsychological studies of face perception in normals'', in The Neuropsychology of Face Perception and Facial Expression Ed. R Bruyer (Hillsdale: Laurence Erlbaum Associates) pp 91 ^ 124 Shepherd J, Davies G, Ellis H, 1981 ``Studies of cue saliency'', in Perceiving and Remembering Faces Eds G Davies, H Ellis, J Shepherd (London: Academic Press) pp 105 ^ 131 Tanaka J, Sengco J, 1997 ``Features and their configuration in face recognition'' Memory and Cognition 25 583 ^ 592 Valentine T, 1988 ``Upside-down faces: a review of the effect of inversion upon face recognition'' British Journal of Psychology 79 471^491 Walker-Smith G, Gale A, Findlay J, 1977 ``Eye movement strategies involved in face perception'' Perception 6 313^326 Yin R, 1970 ``Face recognition: a dissociable ability?'' Neuropsychologia 23 395 ^ 402 Yovel G, Kanwisher N, 2004 ``Face perception: domain specific, not process specific'' Neuron 44 889 ^ 898 Zangemeister W, Sherman K, Stark L, 1995 ``Evidence for a global scanpath strategy in viewing abstract compared with realistic images'' Neuropsychologia 33 1009 ^ 1025 ß 2006 a Pion publication ISSN 0301-0066 (print) ISSN 1468-4233 (electronic)

www.perceptionweb.com

Conditions of use. This article may be downloaded from the Perception website for personal research by members of subscribing organisations. Authors are entitled to distribute their own article (in printed form or by e-mail) to up to 50 people. This PDF may not be placed on any website (or other online distribution system) without permission of the publisher.