<<

Journal of Vision (2012) 12(2):9, 1–16 http://www.journalofvision.org/content/12/2/9 1

Head and gaze dynamics during visual attention shifts in complex environments

Laboratory for Intelligent and Safe Automobiles, Anup Doshi University of California, San Diego, La Jolla, CA, USA

Laboratory for Intelligent and Safe Automobiles, Mohan M. Trivedi University of California, San Diego, La Jolla, CA, USA

The dynamics of overt visual attention shifts evoke certain patterns of responses in eye and movements. In this work, we detail novel findings regarding the interaction of eye gaze and head pose under various attention-switching conditions in complex environments and safety critical tasks such as driving. In particular, we find that sudden, bottom-up visual cues in the periphery evoke a different pattern of eye–head movement latencies as opposed to those during top-down, task-oriented attention shifts. In laboratory vehicle simulator experiments, a unique and significant (p G 0.05) pattern of preparatory head motions, prior to the gaze saccade, emerges in the top-down case. This finding is validated in qualitative analysis of naturalistic real-world driving data. These results demonstrate that measurements of eye–head dynamics are useful data for detecting driver distractions, as well as in classifying attentive states in time and safety critical tasks. Keywords: attention, active perception, top-down and bottom-up attention shifts, exogenous vs. endogenous cues, driver assistance systems Citation: Doshi, A., & Trivedi, M. M. (2012). Head and eye gaze dynamics during visual attention shifts in complex environments. Journal of Vision, 12(2):9, 1–16, http://www.journalofvision.org/content/12/2/9, doi:10.1167/12.2.9.

2007), to predict future behaviors or inattention, and Introduction thereby counteract poor driving behavior. By detecting patterns of body in critical situations, such as the Analysis and understanding of human behavior, partic- time prior to a lane change, these systems are able to ularly of head and eye gaze behavior, has been a subject predict the context of the situation. of interest for many years (Dodge, 1921) in cognitive In other interactive environments such as intelligent psychology and neurophysiology; yet a full understanding command-and-control centers or intelligent meeting of the causes, dynamics, and control mechanisms of head rooms, systems monitoring the participants or operators and eye movements is still a subject of active research could provide assistance based on the subjects’ body (Freedman, 2008). More recently, researchers in the language. This may help reduce distractions and help computer vision and artificial intelligence community have improve performance of whatever task is being performed. increasingly sought to incorporate information gleaned Landry, Sheridan, and Yufik (2001) discovered that certain from patterns of human behaviors into intelligent human– patterns, or “gestalts,” of aircraft on a radar screen drew the machine interfaces (HMIs; Pentland, 2007; Trivedi, attention of the air traffic controllers due to their location, Gandhi, & McCall, 2007). The confluence of these two though they were not relevant to the task. An air traffic research paradigms allows for a deeper understanding of control training manual from the FAA (Cardosi, 1999) fundamental human cognition and behavior, as well as states that “even in low-workload conditions, distractions how to interpret and modify that behavior, if necessary. can clobber short-term or working memory.” An assis- Active perception of human operators by “intelligent tance system could mitigate such dangerous situations by environments” can allow assistance systems to improve detecting the context of the attention shift and providing performance and even help avoid dangerous circumstances. warnings when the attention shift is not task related. Every year, traffic accidents result in over one million Those intelligent environments that must assist fatalities worldwide (Peden et al., 2004) and around 40,000 in time and safety critical situations would clearly benefit in the U.S. alone. Of those, an estimated 26,000 are due from knowledge of the human’s cognitive state. Such to some form of driver inattention (Angell et al., 2006; information could allow the system to detect, for example, S. E. Lee, Olsen, & Wierwille, 2004). Recent advances in whether the driver is distracted or actively engaged in a active vision and machine intelligence have resulted in relevant task. Information about salient properties of the incorporation of camera-based driver analysis into driver environment (Itti & Koch, 2000) could help understand assistance systems (Trivedi & Cheng, 2007; Trivedi et al., attention; however, such systems are not good at predicting doi: 10.1167/12.2.9 Received February 27, 2010; published February 9, 2012 ISSN 1534-7362 * ARVO Journal of Vision (2012) 12(2):9, 1–16 Doshi & Trivedi 2 the actual focus of attention (Ballard & Hayhoe, 2009), nor that measurements of eye–head dynamics are useful do they give insight into the cognitive state of the driver. data for detecting driver distractions, as well as in Studies into the dynamics of head and eye gaze classifying human attentive states in time and safety behavior tend to hint that attention is linked to the types critical environments. of movements of head and eye. A number of studies have shown that when presented with a stimulus in the field of view, the head tends to lag behind the eye during the gaze shift (Bartz, 1966; Goossens & Opstal, 1997). Others have Related research found that early head motions, with respect to the saccade, are associated with target predictability, location, propen- sity to move the head, and timing (Freedman, 2008; Fuller, Human behavior analysis and prediction 1992; Khan, Blohm, McPeek, & Lefe`vre, 2009; Morasso, Sandini, Tagliasco, & Zaccaria, 1977; Ron, Berthoz, & A great amount of recent research has been in the Gur, 1993; Zangemeister & Stark, 1982). These studies detection of human behaviors. Many of these examples occurred in very controlled, arguably unnatural environ- use patterns of human behavior to learn about the scene or ments; more recent studies into “natural” tasks involving predict future behaviors (Oliver, Rosario, & Pentland, grasping, tapping, or sitting in a lecture (Doshi & Trivedi, 2000; Pentland & Liu, 1999; Ryoo & Aggarwal, 2007). In 2009a; Herst, Epelboim, & Steinman, 2001; Mennie, considering time and safety critical situations such as Hayhoe, & Sullivan, 2007; Pelz, Hayhoe, & Loeber, driving scenarios, we present an overview of research 2001) have suggested that task-oriented gaze shifts may related to behavior analysis and prediction in vehicles. be associated with early head motions. These studies suggest the hypothesis that the interactive dynamics of head and eye movements may encode information about Driver behavior and intent prediction the type of attention shift, and thus the cognitive state, of Recent research has incorporated sensors looking inside the human subject. the vehicle to observe driver behavior and infer intent These preliminary investigations led to two related (Trivedi & Cheng, 2007; Trivedi et al., 2007). Bayesian questions in more complex environments such as a driving learning has been used to interpret various cues and scenario: predict maneuvers including lane changes, intersection 1. Is it possible to observe differences in – turns, and brake assistance systems (Cheng & Trivedi, head dynamics during different styles of attention shifts? 2006; McCall & Trivedi, 2007; McCall, Wipf, Trivedi, & 2. Is it possible to extract and use those cues to help Rao, 2007). More recently, head motion has been shown identify or learn information about the subject’s cognitive to be a more useful cue than eye gaze for discerning lane state? change intentions (Doshi & Trivedi, 2009b). In the following sections, we detail novel findings The assumption made in all of these systems has been regarding the interaction of eye gaze and head pose that head motion or eye gaze is a proxy for visual dynamics under various attention-switching conditions in attention. In other words, the system tries to measure head more complex environments and while engaged in safety motion given that the driver is likely paying attention to critical tasks such as driving. In particular, we find that whatever they are looking at, with the assumption that sudden, bottom-up visual cues in the periphery evoke a attention is associated with gaze. different pattern of eye–head yaw movement latencies as opposed to those during top-down, task-oriented attention shifts. In laboratory vehicle simulator experiments, a unique and significant (p G 0.05) pattern of preparatory Gaze behavior and visual search head motions, prior to the gaze saccade, emerges in the top-down case. In contrast to the findings of Land (1992), we A gaze shift may or may not be associated with a also observe the early head motion patterns during task- particular goal. The broad question of “why people look oriented behavior in real naturalistic driving scenarios. where they look” is a subject of research for cognitive We ultimately aim to understand whether one can detect psychologists, neuroscientists, and computer vision whether the human subject had prior knowledge of the scene researchers alike. or the task upon which they are focusing. That contextual A significant amount of research in psychology has knowledge may be useful to real-time interactive assistance examined whether such visual searches are guided by systems. Specifically, it could provide semantic hints such goals (such as the goal of changing lanes) or by external as whether there is a distraction or unplanned stimulus in stimuli (Itti & Koch, 2000; Jovancevic, Sullivan, & a scene, or what kinds of goals the human subjects are Hayhoe, 2006; Navalpakkam & Itti, 2005; Peters & Itti, pursuing. This finding is validated in qualitative analysis 2006; Rothkopf, Ballard, & Hayhoe, 2007). These stimuli of naturalistic real-world driving data. These results show may include visual distractions, which could pop up in a Journal of Vision (2012) 12(2):9, 1–16 Doshi & Trivedi 3 scene and thereby attract the attention of the observer. For Zelinsky, 2000). In this study, we instead use the temporal the most part, any visual search is presumed to be guided relationship of eye gaze and head pose to determine the by some combination of a goal- and stimulus-driven attentional state of the subject and proceed to use that approach, depending on the situation (Yantis, 1998). information as contextual input to event detection and Itti et al. (Itti & Koch, 2000; Navalpakkam & Itti, 2005; criticality assessment systems. In the following section, we Peters & Itti, 2006) and others (Mahadevan & Vasconcelos, describe the framework of eye gaze interactivity analysis, 2010; Zhang, Tong, Marks, Shan, & Cottrell, 2008) have and in the later sections, we describe some supporting made inroads in developing saliency maps that model and experiments. predict focus of attention on a visual scene. Initial models were based on “bottom-up” cues, such as edge and color features. However, it was found that “top-down” cues may be more influential; in other words, the context of the Attention shifts scene and the goals of the human subject are crucial to determining where they look. Jovancevic et al. (2006) determined that even in the presence of potentially Attention shifts can be of two kinds or some combination dangerous distractions, in complex environments gaze is of the two (Yantis, 1998). Top-down attention shifts occur tightly coupled with the task. Several other works have when the observer has a particular task in mind. This task similarly concluded that in natural environments, saliency may necessitate a visual search of a potentially predeter- does not account for gaze, but task and context determine mined location or a search of likely relevant locations. An gaze behavior (Henderson, Brockmole, Castelhano, & example may be as simple as shifting one’s attention from a Mack, 2007; Land, 1992; Rothkopf et al., 2007; Zelinsky, television to a newspaper, after having turned the television Zhang, Yu, Chen, & Samaras, 2005). off. There may also be learned tasks, such as the search for It is clear that in certain critical environments, dis- oncoming cars when crossing a road, as a driver in a vehicle tractions play an important role in attracting attention. or as a pedestrian at a crosswalk. Carmi and Itti (2006) show that dynamic visual cues play a Bottom-up attention shifts are caused by interesting causal role in attracting visual attention. In fact, percep- stimuli in the environment. These stimuli may include tual decisions after a visual search are driven not only by distractions or salient regions of a scene. For example, visual information at the point of eye fixation but also by flashing police lights on the highway may draw a driver’s attended information in the visual periphery (Eckstein, attention, or an audience member may be drawn to an Caspi, Beutter, & Pham, 2004). In certain cases, these instant chat message popping up on the screen during a stimuli may affect primary task performance; Landry et al. technical presentation. (2001) found that certain unrelated gestalt motion patterns Generally speaking, Pashler, Johnston, and Ruthruff on radar screens drew the attention of air traffic controllers (2001) observe that “whereas exogenous attention control away from the task at . In the driving context, there is characterized as stimulus driven, endogenous control is are many well-known cognitive and visual distractions typically characterized as cognitively driven.” Others have that can draw the driver’s attention (Angell et al., 2006; encoded such notions into computational models of Doshi, Cheng, & Trivedi, 2009). Recarte and Nunes (2000) attention (Mozer & Sitton, 1998). measured the number of glances to the mirror during a In many cases, an object in the scene may easily be a lane change, noting that visual distractions decrease the distraction in one instance and part of the primary task at glance durations by 70–85%. This result is well aligned hand at another time. For example, in a classroom, the with more recent results indicating the limitations of person standing in front of the blackboard may be the drivers’ multitasking abilities (Levy & Pashler, 2008; teacher during a lesson, to whom the student should be Levy, Pashler, & Boer, 2006). Moreover, some suggest paying attention. On the other hand, the person in front of that visual distractions may even increase likelihood of the blackboard may just be someone walking by, who is “change blindness,” a phenomena whereby a subject may distracting the student from the task at hand. By classifying look in a certain area and not see or comprehend the the student’s interactive behaviors, we may be able to set objects in front of them (Durlach, 2004; Y. C. Lee, Lee, & the context for the attention shift and understand the state of Boyle, 2007). In these cases, it would be useful to know the subject, whether cognitively driven (top-down) or whether a gaze shift is attributable more to the primary goal stimulus driven (bottom-up). or context (e.g., of maintaining the vehicle’s heading in the lane) or to a secondary and potentially irrelevant visual stimuli (e.g., a flashing billboard). Head and eye gaze during attention shifts Several studies have used eye gaze or head pose to detect the attention of the subject (Batista, 2005)or Zangemeister and Stark (1982) performed a controlled estimate user state or gestures (Asteriadis, Tzouveli, study of eye–head interactions and posited various con- Karpouzis, & Kollias, 2009; Matsumoto, Ogasawara, & ditions of different styles of eye–head movements. In their Journal of Vision (2012) 12(2):9, 1–16 Doshi & Trivedi 4

Figure 1. Examples of various interactions of head and eye movements, with type labels from Zangemeister and Stark (1982). Note that in certain cases eye gaze tends to move first, where in others the head tends to move first. paper, they found several styles of movements, depicted in same results seen above. Several variables including initial Figure 1. Among the most pertinent of movements are eye, head, and target positions, along with predictability, those labeled “Type III,” which include early or antici- seem to affect the latencies of head movements (Ron et al., patory head movement with respect to the gaze shift. They 1993). A few studies have touched the possibility that theorized that this behavior is associated with a repetitive, attention and task can influence the dynamics of eye–head predetermined, or premeditated attentional shift, as is the movements (Corneil & Munoz, 1999; Herst et al., 2001; case for any goal-directed attentional shift. Khan et al., 2009; Mennie, Hayhoe, Sullivan, & Walthew, Morasso et al. (1977) examined control strategies in the 2003). These studies are well controlled in laboratory eye–head system and observed that “The head response environments but limited in their generalizability to [to a visual target], which for random stimuli lags slightly natural environments and more complicated tasks. behind the , anticipates instead for periodical move- In this study, we venture to examine directly the effects ments [of the target].” The implication is once again that of goal- vs. stimulus-driven attentional shifts on eye–head for trained or predetermined gaze shifts, the head move- coordination. Further, by classifying the type of shift, we ment anticipates the gaze. are able to propose a novel model to determine the Indeed, eye shifts can occur much faster than head cognitive state of the subject, which may prove useful to movements, and thus, near-instantaneous gaze changes can assistive human–machine interfaces such as driver assis- be made with eye shifts alone. Shifts of larger amplitudes tance systems. would necessitate a head movement to accommodate the In the following sections, we show that by extracting limited field of view of the eyes. Most likely, when a large the yaw dynamics of eye gaze and head pose, it may be visual attentional shift is about to occur, and the observer possible to identify those gaze shifts that are associated has prior knowledge of the impending shift, these studies with premeditated or task-oriented attentional shifts, imply that there may be some amount of preparatory head driven by “endogenous” cues. In each case, we find that motion (Pelz et al., 2001). a majority of endogenous, task-related shifts occur with an Freedman (2008) and Fuller (1992) each included anticipatory Type III gaze shift. Based on these results and thorough reviews of eye–head coordination and verified the the studies listed above, we might further hypothesize that Journal of Vision (2012) 12(2):9, 1–16 Doshi & Trivedi 5 a Type III gaze shift could imply a task-related shift, and Type I or II is more likely to occur in conjunction with stimulus-related gaze shifts, associated with “exogenous” cues.

Methods

Participants

Ten volunteers consented to participate in the experi- ment, 9 males and 1 female. The participants were mostly in their 20s, with two in their 30s, one over 40, and the female subject near the median age of 25. Every subject had a valid driver’s license, with varying driving experience level from novice to decades of driving experience.

Apparatus

The experiment was conducted in a driving simulator, with several additional features to facilitate the various conditions of the experiment. The simulator was config- ured as shown in Figure 2.A52W screen was placed 3 ft in front of the subject, with a Logitech steering wheel mounted at a comfortable position near the driver. An additional secondary screen was placed to the left of the main monitor, in the peripheral vision of the subject. The main monitor was configured to show a PC-based interactive open source “racing” simulator, TORCS (TORCS, The Open Racing Car Simulator, 2010). The software was modified in several ways to make it a more appropriate driving simulator. The test track was chosen to be a two-lane highway/main road running through a city. Several turns on the driving track necessitated a significant slowdown in speed to maneuver through, keeping the driver actively engaged in the driving task. Figure 2. Experimental setup of LISA-S test bed. The test bed Additionally, the maximum speed of the “vehicle” was includes a PC-based driving simulator, with graphics shown on a limited to appropriate highway speeds, in order to 52-inch monitor, an audio, and a steering wheel controller. The PC discourage excessive speeding. also controls a secondary monitor, located at an angle of 55- with In this experiment, the track contained no other respect to the driver, in a similar location to where the side-view vehicles, to limit the complexity of interactions. The mirror would be. The test bed includes a head and eye gaze ego-vehicle and road parameters such as friction, vehicle tracking system, as well as a vision-based upper body tracking weight, and others were fixed to approach real-world system (not used in this experiment). All the data from the gaze conditions. However, they were also constrained in order and body trackers are recorded synchronously with steering data to facilitate an easy learning process, as some subjects had and other parameters from the driving simulator. never used driving simulators before. The tire friction was increased to prevent sliding, and the limiting of maximum speedalsoalloweddriverstoquicklyadjusttothe appropriately placed not to obstruct the field of view of simulator. the driver. The system required calibration for each A stereo-camera-based non-intrusive commercial eye subject, which was performed prior to any data collection. gaze tracker, faceLAB by Seeing Machines (Seeing Once calibrated, it output a real-time estimate (updated at Machines faceLAB Head and Eye Tracker, 2004), was 60 Hz) of gaze location (pitch and yaw) and head rotation set up just in front of and below the main monitor. It was (pitch, yaw, and roll). These quantities were calculated Journal of Vision (2012) 12(2):9, 1–16 Doshi & Trivedi 6 and transmitted concurrently in “Minimum Latency” the secondary monitor to find out whether to change lanes. mode to the PC running the driving simulator, along with These cues are demonstrated in an illustrative example a “validity” signal indicating confidence in the tracking in Figure 3. (a lack of which would be caused by blinks and occlusions). The “endogenous” condition (Condition 1)ofthe The gaze and head data, along with all the driving experiment was intended to stimulate “goal-oriented” parameters such as distance traveled and lateral position, attention switching, such as the planned visual search of were automatically timestamped and logged to disk every a driver checking mirrors prior to a lane change. Large 10 ms, without any filtering or smoothing operations. overhead signs appeared at several constant, predefined The secondary monitor showed various text messages as locations around the track. The subject was instructed that described below, depending on the condition. This display upon noticing these signs coming up in the distance, the was controlled synchronously by the same PC that ran the subject should glance over at the secondary monitor. This simulator. monitor would be displaying a message, “Left Lane” or “Right Lane,” indicating in which lane the driver should be. The subject was told to glance over to the message and Design move to the corresponding lane by the time the overhead sign passed by. The duration of the cue appearance varied Each subject was run in two conditions, endogenous from 5 to 20 s. In this manner, the subject was allowed and exogenous. In each condition, the subject had a time enough to plan and initiate the attention switch by primary task of maintaining their lane heading and, as part herself; we label this as the “endogenous” cue condition of a secondary task, was cued in a different way to check (Condition 1).

Figure 3. Illustrative (staged) example of the experimental paradigm. In each cuing condition, we measure the differences in eye–head interactions during attention shifts to a secondary monitor, which the driver is required to check for instructions. In the “endogenous” condition, the driver is presented with a cue in the primary monitor and allowed to make a goal-oriented or preplanned attention shift. In the stimulus-oriented “exogenous” cuing condition, the secondary monitor displays a sudden change in color, drawing the driver’s attention to the target. Journal of Vision (2012) 12(2):9, 1–16 Doshi & Trivedi 7 The second condition was designed to evoke an unplanned Analysis “stimulus-oriented” attention-switching response, as if the driver was suddenly distracted. In this condition, the driver The automatically logged data sets from each experiment was also told to maintain the lane as best as possible. The were then processed to analyze the dynamics of head and cue to change lanes would come from the secondary eye movements leading up to the attention shift. For each monitor, whose entire screen would change color suddenly, condition, the appearance time of the cue was determined at a set of predefined times unknown to the subject. This and the subsequent gaze shift was determined to be an color change occurred in concert with a potential change example of a shift of interest. The secondary monitor was of the text message, once again to either “Left Lane” or fixed at an angle of approximately 55- from the subject, so “Right Lane.” Upon noticing the change, the driver was only gaze shifts that resulted in glances of that magnitude tasked with maneuvering to the appropriate lane as soon as were considered. was safe to do so. The subject was told that the colors were In 34% of the cases in the entire experiment, cues that random, not correlated with the text, and would occur at were not followed by a tracked gaze shift of sufficient random times. Thus, the subject’s attention switch was magnitude appeared. This was caused either by a lack of hypothetically initiated by some external cue in the response from the subject or a lack of tracking confidence peripheral field of view; we label this as the “exogenous” from the head and eye tracker. Occasionally, the tracker cue condition (Condition 2). would lose an accurate representation of the subject’s The “exogenous” cue would ideally have been a head or eye movements, and this would be represented by stimulus not associated with the task of driving. However, a “validity” signal, output by the tracker. Whenever there it would have been difficult to collect enough data where the was no valid gaze shift of sufficient magnitude following subject freely decided to attend to a potentially irrelevant the appearance of a cue, the example was discarded from stimulus. By associating the stimulus-style cue with the further analysis. secondary task of lane selection, it became possible to Fewer than 5% of the examples in the “exogenous” gather consistent data about how the subject would respond condition were discarded due to the unforeseen effects of to the stimulus-oriented cues such as sudden flashes, high cognitive load. Occasionally, a color change would motion, or other unplanned distractions. This could then appear while the driver was actively engaged in a sharp be directly compared to driver behaviors under the turn, either causing the driver to lose control of the vehicle or “endogenous” condition described above. to shift their glance slightly without paying much attention Each condition consisted of 10 min of driving, corre- to the color change. Where these effects were observed, the sponding to 12 to 15 lane changes per condition. The examples were discarded, as in these cases drivers behaved order of the conditions was presented randomly, and to inconsistently, most likely due to a task overload. Such ensure comfort, participants were offered breaks between effects could be the topic of further investigations, but conditions. examples were too few to discuss in detail in this study. Each subject j had approximately 10 examples of each condition after the pruning step described above. For the Procedure remaining examples in both conditions, the time of the first gaze saccade to the secondary monitor was found in Participants were told that the experiment pertains to an iterative manner. This was done to avoid any false naturalistic lane-keeping behavior. After calibrating positives in an automatic saccade detection procedure, each subject on the gaze tracker, the subject was given given the somewhat noisy nature of the gaze data. 5–10 min to acclimatize herself to the simulator. She was In the first step, the example was manually annotated as queried once she felt comfortable with the simulator, and to the approximate time of the initiation of the gaze her comfort level was verified by subsequently asking her saccade, during the first gaze shift of approximately 55- in to keep the vehicle in a single lane for at least 60 s. yaw (i.e., to the target monitor). Subsequently, the point of For the remainder of the experiment, the subject was maximum yaw acceleration in a local 50-ms window W tasked primarily with maintaining her current lane to the around the annotated point TL was found. This was best of her ability. This allowed the subject to be actively calculated using a 5-tap derivative-of-Gaussian filter to engaged in the driving process throughout the experiment. temporally smooth and find the second derivative of the As part of a secondary task, the subject was instructed to gaze rotation signal. The maximum point of the second respond to the cues in each condition, and if necessary, derivative was fixed as the location of the gaze saccade: change lanes when safe to do so. This required a glance to the secondary screen, or “side mirror,” in order to decide S ¼ ð W ½ Þ; ð1Þ which lane to move into. Though this was not entirely Tij argmaxtZW eY t naturalistic, it was reminiscent of glancing at the mirror prior to lane changes to scan for obstacles, and partic- where eY[t] represents the yaw position of the eye at time t. ipants had little difficulty following the instructions. An example of this detection procedure can be seen in Journal of Vision (2012) 12(2):9, 1–16 Doshi & Trivedi 8

e2 ¼ ð k Þ; ð5Þ Pj mediani Pij Condition 2

1 ¼ ð k Þ; ð6Þ eMj mediani Mij Condition 1

2 ¼ ð k Þ: ð7Þ eMj mediani Mij Condition 2

Results and discussion

Each trial was characterized by one main independent variable: the condition, either endogenous or exogenous. Two related variables were measured, namely, the head yaw Pij and head motion Mij at the point of the gaze saccade. The order of presentation was also randomized, but there . Figure 4 Sample data showing the procedure for detection of the is no interaction between order and condition (p 9 0.5), so start of the eye saccade. A local area around a manually marked we collapse across the order groups. point is searched for the maximum eye yaw acceleration. This S A main effect of the condition is found both in yaw point is labeled as the start of the saccade, T ij . The dynamics of 1 - 2 - S (mean (Pij) = 6.68 , mean (Pij ) = 1.54 , F(1,9) = 9.44, the head yaw, i.e., position and motion, at this point T ij are 1 p G 0.05) and in motion (mean (Mij ) = 41.36-/s, mean extracted for further analysis. 2 (Mij ) = 22.12-/s, F(1,9) = 11.23, p G 0.05). Figure 5 shows the distribution of all the data at the point of the saccade, Figure 4; we found that the local point of maximum eye clearly demonstrating that the yaw and motion both tend yaw acceleration was a consistent labeling method for the to be greater at the point of saccade in the endogenous following analysis. condition. S Given the time of the gaze saccade, Tij, two features Figure 6 shows the average head yaw over all the based on head dynamics were calculated for each example examples time aligned to the saccade location. The head i of subject j. The first was simply the value of the head yaw in Condition 1, shown in red, demonstrates a clear yaw at the point of the saccade: increase in head motion as much as 500 ms prior to the saccade, with the greatest motion occurring around 200 ms ¼ ½ S; ð2Þ prior. Head yaw in Condition 2 begins clearly increasing Pij hY Tij in earnest only 50–100 ms prior to the saccade. Prior studies (Freedman, 2008;Pelzetal.,2001; where hY[t] represents the yaw position of the head at Zangemeister & Stark, 1982) have noticed early head time t. motions up to 200 ms in advance of the saccade. The results The second feature was the average yaw velocity in the of these experiments are thus in mild agreement, although it 100 ms prior to the saccade: is apparent that in preparation for task-oriented gaze shifts in the driving situation, the head movement may begin as TS Xij much as 500 ms in advance of the saccade. 1 0 Mij ¼ hY½t : ð3Þ To discount the effect of each subject’s outliers, the 10 ½ ; ½ ; t¼TSj100 ms e1 2 e1 2 ij subject-wise medians of these data, Pj and Mj , were also calculated to analyze in further detail. Given the

Pij and Mij thus represent the yaw position and motion limited data set size, the two pairs of statistics were then features of each example i of subject j. These features analyzed using the Wilcoxon Signed-Rank test, a non- were chosen to capture whether the head was moving in a parametric extension of the t test. This analysis showed preparatory motion prior to the saccade and by how much. significant differences in both cases; eP1 = 6.2-9eP2 = 1.1- Based on the features above, several additional statistics j j G 1 - 9 2 were calculated: (Z = 2.80, p = 0.0051 0.01) and eMj = 45.3 /s eMj = 13.9-/s (Z = 2.50, p = 0.0125 G 0.02). Figure 7 shows the differences in the median statistics computed above, for e1 ¼ ð k Þ; ð4Þ Pj mediani Pij Condition 1 Pej and Mej. Journal of Vision (2012) 12(2):9, 1–16 Doshi & Trivedi 9

Figure 6. Average head yaw prior to eye gaze saccade under each condition of the experiment, aligned to the position of the saccade. Dotted lines show the variance of the overall data. In Condition 1, a clear pattern of early head movement, beginning 0.5 s prior to the actual gaze shift, emerges. This early head movement is much less evident in Condition 2. Kinematics of eye and head movements

Here, we measure a number of other variables that . could have influenced the onset of early head motion due Figure 5 Overall distribution of head yaw position and head yaw to the kinematics of the eye–head motion. motion at the time of the eye gaze saccade for each condition, Prior studies in controlled conditions have determined including all examples of all subjects. that the amplitude of the gaze shift had a significant

The endogenous cuing condition, corresponding to a “task-oriented” attention shift, demonstrates a clear pattern of greater and earlier head motion just prior to the saccade. Figure 8 demonstrates the relative timings of the gaze saccade after the appearance of the cue. The exogenous cue tends to attract attention very quickly, and most gaze shifts are made within 500 ms of the cue. This is in clear contrast to the endogenous cue, which, as expected, results in the subject shifting gaze in a preplanned manner. This separation shows that the experimental conditions elicited the two different styles of gaze shifts appropriately. In order to characterize the time course of the head movements with respect to the gaze shift, we can measure the timing of the first “significant” head motion. In Figure 9, the histograms of the first head motion (where head motion goes above a fixed manually selected threshold of 17-/s) is shown for each condition. This is found by searching in both directions from the point of the gaze saccade, to determine where the head motion first exceeds Figure 7. Distribution of subject-wise median head yaw position the threshold. The endogenous cuing condition can be and head yaw motions at the time of the eye saccade. Error bars observed eliciting a greater portion of early head motions. represent standard error of the median. Journal of Vision (2012) 12(2):9, 1–16 Doshi & Trivedi 10

Figure 10. Starting yaw position for eye and head motion under Figure 8. Distribution of saccade timings, after the onset of the first each condition. Error bars represent the standard error of the cue. The delay in Condition 1 is as expected as subjects take time mean. to detect the cue and plan the saccade. Condition 2 follows the pattern of unplanned saccades, mostly occurring around 500 ms movement of the head. However, in this study, we have after cue onset. fixed the target location, and the amplitude as determined by initial position is generally constant as well. To verify impact on the amount of head movement prior to this, we compared the starting yaw positions in Figure 10 the saccade (Freedman, 2008), as larger movements and found no reliable differences in starting position tended to correlate with early head motions. Another (EyeInitialYaw: F(1,9) = 2.84, p = 0.09; HeadInitialYaw: critical variable was found to be the predictability of the F(1,9) = 0.95, p = 0.33). The subject is always aware of target location, which also positively influenced the early the location of the target, in both the endogenous and exogenous cuing cases. In spite of the constant target and shift amplitude, we still find variations in the amount of early head movement, correlating with the type of cuing condition, whether driven endogenously by top-down motor commands or exogenously by bottom-up stimulus responses. Bizzi, Kalil, and Morasso (1972) also report in con- trolled environments that predictive movement conditions include lower peak velocities and movement durations than triggered conditions. To analyze these effects, we measured the peak velocities and saccade durations for both eye and head movements in Figures 11 and 12.In contrast to the earlier studies, the peak eye velocities in the endogenous condition actually trend toward being sig- nificantly greater than in the exogenous condition (F(1,9) = 5.16, p = 0.02). In all other cases, there were no significant differences (PeakHeadVelocity: F(1,9) = 0.14, p = 0.71; EyeMovementDuration: F(1,9) = 1.24, p = 0.26; Head- MovementDuration: F(1,9) = 1.10, p = 0.30). We are thus not able to observe slower peak velocities or movement durations in the endogenous cuing case. Figure 9. Distribution of first significant head motions (over a fixed Finally, during predictive movements, it has been threshold) relative to the gaze saccade. The histograms represent reported that the head contribution may be larger than the actual measurements, and the solid lines represent a fitted during triggered movements (Freedman, 2008). If it were Gaussian. The endogenous, “goal-oriented” condition shows a the case that the head contribution were larger in predictive marked difference, with a majority of head motions occurring prior movements, we would observe a greater maximum in the to the saccade. head yaw in the endogenous case. As demonstrated in Journal of Vision (2012) 12(2):9, 1–16 Doshi & Trivedi 11

Figure 13. Maximum yaw position for eye and head motion under each condition. Error bars represent the standard error of the Figure 11. Maximum yaw rates for eye and head motion under mean. each condition. Error bars represent the standard error of the mean. duration seem uncorrelated with either style of gaze shift. This could further imply that under these conditions, such Figure 13, we find no reliable differences between the variables are irrelevant to onset of early head motion, in maximum yaw positions of either the head or the eye, contrast to the conclusions of earlier studies, such as those between the two conditions (EyeMaxYaw: F(1,9) = 2.12, reviewed in Freedman (2008). However, more experiments p = 0.15; HeadMaxYaw: F(1,9) = 0.15, p = 0.70). should be done in these cases to verify these claims. Some of these results have been repeated in other controlled environments (Fuller, 1992; Khan et al., 2009; Zangemeister & Stark, 1982). In this more naturalistic Analysis of misclassifications study, we found almost no variations between the con- ditions. This implies that by approximating more natural To analyze the misclassifications associated with each conditions, along with a more complex task of driving, of the test statistics (either Pij or Mij), we show confusion variables such as initial gaze position and gaze shift matrices in Tables 1 and 2. Results approach 66% detection rates using the head yaw position, with fewer misclassifications associated with predictions of exogenous, stimulus-oriented behavior. Given this information, it may be possible to generate a classifier to determine in real time whether the individual example is more similar to Condition 1 or Condition 2. The statistical basis for such a classifier is provided by the ANOVA results in the previous section, giving credence to the possibility of automatically classifying the examples. Such a classifier could then directly be used to improve advanced human–machine interfaces in task-oriented envi- ronments; we leave it to future work for analysis of such a classifier.

Number of examples

Actual G Actual S

Predicted G 63 41 Predicted S 25 87

Table 1. Confusion matrix for detecting endogenous, goal-oriented Figure 12. Duration of motion from initial movement until target. gaze shifts (G) versus exogenous, stimulus-oriented shifts (S) in Note that, in case of eye motion, this is a superset of the saccade the simulator experiment, using head yaw position criteria. Correct duration. Error bars represent the standard error of the mean. classification rate is 69%. Journal of Vision (2012) 12(2):9, 1–16 Doshi & Trivedi 12

Number of examples by McCall et al. (2007). A camera-based lane position detector and CAN-Bus interface provided most of the data Actual G Actual S related to the vehicle and surrounding environment. Predicted G 65 39 The main driver-focused sensor was a rectilinear color Predicted S 35 77 camera mounted above the center console facing toward the driver, providing 640 480 resolution video at 30 frames Table 2. Confusion matrix for detecting endogenous, goal-oriented per second. To calculate head motion, optical flow vectors gaze shifts (G) versus exogenous, stimulus-oriented shifts (S) in were compiled and averaged in several windows over the the simulator experiment, using head yaw motion criteria. Correct driver’s (detected with the Viola and Jones, 2001, face classification rate is 66%. detector). This method was found to be stable and robust across different harsh driving conditions and various drivers. Other methods could be used for these purposes The results in these prior sections serve to validate the (Murphy-Chutorian, Doshi, & Trivedi, 2007; Murphy- hypothesis that humans exhibit different behaviors when Chutorian & Trivedi, 2009). Various automatic eye gaze attention switching in various contexts. Of importance is detectors exist (e.g., Wu & Trivedi, 2007); however to the attention switch in time and safety critical situations ensure accuracy and reliability, eye gaze was labeled into such as driving, where distractions pose significant dangers. one of 9 categories using a manual reduction technique Knowledge of a planned attention shift could give hints to similar to several recent NHTSA studies on workload and whether the driver is actively engaged with a task. In the lane changes (Angell et al., 2006; S. E. Lee et al., 2004; next section, we discuss an application of this eye–head details on this procedure can be found in Doshi & Trivedi, dynamics knowledge in real-world task-oriented situations, 2009b). While the technique did not capture the subtle in order to design classifiers to detect driver’s intentions. variations in eye movements, it was useful to capture the timing of the eye saccades to the mirrors and side windows. The data set was collected from a naturalistic ethno- graphic driving experiment in which the subjects were not Investigations into naturalistic told that the objective was related to lane change situations. driving Eight drivers of varying age, sex, and experience drove for several hours each on a predetermined route. A total of 151 lane changes were found on highway situations with McCall et al. (2007) demonstrated the ability to detect a minimal traffic; 753 negative samples were collected, driver’s intent to change lanes up to 3 s ahead of time, by corresponding to highway “lane-keeping” situations. analyzing driver head motion patterns. Doshi and Trivedi (2009b) extended this study and found that head motion was in fact an earlier predictor of lane change intentions Analysis than eye gaze. However, the reasons for this interesting finding were not clear. These examples were used with the features described We propose that the visual search that occurs prior to above to train a classifier to predict 3 s in advance of a lane changes, and potentially in other similar common lane change, whether the driver would intend to change driving maneuvers, is initiated by a top-down process in lanes. Another classifier was trained for 2 s ahead of the lane the driver’s mind. The driver has a goal in mind and, thus, change. In similar studies, an extension of SVM, namely, is trained to initiate a search in particular locations such as Relevance Vector Machines, was found to be a good the mirrors and over the , for obstacles. Here, we classifier (see McCall et al., 2007; Tipping & Faul, 2003 present a deeper analysis into real driving data to support for more details) and was thus used here as well. In a this hypothesis, that the visual search prior to lane changes comparative study, Doshi and Trivedi (2009b) found that is a Type III search. The ability to detect this type of such a classifier based on head motion has significantly behavior is crucial in being able to identify the context of more predictive power than one based on eye gaze 3 s the situation and then to assess its criticality or determine ahead of the lane change but not 2 s ahead of time. if objects around the vehicle are of interest. By looking at the outputs of each classifier, we can get a of the performance of eye gaze and head pose over time. Specifically, the RVM classifier outputs a class Naturalistic driving data collection membership likelihood, ranging from j1 (for negative examples) to 1 (for positive examples); thus the more For this research, data were collected in a driving positive the value, the more confident it is in its experiment with an intelligent vehicle test bed outfitted with predictions of a true intention. Figure 14 shows the a number of sensors detecting the environment, vehicle processing framework for this scenario. By looking at the dynamics, and driver behavior. These data are drawn from average over all positive examples of these “Intent the same data as were used in the lane change intent work Prediction Confidences,” in Table 3, we can tell that the Journal of Vision (2012) 12(2):9, 1–16 Doshi & Trivedi 13

Figure 14. Flowchart of proposed approach for evaluating naturalistic driving data during lane change-associated visual search. eye-gaze-based classifier is hardly better than chance 3 s preparatory head motions, prior to the gaze saccade, before the lane change but improves significantly in the emerges in the top-down case. 2-s case. On the other hand, the head-motion-based This finding is validated in qualitative analysis of classifier works very well even in the 3-s case. naturalistic real-world driving data. In examining the time The results indicate that drivers engage in an earlier course of visual searches prior to lane changes, the same preparatory head motion, before shifting their gaze to significant pattern of early head motions appeared, indicat- check mirrors or blind spot. ing a “task-oriented” attention shift. Though it would be ANOVA significance tests comparing the population of dangerous to collect data regarding stimulus-oriented atten- Intent Prediction Confidences (IPC) demonstrate quanti- tion shifts in real driving, the simulator experiments tatively that the preparatory head motion prior to the eye presented here seem to demonstrate that sudden stimuli gaze shift is a significant trend 3 s prior to the lane attract eye motions as early as, or earlier than, head motions. change: The head pose classifier is significantly more One important question arises regarding the nature of the predictive than the eye gaze classifier (F(1,7) = 24.4, stimulus presented in this experiment, specifically whether p G 0.01). We can conclude that the early head motions the “exogenous” cuing condition actually corresponds to a begin to indicate a driver’s intentions prior to the actual distraction-driven attention shift. Pashler et al. (2001) notes gaze saccade. This implies that the detection of a goal- that “abrupt onset” cues may not be inherently attractive oriented gaze shift may be quite useful in future advanced in a bottom-up sense but are only attractive if those cues driver assistance systems, by helping to determine the also happen to be useful for whatever top-down tasks the context of the drive and whether the driver is indeed user may be engaged in. This “cognitive penetration” of paying attention to task-relevant objects. bottom-up cuing could imply that the “exogenous” cuing case presented in these experiments may not correlate to a real bottom-up, reflexive “distraction.” However, it could be argued that in real driving scenarios, users are primed Concluding remarks

We have demonstrated how the dynamics of overt Seconds before lane change visual attention shifts evoke certain patterns of responses in eye and head movements, in particular the interaction 3s 2s of eye gaze and head pose dynamics under various Eye gaze classifier (IPC ) 0.0027 0.4691 attention-switching conditions. Sudden, bottom-up visual eye Head pose classifier (IPC ) 0.4411 0.6639 cues in the periphery evoke a different pattern of eye–head head ANOVA: IPC 9 IPC p G 0.01 p 9 0.05 yaw dynamics as opposed to those during top-down, task- head eye oriented attention shifts. In laboratory vehicle simulator Table 3. Average intent prediction confidences (IPC) for each type experiments, a unique and significant (p G 0.05) pattern of of classifier, where a value of 0 represents chance. Journal of Vision (2012) 12(2):9, 1–16 Doshi & Trivedi 14 to detect abrupt onset stimuli because they may be Bartz, A. E. (1966). Eye and head movement in peripheral relevant to the safety of the driving situation. In this vision: Nature of compensatory eye movements. context, any abrupt onset stimulus that is then unrelated to Science, 152, 1644–1645. the primary task at hand (safe driving) can be considered a Batista, J. P. (2005). A real-time driver visual attention “distraction,” and the results of this study shows that the monitoring system. In J. Marques, N. Pe´rez de la visible dynamics of attention shifts differ in this case. Blanca, & P. Pina (Eds.), Iberian Conference on More work could be done to characterize the nature of Pattern Recognition and Image Analysis (vol. 3522, distractions in complex environments, as they relate to pp. 200–208). Heidelberg: Springer Berlin. top-down goals of the subject, and then to include different forms of distractions in these experiments. Bizzi, E., Kalil, R. E., & Morasso, P. (1972). Two modes The results presented here indicate that measurement of of active eye–head coordination in monkeys. eye–head dynamics may prove to be useful data for Research, 40, 45–48. classifying human attentive states in time and safety critical Cardosi, K. M. (1999). Human factors for air traffic environments. By detecting a “predictive” or early head control specialists: A user’s manual for your brain movement prior to a gaze shift, an intelligent assistance (Report DOT/FAA/AR-99/39). Washington, DC: system could improve its estimate of whether the subject is Federal Aviation Administration, U.S. Department actively engaged in the task at hand. In potentially danger- of Transportation. ous situations such as driving and flying, active perception of humans thus has a large role to play in improving Carmi, R., & Itti, L. (2006). Visual causes versus correlates comfort and saving lives. of attentional selection in dynamic scenes. Vision Research, 46, 4333–4345. Available from http:// www.sciencedirect.com/science/article/B6T0W- 4M4TNJ1-2/2/2a4f8e772f046d1d0eb9ea90e57eeaf1. Acknowledgments Cheng, S. Y., & Trivedi, M. M. (2006). Turn intent analysis using body-pose for intelligent driver assis- This research was supported by a National Science tance. IEEE Pervasive Computing, Special Issue on Foundation Grant and a Doctoral Dissertation Grant from Intelligent Transportation Systems, 5, 28–37. the University of California Transportation Center (U.S. Department of Transportation Center of Excellence). The Corneil, B. D., & Munoz, D. P. (1999). Human eye–head authors would like to thank Professor Hal Pashler and gaze shifts in a distractor task: II. Reduced threshold Professor Michael Mozer for their valuable comments and for initiation of early head movements. Journal of insights. The authors also wish to thank their colleagues Neurophysiology, 86, 1406–1421. from the CVRR Laboratory for their assistance, especially Dodge, R. (1921). The latent time of compensatory eye during experimental data collection. movements. Journal of Experimental Psychology, 4, 247–269. Commercial relationships: none. Corresponding author: Anup Doshi. Doshi, A., Cheng, S. Y., & Trivedi, M. M. (2009). A Email: [email protected]. novel, active -up display for driver assistance. Address: University of California, San Diego, 9500 IEEE Transactions on Systems, Man, and Cybernetics, Gilman Dr, MC 0434, La Jolla, CA 92093-0434, USA. Part B, 39,85–93. Doshi, A., & Trivedi, M. M. (2009a). Head and gaze References dynamics in visual attention and context learning. In IEEE Computer Society Conference on Com- puter Vision and Pattern Recognition Workshops Angell, L., Auflick, J., Austria, P. A., Kochhar, D., (pp. 77–84). Miami, FL, USA. Tijerina, L., Biever, W., et al. (2006). Driver work- load metrics task 2 final report (Report DOT HS Doshi, A., & Trivedi, M. M. (2009b). On the roles of eye 810635). Washington, DC: NHTSA, U.S. Department gaze and head dynamics in predicting driver’s intent of Transportation. to change lanes. IEEE Transactions on Intelligent Transportation Science, 10, 453–462. Asteriadis, S., Tzouveli, P., Karpouzis, K., & Kollias, S. (2009). Estimation of behavioral user state based on Durlach, P. J. (2004). Change blindness and its implica- eye gaze and head poseVApplication in an e-learning tions for complex monitoring and control systems environment. Multimedia Tools and Applications, 41, design and operator training. Human–Computer 469–493. Interaction, 19, 423–451. Ballard, D. H., & Hayhoe, M. M. (2009). Modelling the Eckstein, M. P., Caspi, A., Beutter, B. R., & Pham, B. T. role of task in the control of gaze. Visual Cognition, (2004). The decoupling of attention and eye movements 17, 1185–1204. during multiple fixation search [Abstract]. Journal of Journal of Vision (2012) 12(2):9, 1–16 Doshi & Trivedi 15

Vision, 4(8):165, 165a, http://www.journalofvision. concurrent task does not insulate braking responses org/content/4/8/165, doi:10.1167/4.8.165. from dual-task slowing. Applied Cognitive Psychology, Freedman, E. G. (2008). Coordination of the eyes and 22, 507–525. head during visual orienting. Experimental Brain Levy, J., Pashler, H., & Boer, E. (2006). Central interfer- Research, 190, 369–387. ence in driving: Is there any stopping the psychological Fuller, J. H. (1992). Comparison of head movement refractory period? Psychological Science, 17, 228–235. strategies among mammals. In A. Berthoz, W. Graf, Mahadevan, V., & Vasconcelos, N. (2010). Spatiotemporal & P. P. Vidal (Eds.), Head– sensory motor saliency in dynamic scenes. IEEE Transactions on system (pp. 101–114). New York: Oxford Press. Pattern Analysis and Machine Intelligence, 32, Goossens, H. H. L. M., & Opstal, A. J. V. (1997). Human 171–177. eye–head coordination in two dimensions under Matsumoto, Y., Ogasawara, T., & Zelinsky, A. (2000). different sensorimotor conditions. Experimental Brain Behavior recognition based on head pose and gaze Research, 114, 542–560. direction measurement. Proceedings of the IEEE/RSJ Henderson, J. M., Brockmole, J. R., Castelhano, M. S., & International Conference on Intelligent Robots and Mack, M. (2007). Visual saliency does not account Systems, 2000, 3, 2127–2132. for eye movements during visual search in real- McCall, J. C., & Trivedi, M. M. (2007). Driver behavior world scenes. In R. P. V. Gompel, M. H. Fischer, and situation aware brake assistance for intelligent W. S. Murray, & R. L. Hill (Eds.), Eye movements vehicles. Proceedings of the IEEE, Special Issue on (pp. 537–562). Oxford, UK: Elsevier. Available from Advanced Automobile Technology, 95, 374–387. http://www.sciencedirect.com/science/article/B87JD- McCall, J. C., Wipf, D., Trivedi, M. M., & Rao, B. (2007). 4PFGDST-14/2/71b923d388d1e0a0ef1f6a6aeb98d6a8. Lane change intent analysis using robust operators Herst, A. N., Epelboim, J., & Steinman, R. M. (2001). and sparse Bayesian learning. IEEE Transactions on Temporal coordination of the human head and eye Intelligent Transportation Systems, 8, 431–440. during a natural sequential tapping task. Vision Mennie, N. R., Hayhoe, M. M., & Sullivan, B. T. (2007). Research, 41, 3307–3319. Look-ahead fixations: Anticipatory eye movements in Itti, L., & Koch, C. (2000). A saliency-based search natural tasks. Experimental Brain Research, 179, mechanism for overt and covert shifts of visual 427–442. attention. Vision Research, 40, 1489–1506. Mennie, N. R., Hayhoe, M. M., Sullivan, B. T., & Jovancevic, J., Sullivan, B., & Hayhoe, M. (2006). Walthew, C. (2003). Look ahead fixations and Control of attention and gaze in complex environments. visuo-motor planning [Abstract]. Journal of Vision, Journal of Vision, 6(12):9, 1431–1450, http://www. 3(9):123, 123a, http://www.journalofvision.org/content/ journalofvision.org/content/6/12/9, doi:10.1167/6.12.9. 3/9/123, doi:10.1167/3.9.123. [PubMed][Article] Morasso, P., Sandini, G., Tagliasco, V., & Zaccaria, R. Khan, A. Z., Blohm, G., McPeek, R. M., & Lefvre, P. (1977). Control strategies in the eye–head coordina- (2009). Differential influence of attention on gaze and tion system. IEEE Transactions on Systems, Man and head movements. Journal of Neurophysiology, 101, Cybernetics, 7, 639–651. 198–206. Mozer, M. C., & Sitton, M. (1998). Computational Land, M. F. (1992). Predictable eye–head coordination modeling of spatial attention. In H. Pashler (Ed.), during driving. Nature, 359, 318–320. Attention (chap. 3, pp. 341–394). East Sussex, UK: Landry, S. J., Sheridan, T. B., & Yufik, Y. M. (2001). A Psychology Press. methodology for studying cognitive groupings in a Murphy-Chutorian, E., Doshi, A., & Trivedi, M. M. target-tracking task. IEEE Transactions on Intelligent (2007). Head pose estimation for driver assistance Transportation Systems, 2, 92–100. systems: A robust algorithm and experimental eval- Lee, S. E., Olsen, C. B., & Wierwille, W. W. (2004). A uation. IEEE International Transportation Systems comprehensive examination of naturalistic lane Conference, 709–714. changes (Report DOT HS 809702). Washington, Murphy-Chutorian, E., & Trivedi, M. M. (2009). Head DC: NHTSA, U.S. Department of Transportation. pose estimation in computer vision: A survey. IEEE Lee, Y. C., Lee, J. D., & Boyle, L. N. (2007). Visual Transactions on Pattern Analysis and Machine attention in driving: The effects of cognitive load and Intelligence, 31, 607–626. visual disruption. Human Factors, 49, 721–733. Navalpakkam, V., & Itti, L. (2005). Modeling the Levy, J., & Pashler, H. (2008). Task prioritization in influence of task on attention. Vision Research, 45, multitasking during driving: Opportunity to abort a 205–231. Journal of Vision (2012) 12(2):9, 1–16 Doshi & Trivedi 16

Oliver, N., Rosario, B., & Pentland, A. (2000). A Tipping, M. E., & Faul, A. C. (2003). Fast marginal Bayesian computer vision system for modeling likelihood maximisation for sparse Bayesian models. human interactions. IEEE Transactions on Pattern In C. M. Bishop & B. J. Frey (Eds.), Proceedings of Analysis and Machine Intelligence, 22, 831–843. the Ninth International Workshop on Artificial Intel- Pashler, H., Johnston, J. C., & Ruthruff, E. (2001). ligence and Statistics (pp. 1–13). Key West, FL. Attention and performance. Annual Review of Psy- TORCS, The Open Racing Car Simulator (2010). This is chology, 52, 629–651. open source software which can be downloaded here: Peden, M., Scurfield, R., Sleet, D., Mohan, D., Hyder, http://torcs.sourceforge.net/. A. A., Jarawan, E., & Mathers, C. (Eds.) (2004). Trivedi, M. M., & Cheng, S. Y. (2007). Holistic sensing World report on road traffic injury prevention. and active displays for intelligent driver support Geneva, Switzerland: World Health Organization. systems. IEEE Computer, Special Issue on Human- Pelz, J., Hayhoe, M., & Loeber, R. (2001). The coordina- Centered Computing, 40, 60–68. tion of eye, head, and hand movements in a natural Trivedi, M. M., Gandhi, T., & McCall, J. C. (2007). task. Experimental Brain Research, 139, 266–277. Looking in and looking out of a vehicle: Computer Pentland, A. (2007). Social signal processing [Exploratory vision-based enhanced vehicle safety. IEEE Trans- DSP]. IEEE Signal Processing, 24, 108–111. actions on Intelligent Transportation Systems, 8, 108–120. Pentland, A., & Liu, A. (1999). Modeling and prediction of human behavior. Neural Computation, 11, 229–242. Viola, P., & Jones, M. J. (2001). Rapid object detection using a boosted cascade of simple features. Peters, R. J., & Itti, L. (2006). Beyond bottom-up: IEEE Conference on Computer Vision and Pattern Incorporating task-dependent influences into a compu- Recognition, 1, 511–518. tational model of spatial attention. IEEE Conference on Computer Vision and Pattern Recognition, 1–8. Wu, J., & Trivedi, M. M. (2007). Simultaneous eye tracking and blink detection with interactive particle Recarte, M. A., & Nunes, L. M. (2000). Effects of verbal filters. EURASIP Journal on Advances in Signal and spatial-imagery tasks on eye fixations while Processing, 2008, 114:1–114:17. driving. Journal of Experimental Psychology: Applied, 6, 31–43. Yantis, S. (1998). Control of visual attention. In H. Pashler (Ed.), Attention (pp. 13–74). Hove, UK: Psychology Ron, S., Berthoz, A., & Gur, S. (1993). Saccade Press. vestibuloocular reflex cooperation and eye head uncoupling during orientation to flashed target. The Zangemeister, W. H., & Stark, L. (1982). Types of gaze Journal of Physiology, 464, 595–611. movement: Variable interactions of eye and head movements. Experimental Neurology, 77, 563–577. Rothkopf, C. A., Ballard, D., & Hayhoe, M. (2007). Task and context determine where you look. Journal of Zelinsky, G. J., Zhang, W., Yu, B., Chen, X., & Samaras, Vision, 7(14):16, 1–20, http://www.journalofvision. D. (2005). The role of top-down and bottom-up org/content/7/14/16, doi:10.1167/7.14.16. [PubMed] processes in guiding eye movements during visual [Article] search. In Neural Information Processing Systems Conference (vol. 18, pp. 1569–1576). Cambridge, Ryoo, M. S., & Aggarwal, J. K. (2007). Hierarchical MA: MIT Press. recognition of human activities interacting with objects. IEEE Conference on Computer Vision and Zhang, L., Tong, M. H., Marks, T. K., Shan, H., & Pattern Recognition, 0, 1–8. Cottrell, G. W. (2008). Sun: A Bayesian framework for saliency using natural statistics. Journal of Vision, Seeing Machines faceLAB Head and Eye Tracker (2004). 8(7):32, 1–20, http://www.journalofvision.org/content/ This is a product available from the Seeing Machines 8/7/32, doi:10.1167/8.7.32. [PubMed][Article] company, more info is available online here: http:// www.seeingmachines.com/product/facelab/.