Vision for Driver Assistance: Looking at People in a Vehicle

Chapter 30 Vision for Driver Assistance: Looking at People in a Vehicle

Cuong Tran and Mohan Manubhai Trivedi

Abstract An important real-life application domain of computer vision techniques looking at people is in developing Intelligent Driver Assistance Systems (IDAS’s). By analyzing information from both looking in and looking out of the vehicle, such systems can actively prevent vehicular accidents, improve driver safety as well as driver experience. Towards such goals, developing systems looking people in a vehicle (i.e. driver and passengers) to understand their intent, behavior, and states is needed. This is a challenging task which typically requires high reliability, accuracy, and efficient performance. Challenges also come from the dynamic background and varying lighting condition in driving scenes. However, looking at people in a vehicle also has its own characteristics which could be exploited to simplify the problem such as people typically sitting in a fixed position and their activities being highly related to the driving context. In this chapter, we give a concise overview of various related research studies to see how their approaches were developed to fit the specific requirements and characteristics of looking at people in a vehicle. From a historical point of view, we first discuss studies looking at head, eyes, and facial landmarks and then studies looking at body, hands, and feet. Despite lots of active research and published papers, developing accurate, reliable, and efficient approaches for looking at people in real-world driving scenarios is still an open problem. To this end, we will discuss some remaining issues for the future development in the area.

30.1 Introduction and Motivation

Automobiles were at the core of transforming lives of individuals and nations during the 20th century. However, despite their many beneﬁts, motor vehicles pose a con- siderable safety risk. A study by World Health Organization mentions that annually, over 1.2 million fatalities and over 20 million serious injuries occur worldwide [28].

C. Tran () · M.M. Trivedi Laboratory for Intelligent and Safe Automobiles (LISA), University of California at San Diego, San Diego, CA 92037, USA e-mail: [email protected] M.M. Trivedi e-mail: [email protected]

T.B. Moeslund et al. (eds.), Visual Analysis of Humans, 597 DOI 10.1007/978-0-85729-997-0_30, © Springer-Verlag London Limited 2011 598 C. Tran and M.M. Trivedi

Fig. 30.1 Looking-in and Looking-out of a vehicle [36]

Most roadway accidents are caused by driver error. A 2006 study sponsored by the US Department of Transportation’s National Highway Traffic Safety Administra- tion concluded that driver inattention contributes to nearly 80 percent of crashes and 65 percent of near crashes. Therefore in today’s vehicles, embedded computing systems are increasingly used to make them safer as well as more reliable, comfort- able, and enjoyable to drive. In vehicle-based safety systems, it is more desirable to prevent an accident (active safety) rather than reduce the severity of injuries (passive safety). However, active- safety systems also pose more difficult and challenging problems. To be effective, such technologies must be human-centric and work in a “holistic” manner [34, 36]. As illustrated in Fig. 30.1, information from looking inside a vehicle (i.e. driver and passengers), looking outside to the environment (e.g. looking at roads, other cars), as well as vehicle sensors (e.g. measuring steering angle, speed) need to be taken into account. In this chapter, we focus on the task of looking at people inside a vehicle (i.e. driver and passengers) to understand their intent, behavior, and states. This task is inherently challenging due to the dynamic driving scene background and varying lighting condition. Moreover it also demands high reliability, accuracy, and efficient performance (e.g. real-time performance for safety related applications). Obviously, the fundamental computer vision and machine learning techniques looking at people, which were covered in previous chapters, are the foundation for techniques looking at people inside a vehicle. However, human activity in a vehicle also has its own characteristics, which should be exploited to improve the system performance such as people typically sit in a fixed position and their activities are highly related to the driving context (e.g. most of driver foot movements are related to pedal press activity). In the following sections, we provide a concise overview of several selected research studies focusing on how computer vision techniques are developed to fit the requirements and characteristics of systems looking at people in a vehicle. We start in Sect. 30.2 with a discussion of some criteria for categorizing existing approaches such as their objective (e.g. to monitor driver fatigue or to analyze driver intent) or the cueing information which is used (e.g. looking at head, eyes, or feet). Initially, 30 Vision for Driver Assistance: Looking at People in a Vehicle 599 research studies in this area focus more on cues related to driver head like head pose, eye gaze, and facial landmarks which are needed to determine driver attention and fatigue state [3, 13, 14, 18, 20, 26, 30, 38]. Some selected approaches of this kind are covered in Sect. 30.3. More recently, beside these traditional cues, other parts of the body like hand movement, foot movement, or the whole upper body posture are also shown to be important for understanding people intent and behavior in a vehicle [6, 7, 19, 31, 33, 35]. We will talk about some selected approaches in this category in Sect. 30.4. Despite lots of active research, developing accurate, reliable, and efficient approaches for looking in a vehicle as well as combining them with looking-out information for holistic human-centered Intelligent Driver Assistance System (IDAS) are still open problems. Section 30.5 is a discussion of some open issues for the future development in the area, and finally we have some concluding remarks in Sect. 30.6.

30.2 Overview of Selected Studies

There are several ways to categorize related studies in the area depending on spe- ciﬁc purpose. Figure 30.2 shows the basic steps of a common computer vision system looking at people. We see that approaches may use different types of input (e.g. monocular camera, stereo camera, camera with active infrared illuminators), extract different types of intermediate features, and aim to analyze different types of driver behavior or state. Beside these functional criteria, we can also categorize the approaches based on the fundamental techniques underlying their implementation at each step. With the goal of providing an overview of several selected research studies, we put them into a summary table (Table 30.1) with the following important elements associated with these approaches. • Objective: What is the ﬁnal goal of that study (e.g. to monitor driver fatigue, detect driver distraction, or to recognize driver turn intent)? • Sensor input: Which type of sensor input is used (e.g. monocular, stereo, or thermal camera)?

Fig. 30.2 Basic components of a system looking at people in a vehicle 600 C. Tran and M.M. Trivedi

Table 30.1 Overview of selected studies for looking at people in a vehicle Objective Sensor input Monitored Methodology and experimental body parts evaluation

Grace et al. Drowsiness Two Eyes Use illuminated eye detection ’98 [14] detection for PERCLOS [14] and PERCLOS measurement. truck driver cameras In-vehicle experiment Smith et al. Determination Monocular Head eyes, Use appearance-based head and ’03 [30] of driver and face face features tracking. Model visual features driver visual attention with attention FSM’s. In-vehicle experiment Ishikawa Driver gaze Monocular Eyes Use active appearance model to et al. ’04 [18] tracking track the whole face. Then detect iris with template matching and estimate eye gaze. In-vehicle and simulation Ji et al. Driver fatigue Two cameras Head, eye, Combine illumination based and ’04 [20] monitoring with active facial appearance-based techniques for and prediction infrared landmarks eye detection. Fuse different illuminators information from head pose, eyes in a probabilistic fatigue model. Simulation experiment Trivedietal. Occupant Stereo and Body Use head tracking to infer sitting ’04 [35] posture thermal posture posture. In-vehicle experiment analysis cameras Fletcher et al. Driver Commercial Eye gaze Develop road sign recognition ’05 [13] awareness eye tracker algorithm. Use epipolar monitoring geometry to correlate eye gaze with road scene for awareness monitoring. In-vehicle experiment Veeraragha- Unsafe driver Monocular Face and Use motion of skin regions and van et al. activities hands a Bayesian classiﬁer to detect ’05 [37] detection some unsafe activities (e.g. drinking, using cellphone). Simulation experiment Bergasa et al. Driver One camera Head and Use PERCLOS, nodding ’06 [3] vigilance w/illuminator eyes frequency, blink frequency in a monitoring fuzzy inference system to compute vigilance level. In-vehicle experiment Cheng and Turn intent Multi-modal Head and Use sparse Bayesian learning to Trivedi analysis sensors and hands classify turn intent and evaluate ’06 [6] maker-based with different feature vector motion capture combination. In-vehicle experiment (continued on the next page) 30 Vision for Driver Assistance: Looking at People in a Vehicle 601

Table 30.1 (Continued) Objective Sensor input Monitored Methodology and experimental body parts evaluation

Cheng et al. Driver hand Color and Head and Use optical flow head tracking ’06 [5] grasp and turn thermal hands and HMM based activity analysis camera classifier. In-vehicle experiment Ito and Prediction of Monocular Body Track 6 marker points on Kanade 9driver shoulders, elbows, and wrists. ’08 [19] operations Use discriminant analysis to learn Gaussian operation models and then Bayesian classifier. Simulation experiment

Doshi and Driver lane Monocular Head, eye Use Relevance Vector Machine Trivedi change intent for lane change prediction ’09 [11] analysis (optical ﬂow based head motion, manually labeled eye gaze). In-vehicle and simulation

Tran and Driver 3 cameras Head, hands Combine tracked head pose and Trivedi distraction hand position using a rule-based ’09 [31] monitoring approach. In-vehicle experiment Murphy- Real-time 3D Monocular Head Hybrid method combining static Chutorian and head pose pose estimation with an Trivedi tracking appearance-based particle ﬁlter ’10 [26] 3D head tracking algorithm. In-vehicle experiment

Wu and Eye gaze Monocular Eye Use two interactive Particle Trivedi tracking and Filters to simultaneously track ’10 [38] blink eyes and detect blinks. recognition In-vehicle and lab experiment

Cheng and Driver & Monocular Hands Use HOG feature descriptor and Trivedi passenger camera with SVM classiﬁer. In-vehicle ’10 [7] hand illuminator experiment determination

• Monitored body parts: Which type of cueing feature is extracted (e.g. information about head pose, eye gaze, body posture, or foot movement)? • Methodology and algorithm: The underlying techniques that were used • Experiment and evaluation: How were the proposed approach evaluated? Was it actually evaluated in real-world driving scenario or indoor simulation?

In the next sections, we will review several selected methods focusing on how computer vision techniques were developed to ﬁt the requirements and characteristics of systems looking at people in a vehicle. Based on the type of cueing information, we will discuss those approaches in two main categories which are approaches 602 C. Tran and M.M. Trivedi looking at driver head, face, and facial landmarks (Sect. 30.3) and approaches looking at driver body, hands, and feet (Sect. 30.4).

30.3 Looking at Driver Head, Face, and Facial Landmarks

Initial research studies looking at driver focused more on cues related to driver head like head pose, eye gaze, and facial landmarks. This kind of cueing features were shown to be important in determining driver attention and cognitive state (e.g. fatigue) [3, 13–15, 20]. Some example studies in this category are approaches for monitoring and prediction of driver fatigue, driver head pose tracking for monitoring driver awareness, eye tracking and blink recognition.

30.3.1 Monitoring and Prediction of Driver Fatigue

The National Highway Traffic Safety Administration (NHTSA) [27] has reported drowsy drivers as an important cause for fatal on road crashes and injuries in the U.S. Therefore, developing systems that actively monitor a driver’s level of vigilance and alert the driver of any insecure driving conditions is desirable for accident prevention. Different approaches were used to tackle the problem such as assessing the vigilance capacity of an operator before the work is perform [9], assess the driver state using sensors mounted on the driver to measure heart rate, brain activity [39]or using vehicle embedded sensors information (e.g. steering wheel movements, acceleration and braking profiles) [2]. Computer vision techniques looking at the driver could provide another non-intrusive approach to the problem. Research studies have shown that information such as the PERCLOS measurement introduced by Grace et al. [14] are highly correlated to fatigue state and can be used to monitor driver fatigue. Other head and face related features like eye blink frequency, eye movement, nodding frequency, facial expression have also been used for driver fatigue and vigilance analysis [3, 20]. We will take a look at a representative approach proposed by Ji et al. [20]for real-time monitoring and prediction of driver fatigue. In order to achieve the robustness required for in-vehicle applications, different cues including eyelid movement, gaze movement, head movement, and facial expression were extracted and fused in a Bayesian Network for human fatigue modeling and prediction. Two CCD cameras with active infrared illuminators were used. For eye detection and tracking, the bright pupil technique was combined with an appearance-based technique using a SVM classifier to improve the robustness. This information of eye detection and tracking was then also utilized in their algorithm for tracking head pose with Kalman filter and using Gabor features to track facial landmarks around the mouth and eye regions. The validation of the eye detection and tracking part as well as the extracted fatigue parameters and score were provided which showed some good results (e.g. 30 Vision for Driver Assistance: Looking at People in a Vehicle 603

0.05% false-alarm rate and a 4.2% misdetection rate). However, it seems that the proposed approach was only evaluated with data from indoor environment. There- fore how this approach work with real-world driving scenarios with their challenges is still an open question.

30.3.2 Eye Localization, Tracking and Blink Pattern Recognition

Focusing on the task of robustly extracting visual cue information, a former member of our team Wu et al. proposed an appearance-based approach using monocular camera input for eye tracking and blink pattern recognition [38]. For better accuracy and robustness, a binary tree is used to model the statistical structure of the object’s feature space. This is a kind of global to local representation in which each subtree explains more detailed information than its parent tree (useful to represent object with high-order substructures like eye image). After the eyes are automatically lo- cated, a particle filter-based approach is used to simultaneously track eyes and detect blinks. Two interactive particle filters were used, one for open-eye and one for close-eye. The posterior probabilities learned by the particle filters are used to determine which particle filter gives the correct tracks. This particle filter is then labeled as the primary one and used to reinitialize the other particle filter. The performance of both the blink detection rate and the eye tracking accuracy were evaluated and showed good results with various scenarios including indoor and in-vehicle data sequences as well as the FRGC (Face Recognition Grand Challenge) benchmark data for evaluation of tracking accuracy. Also focusing on a robust eye gaze tracking system, Ishikawa et al. [18] proposed to track the whole face with AAMs for more reliable extraction of eye regions and head pose. Based on the extracted eye regions, a template matching method is used to detect iris and use that for eye gaze estimation. This approach was evaluated and showed promising results with a few subjects for both indoor and in vehicle video sequences.

30.3.3 Tracking Driver Head Pose

Head pose information is also a strong indicator of a driver’s field of view and current focus of attention and typically is less noisy than eye gaze. Driver head-motion estimation has also been used along with video-based lane detection and the vehicle CAN-bus (Controller Area Network) data to predict the driver’s intent to change lanes in advance of the actual movement of the vehicle [22]. Related works in head pose estimation can be roughly categorized into static head pose estimation methods which estimate head pose directly from the current still image, tracking methods which recover the global pose change of the head from the observed movement between video frames, and hybrid methods. A detailed survey of head pose estimation 604 C. Tran and M.M. Trivedi and tracking approaches can be found in [25]. Up to now, computational head pose estimation still remains a challenging vision problem, and there are no solutions that are both inexpensive and widely available. In [26], a former member of our team Murphy-Chutorian et al. proposed an integrated approach using monocular camera input for real-time driver head pose tracking in 3D. In order to overcome the difficulties inherent with varying lighting conditions in a moving car, a static head pose estimator using support vector regressors (SVRs) was combined with an appearance-based particle filter for 3-D head model tracking in an augmented reality environment. For initial head pose estimation with SVRs, the Local Gradient Orientation (LGO) histogram, which is robust to minor deviations in region alignment, lighting, was used. The LGO histogram of a scale-normalized facial region is a 3D histogram M × N × O in which the first two dimensions correspond to the vertical and hor- izontal positions in the image and the third to the gradient orientation. Based on the initial head pose estimation, an appearance-based particle filter in an augmented reality, which is a virtual environment that mimics the view space of a real camera, is used to track the driver head in 3D. Using an initial estimate of the head position and orientation, the system generates a texture-mapped 3-D model of the head from the most recent video image and places it into the environment. A particle filter approach is then used to match the view from each subsequent video frame. Though this operation is computationally expensive, it was highly optimized for graphic processing units (GPUs) in the proposed implementation to achieve real-time performance (tracking head at ∼30 frames per second). Evaluation of this approach showed good results in real-world driving situations with drivers of varying ages, race, and sex spanning daytime and nighttime conditions.

30.4 Looking at Driver Body, Hands, and Feet

Beside cues from head, eyes, and facial features, information from other parts of the driver body like hand movement, foot movement, or the whole upper body posture also provides important information. Recently, there have been more research studies making use of such cues for better understanding of driver intent, behavior [6, 7, 31, 33].

30.4.1 Looking at Hands

Looking at driver hands is needed since it is an important factor in controlling the vehicle. However, it has not been studied much in the area of looking inside a vehicle. In [6], a sparse Bayesian classiﬁer taking into account both hand position and head pose was developed for lane change intent prediction. Hand position was also used in a system assisting driver in “keeping hands on the wheel and eyes on the road” [31]. A rule-based approach with state machines was used to combine hand 30 Vision for Driver Assistance: Looking at People in a Vehicle 605

Fig. 30.3 System for “keeping hands on the wheel and eyes on the road” [31] position and head pose in monitoring driver distraction (Fig. 30.3). In [7], a former member of our team Cheng et al. proposed a novel real-time computer-vision system that robustly discriminates which of the front-row seat occupants is accessing the infotainment controls. The knowledge of who is the user-that is, driver, passenger, or no one-can alleviate driver distraction and maximize the passenger infotainment experience (e.g. the infotainment system should only provide its fancy options, which can be distracting, to the passenger but not the driver). The algorithm uses a modi- fied histogram-of-oriented-gradients HOG feature descriptor to represent the image area over the infotainment controls and a SVM and median filtering over time to classify each image to one of the three classes with ∼96% average correct classification rate. This rate was achieved over a wide range of illumination conditions, human subjects, and times of day.

30.4.2 Modeling and Prediction of Driver Foot Behavior

Beside hands, driver feet also has an important role in controlling the vehicle. In addition to information from embedded pedal sensors, the visual foot movement before and after a pedal press can provide valuable information for better semantic understanding of driver behavior, state, and style. They can also be used to gain a time advantage in predicting a pedal press before it actually happens, which is very important for providing proper assistance to driver in time critical (e.g. safety related) situations. However, there were very few research studies in analyzing driver foot information. Mulder et al. have introduced a haptic gas pedal feedback system for car-following [23] in which the gas pedal position was used to improve the system performance. A former member of our team McCall et al. [22] developed a brake assistance system, which took into account both driver’s intent to brake (from pedal positions and the camera-based foot position) and the need to brake given the current situation. Recently, our team has examined an approach for driver foot behavior analysis using a monocular foot camera input. The underlying idea is motivated by the fact that driver foot movement is highly related to the pedal press activity. After tracking the foot movement with an optical ﬂow based tracking method, a 7-state HMM)for describing foot behavior was speciﬁcally designed for driving scenarios (Fig. 30.4). The elements of this driver foot behavior HMM are as follows. 606 C. Tran and M.M. Trivedi

Fig. 30.4 Foot behavior HMM state model with 7 states

• Hidden states: We have 7 states {s1,s2,s3,s4,s5,s6,s7} including Neutral, BrkEngage, AccEngage, TowardsBrk, TowardsAcc, ReleaseBrk, ReleaseAcc.The state at time t is denoted by the random variable qt . • Observation: The observation at time t is denoted by the random variable Ot which has 6 components Ot = px,py,vx,vy,B,A where {px,py,vx,vy} are the current estimated position and velocity of driver foot. {B,A} are obtained from vehicle CAN information which determine whether the brake and accelerator are currently engaged or not. • Observation probability distributions: In our HMM model, we assume a Gaussian output probability distribution P(Ot |qt = si) = N(μi,σi). • Transition matrix: A ={aij } is a 7 × 7 state transition matrix where aij is the probability of making a transition from state si to sj , aij = P(qt+1 = sj |qt = si). • Initial state distribution: Assume an uniform distribution of the initial states. Utilizing reliable information from the vehicle CAN data, an automatic data la- beling procedure was developed for training and evaluating of the HMM model. The HMM model parameters Λ including the Gaussian observation probability distribution and the transition matrix are learned using a Baum–Welch algorithm. The meaning of these estimated foot behavior states also connect directly to the prediction of actual pedal presses (i.e. when the foot is in the state TowardsBrk or TowardsAcc, we can predict a corresponding brake or acceleration press in near future). This approach was evaluated with data from a real-world driving testbed 30 Vision for Driver Assistance: Looking at People in a Vehicle 607

Fig. 30.5 Vehicle testbed conﬁguration for foot analysis experiment

Fig. 30.6 Tracked trajectories of a brake (red) and an acceleration (blue). The labeled points show the outputs of the HMM based foot behavior analysis

(Fig. 30.5). An experimental data collection paradigm was designed to approximate stop-and-go trafﬁc in which the driver will accelerate or brake depending on whether the stop or go cue is shown. Figure 30.6 visualizes the outputs of the approach 608 C. Tran and M.M. Trivedi for a brake and an acceleration example. Over all 15 experimental runs with 128 trials (a stop or go cue is shown) per run, a major part ∼75% of the pedal presses can be predicted with ∼95% accuracy at 133 ms prior to the actual pedal press. Regarding the misapplication cases (i.e. subjects were cued to hit a speciﬁc pedal but instead applied the wrong pedal), all of them were predicted correctly ∼200 ms on average before the actual press, which is actually earlier than for general pedal press prediction. This indicates the potential of using the proposed approach in predicting and mitigating pedal errors which is one problem of recent interest to the automotive safety community [16].

30.4.3 Analyzing Driver Posture for Driver Assistance

The whole body posture is another cueing information that should be explored more in looking at people inside a vehicle. Figure 30.7 shows some possible ranges of driver posture movement which might have connection to driver state and intention. For example, leaning backward might indicate relax position, leaning forward indicates concentration. Driver may also change posture in preparation for some spe- ciﬁc tasks such as moving head forward to prepare for a better visual check before lane change. In [19], Ito and Kanade used six marker points on shoulders, elbows, and wrists to predict nine driver operations toward different destinations including navigation, A/C, left vent, right vent, gear box, console box, passenger seat, glove compartment, and rear-view mirror. Their approach has been evaluated with different subjects in driving simulation with high prediction accuracy 90% and low false positive rate 1.4%. This approach, however, requires putting markers on the driver. In [8], Datta et al. have developed a markerless approach for tracking systems of articulated planes was also applied to track 2D driver body pose on these same simulation data. Though this approach has automated the tracking part, it still requires a manual initialization of the tracking model. Beside looking at driver, looking at occupant posture is also important. In [35], our team investigated basic feasibility of using stereo and thermal long-wavelength infrared video for occupant position and posture analysis, which is a key require- ment in designing “smart airbag” systems. In this investigation, our suggestion was to use head tracking information, which is easier to track, instead of more detailed

Fig. 30.7 Illustration of some possible range of driver posture movement during driving 30 Vision for Driver Assistance: Looking at People in a Vehicle 609

Fig. 30.8 Elbow joints prediction. (Left) Generate elbow candidates at each frames. (Right)Over a temporal segment, select the sequence of elbow joints that minimizes the joint displacement. By adding 2 pseudo nodes s and t with zero-weighted edges, this can be represented as a shortest path problem occupant posture analysis for robust “smart airbag” deployment. However, for potential applications goes beyond the purpose of “smart airbag” such as driver atten- tiveness analysis and human-machine interfaces inside the car, we see we need to look at more detailed body posture of driver and occupant. Our team has developed a computational approach for upper body tracking using the 3D movement of extremities (head and hands) [32]. This approach tracks a 3D skeletal upper body model which can be determined by a set of upper body joints and end points positions. To achieve robustness and real-time performance, this approach first tracks the 3D movements of extremities, including head and hands. Then using human upper body configuration constraints, movements of the extremities are used to predict the whole 3D upper body motion with inner joints. Since the head and hand regions are typically well defined and undergo less occlusion, tracking is more reliable and could enable us more robust upper body pose determination. Moreover by breaking the problem of high-dimensional search for upper body pose into two steps, the complexity is reduced considerably. The downside is that we need to deal with the ambiguity in inverse kinematics of upper body, i.e. there could be various upper body poses corresponding to the same head and hands positions. However, this issue is reduced in driving scenarios, since the driver typically sits in a fixed position. To deal with this ambiguity, the “temporal inverse kinematics” based on observation of dynamics of the extremities was used instead of just inverse kinematics constraints at each single frame. Figure 30.8 briefly describes this idea with a numerical method to predict elbow joint sequences. Since the lengths of upper arm and lower arm are fixed, possible elbow joint positions with known shoulder joint position S and hand position H will lie on a circle. At each frame, the range of possible elbow joint (the mentioned circle) is determined and then quantized into several elbow candidates based on a distance 610 C. Tran and M.M. Trivedi

Fig. 30.9 Superimposed results of 3D driver body pose tracking using extremities movement threshold between candidates (Fig. 30.8(left)). For a whole temporal segmentation, the selected sequence is the one that minimize the total elbow joint displacement. As shown in Fig. 30.8(right), this selection can be represented as a shortest path problem. Due to the layer structure of the constructed graph, a dynamic programming technique can be used to solve this shortest path problem in linear time complexity O(n) where n is the number of frames in the sequence. This approach was validated and showed good results with various subjects in both indoor and in vehicle environments. Figure 30.9 shows some example results of the 3D driver body pose tracking superimposed on input images for visual evaluation.

30.5 Open Issues for Future Research

Some related research studies have shown promising results. However, the development of accurate, reliable, and efficient approaches to looking at people in a vehicle for real-world driver assistance systems is still in its infancy. In this section, we will discuss some of the main issues that we think should be addressed for the future development in the area. • Coordination between real-world and simulation testbeds: Simulation environments have the advantage of more flexibility in configuring sensors and designing experiment tasks for deeper analysis, which might be difficult and unsafe for implementing in real-world driving. However, the ultimate goal is to develop systems that work for real vehicle and there are always gaps between simulation environment and real world. Therefore in general a coordination between real- world driving and simulation environment is useful and should be considered in the development process. • Looking at driver body at multiple levels: To achieve robustness and accuracy, a potential trend is to combine cues at multiple body levels since human body is a homogeneous and harmonious whole and behavior and states are generally expressed at different body levels simultaneously. However, we see that cueing information from different body parts have different characteristics and typically require different approaches to extract. Therefore how to develop efficient systems looking at driver body at multiple levels is still an open question. • Investigating the role of features extracted from different body parts: Depending on the concerned behavior and/or cognitive state, features from some body parts may be useful, while others may not or may even be distracting factors. Moreover 30 Vision for Driver Assistance: Looking at People in a Vehicle 611

for efficiency, only useful feature cues should be extracted. In [11],Doshietal. from our team have done a comparative study on the role of head pose and eye gaze for driver lane change intent analysis. The results indicated that head pose, which is typically less noisy and easier to track than eye gaze, is actually a better feature for lane change intent prediction. In general, how to systematically do similar investigation for different feature cues and analysis tasks is desirable. • Combining looking-in and looking-out: Some research studies have combined the output of looking-in and looking-out analysis for different assistance systems such as driver intent analysis [7, 10, 21], intelligent brake assistance [22], traffic sign awareness [13], driver distraction [17]. In [29], Pugeault and Bow- den showed that information from a looking-out camera can be used to predict some driver actions including steering left or right, pressing accelerator, brake, or clutch. This implies that the contextual information from looking-out is also important to looking-in analysis of driver behavior and states. In general, both looking-in and looking-out information will be needed in developing efficient human-centered driver assistance systems [34, 36]. • Interacting with driver when needed: Generally, IDAS’s need to have the ability to provide feedbacks to the user when needed (e.g. to alert driver in critical situations). However, these IDAS feedbacks must be introduced carefully to en- sure that they do not confuse or distract the driver, thereby undermining their intended purpose. Generally, interdisciplinary efforts need investigation as to the effect of different feedback mechanisms including visual, audio, and/or haptic feedback [1]. • Learning individual driver models vs. generic driver models: It has been noted that individual drivers may act and respond in different ways under various conditions [4, 12, 24]. Therefore, it might be difficult to learn generic driver models that work well for all drivers. In order to achieve better performance, adapting the assistance systems to individual drivers based on their style and preferences has been needed. Murphey et al. [24] used the pedal press profile for classification of driver styles (i.e. calm, normal, and aggressive) and showed the correlation between these styles and the fuel consumption. In [12], our team has also studied some measures of driving style and their correlation with the predictability and responsiveness of the driver. The results indicated that “aggressive” drivers are more predictable than “non-aggressive” drivers, while “non-aggressive” drivers are more receptive of feedback from Driver Assistance Systems.

30.6 Conclusion

Looking at people in a vehicle to understand their behavior and state is an important area which plays a significant role in developing human-centered Intelligent Driver Assistance Systems. The task is challenging due to the high demand on reliability and efficiency as well as the inherent computer vision difficulty of dynamic background and varying lighting conditions. In this chapter, we provided a concise overview of several selected research studies looking at different body parts ranging 612 C. Tran and M.M. Trivedi from coarse body to more detailed levels of feet, hands, head, eyes, and facial landmarks. To overcome the inherent challenges and achieve the required performance, some high-level directions learned from those studies are as follows. • Design techniques which are specific for in-vehicle applications utilizing the characteristics such as that a driver typically sits in a fixed position or driver foot movement is highly related to pedal press actions. • Integrate cueing information from different body parts. • Consider the trade-offs between the cues that can be extracted more reliably and the cues that seem to be useful but hard to extract. • Make use of both dynamic information (body motion) and static information (body appearance). • Make use of different input modalities (e.g. color cameras and thermal infrared cameras). Despite lots of active studies, more research efforts are still needed to bring these high-level ideas into development of accurate, reliable, and efficient approaches for looking at people in a vehicle and actually improve the lives of drivers around the world.

30.6.1 Further Reading

Interested readers may consult the following references for a broad overview of research topic trends and research groups in the area of intelligent transportation systems. • Li, L., Li, X., Cheng, C., Chen, C., Ke, G., Zeng, D., Scherer, W.T.: Research col- laboration and ITS topic evolution: 10 years at T-ITS. IEEE Trans. Intell. Transp. Syst. (June 2010) • Li, L., Li, X., Li, Z., Zeng, D., Scherer, W.T.: A bibliographic analysis of the IEEE transactions on intelligent transportation systems literature. IEEE Trans. Intell. Transp. Syst. (October 2010)

Acknowledgements We thank the sponsorships of U.C. Discovery Program, National Science Foundation as well as industry sponsors including Nissan, Volkswagen Electronic Research Lab- oratory, and Mercedes. We also thank former and current colleagues from our Laboratory for In- telligent and Safe Automobiles (LISA) for their cooperation, assistance, and contributions: Dr. Kohsia Huang, Dr. Joel McCall, Dr. Tarak Gandhi, Dr. Sangho Park, Dr. Shinko Cheng, Dr. Steve Krotosky, Dr. Junwen Wu, Dr. Erik Murphy-Chutorian, Dr. Brendan Morris, Dr. Anup Doshi, Mr. Sayanan Sivaraman, Mr. Ashish Tawari, and Mr. Ofer Achlertheir.

References

1. Adell, E., Várhelyi, A.: Development of HMI components for a driver assistance system for safe speed and safe distance. In: The 13th World Congress and Exhibition on Intelligent Trans- port Systems and Services ExCel London, United Kingdom (2006) [611] 30 Vision for Driver Assistance: Looking at People in a Vehicle 613

2. Artaud, P., Planque, S., Lavergne, C., Cara, H., de Lepine, P., Tarriere, C., Gueguen, B.: An on- board system for detecting lapses of alertness in car driving. In: The 14th Int. Conf. Enhanced Safety of Vehicles (1994) [602] 3. Bergasa, L.M., Nuevo, J., Sotelo, M.A., Barea, R., Lopez, M.E.: Real-time system for monitoring driver vigilance. IEEE Trans. Intell. Transp. Syst. 7(1), 63–77 (2006) [599,600,602] 4. Burnham, G.O., Seo, J., Bekey, G.A.: Identiﬁcation of human drivers models in car following. IEEE Trans. Autom. Control 19(6), 911–915 (1974) [611] 5. Cheng, S.Y., Park, S., Trivedi, M.M.: Multiperspective and multimodal video arrays for 3d body tracking and activity analysis. Comput. Vis. Image Underst. (Special Issue on Advances in Vision Algorithms and Systems Beyond the Visible Spectrum) 106(2–3), 245–257 (2007) [601] 6. Cheng, S.Y., Trivedi, M.M.: Turn-intent analysis using body pose for intelligent driver assistance. IEEE Pervasive Comput. 5(4), 28–37 (2006) [599,600,604] 7. Cheng, S.Y., Trivedi, M.M.: Vision-based infotainment user determination by hand recognition for driver assistance. IEEE Trans. Intell. Transp. Syst. 11(3), 759–764 (2010) [599,601, 604,605,611] 8. Datta, A., Sheikh, Y., Kanade, T.: Linear motion estimation for systems of articulated planes. In: IEEE Conference on Computer Vision and Pattern Recognition (2008) [608] 9. Dinges, D., Mallis, M.: Managing fatigue by drowsiness detection: Can technological promises be realized? In: Hartley, L. (ed.) Managing Fatigue in Transportation, Elsevier, Ox- ford (1998) [602] 10. Doshi, A., Trivedi, M.M.: Investigating the relationships between gaze patterns, dynamic vehicle surround analysis, and driver intentions. In: IEEE Intelligent Vehicles Symposium (2009) [611] 11. Doshi, A., Trivedi, M.M.: On the roles of eye gaze and head pose in predicting driver’s intent to change lanes. IEEE Trans. Intell. Transp. Syst. 10(3), 453–462 (2009) [601,611] 12. Doshi, A., Trivedi, M.M.: Examining the impact of driving style on the predictability and responsiveness of the driver: Real-world and simulator analysis. In: IEEE Intelligent Vehicles Symposium (2010) [611] 13. Fletchera, L., Loyb, G., Barnesc, N., Zelinsky, A.: Correlating driver gaze with the road scene for driver assistance systems. Robot. Auton. Syst. 52(1), 71–84 (2005) [599,600,602,611] 14. Grace, R., Byrne, V.E., Bierman, D.M., Legrand, J.M., Davis, R.K., Staszewski, J.J., Carna- han, B.: A drowsy driver detection system for heavy vehicles. In: Digital Avionics Systems Conference, Proceedings, The 17th DASC, The AIAA/IEEE/SAE (1998) [599,600,602] 15. Hammoud, R., Wilhelm, A., Malawey, P., Witt, G.: Efﬁcient realtime algorithms for eye state and head pose tracking in advanced driver support systems. In: IEEE Conference on Computer Vision and Pattern Recognition (2005) [602] 16. Healey, J.R., Carty, S.S.: Driver error found in some Toyota acceleration cases. In: USA Today (2010) [608] 17. Huang, K.S., Trivedi, M.M., Gandhi, T.: Driver’s view and vehicle surround estimation using omnidirectional video stream. In: IEEE Intelligent Vehicles Symposium (2003) [611] 18. Ishikawa, T., Baker, S., Matthews, I., Kanade, T.: Passive driver gaze tracking with active appearance models. In: The 11th World Congress on Intelligent Transportation Systems (2004) [599,600,603] 19. Ito, T., Kanade, T.: Predicting driver operations inside vehicles. In: IEEE International Con- ference on Automatic Face and Gesture Recognition (2008) [599,601,608] 20. Ji, Q., Zhu, Z., Lan, P.: Real time non-intrusive monitoring and prediction of driver fatigue. IEEE Trans. Veh. Technol. 53(4), 1052–1068 (2004) [599,600,602] 21. McCall, J., Wipf, D., Trivedi, M.M., Rao, B.: Lane change intent analysis using robust op- erators and sparse Bayesian learning. IEEE Trans. Intell. Transp. Syst. 8(3), 431–440 (2007) [611] 22. McCall, J.C., Trivedi, M.M.: Driver behavior and situation aware brake assistance for intelligent vehicles. Proc. IEEE 95(2), 374–387 (2007) [603,605,611] 614 C. Tran and M.M. Trivedi

23. Mulder, M., Pauwelussen, J.J.A., van Paassen, M.M., Mulder, M., Abbink, D.A.: Active de- celeration support in car following. IEEE Trans. Syst. Man Cybern., Part A, Syst. Hum. 40(6), 1271–1284 (2010) [605] 24. Murphey, Y.L., Milton, R., Kiliaris, L.: Driver’s style classification using jerk analysis. In: IEEE Workshop on Computational Intelligence in Vehicles and Vehicular Systems (2009) [611] 25. Murphy-Chutorian, E., Trivedi, M.M.: Head pose estimation in computer vision: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 31(4), 607–626 (2009) [604] 26. Murphy-Chutorian, E., Trivedi, M.M.: Head pose estimation and augmented reality tracking: An integrated system and evaluation for monitoring driver awareness. IEEE Trans. Intell. Transp. Syst. 11(2), 300–311 (2010) [599,601,604] 27. NHTSA: Traffic safety facts 2006 – a compilation of motor vehicle crash data from the fatality analysis reporting system and the general estimates system. In: Washington, DC: Nat. Center Stat. Anal., US Dept. Transp. (2006) [602] 28. Peden, M., Scurfield, R., Sleet, D., Mohan, D., Hyder, A.A., Jarawan, E., Mathers, C.: World report on road traffic injury prevention: Summary. In: World Health Organization, Geneva, Switzerland (2004) [597] 29. Pugeault, N., Bowden, R.: Learning pre-attentive driving behaviour from holistic visual features. In: The 11th European Conference on Computer Vision (2010) [611] 30. Smith, P., Shah, M., Lobo, N.V.: Determining driver visual attention with one camera. IEEE Trans. Intell. Transp. Syst. 4(4), 205–218 (2003) [599] 31. Tran, C., Trivedi, M.M.: Driver assistance for keeping hands on the wheel and eyes on the road. In: IEEE International Conference on Vehicular Electronics and Safety (2009) [599, 601,604,605] 32. Tran, C., Trivedi, M.M.: Introducing ‘XMOB’: Extremity movement observation framework for upper body pose tracking in 3d. In: IEEE International Symposium on Multimedia (2009) [609] 33. Tran, C., Trivedi, M.M.: Towards a vision-based system exploring 3d driver posture dynamics for driver assistance: Issues and possibilities. In: IEEE Intelligent Vehicles Symposium (2010) [599,604] 34. Trivedi, M.M., Cheng, S.Y.: Holistic sensing and active displays for intelligent driver support systems. IEEE Comput. 40(5), 60–68 (2007) [598,611] 35. Trivedi, M.M., Cheng, S.Y., Childers, E., Krotosky, S.: Occupant posture analysis with stereo and thermal infrared video: Algorithms and experimental evaluation. IEEE Trans. Veh. Tech- nol. (Special Issue on In-Vehicle Vision Systems) 53(6), 1698–1712 (2004) [599,600,608] 36. Trivedi, M.M., Gandhi, T., McCall, J.: Looking-in and looking-out of a vehicle: Computer- vision-based enhanced vehicle safety. IEEE Trans. Intell. Transp. Syst. 8(1), 108–120 (2007) [598,611] 37. Veeraraghavan, H., Atev, S., Bird, N., Schrater, P., Papanikolopoulos, N.: Driver activity monitoring through supervised and unsupervised learning. In: IEEE Conference on Intelligent Transportation Systems (2005) [600] 38. Wu, J., Trivedi, M.M.: An eye localization, tracking and blink pattern recognition system: Algorithm and evaluation. ACM Trans. Multimedia Comput. Commun. Appl. 6(2) (2010) [599,601,603] 39. Yammamoto, K., Higuchi, S.: Development of a drowsiness warning system. J. Soc. Automot. Eng. Jpn. (1992) [602]