Face and Gesture Recognition

Facial-Expression Analysis for Predicting Unsafe Driving Behavior

A system for tracking driver facial features aims to enhance the predictive accuracy of driver-assistance systems. The authors identify key facial features at varying pre-accident intervals and use them to predict minor and major accidents.

very year in the US alone, more than We propose an active driver-safety frame- 42,000 Americans die as a result of work that captures both vehicle dynamics and 6.8 million automobile accidents.1 the driver’s face. We then merge the two levels Consequently, driver-safety tech- of data to produce an accident prediction and nology is an active research area in investigate the frameworks performance. Our Eboth industry and academia. Pervasive comput- study differs from previous work in active driver ing environments, with integrated sensors and safety in four ways. First, we use a bottom-up networking, can provide an ideal platform for approach, analyzing the movement of a com- developing such technology. Taking effective prehensive set of 22 raw facial features, rather countermeasures to enhance safe vehicle opera- than simple metafeatures such as eye gaze or tion requires merging information from diverse head orientation. Second, we evaluate a range sets of information. of time and frequency domain statistics to deter- As a first step, an active mine the most valuable statistics for driving ac- driver safety system (a system cident prediction. Third, we predict major and Maria E. Jabon, Jeremy N. Bailenson, designed to prevent accidents) minor accidents directly, not intermediate driver Emmanuel Pontikakis, must monitor vehicle state and states such as fatigue. Finally, we explore the Leila Takayama, and Clifford Nass surroundings (see the “Re- use of the face and car outputs at varying pre- lated Work in Active Driver accident intervals, uncovering important tempo- Safety Systems” sidebar). ral trends in predictive accuracy for each feature However, to fully transform subset. the vehicle into a smart envi- ronment,2 the driver must also be monitored. Experimental Testbed Human factors researchers have long studied We recruited 49 undergraduate students the driver’s role in causing and preventing ac- to drive a 40-minute simulated course in a cidents and have found that the driver’s physi- STiSIM driving simulator. We set up the simula- cal and emotional state, including fatigue3 and tor, developed by Systems Technology,5 to run stress levels,4 play a role in a significant num- on a single PC and project the simulated im- ber of traffic accidents. Thus, many research- age of the roadway onto a white wall in the lab ers have begun developing active driver-safety (see Figure 1). systems that monitor the driver as well as the During the driving course, we projected the vehicle. sounds of the car, the road, and events happening

2 PERVASIVE computing Published by the IEEE CS n 1536-1268/11/$26.00 © 2011 IEEE

PC-10-04-Jabon.indd 2 8/4/11 4:09 PM Related Work in Active Driver Safety Systems

uch work has been done in the area of holistic vehicle 3. J. McCall and M. Trivedi, “Video-Based Lane Estimation and Track- M sensing for active driver safety, including systems ing for Driver Assistance: Survey, System, and Evaluation,” IEEE Trans. Intelligent Transportation Systems, vol. 7, no. 1, 2006, pp. 20–37. for monitoring the vehicle environment, vehicle state, and more recently driver state. Systems developed to monitor the 4. M. Bertozzi, A. Broggi, and A. Lasagni, “Infared Stereo Vision-Based vehicle environment include pedestrian and obstacle detec- Pedestrian Detection,” Proc. IEEE Intelligent Vehicles Symp., IEEE Press, 2005, pp. 24–29. tors, lane-guidance systems, rear-bumper proximity sensors, blind-spot car detectors, automatic windshield wipers, and 5. Y. Suzuki, T. Fujii, and M. Tanimoto, “Parking Assistance Using Multi- surround imaging systems for parking assistance.1–6 Systems Camera Infrastructure,” Proc. IEEE Intelligent Vehicles Symp., IEEE Press, 2005, pp. 106–110. for monitoring the vehicle state include tracking vehicle location via GPS and accelerometers and other sensors to 6. H. Kurihata, T. Takahashi, and I. Ide, “Rainy Weather Recognition monitor driving speed, steering wheel angle, braking, and from In-Vehicle Camera Images for Driver Assistance,” Proc. IEEE Intelligent Vehicles Symp., IEEE Press, 2005, pp. 205–210. acceleration.7–8 Systems for monitoring driver state include frameworks that gauge driver fatigue, drowsiness, or stress 7. J. Waldo, “Embedded Computing and Formula One Racing,” IEEE levels.9–14 Pervasive Computing, vol. 4, no. 3, 2005, pp. 18–21.

In this article, we extend previous work by using computer 8. M. Bertozzi et al., Knowledge-Based Intelligent Information and Engi- vision algorithms to directly map specific facial features to unsafe neering Systems, B. Appolloi et al., eds., Springer, 2007, pp. 704–711. driving behavior. We use a comprehensive set of raw facial- 9. M. Trivedi, T. Gandhi, and J. McCall, “Looking-In and Looking-Out feature points that are absent in previous work, including points of a Vehicle: Computer-Vision-Based Enhanced Vehicle Safety,” around the nose. Furthermore, we don’t infer any specific mental IEEE Trans. Intelligent Transportation Systems, vol. 8, no. 1, 2007, states such as fatigue, but rather implement a more empirical pp. 108–120.

approach that uses machine learning algorithms to find and use 10. A. Williamson and T. Chamberlain, Review of On-Road Driver Fatigue the facial features that are most correlated with accidents. In ad- Monitoring Devices, NSW Injury Risk Management Research Centre, dition, we identify important temporal trends in predictive accu- Univ. of New South Wales, 2005.

racy for each feature subset, revealing how to best use the face to 11. J.A. Healy and R.W. Picard, “Detecting Stress During Real-World improve the predictive accuracy of classifiers up to four seconds Driving Tasks Using Physiological Sensors,” IEEE Trans. Intelligent before a driving accident. Transportation Systems, vol. 6, no. 2, 2005, pp. 156–166.

12. A. Doshi and M.M. Trivedi, “On the Roles of Eye Gaze and Head References Dynamics in Predicting Driver’s Intent to Change Lanes,” IEEE Trans. Intelligent Transportation Systems, vol. 10, no. 3, 2009, pp. 453–462. 1. L. Li et al., “IVS 05: New Developments and Research Trends for Intelligent Vehicles,” IEEE Intelligent Systems, vol. 20, no. 4, 2005, 13. L. Fletcher, L. Petersson, and A. Zelinsky, “Road Scene Monotony pp. 10–14. Detection in a Fatigue Management Driver Assistance System,” Proc. IEEE Intelligent Vehicles Symp., IEEE Press, 2005, pp. 484–489. 2. T. Gandhi and M. Trivedi, “Vehicle Surround Capture: Survey of Techniques and a Novel Omni Video-Based Approach for Dynamic 14. Q. Ji, Z. Zhu, and P. Lan, “Real Time Non-intrusive Monitoring and Panoramic Surround Maps,” IEEE Trans. Intelligent Transportation Prediction of Driver Fatigue,” IEEE Trans. Vehicular Technology, IEEE Systems, vol. 7, no. 3, 2006, pp. 293–308. Press, 2004, pp. 1052–1068.

around the car into the simulator room collect a large sample size of accidents the videos to AVI format in real time via four PC speakers. The course sim- to use in our analyses. This let us gener- using DirectX and DivX technology. ulated driving in a suburban environ- ate separate models for major accidents Although many technologies can cap- ment with conditions varying from (for example, hitting pedestrians, other ture facial movements, we opted for light to intense traffic. We included vehicles, or off-road objects) and minor image-based capture because it does not such challenging situations as busy in- accidents (such as unwarranted lane require special markers or user inter- tersections, unsafe drivers, construc- changes, driving off the road, or run- vention. Thus, our system is less intru- tion zones, sharp turns, and jaywalking ning a stoplight). sive, increasing transparency. We also pedestrians in an effort to increase the During the experimental sessions, recorded the simulator’s output during drive’s complexity (see Table 1). we recorded participants’ faces with the driving sessions, which was a text- We used a virtual driving simulator two Logitech Web cameras at a rate of based log file listing road conditions, instead of real cars in order to safely 15 frames per second. We compressed steering wheel angle, lane tracking

OCTOBER–DECEMBER 2011 PERVASIVE computing 3

PC-10-04-Jabon.indd 3 8/4/11 4:09 PM Face and Gesture Recognition

the video and could extract the pre- accident facial geometry and vehicle information. Our goal was to deter- mine the optimal way to combine the facial movements and vehicle out- puts to predict when accidents would occur. Thus, we sampled several pre- accident time intervals beginning be- tween one and four seconds before the accident and ranging from one to 10 seconds long. For each inter- val, we extracted the data preceeding every major and minor accident in our dataset, along with a random number of nonaccident intervals to use in our analyses. Figure 1. A study participant being monitored in the STISIM driving simulator. The simulator runs on a single PC and projects the image of the roadway onto a white Time Series Statistics Calculation wall in the lab. After data synchronization, we com- puted a series of time-domain statis- tics on the coordinates in each interval information, car speed, longitudinal we used the Neven Vision library.6 For to use as inputs to our classifiers. We acceleration (feet/second2), braking each video frame, the Neven Vision li- calculated averages, velocities, maxi- information, and number and type of brary automatically detects, without any mums, minimums, standard devia- accidents. preset markers worn on the user’s face, tions, and ranges for each of the Neven the x and y coordinates of 22 points on outputs and for the vehicle outputs Analysis Procedure the face, eye- and mouth-openness lev- such as speed, wheel angle, throttle, Using the collected videos and driving els, and head movements (for example, and braking outputs. For some impor- simulator data, we constructed a data- yaw, pitch, and roll). It does this at a rate tant facial characteristics, such as eye- set to build our computational models. of 30 frames per second. Figure 3 is a and mouth-openness levels, we also Figure 2 summarizes the data-analysis screenshot of the Neven face-tracking created five-bin histograms from 0 to phases. software. 100 percent to capture their distribu- tion over the time interval. Facial-Feature Extraction Data Synchronization The first step in constructing our da- In the next phase of our analysis, we Frequency Domain tasets included extracting key facial synchronized the video outputs with Statistics Calculation features and head movements from the the driving simulator outputs so we We next calculated frequency domain videos we collected. For this processing knew when accidents occurred within statistics on each facial coordinate

TABLE 1 Simulation parameters.

Miles Environment Speed limit (mph) Intersections Traffic Challenges 0–1 Suburban 35 6 Heavy Many pedestrians 2–6 Highway 65 0 Moderate Narrow roads, tight curves 7–9 Suburban 35 15 Heavy Many pedestrians 10 –11 Highway 55 1 Light Construction zone, obstacles 12–18 Suburban 35 24 Heavy Many pedestrians 19–20 Rural 35 0 Light Dirt roads, obstacles 21–22 Rural 35 1 Moderate Narrow roads, tight curves 23–32 Urban 55 6 Moderate Tight curves

4 PERVASIVE computing www.computer.org/pervasive

PC-10-04-Jabon.indd 4 8/4/11 4:09 PM and each car output. We used the Matlab Wavelet toolbox to perform Facial Minor the discrete wavelet transform, in expression accident videos particular the Daubechies wavelet prediction Time and family with orders one, two, and Machine frequency Chi-square four. For each order, we performed a learning domain feature algorithm level-three decomposition of the input feature selection training signal and collected statistics over the extraction detail coefficients of each level includ- Major Driving accident ing averages, ranges, histograms, and simulator logs prediction variances. We calculated these addi- tional statistics because facial signals are dynamic, and we expected that Figure 2. Data-analysis procedure. Facial features were extracted from video their micromomentary movements recordings and synchronized with simulator output logs. Time and frequency could leak information about the in- domain statistics were extracted from the resulting datasets, top features ternal state of the person making the were selected, and then used to predict major and minor accidents at varying expression. pre-accident time intervals.

Final Dataset Creation Finally, for each interval length, we combined all the pre-accident and major accident features were move- nonaccident intervals for each of the ments of points around the mouth 49 participants into one large dataset. and eyes, whereas top minor acci- To improve the reliability of the face dent features centered around the measurements, we discarded intervals nose. This adds new information to where the average face-tracking con- previous works, which used features fidence (that is, the measure of how only around the eyes and mouth to confident the face-tracking software predict dangerous driving states.3–4 was in its measurement) was lower Less of a difference existed between than 60 percent. Our resulting data­ the top major accident car features set contained a total of 179 minor and top minor accident car features; accident instances, 131 major acci- in both cases, the steering wheel dent instances, and 627 nonaccident angle and steering inputs made up instances. more than 75 percent of the top 20 features. Figure 3. Neven Vision tracking Chi-Square Feature Extraction We also found that the most useful points on a subject’s face. We tracked Initially, our datasets consisted of facial-feature statistics varied across 22 points around the eyes, nose, 7,402 facial features and 1,162 ve- accident type. Wavelets proved to mouth, and eyebrows at a rate of hicle features for each pre-accident be the most useful statistics for 30 frames per second. and nonaccident vector. To identify major accident prediction, whereas which facial features were the most simple minimums and maximums were important indicators of unsafe driv- more informative for minor accident ing behavior and to more quickly prediction. kappa, which corrects for the degree train our algorithms, we performed of agreement between a classifier’s pre- a chi-square feature selection for Classification and Results dictions and reality by considering the each dataset. Tables 2 and 3 list the We experimented with numerous proportion of predictions that might most predictive 20 car features and state-of-the-art classifiers to predict occur by chance.8 This measure has most predictive 20 facial features driving accidents, including Bayesian been shown to be more robust than for our major and minor accident nets, decision tables, decision trees, simple measures such as hit rate or over- predictions. support vector machines (SVMs), re- all accuracy. Values range from zero The facial features most predic- gressions, and LogitBoost simple deci- to one, with a score of zero implying tive of major and minor accidents sion stump classifiers.7 We evaluated completely random classification and differed greatly. Most of the top our classifiers according to Cohen’s a one implying perfect classification.

OCTOBER–DECEMBER 2011 PERVASIVE computing 5

PC-10-04-Jabon.indd 5 8/4/11 4:09 PM Face and Gesture Recognition

Table 2 Generally, a kappa scores greater Most predictive 20 facial and vehicle features for major accident prediction. than 0.2 are considered statistically significant.9 Feature Statistic Value Face Predicting Minor Accidents We first attempted to predict only mi- Lower lip center X Wavelet 62 33.3 nor accidents (that is, centerline cross- Upper lip center Y Wavelet 62 26.6 ings, tickets, and road-edge excur- Right lower lip X Wavelet 62 25.8 sions) to determine the face’s role in Right pupil Y Wavelet 9 25.3 minor accident predictions. Our minor Right pupil Y Wavelet 39 25.3 accident datasets consisted of 806 in- Right upper lip X Wavelet 62 23.2 stances: 179 minor accident instances Left eye aspect ratio Wavelet 88 23.0 and 627 nonaccident instances. We trained five classifiers: a SVM classifier Left inner eyebrow Y Wavelet 55 20.9 with a polykernel, a LogitBoost classi- Lower lip center Y Wavelet 51 20.2 fier with the weak classifier of a simple Right inner eyebrow Y Wavelet 55 20.2 decision stump, a multilayer percep- Left nostril Y Wavelet 5 18.4 tron neural net, a decision table, and Left mouth corner X Wavelet 72 17.7 a logistic regression. We built these Right eye aspect ratio Wavelet 78 17.1 classifiers using the publicly available Waikato Environment for Knowledge Face X Wavelet 42 16.6 Analysis (WEKA) tool10 and validated Left eye aspect ratio Wavelet 53 15.8 our models using a tenfold cross vali- Head euler X Wavelet 89 15.3 dation. The LogitBoost classifier pro- Right pupil Y Wavelet 84 15.1 vided the highest kappa statistic of the Left lower lip Y Wavelet 51 15.1 five classifiers. Figure 4 shows the re- Face scale Wavelet 55 15.1 sults of the LogitBoost classifications across various pre-accident intervals Right lower lip Y Wavelet 80 14.8 ranging from one to four seconds pre- Car accident using one to 10 seconds of Velocity Average 161.0 data. Velocity Minimum 154.7 An interesting trend appeared Velocity Maximum 124.5 when analyzing the various pre- Steering wheel angle Wavelet 40 88.8 accident intervals used in minor ac- Braking input Wavelet 65 86.9 cident prediction. Whereas the car features proved more useful in pre- Steering input Wavelet 71 83.8 dicting accidents close to the accident Steering wheel angle Wavelet 3 82.7 time (one to two seconds before), fa- Steering wheel angle Velocity 78.9 cial features proved more predictive Steering wheel angle Wavelet 43 78.1 longer before the accident (three to Steering wheel angle Wavelet 72 76.2 four seconds before). In fact, in all the Steering wheel angle Wavelet 35 76.1 intervals we analyzed, classifiers using the facial features outperformed Steering wheel angle Wavelet 41 76.1 classifiers using the car features at Steering input Wavelet 44 74.2 four seconds prior to the accident. Steering wheel angle Wavelet 2 74.1 Furthermore, at four seconds pre- Braking input Wavelet 42 73.9 accident, these classifiers outper- Steering input Wavelet 40 73.5 formed the classifiers using all the Steering wheel angle Wavelet 42 73.3 features in four of the 10 intervals. Steering input Wavelet 37 72.5 This suggests that facial features, especially those around the nose, Steering wheel angle Wavelet 11 72.5 could significantly improve driver Steering wheel angle Wavelet 37 73.2 safety systems accuracy a longer time

6 PERVASIVE computing www.computer.org/pervasive

PC-10-04-Jabon.indd 6 8/4/11 4:09 PM Table 3. Most predictive 20 facial and vehicle features before an accident, allowing drivers for minor accident prediction. more time to react and prevent the accident. Feature Statistic Value To further analyze the temporal Face trends in our classifiers’ predictive ac- Right outer eye corner Y Minimum 41.6 curacy, we plotted receiver operating Mouth aspect ratio Wavelet 57 33.6 characteristic (ROC) curves depict- ing true versus false positives for the Left outer eye corner Y Maximum 32.6 classifiers using all features, only car Left pupil Y Average 31.5 features, and only facial features, at Left outer eye corner Y Average 28.0 one to four seconds before accidents Left pupil Y Maximum 28.0 occurred (see Figure 5). We used seven Left inner eyebrow X Minimum 27.1 seconds of data for each plot given that Left pupil X Minimum 26.4 our highest accuracies occurred in this Nose tip Y Minimum 26.2 range. The ROC curves for the face be- Right nostril X Maximum 25.9 come almost equal to the curves Nose root X Minimum 25.3 using all the data by four seconds Nose tip X Maximum 25.3 prior to the minor accidents. This Right nostril X Average 25.0 implies that by four seconds prior to Face X Velocity 24.7 the accident the signal that provides Left eyebrow center X Minimum 23.9 the bulk of predictive power comes from the face. In all cases, the ROC Left outer eye corner X Minimum 23.5 curves using all the features provide Left inner eye corner X Wavelet 28 23.3 the best overall tradeoff between Right nostril Y Minimum 23.1 true and false positives. From this Left inner eyebrow X Average 23.0 we conclude that the face provides Left inner eye corner X Minimum 22.4 a signal that isn’t in the vehicular Car features and that this signal occurs Steering wheel angle Maximum 143.9 before the signal provided by the car features. Steering wheel angle Velocity 136.7 Steering input Velocity 128.4 Predicting Major Accidents Steering wheel angle Wavelet 3 122.0 We next attempted to predict major Steering input Maximum 119.7 accidents (that is, hitting objects or Steering input Wavelet 3 110.3 pedestrians) to determine the face’s Steering wheel angle Variance 102.2 role in major accident predictions. Our major accident datasets con- Steering wheel angle Wavelet 4 98.4 sisted of 758 instances: 131 major Steering wheel angle Wavelet 69 98.3 accident instances and 627 non­ Steering wheel angle Wavelet 70 96.1 accident instances. We again trained Steering wheel angle Average 96.0 five classifiers, a SVM classifier with Steering wheel angle Wavelet 2 93.5 a polykernel, a LogitBoost classifier Steering wheel angle Wavelet 72 93.4 with the weak classifier of a simple decision stump, a multilayer percep- Steering input Variance 93.4 tron neural net, a simple decision Steering input Wavelet 4 89.9 table, and a logistic regression. As Steering wheel angle Wavelet 34 89.4 with minor accidents, we built all Steering wheel angle Wavelet 44 89.4 classifiers using WEKA and validated Steering input Range 87.8 the models using a tenfold cross Steering wheel angle Wavelet 42 87.3 validation. Again, the LogitBoost Steering wheel angle Wavelet 41 85.4 classifier gave the highest kappa statistics. Steering wheel angle Wavelet 12 85.2

OCTOBER–DECEMBER 2011 PERVASIVE computing 7

PC-10-04-Jabon.indd 7 8/4/11 4:09 PM Face and Gesture Recognition

0.5 0.5 Kappa Kappa

0 0 1 2 3 4 1 2 3 4 (a) Time before accident (sec) (b) Time before accident (sec)

0.5 0.5 Kappa Kappa

0 0 1 2 3 4 1234 (c) Time before accident (sec) (d) Time before accident (sec)

0.5 0.5 Kappa Kappa

0 0 1 2 3412 34 (e) Time before accident (sec) (f) Time before accident (sec)

0.5 0.5 Kappa Kappa

0 0 1 2 341234 (g) Time before accident (sec) (h) Time before accident (sec)

0.5 0.5 Kappa Kappa

0 0 1 2 3412 34 (i) Time before accident (sec) (j) Time before accident (sec)

All Car Face

Figure 4. Minor accident classifier performance. We used face, car, and all inputs to predict minor accidents one to four seconds before accidents occurred using statistics calculated over a range of intervals of data: (a) one, (b) two, (c) three, (d) four, (e) five, (f) six, (g) seven, (h) eight, (i) nine, and (j) 10 seconds. Kappa values greater than 0.2 are considered statistically significant.

8 PERVASIVE computing www.computer.org/pervasive

PC-10-04-Jabon.indd 8 8/4/11 4:09 PM Figure 6 presents the results of the LogitBoost classifiers across 1 1 pre-accident intervals ranging from one to four seconds pre-accident and 0.8 0.8 using one to 10 seconds of data. Unlike in the case of minor acci- 0.6 0.6 dents, where we observed a marked increase in the predictive power of 0.4 0.4 True positives True positives facial features as the time before the 0.2 0.2 accident increased, we see the perfor- mances remain relatively stable across 0 0 time for major accidents. However, 00.2 0.4 0.6 0.81 00.2 0.40.6 0.81 as with minor accident classifiers, (a) False positives (b) False positives the classifiers that use facial features in combination with the car features 1 1 consistently provided a higher classi- fication kappa than the classifiers that 0.8 0.8 used either feature set alone. This sug- gests that the vehicle features might 0.6 0.6 be a better signal than facial features 0.4 0.4 in major accidents, but that facial fea- True positives True positives tures can still improve the classifiers’ All 0.2 0.2 overall performance. Car Face 0 0 Predicting Minor 0 0.2 0.4 0.6 0.81 0 0.2 0.4 0.6 0.81 and Major Accidents (c) False positives (d) False positives As a final step, we predicted minor and major accidents together by combining Figure 5. Receiver operating characteristic (ROC) plots of classifiers. We used car all major, minor, and nonaccident in- features, facial features, and all features to predict minor accidents one to four stances into one large dataset and cre- seconds before accidents occur: (a) one, (b) two, (c) three, and (d) four seconds. ating classifiers on this comprehensive dataset. We trained the same five clas- sifiers andvalidated our models using a For a closer view of these trends, only the facial features perform essen- tenfold cross validation. Once again the we plotted ROC curves depicting true tially the same as the classifiers using LogitBoost classifier provided the high- versus false positives for the classifiers all the features in combination. How- est kappa statistic of the five sampled using all features, only car features, ever, the predictive accuracy for major classifiers. and only facial features, at one to four accidents appears to come primarily Figure 7 presents the results of the seconds before accidents occurred. We from the vehicle features. This con- LogitBoost classifications across pre- did this for each class (major and mi- firms the results we saw in our binary accident intervals ranging from one to nor) in isolation. This let us examine classifiers, where the facial features four seconds pre-accident and using whether the facial signals were more proved more helpful in predicting mi- one to 10 seconds of data. predictive of major or minor accidents nor accidents than major accidents. As in the case of both minor and across the pre-accident time intervals. Overall, this suggests that important major accident predictions, the classi- We used seven seconds of data for each signals for accident prediction exist in fiers using all the features performed plot given that our highest accuracies drivers’ faces up to four seconds prior best across sampled intervals. In ad- occurred using seven seconds of data to accidents and that these signals are dition, the performance of the clas- (see Figures 8 and 9). strongest for minor accidents. sifiers using the facial features alone When viewing the ROC curves for tended to remain steady even out to the minor accident predictions, it be- Figure 4. Minor accident classifier performance. We used face, car, and all inputs to predict minor accidents one to four four seconds pre-accident, whereas comes apparent that the face accounts lthough our study proves seconds before accidents occurred using statistics calculated over a range of intervals of data: (a) one, (b) two, (c) three, the performance of the classifiers us- for most of the predictive accuracy of encouraging for the pros- (d) four, (e) five, (f) six, (g) seven, (h) eight, (i) nine, and (j) 10 seconds. Kappa values greater than 0.2 are considered ing only the car features tended to the classifiers after three seconds prior pect of using facial features statistically significant. fall off. to the accidents; the classifiers using A to aid in driver accident

OCTOBER–DECEMBER 2011 PERVASIVE computing 9

PC-10-04-Jabon.indd 9 8/4/11 4:09 PM Face and Gesture Recognition

0.5 0.5 Kappa Kappa

0 0 1 2 341 2 3 4 (a) Time before accident (sec) (b) Time before accident (sec)

0.5 0.5 Kappa Kappa

0 0 123412 34 (c) Time before accident (sec) (d) Time before accident (sec)

0.5 0.5 Kappa Kappa 0

0 12341234 (e) Time before accident (sec) (f) Time before accident (sec)

0.5 0.5 Kappa Kappa

0 0 12341234 (g) Time before accident (sec) (h) Time before accident (sec)

0.5 0.5 Kappa Kappa

0 0 12341234 (i) Time before accident (sec) (j) Time before accident (sec)

All Car Face

Figure 6. Major accident classifier performance. We used face, car, and all inputs to predict major accidents one to four seconds before accidents occur using statistics calculated over a range of intervals of data: (a) one, (b) two, (c) three, (d) four, (e) five, (f) six, (g) seven, (h) eight, (i) nine, and (j) 10 seconds. Kappa values over 0.2 are considered statistically significant.

10 PERVASIVE computing www.computer.org/pervasive

PC-10-04-Jabon.indd 10 8/4/11 4:09 PM 0.5 0.5 Kappa Kappa

0 0 1 2 341 2 3 4 (a) Time before accident (sec) (b) Time before accident (sec)

0.5 0.5 Kappa Kappa

0 0 12341234 (c) Time before accident (sec) (d) Time before accident (sec)

0.5 0.5 Kappa Kappa

0 0 12341234 (e) Time before accident (sec) (f) Time before accident (sec)

0.5 0.5 Kappa Kappa

0 0 12341234 (g) Time before accident (sec) (h) Time before accident (sec)

0.5 0.5 Kappa Kappa

0 0 12341234 (i) Time before accident (sec) (j) Time before accident (sec)

All Car Face

Figure 7. Minor and major accident classifier performance. We used face, car, and all inputs to predict major and minor accidents one to four seconds before accidents occur using statistics calculated over a range of intervals of data: (a) one, (b) two, (c) three, (d) four, (e) five, (f) six, (g) seven, (h) eight, (i) nine, and (j) 10 seconds. Kappa values over 0.2 are considered statistically significant.

OCTOBER–DECEMBER 2011 PERVASIVE computing 11

PC-10-04-Jabon.indd 11 8/4/11 4:09 PM Face and Gesture Recognition

models and to these particular data­sets. 1 1 If we had sensed other aspects of the driver (for example, heart rate) or any 0.8 0.8 other part of the driver-environment system, we might have generated 0.6 0.6 very different models for predicting impending driver accidents. Thus, 0.4 0.4 future work would benefit from in- True positive s True positive s cluding a wider range of sensor data 0.2 0.2 to improve the accuracy of such driver safety support systems. Generating 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 more datasets from other populations (a) False positives (b) False positives of participants and other driving con- texts would also improve the study’s 1 1 generalizability.

0.8 0.8 Acknowledgments 0.6 0.6 We thank Dan Merget for software development; 0.4 0.4 Rabindra Ratan for helping to set up the driving

True positives True positives simulator; Alexia Nielsen, Karmen Morales, Julia All Saladino, and Anna Ho for running this experi- 0.2 0.2 Car ment; and David Jabon, Grace Ahn, Doug Face Blumeyer, Steffi Paepcke, Jeff Wear, Kathryn 0 0 Segovia, and Jesse Fox for helpful comments 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 on earlier drafts of this article. This work was funded in part by US National Science Founda- (c) False positives (d) False positives tion grant SES-0835601 and a grant from Omron Corporation.

Figure 8. Minor accident ROC curves. We used car, face, and all features at pre-accident time intervals: (a) one, (b) two, (c) three, and (d) four seconds before the minor accident to predict the incident. References

1. U.S. Dept. of Transportation, “Traffic Safety Facts 2006: A Compilation of Motor Vehicle Crash Data from the prediction, it does have several limita- driver, take action on its own, or do Fatality Analysis Reporting System and tions. First, we did not run the study some combination of the two. These the General Estimates System,” tech. on the road because the study design possible actions are open for future report DOTHS 810 818, Nat’l High- way Traffic Safety Administration, was deemed too hazardous to run in work. 2006. the physical world until we had run Third, because we didn’t implement the study in a simulator. Thus, our our system in real time, we couldn’t 2. M. Satyanarayanan, “Pervasive Com- puting: Vision and Challenges,” IEEE study ignores important parameters, analyze the classifiers’ false-alarm Personal Comm., vol, 8, no. 4, 2001, such as vehicle motion and surround rate over time. However, given that pp. 10 –17. effects, that affect driver’s perception Neven Vision can process streaming 3. Z. Zu, Q. Ji, and P. Lan, “Real-Time Non- and reaction to situations. This lim- facial video at a rate of 30 frames intrusive Driver Fatigue Monitoring,” its the generalizability of the study’s per second, and that our feature cal- IEEE Trans. Vehicle Technology, vol. 53, findings, and we must do further culations and classifier predictions no. 4, 2004, pp. 1052–1068. work to determine the impact of these can be made in under 500 milli­ 4. J.A. Healy and R.W. Picard, “Detect- parameters. seconds using Matlab on a standard ing Stress During Real-World Driving Second, although this study thor- 2.5-GHz processor, we’re confi- Tasks Using Physiological Sensors,” IEEE Trans. Intelligent Transpor- oughly investigated ways of sensing dent that our system can be run in real tation Systems, vol. 6, no. 2, 2005, impending accidents on the road, it time. pp. 156–166. didn’t investigate exactly what a per- Finally, as with any statistical model, 5. R.W. Allen et al., “A Low Cost PC Based vasive system could do to prevent that these results are limited to the spe- Driving Simulator for Prototyping and accident. The system could notify the cific features included in the original Hardware-in-the-Loop Applications,”

12 PERVASIVE computing www.computer.org/pervasive

PC-10-04-Jabon.indd 12 8/4/11 4:09 PM Figure 9. Major accident ROC curves. We used car, face, and all features at 1 1 pre-accident time intervals: (a) one, (b) two, (c) three, and (d) four seconds 0.8 0.8 before the major accident to predict the incident. 0.6 0.6

0.4 0.4 Proc. SAE Int’l Congress & Exposition, True positives True positives SAE Int’l, 1998, paper no. 98-0222. 0.2 0.2 6. H. Neven, Neven Vision [computer soft- ware], Neven Vision, Inc., 2003. 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 7. J. Friedman, T. Hastie, and R. Tibshirani, “Additive Logistic Regression: A Sta- (a) False positives (b) False positives tistical View of Boosting,” Annals of Statistics, vol. 28, no. 2, 2000, 1 1 pp. 337– 407. 0.8 0.8 8. A. Ben-David, “What’s Wrong with Hit Ratio?,” IEEE Intelligent Systems, vol. 21, no. 6, 2006, pp. 68–70. 0.6 0.6

9. J. Landis and G. Koch, “The Measure- 0.4 0.4 ment of Observer Agreement for Cat- True positives True positives egorical Data,” Biometrics, vol. 33, no. 1, All 0.2 0.2 1977, pp. 159–174. Car Face 10. H.I. Witten and E. Fank, Data Min- 0 0 ing: Practical Machine Learning Tools and Techniques, Morgan Kaufmann, 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 2005. (c) False positives (d) False positives

the Authors

Maria Jabon is a software developer at LinkedIn. Leila Takayama is a research scientist at Willow Her research interests include machine learning Garage, studying human-robot interaction. Her and Web interfaces for visualizing high-dimensional research interests include interactions with and datasets. Jabon has a MS in electrical engineer- through personal robots, focusing on issues of em- ing from Stanford University. Contact her at bodied cognition and perceived agency. Takayama [email protected]. has a PhD from Stanford University’s Communication between Humans and Interactive Media (CHIMe) lab, with a minor in psychology. She is a member of the Association for Psychological Science, Cogni- tive Science Society, and ACM Special Interest Group Jeremy Bailenson is the founding director of the on Computer-Human Interaction. Contact her at Virtual Human Interaction Lab, coauthor of the book [email protected]. Infinite Reality: Avatars, New Worlds, Eternal Life, and the Dawn of the Virtual Revolution, and an as- sociate professor in the Department of Communica- Clifford Nass is the Thomas M. Storke Professor tion at Stanford University. His research interests in- at Stanford University, founder and director of the clude digital human representation, especially in the Communication between Humans and Interactive context of immersive virtual reality. Bailenson has Media (CHIMe) Lab, and codirector of the Kozmetsky a PhD in cognitive psychology from Northwestern Global Collaboratory and Auto-X at Stanford Univer- University. Contact him at [email protected]. sity. His research focuses on (laboratory and field) experimental studies of social-psychological aspects of human-interactive media interaction, and his Emmanuel (Manos) Pontikakis is a quantitative research interests include automobile interfaces, analyst at Citadel Investment Group. His research mobile interfaces, human-robot interaction, cogni- interests include privacy-preserving data mining, tive and social effects of multitasking, adaptive and machine learning, human-computer interaction, personalized systems, and voice interfaces. Nass and quantitative finance. Pontikakis has an MS in has a PhD in from . computer science from Stanford University. Con- Contact him at [email protected]. tact him at [email protected].

OCTOBER–DECEMBER 2011 PERVASIVE computing 13

PC-10-04-Jabon.indd 13 8/4/11 4:09 PM