DESIGN AND IMPLEMENTATION OF DRIVER DROWSINESS DETECTION SYSTEM by Aleksandar Colic

A Dissertation Submitted to the Faculty of The College of Engineering & Computer Science in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy

Florida Atlantic University Boca Raton, FL December 2014 Copyright 2014 by Aleksandar Colic

ii

ACKNOWLEDGEMENTS

I would like to express my sincere gratitude to my advisor Dr. Oge Marques for his support, guidance and encouragement throughout my graduate studies. I also wish to thank my committee: Dr. Borko Furht, Dr. Robert B. Cooper and Dr. Shihong Huang for their invaluable suggestions. I am deeply thankful to Jean Mangiaracina from Graduate Programs for her immeasurable help on this long journey. And last but not least my many thanks go to my family for always believing in me and friends and colleagues for always being there for me.

iv ABSTRACT

Author: Aleksandar Colic Title: Design and Implementation of Driver Drowsiness Detection System Institution: Florida Atlantic University Dissertation Advisor: Dr. Oge Marques Degree: Doctor of Philosophy Year: 2014

There is a substantial amount of evidence that suggests that driver drowsiness plays a significant role in road accidents. Alarming recent statistics are raising the interest in equipping vehicles with driver drowsiness detection systems. This disserta- tion describes the design and implementation of a driver drowsiness detection system that is based on the analysis of visual input consisting of the driver’s face and eyes. The resulting system combines off-the-shelf software components for face detection, human skin color detection and eye state classification in a novel way. It follows a behavioral methodology by performing a non-invasive monitoring of external cues describing a driver’s level of drowsiness. We look at this complex problem from a systems engineering point of view in order to go from a proof-of-concept prototype to a stable software framework. Our system utilizes two detection and analysis meth- ods: (i) face detection with eye region extrapolation and (ii) eye state classification. Additionally, we use two confirmation processes – one based on custom skin color detection, the other based on nod detection – to make the system more robust and resilient while not sacrificing speed significantly. The system was designed to be dy- namic and adaptable to conform to the current conditions and hardware capabilities. v DESIGN AND IMPLEMENTATION OF DRIVER DROWSINESS DETECTION SYSTEM

List of Tables ...... x

List of Figures ...... xi

1 Introduction ...... 1 1.1 Motivation...... 1 1.2 Problem Statement...... 3 1.3 Contributions...... 3 1.4 Organization...... 4

2 Background and Context ...... 6 2.1 Fundamental Concepts and Terminology...... 6 2.1.1 What is drowsiness?...... 6 2.1.2 What causes drowsiness?...... 7 2.1.3 What can we do about it?...... 9 2.2 Drowsiness Detection and Measurement Methods...... 9 2.2.1 Subjective Methods...... 9 2.2.2 Physiological Methods...... 14 2.2.3 Vehicle-Based Methods...... 16 2.2.4 Behavioral Methods...... 18 2.2.5 Hybrid Methods...... 20 2.3 Commercial Solutions...... 21 2.3.1 Car Manufacturers...... 21

vi 2.3.2 Independent Products...... 25

3 Technologies, Algorithms and Research Aspects ...... 27 3.1 Imaging Sensors...... 28 3.1.1 Visible light cameras...... 28 3.1.2 Near infrared (NIR) cameras...... 29 3.2 Feature detection and extraction...... 30 3.3 Machine Learning Classifiers...... 37 3.4 Challenges and practical aspects...... 39 3.4.1 Data collection...... 39 3.4.2 Performance requirements...... 41

4 Related Work ...... 43 4.1 Head pose estimation...... 43 4.2 Yawning...... 44 4.3 Eye state estimation...... 46

5 First Prototype ...... 50 5.1 System Initialization – Preparation...... 51 5.1.1 Skin Color Feature Analysis...... 53 5.1.2 Eye Model Analysis...... 54 5.1.3 Head Position Analysis...... 54 5.2 Regular Stage - Eye Tracking with Eye-State Analysis...... 55 5.3 Warning Stage - Nod Analysis...... 56 5.3.1 Head Position Monitoring...... 57 5.4 Alert Stage...... 58 5.5 Preliminary Experiments...... 58 5.5.1 Camera rotation test...... 59 5.5.2 Head rotation test...... 60

vii 5.5.3 ”Real-World” Test...... 61 5.5.4 Open vs. Closed Eyes test...... 62

6 Android Implementation ...... 64 6.1 Initialization Stage...... 64 6.1.1 Algorithm outlook...... 64 6.1.2 Record sequence...... 66 6.1.3 Head position localization...... 71 6.1.4 Skin color extraction and analysis...... 73 6.1.5 Manual confirmation with eye region extrapolation...... 74 6.1.6 Building eye model using SVM...... 78 6.2 Monitoring stage...... 80 6.2.1 Algorithm outlook...... 80 6.2.2 Direction and speed estimation...... 82 6.2.3 Tracking area extrapolation...... 84 6.2.4 Eye tracking...... 85 6.2.5 Skin color confirmation...... 90 6.2.6 Eye state analysis...... 93 6.3 Warning stage...... 94 6.3.1 Algorithm outlook...... 94 6.3.2 Closed eyes monitoring...... 95 6.3.3 Nod detection...... 96 6.3.4 Distraction analysis...... 97 6.4 Alert stage...... 98 6.5 Experiments and results...... 99 6.5.1 Synchronization Test...... 99 6.5.2 Face detection speed comparison of original and custom detec- tion area...... 101 6.5.3 Eye region extrapolation limitations...... 102

viii 6.5.4 Tracking stage in proactive mode - speed test...... 103 6.5.5 Tracking stage in retroactive mode - speed test...... 104

7 Concluding Remarks ...... 106

Bibliography ...... 109

ix LIST OF TABLES

2.1 Typical administration of MSLT test...... 11 2.2 Stanford Sleepiness Scale...... 12 2.3 Karolinska Sleepiness Scale verbal cues...... 13

x LIST OF FIGURES

1.1 Proportion of road traffic deaths by age range...... 2

2.1 Latency to sleep at 2-hour intervals across the 24-hour day...... 7 2.2 The driver monitoring system uses six IR sensors (visible imme- diately in front of the instrument panel). Source: Wikimedia Commons. 22 2.3 Lexus driver monitoring system. Source: Wikimedia Commons.... 23

3.1 Typical face detection steps based on Haar-like features...... 32 3.2 Local Binary Pattern example...... 34 3.3 Examples of eigenfaces...... 35 3.4 Example of horizontal projection...... 36 3.5 Concept of a Support Vector Machine. Source: Wikimedia Commons. 39 3.6 National Advanced Driving Simulator...... 40

5.1 The four stages of our Drowsiness Detection System...... 50 5.2 Eye detection algorithm: limitations due to horizontal angle change.. 51 5.3 Eye detection algorithm: limitations due to vertical angle change... 52 5.4 Successful initial face and eyes detection...... 52 5.5 Chroma-based skin detection comparison...... 54 5.6 Nod stages: eyes above upper threshold...... 57 5.7 Nodding detection method and its stages...... 58 5.8 Camera rotating: angle change limitations...... 60 5.9 Head rotating: angle change limitations...... 61 5.10 Real-World test results...... 62

xi 5.11 Support Vector Machine result examples: (a) & (b) correct; (c), (d) & (e) incorrect...... 63

6.1 Top: four stages of our Driver Drowsiness Detection System. Bottom: outline of the initialization stage...... 65 6.2 Bounding box of a detected object is defined with two sets of coordi- nates. En face detection of a driver’s head...... 72 6.3 Captured face regions contain artifacts that can skew the results which can be minimized by masking most commonly affected areas...... 74 6.4 Head rotation angle example...... 78 6.5 Top: four stages of our Driver Drowsiness Detection System. Bottom: outline of the proactive monitoring stage...... 80 6.6 Top: four stages of our Driver Drowsiness Detection System. Bottom: outline of the retroactive monitoring stage...... 81 6.7 Eight potential directions of head movement...... 83 6.8 Extrapolated search area depends on the speed and direction of the driver’s head: area within the light blue rectangle is extrapolated... 85 6.9 Eye region extrapolation examples...... 87 6.10 Extracted information from an image: face and eye region bounding boxes and their relation...... 88 6.11 Extracted eye region from the face detection done by FaceDetector class: dark orange is provided information, light blue is extrapolated information...... 89 6.12 Skin confirmation areas of interest...... 90 6.13 Top: four stages of our Driver Drowsiness Detection System. Bottom: outline of the warning stage...... 95 6.14 Typical outlook of a cropped face region for various movement speeds. 100 6.15 Success rate for different resolutions...... 100 6.16 Size comparison example: Original captured image with the cropped detection area...... 101 6.17 Speed comparison: Average times for face detection depending on the resolution and detection area size...... 102 6.18 Detected face regions vary drastically in size causing the estimated eye regions to vary drastically in size and location...... 103

xii 6.19 Average time per frame(c) is calculated by adding up the time from all of the major steps(a) with the time to complete the supporting processes(b)...... 104 6.20 Average time per frame(c) is calculated by adding up the time from all of the major steps(a) with the time to complete the supporting processes(b)...... 105

xiii CHAPTER 1 INTRODUCTION

1.1 MOTIVATION

Shocking statistics revealed by World Health Organization (WHO) in a 2009 report [1] showed that more than 1.2 million people die on roads around the world every year. Moreover, an additional 20 to 50 million individuals suffer non-fatal injuries. Such astonishing numbers have triggered community action. For example, the United Nations (UN) General Assembly dedicated the decade between 2011 and 2020 to be the Decade of Action for Road Safety. A recently published follow-up report by the WHO [2] showed that even though some progress has been made, the shocking figure of 1.24 million deaths caused by road accidents per year remains essentially the same. An analysis of the number of deaths by age range (Figure 1.1) shows that almost 40% are young people, below the age of 30. A Report from the National Highway Traffic Safety Administration (NHTSA) from 1994 [3] provided statistics on how many accidents are caused by drowsy driv- ing. An average annual total of 6.3 million police reported crashes occurred during the period between 1989 and 1993 in the United States. Of these, approximately 100,000 crashes per year (i.e., 1.6% of 6.3 million) were identified with drowsiness in the corresponding Police Crash Reports (PCR). Additionally, many other accident reports referred to “Drift-Out-Of-Lane” crashes, which might be related to drowsiness aspects as well. Approximately 71,000 of all drowsy-related crashes involved non-fatal injuries, whereas 1,357 drowsy-related fatal crashes resulted in 1,544 fatalities (3.6% of all fatal crashes), as reported by the Fatality Analysis Reporting System (FARS).

1 Figure 1.1: Proportion of road traffic deaths by age range.

Nevertheless, many run-off-roadway crashes are not reported or cannot be verified by police, suggesting that the problem is much larger than previously estimated. A significant number of surveys, studies and reports suggest that drowsiness is one of the biggest causes of road accidents. The US National Sleep Foundation (NSF) reported that 54% of adult drivers have driven a vehicle while feeling drowsy and 28% of them actually fell asleep on the wheel [3]. Powell et al. [4] concluded that sleepiness can impair driving performance as much or more than alcohol. A more recent report [5] from The American Automobile Association (AAA) estimates that one out of every six (16.5%) deadly traffic accidents, and one out of eight (12.5%) crashes requiring hospitalization of car drivers or passengers is due to drowsy driving. In summary, there is a substantial amount of evidence that suggests that drowsiness is one of the big factors in road accidents.

2 1.2 PROBLEM STATEMENT

Currently available driver drowsiness detection systems usually fall into two cate- gories: (i) very expensive systems, limited to specific high-end car models; and (ii) affordable solutions that lack robustness. Our work is focused on implementing a drowsiness detection system that tries to bridge the gap between them by balancing affordability and availability with functionality. An analysis of current and previous work in the field of drowsiness detection emphasize the difficulty and complexity of the problem due to three essential challenges that need to be tackled, namely: relia- bility, accuracy and speed. The aim of our approach is to overcome these challenges by building a mobile, real-time, dynamic, adaptive system that leverages, whenever possible, readily available computer vision tools.

1.3 CONTRIBUTIONS

The main contributions of our work are:

• Design and implementation of a MATLAB-based prototype system capable of analyzing recorded sequences.

• Implementation of an Android based system capable of working in proactive and retroactive modes of monitoring driver’s behavior.

• In depth analysis of all aspects for design and implementation of drowsiness detection system from the systems engineering point of view.

• Theoretical and practical analysis of currently available software tool capabili- ties and their practical integration into our system architecture.

• Design and implementation of a novel two-threshold nod detection process.

• Design and implementation of a novel custom skin color extrapolation and con- firmation process. 3 • Design and implementation of a novel eye region extrapolation method based on driver’s behavioral history in correlation with known face position in a given frame.

• Introduction of a complex interactive tool which allows system to extrapolate needed custom, user-specific information in order to increase the overall perfor- mance.

• Design and implementation of a novel face detection search area extrapolation based on the speed and direction of driver’s head moving through the previous frames.

1.4 ORGANIZATION

The remainder of this dissertation is organized as follows.

• Chapter 2 introduces fundamental concepts and terminology related to drowsi- ness, which are needed to more deeply understand the complexity of a detection and measurement process. The second part of the chapter explains various de- tection and measurement methods, while the final part is dedicated to currently available commercial solutions. The material for this chapter has previously ap- peared in our recently published book on the topic [6].

• Chapter 3 describes existing technologies and algorithms as well as practical re- search aspects involved in the design and implementation of a driver drowsiness detection system from a systems engineering point of view. The material for this chapter also has previously appeared in our recently published book on the topic [6].

• Chapter 4 summarizes our literature review and highlights some of the related research work in this field. 4 • Chapter 5 describes our first (MATLAB-based) prototype system and associ- ated experiments and results. A shorter version of this chapter appeared as a conference paper at SIGMAP 2014 [7].

• Chapter 6 presents our Android-based system implementation.

• Finally, Chapter 7 contains a summary of the dissertation with conclusions and suggestions for future work.

5 CHAPTER 2 BACKGROUND AND CONTEXT

2.1 FUNDAMENTAL CONCEPTS AND TERMINOLOGY

2.1.1 What is drowsiness?

Drowsiness, also referred to as sleepiness, can be defined as “the need to fall asleep”. This process is a result of normal human biological rhythm, which consists of sleep- wake cycles. The sleep-wake cycle is governed by both homeostatic and circadian factors. Homeostasis relates to the neurobiological need to sleep; the longer the period of wakefulness, the more pressure builds for sleep and the more difficult it is to resist [8]. The circadian pacemaker is an internal body clock that completes a cycle approximately every 24 hours. Homeostatic factors govern circadian factors to regulate the timing of sleepiness and wakefulness. These processes create a predictable pattern of two sleepiness peaks, which com- monly occur about 12 hours after the mid-sleep period (during the afternoon for most people who sleep at night) and before the next consolidated sleep period (most commonly at night, before bedtime) [9] (Figure 2.1). It is also worth noticing that the sleep-wake cycle is intrinsic and inevitable, not a pattern to which people voluntarily adhere or can decide to ignore. Despite the tendency of society today to give sleep less priority than other activities, sleepiness and performance impairment are neurobiological responses of the human brain to sleep deprivation. Sleep and wakefulness are influenced by the light/dark cycle, which in humans most often means wakefulness during daylight and sleep during darkness. People

6 Figure 2.1: Latency to sleep at 2-hour intervals across the 24-hour day.

whose sleep is out of phase with this cycle, such as night workers, air crews, and travelers who cross several time zones, can experience sleep loss and sleep disruption that reduce alertness [10, 11]. In medical terms, sleep can be divided into three stages: awake, non-rapid eye movement stage (NREM) and rapid eye movement stage (REM). The sleepy (drowsy) interval – i.e, transition from awake to asleep – occurs during the NREM stage [12].

2.1.2 What causes drowsiness?

Although alcohol and some medications can independently induce sleepiness, the primary causes of sleepiness and drowsy driving in people without sleep disorders are sleep restriction, sleep fragmentation and circadian factors.

Sleep Restriction or Loss Short duration of sleep appears to have the greatest negative effects on alertness [13]. Although the need for sleep varies among individuals, sleeping 8 hours per 7 24-hour period is common, and 7 to 9 hours is needed to optimize performance. Experimental evidence shows that sleeping less than 4 consolidated hours per night impairs performance on vigilance tasks [14]. Acute sleep loss, even the loss of one night of sleep, results in extreme sleepiness. The effects of sleep loss are cumulative[15]. Regularly losing 1 to 2 hours of sleep a night can create a “sleep debt” and lead to chronic sleepiness over time. The only way to reduce sleep debt is to get some sleep. Both external and internal factors can lead to a restriction in the time available for sleep. External factors include work hours, job and family responsibilities, and school bus or school opening times. Inter- nal or personal factors sometimes are involuntary, such as a medication effect that interrupts sleep. Often, however, reasons for sleep restriction represent a lifestyle choice, such as the decision to sleep less in order to have more time to work, study, socialize, or engage in other activities.

Sleep Fragmentation Sleep is an active process, and adequate time in bed does not mean that ad- equate sleep has been obtained. Sleep disruption and fragmentation cause inadequate sleep and can negatively affect functioning [16]. Similar to sleep restriction, sleep fragmentation can have internal and external causes. The pri- mary internal cause is illness, including untreated sleep disorders. Externally, disturbances such as noise, children, activity and lights, a restless spouse, or job- related duties (e.g., workers who are on call) can interrupt and reduce the qual- ity and quantity of sleep. Studies of commercial vehicle drivers present similar findings. For example, the National Transportation Safety Board (NTSB) [17] concluded that the critical factors in predicting crashes related to sleepiness were: the duration of the most recent sleep period, the amount of sleep in the previous 24 hours, and fragmented sleep patterns.

Circadian Factors 8 As noted earlier, the circadian pacemaker regularly produces feelings of sleepi- ness during the afternoon and evening, even among people who are not sleep deprived [16]. Shift work also can disturb sleep by interfering with circadian sleep patterns.

2.1.3 What can we do about it?

In this chapter, we have presented the problem of driver drowsiness and its correlation with car accidents worldwide. The gravity of the problem, and the fact that it is related to a natural physiological need to combat the fatigue of the human system, suggest that it can not be eliminated altogether. Instead, it needs to be measured and detected in time to prevent more serious consequences. In the remainder of this monograph, we will describe several different methods for measuring and detecting driver drowsiness, survey existing solutions, provide guidance on how they can be implemented, and discuss the associated technical challenges.

2.2 DROWSINESS DETECTION AND MEASUREMENT METHODS

There are several different ways to detect and measure driver drowsiness (or sleepi- ness). They are normally grouped into five categories: subjective, physiological, vehicle-based, behavioral, and hybrid. This chapter provides a brief survey of driver drowsiness detection methods in each of these categories.

2.2.1 Subjective Methods

Sleepiness can be explained as a physiological need to combat the fatigue of the human system. The more the system is fatigued (i.e., sleep deprived), the stronger the need for sleep, which suggests that sleepiness can have different levels. Scientific organizations such as Laboratory for Sleep [18], Division of Sleep Disorders [18] and Association of Professional Sleep Societies [19], to name a few, have been creating

9 various descriptive scales of sleepiness levels. Current subjective tools used for the assessment of sleepiness are based on ques- tionnaires and electro-physiological measures of sleep. Their purpose is to provide an insight on how to more successfully predict which factors might lead to accidents and to provide means for other method groups to focus on detecting and preventing some key factors associated with driver drowsiness. This way of measuring is known as Subjective Measuring, since testing subjects were asked to describe their level of sleepiness, which is clearly a subjective assessment of their perception of drowsiness. Some of the best-known subjective tests of sleepiness are:

• Epworth Sleepiness Scale (ESS) [20]: an eight-item, self-report measure that quantifies individuals’ sleepiness by their tendency to fall asleep in static, non-stressful situations: reading, watching television, sitting in a car at a traffic light. On each of eight situations, subjects are rating their likeliness to doze off or fall asleep on a scale from 0 (no chance) to 3 (high chance). Subjects can score between 0 and 24 points. Subjects scoring less than 10 are considered awake or slightly sleepy, while 15 points or more indicate severe sleepiness.

• Multiple Sleep Latency Test (MSLT) [19]: a test based on the presump- tion that people who fall asleep faster are more sleep deprived. The MSLT measures the tendency to fall asleep in a standardized sleep-promoting situa- tion during four or five 20-minute nap opportunities that are spaced 2 hours apart throughout the day and in which the individual is instructed to try to fall asleep (Table 2.1). Since brain wave sleep-staging criteria are very well es- tablished, the intervals of time it takes the subjects to fall asleep can be easily measured. If a subject needs less than five minutes to fall asleep, he or she is considered pathologically sleepy; whereas taking more then ten minutes is considered normal. 10 Table 2.1: Typical administration of MSLT test

Step Time Subject’s Chores

1 5 AM Lights on 2 7 AM Subject awake 3 7:30 AM Measuring brain activity 4 8 AM 20 minute nap 5 9 AM Subject awake – measuring brain activity 6 10 AM 20 minute nap 7 12:30 PM Subject eats lunch – measuring brain activity 8 1 PM 20 minute nap 9 3:30 PM Subject leaves

• Maintenance of Wakefulness Test (MWT) [18]: a test in which individuals are instructed to try and remain awake. Their attempt is monitored over a period of twenty minutes. If a subject can stay awake over that period of time, he is considered awake and capable of operating a vehicle. But if a subject falls asleep within the first fifteen minutes, he can be considered too sleep deprived to drive.

• Stanford Sleepiness Scale (SSS) [21]: an instrument that contains seven statements through which people rate their current level of alertness (e.g., 1= “feeling...wide awake” to 7= “...sleep onset soon...”). The scale correlates with standard performance measures, is sensitive to sleep loss and can be admin- istered repeatedly throughout a 24-hour period. Typically, subjects are asked to rate their alertness level every two hours throughout the day by choosing a single number associated with specific alertness description (Table 2.2).

• The Karolinska Sleepiness Scale (KSS) [22]: contains questions that guide

11 Table 2.2: Stanford Sleepiness Scale

Value Description

1 Feeling active, vital, alert or wide awake 2 Functioning at high levels, but not at peak; able to concentrate 3 Relaxed, awake but not fully alert; responsive 4 Little foggy 5 Foggy, beginning to loose track; having difficulty staying awake 6 Sleepy, woozy, fighting sleep; prefer to lie down 7 Cannot stay awake, sleep onset appears imminent

subjects to provide, to the best of their ability, a self-report of the quality of their sleep. Laboratory tests and field studies suggest that the measurements gathered by this method seem to adequately cover and relate to various sleep and lack of sleep scenarios. This is the most commonly used drowsiness scale – a nine-point scale that has verbal anchors for each step (Table 2.3).

• Visual Analogue Scale (VAS) [23]: asks subjects to rate their “sleepiness” using a scale spread along a 100 mm wide line. Suggestions for sleep deprivation state range from “just about asleep” (left end) to “as wide awake as I can be” (right end). Subjects place a mark on the line expressing how sleepy they feel they are. Sleepiness level is measured by the distance in millimeters from one end of the scale to the mark placed on the line. The VAS is convenient since it can be rapidly administered as well as easily repeated.

The MSLT and MWT were developed for neurophysiological assessment and are sensitive to both acute and chronic sleep loss. These types of tests cannot be admin- istered and monitored without special training and under special conditions and can only be performed on healthy subjects [19, 18]. Moreover, the practicality of these

12 Table 2.3: Karolinska Sleepiness Scale verbal cues.

Level Sleepiness level

1 Extremely alert 2 Very alert 3 Alert 4 Rather alert 5 Neither alert nor sleepy 6 Some signs of sleepiness 7 Sleepy but no difficulty staying awake 8 Sleepy with some effort to keep alert 9 Extremely sleepy, fighting sleep tests for assessing crashes is very small but some portions of them, such as slightly modified “nap tests” in combination with questionnaires, have been used for that purpose [24]. Other subjective measurement methods worth mentioning are: Sleep-Wake Ac- tivity Inventory [25], Pittsburgh Sleep Quality Index [26] and Sleep Disorders Ques- tionnaire [27]. All of them are not quite practical for assessment of crash situations but their ability to monitor individuals over extended periods of time, combined with subjects’ self-reporting provides valuable information towards better understanding of sleep deprivation and its manifestation in humans. Subjective measurement results gathered from all these tests greatly depend on the quality of the asked questions as well as proper interpretation and understanding of those questions by the subject. Due to the age and social diversity of subjects, it might not be possible to formulate a questionnaire to accommodate every potential problem. Moreover, the subjects’ perspective plays a huge role on the quality of the acquired data. Lastly, it is worth stating that it is very difficult to acquire 13 subjective drowsiness feedback from a driver in a real-world driving situation; all the measurements are usually done in a simulated environment.

2.2.2 Physiological Methods

Physiological methods offer an objective, precise way to measure sleepiness. They are based upon the fact that physiological signals start to change in earlier stages of drowsiness [28, 29], which could allow a potential driver drowsiness detection system a little bit of extra time to alert a drowsy driver in a timely manner and thereby prevent many road accidents. The idea of being able to detect drowsiness at an early stage with very few false positives has motivated many researchers to experiment with various electro-physiological signals of the human body, such as electrocardiogram (ECG), electroencephalogram (EEG), and electrooculogram (EOG). They are briefly defined and explained below.

• Electrocardiogram (ECG) records electrical activity of a human heart. This system can very precisely tell which state the human body is in by detecting minute changes in the behavior of the heart, such as increase or decrease of heart rate [30, 31]. Variability of a heart rate can be described using Heart Rate Variability measure (HRV) [30, 31], in which the low (LF) and high (HF) frequencies of heartbeat are described. HRV is a measure of the beat-to-beat (R-R intervals) changes in the heart rate. When a subject is awake, the heart rate is much closer to the HF. The ECG can clearly show that when a subject starts going into drowsy state, the heart rate starts slowing down and heading towards the LF band.

• Electroencephalogram (EEG) records electrical activity of a human brain. It is the most reliable and most commonly used signal that can precisely describe humans alertness level [28, 20, 32, 33, 34]. The EEG signal is highly complex and has various frequency bands. Frequency bands that can be measured to 14 determine if a subject is drowsy are: delta band – which corresponds to sleep activity; theta band – which is related to drowsiness; and beta band – which corresponds to alertness. A decrease in the power changes in the alpha frequency band and an increase in the theta frequency band indicate drowsiness. The frequencies measured using this method are very prone to errors and require very specific conditions for being measured properly. Moreover, in order to measure them, sensing devices would have to make physical contact with the subject. Clearly, in a real-world driving scenario, having electrodes attached to the driver’s head, beyond their huge inconvenience, would hinder their driving capabilities and potentially increase the chances of an accident happening.

• Electrooculogram (EOG) records the electrical potential difference between the cornea and the retina of a human eye. It is shown that this difference deter- mines the behavior of the eye, which can be used to monitor drivers’ alertness level [35, 30, 36]. This method is highly invasive since it requires direct con- tact with a subject, usually in the following manner: a disposable electrode is placed on the outer corner of each eye and a third electrode at the center of the forehead for reference [37]. The associated methodology is relatively simple: if a slower eye movement is detected, compared to the regular eye movement of a subject in the awake stage, the conclusion is that the subject is becoming drowsy. Though this type of measurement is very precise and leads to very small detection errors, it is not the most practical for real-world, real-time implemen- tation due to its invasiveness and the complexity of the apparatus needed for the measurement.

The reliability and accuracy of driver drowsiness detection by using physiological signals is very high compared to other methods. However, the intrusive nature of mea- suring physiological signals remains an issue that prevents their use in real-world sce- narios. Due to the technological progress in recent years, it is possible that some of the 15 problems caused by these methods will be overcome in the future. Examples include: the use of wireless devices to measure physiological signals in a less intrusive manner by placing the electrodes on the body and obtaining signals using wireless technolo- gies like Zigbee or Bluetooth; or by placing electrodes on the steering wheel [38, 39]; or placing electrodes on the drivers seat [40]. The obtained signals can be processed and monitored in various ways, such as using smart phone devices [41, 42]. Obtain- ing these signals in a non-intrusive way certainly contributes towards their real-world applicability. But the question on whether this way of collecting data may lead to increased measurement errors has not been answered conclusively yet. Recently, a few experiments have been conducted to validate the potential use of less-intrusive or non-intrusive systems and inspect the implications of this trade-off [40, 38].

2.2.3 Vehicle-Based Methods

Our understanding of drowsy-driving crashes is often based on subjective evidence, such as police crash reports and driver’s self-reports following the event [43, 44]. Evidence gathered from the reports suggests that the typical drivers’ and vehicles’ behavior during these events usually exhibit characteristics such as:

• Higher speed with little or no breaking.. Fall-asleep crashes are likely to have serious consequences. The mortality rate associated with drowsy-driving crashes is high, probably due to the combination of higher speeds and delayed reaction time [45].

• A vehicle leaves the roadway. An analysis of police crash reports in North Carolina showed that the majority of the non alcohol, drowsy-driving crashes were single-vehicle roadway departures [44]. It is very common for a sleep- impaired driver to lose concentration and stray off the road. The NHTSA General Estimates System data reflects the same trend but also suggests that sleepiness may play a role in rear-end crashes and head-on crashes as well [46]. 16 • The crash occurs on a high-speed road. In comparison with other types of crashes, drowsy-driving crashes more often take place on highways and major roadways with speed limits of 55 miles per hour and higher [46]. It seems that monotonous driving on such roads can cause lapses in concentration on sleep deprived drivers thus increasing a chance of an accident.

• The driver does not attempt to avoid crashing. NHTSA data shows that sleepy drivers are less likely than alert drivers to take corrective action before a crash [46]. Reports also suggest that evidence of a corrective maneuver, such as skid marks or brake lights, is usually absent in a fall-asleep crash.

• The driver is alone in the vehicle. In a New York State survey of lifetime incidents, 82% of drowsy-driving crashes involved a single occupant [43].

All of the characteristics noted above suggest that a vehicle involved in an accident driven by a drowsy driver creates specific driving patterns that can be measured and used for detection of a potential drowsy driving situation. The two most commonly used vehicle-based measures for driver drowsiness detec- tion are: the steering wheel movement (SWM) and the standard deviation of lane position (SDLP). Steering Wheel Movement (SWM). These methods rely on measuring the steering wheel angle using an angle sensor mounted on the steering column, which allows for detection of even the slightest steering wheel position changes [47, 48, 49]. When the driver is drowsy, the number of micro-corrections on the steering wheel is lower than the one found in normal driving conditions [47, 50]. A potential problem with this approach is the high number of false positives. SWM-based systems can function reliably only in particular environments and are too dependent on the geo- metric characteristics of the road and, to a lesser extent, on the kinetic characteristics of the vehicle [51].

17 Standard Deviation of Lane Position (SDLP). Leaving a designated lane and crossing into a lane of opposing traffic or going off the road are typical behaviors of a car driven by a driver who has fallen asleep. The core idea behind SDLP is to monitor the car’s relative position within its lane with an externally-mounted camera. Specialized software is used to analyze the data acquired by the camera and compute the car’s position relative to the road’s middle lane [52, 53]. SDLP-based systems’ limitations are mostly tied to their dependence on external factors such as: road marking, weather, and lighting conditions.

2.2.4 Behavioral Methods

The methods mentioned thus far were deemed as either unreliable or very intrusive for real-world applications, thus leading towards exploiting a different type of methodol- ogy, based upon non-invasive observation of a driver’s external state. These methods are based on detecting specific behavioral clues exhibited by a driver while in a drowsy state. A typical focus is on facial expressions that might express characteristics such as: rapid, constant blinking, nodding or swinging of the head, or frequent yawning. These are all tell-tale signs that a person might be sleep deprived and/or feeling drowsy. Typically, systems based on this methodology use a video camera for image acquisition and rely on a combination of computer vision and machine learning tech- niques to detect events of interest, measure them, and make a decision on whether the driver may be drowsy or not. If the sequence of captured images and measured parameters (e.g., pattern of nodding or time lapsed in “closed eye state”) suggest that the driver is drowsy, an action — such as sounding an audible alarm — might be warranted.

• Head or eye position. When a driver is drowsy, some of the muscles in the body begin to relax, leading to nodding. This nodding behavior is what researchers are trying to detect. Research exploiting this feature has started 18 just recently [54, 55]. Detecting head or eye position is a complex computer vision problem which might require stereoscopic vision or 3D vision cameras.

• Yawning. Frequent yawning is a behavioral feature that tells that the body is fatigued or falling into a more relaxed state, leading towards sleepiness. Detect- ing yawning can serve as a preemptive measure to alert the driver. It should be noted, however, that yawning does not always occur before the driver goes into a drowsy state. Therefore it cannot be used as a stand-alone feature; it needs to be backed up with additional indicators of sleepiness [56, 57].

• Eye state. Detecting the state of the eyes has been the main focus of research for determining if a driver is drowsy or not. In particular, the frequency of blinking has been observed [58, 59, 60, 61, 62, 63]. The term PERCLOS (PER- centage of eyelid CLOSure over the pupil over time) has been devised to provide a meaningful way to correlate drowsiness with frequency of blinking. This mea- surement has been found to be a reliable measure to predict drowsiness.

At any given time, the eye can roughly be categorized into one of three states: wide open, partially open, or closed. The last two can be used as indicators that a driver is experiencing sleepiness. If the eyes stay in these two states for a prolonged period of time, it can be concluded that the driver is experiencing abnormal behavior. An eye-state detection system must be able to reliably detect and distinguish these different states of the eyes. Various algorithms with various approaches for extracting and filtering important features of the eyes [64, 65, 66, 67, 68] have been used throughout the years. Typically, the feature extraction process is followed by training and use of machine learning algorithms of various capabilities, strengths and weaknesses [69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82].

• Multiple Facial Actions. Some researchers used multiple facial features, 19 including state and position of the eyebrow, lip and jaw dropping combined with eye blinking [83].

Behavioral methods are considered cost effective and non-invasive, but lead to significant technical challenges. In addition to the challenges associated with the underlying computer vision, machine learning and image processing algorithms, the resulting systems are required to perform in real-time and to exhibit robustness when faced with bumpy roads, lighting changes, dirty lenses, improperly mounted cameras, and many other real-world less-than-ideal driving situations.

2.2.5 Hybrid Methods

All of the previously mentioned methods have strengths and weaknesses. Vehicle- based measurements depend on specific driving conditions (such as weather, lighting, etc.) and can be used on specific roads only (with clearly marked signs and lanes). Moreover, they may lead to a large number of false positives, which would lead to a loss of confidence in the method. Behavioral measures, on the other hand, may show huge variation in the results depending on the associated lighting conditions. Physiological measures are reliable and accurate but their intrusive nature is still a challenge, which may be mitigated should non-invasive physiological sensors become feasible in the near future [42, 84]. Several recent research studies have attempted to develop driver drowsiness detec- tion systems as a fusion of different methods. One study which combined behavioral methodology and vehicle-based methodology showed that the reliability and accu- racy of the created hybrid method was significantly higher than those using a single methodology approach [85]. Another study, which included subjective measures in combination with behavioral and physiological measures, showed significantly higher success rate than any individual method alone [86]. These early results suggest that a combination of the three types of methods — behavioral, physiological and vehicle- 20 based — is a promising avenue worth pursuing in the development of real-world, vehicle-mounted, driver drowsiness detection solutions.

2.3 COMMERCIAL SOLUTIONS

The automobile industry has spent a significant amount of resources in recent years to develop new features aimed at driver drowsiness detection. Moreover, independent companies have also recognized that this market might grow and become profitable and have developed products whose goals are comparable, but work independently of the vehicle’s brand or model. This chapter provides a representative sample of those efforts.

2.3.1 Car Manufacturers

The ability to offer some type of ‘driver assist’ system as an added value to their vehicle lineup has motivated many automobile manufacturers to offer built-in solutions that are capable of detecting signs of driver drowsiness and warning the driver accordingly. These are some examples (sorted alphabetically by car manufacturer).

• Ford: The American car manufacturer introduced their “Driver Alert System” in 2012 [87]. The system uses a forward-looking camera to monitor the vehicle’s position in the lane and estimate the driver’s alertness level based upon driver behavior, i.e, the ability to stay within the lane’s limits. If the alertness level is perceived to be lower than a certain threshold, a light audible and visual warning – “Rest suggested” – appears in the car’s instrument cluster; if it falls even further, a more severe alert – “Rest now” – (with red light and chime sound) is displayed until it is explicitly dismissed by the driver. The system also offers the possibility to check for the current estimated level of alertness at any given time. This system falls under the “vehicle-based” category of driver drowsiness detection methods. 21 Figure 2.2: The Lexus driver monitoring system uses six IR sensors (visible immedi- ately in front of the instrument panel). Source: Wikimedia Commons.

• Lexus & : The Japanese automaker (part of the Toyota group) has offered, in selected vehicles, a “Driver Monitoring System”, which first became available in 2006 [88]. It uses a CCD (charge-coupled device) camera, mounted on top of the steering column cover to monitor driver attentiveness using eye tracking and head motion detection techniques. Six built-in near-infrared LED sensors enable the system to work accurately both day and night (Figure 2.2). During start up, the system automatically estimates the position of the driver’s facial features and measures the width and center line of the face. This informa- tion is used as a reference to monitor the movement of the driver’s head when looking from side to side (Figure 2.3).

The solution works in conjunction with Lexus’ “Advanced Obstacle Detection System” as follows. If the driver turns his head away from the road ahead while the vehicle is moving and an obstacle is detected in front of the vehicle, the system activates a pre-crash warning light and buzzer. If the situation persists, the brakes are briefly applied to alert the driver. And if this still fails to elicit action from the driver, the system engages emergency braking preparation and

22 Figure 2.3: Lexus driver monitoring system. Source: Wikimedia Commons.

front seatbelt pre-tensioning.

The combination of different monitoring devices places this system in the “hy- brid” category (according to the different categories listed in Chapter 2).

The “Driver Monitoring System” has been modified in 2008 by Toyota company to include eyelid detection which is used to determine the state of driver’s eyes [89]. This increases the overall robustness of the system. If the eyelids start to get droopy, an alarm will sound, and, again, the system will jump in and attempt to decelerate the car automatically. Toyota expects to be installing this in cars in the next couple of years.

• Mercedes-Benz. Back in 2009, Mercedes-Benz introduced system called “At- tention Assist” [90]. At the heart of this system is a highly sensitive sensor which allows extremely precise monitoring of the steering wheel movements and the steering speed. Based on these data, the system calculates an individ- ual behavioral pattern during the first few minutes of every trip. This pattern is then continuously compared with the current steering behavior and the current driving situation. This process allows the system to detect typical indicators of drowsiness and warn the driver. The system becomes active at the speeds between 80 and 180 km/h because it has been shown that, while driving for 23 extended period of time at these speeds, the risk of drowsiness is much greater then in typical city drive.

• Volvo. Volvo was among the first to introduce a drowsiness detection sys- tem [53], combining two safety features: “Driver Alert Control” and “Lane Departure Warning”. “Driver Alert Control” monitors the car’s movements and assesses whether the vehicle is being driven in a controlled or uncontrolled way. From the technical point of view the system is straight forward and con- sists of: (a) a camera, which is installed between the windshield and the interior rear-view mirror and continuously measures the distance between the car and the road lane markings; (b) sensors, which register the car’s movements; (c) a control unit, which stores the information and calculates whether the driver risks losing control of the vehicle. The second system, “Lane Departure Warn- ing System”, helps preventing single-vehicle road departure accidents as well as head-on collisions due to temporary distraction. This system has limitations, since it highly depends on the number and quality of visible road markings, good lighting conditions, no fog or snow or any other extreme weather conditions.

• Volkswagen. The “Driver Fatigue Detection” system, by Volkswagen, auto- matically analyzes the driving characteristics and – if they indicate possible fatigue – recommends that the driver take a break [91]. The system continually evaluates steering wheel movements along with other signals in the vehicle on motorways and others roads at speeds in excess of 65 km/h, and calculates a fatigue estimate. If fatigue is detected, the driver is warned by information in the multi-function display and an acoustic signal. The warning is repeated after 15 minutes if the driver has not taken a break.

24 2.3.2 Independent Products

In addition to driver alert technologies developed by auto manufacturers, similar systems are available to the owners of older vehicles through the aftermarket. Some aftermarket driver alert systems include:

• EyeTracker: This system was created by the Fraunhofer Institute for Digital Media Technology in Germany [92]. It consists of at least two cameras because stereoscopic vision is an essential part of the detection method. It is presented as a small modular system completely independent of the car being used. The employed tracking strategy allows for determining spatial position of the eyes and easy calculation of the direction of driver’s gaze.

• Anti Sleep Pilot: This is the only system to date that combines subjective methods with vehicle-based methods in one system [93]. It consists of a small device that is easily mountable on the car’s dashboard. It requires from a driver to take a questionnaire prior to the trip, which will create the driver’s personal risk profile and prior fatigue status. Additionally, it monitors driving through various built-in sensors. The alertness level is constantly monitored by an alertness maintaining test that prompts the driver to respond to audiovisual cues from the device.

• Seeing Machines DSS: This is a product of Seeing Machines Ltd., specifically modified for use in vehicle systems. It uses a small, dashboard-mounted, camera for eye behavior tracking [94].

• Takata SafeTrak: This system by Takata corporation consists of a small video camera, which provides input to a sophisticated machine vision software in order to monitor the road ahead and warn drivers if they unintentionally leave their lane or if their driving pattern starts to indicate erratic behavior [95].

25 • Nap Zapper: This is an inexpensive and simple device, mounted over the driver’s ear (it has the size of a typical external hearing aid) [96]. At its core lies an electronic position sensor that detects when the driver’s head nods forward and sounds an audio alarm. This device might prove useful in certain situations such as long distance driving on monotonous, straight roads.

26 CHAPTER 3 TECHNOLOGIES, ALGORITHMS AND RESEARCH ASPECTS

This chapter presents an overview of the research aspects associated with the de- velopment of driver drowsiness detection (DDD) solutions. It summarizes relevant technologies, popular algorithms, and design challenges associated with such systems. It focuses particularly on vehicle-mounted solutions that perform noninvasive moni- toring of the driver’s head and face for behavioral signs of potential drowsiness, such as nodding, yawning, or blinking. Typically, systems based on this methodology use a video camera for image acquisition and rely on a combination of computer vision and machine learning techniques to detect events of interest, measure them, and make a decision on whether the driver may be drowsy or not. If the sequence of captured images and measured parameters (e.g., pattern of nodding or time lapsed in “closed eye state”) suggest that the driver is drowsy, an action – such as sounding an alarm – might be warranted. DDD systems based on visual input are specialized computer vision solutions, which capture successive video frames, process each frame, and make decisions based on the analysis of the processed information. After capturing each frame using an imaging sensor (Section 3.1), one or more feature detection and extraction algorithms (Section 3.2) are applied to the pixel data. Their goal is to detect the presence and location of critical portions of the image (e.g., head, face, and eyes), measure their properties, and encode the results into numerical representations, which can then be used as input by a machine learning classifier (Section 3.3) that makes decisions such as “drowsy or not-drowsy” based on the analyzed data. In the remainder of the chapter we discuss selected imaging sensors, feature extrac- 27 tion algorithms, machine learning classifiers, and conclude by looking at challenges and constraints associated with this field of research and development.

3.1 IMAGING SENSORS

The imaging sensors used in most DDD systems fall into one of these two categories, depending on the range of the electromagnetic spectrum in which they operate: (i) visible light (“conventional”) cameras; or (ii) near infrared (NIR) cameras. The for- mer can provide excellent resolution at relatively low cost, but depend on appropriate lighting conditions to operate satisfactorily. The latter can be used – often in addition to conventional cameras – to handle nighttime and other poor lighting situations.

3.1.1 Visible light cameras

The two most popular technologies for imaging sensors used in visible light cameras are either CCD (Charge-Coupled Device) and CMOS (Complementary Metal-Oxide Semiconductor). Both of the sensors perform essentially the same function, namely the conversion of light into electrical signals which can be further encoded, stored, processed, transmitted, and analyzed by specialized algorithms. In a CCD sensor, every pixel’s charge is transferred through a very limited number of output nodes (often just one) to be converted to voltage, buffered, and sent off- chip as an analog signal. An analog-to-digital converter turns each pixel’s value into a digital value. CCDs use a special manufacturing process to create the ability to transport charge across the chip without distortion. This process leads to very high- quality sensors in terms of fidelity and light sensitivity. In a CMOS sensor, each pixel has its own charge-to-voltage conversion, and the sensor often also includes amplifiers, noise-correction, and digitization circuits, so that the chip outputs digital bits. These other functions increase the design complexity and reduce the area available for light capture. With each pixel doing its own conversion,

28 uniformity is lower, but it is also massively parallel, allowing high total bandwidth for high speed. Both of these technologies have their advantages and disadvantages. The CCD technology is considered to have matured over time but the CMOS is slowly catching up with them. CCD sensors create high-quality, low noise and high resolution images. CMOS sensors are usually more susceptible to noise and their light sensitivity is lower then CCDs. In terms of power consumption, CMOS technology requires substantially less power. CMOS also leads as a cheaper to fabricate technology because CMOS chips can be fabricated on just about any standard silicon product line, which makes them fairly inexpensive. Driver Drowsiness Detection systems need cameras that can produce high quality images with high resolution and low noise level which is so far the domain of the CCD cameras; on the other hand, the chosen imaging sensor should be cheap and battery efficient, which tilts the scale toward CMOS. The final decision will be a tradeoff among these pros and cons of each candidate technology.

3.1.2 Near infrared (NIR) cameras

One of the most common limitations of computer vision systems, in general, is their inability to perform consistently well across a wide range of operating conditions, e.g., when the lighting conditions are significantly different than the ones for which the system was designed and tested. In the case of vehicle-mounted solutions that rely on visual input, the ability to tolerate large variations in light intensity (from bright sunlight to nighttime driving on unlit roads) presents a formidable challenge. The solution for ensuring operability in low lighting conditions usually includes using a NIR camera as a sensor. The term “near infrared” refers to a small portion of the much larger region called infrared (IR), located between the visible and microwave portions of the electromag-

29 netic spectrum. NIR makes up the part of IR closest in wavelength to visible light and occupies the wavelengths between about 700 nanometers and 1500 nanometers (0.7 m - 1.5 m). NIR is not to be confused with thermal infrared, which is on the opposite end of the infrared spectrum (wavelengths in the (8 m - 15 m) range) and measures radiant (emitted) heat. NIR cameras are available in either CCD or CMOS sensors, and they can provide monochrome or color images at their output. DDD systems that use NIR cameras usually employ an additional source of NIR radiation, such as LEDs designed to emit radiation in NIR frequency band, to illumi- nate the object of interest and, in that way, amplify the input values for the camera. It is common to have the LEDs focus on the driver’s eyes, because of the pupils’ no- ticeable ability to reflect infrared, which leads to a quick and effective way to locate the position of the eyes [59, 70, 97, 75].

3.2 FEATURE DETECTION AND EXTRACTION

Generally speaking, any system designed to monitor the state of an object of interest over time must be capable of detecting that object, determining what state the object is in, and tracking its movements. In computer vision systems, the detection and tracking steps are based on extracting useful features from the pixel data and building models that represent the object of interest. In the specific case of DDD systems whose goal is to determine the drowsiness state of a driver by observing the driver’s facial features, the focus is on the head, face, and eyes. Some of the most widely used facial features and corresponding feature extraction methods are described next. Light Intensity Differentiation. The grayscale representation of a face can be described as a collection of dark and bright areas, where usually the region of the eyes is much darker then the region of the nose and cheeks and the eyes are darker then the bridge of the nose. If we take each of these regions and describe them as a simple rectangle with bright or dark values inside, we are describing them in terms

30 of Haar-like features[98]. Put simply, eyes and cheeks can be represented as two rectangles vertically adjacent to each other, while darker one is on top of the brighter one. And in the other case we would need three rectangles adjacent horizontally to each other, two dark ones for the eyes and one bright one for the nose in the middle. So the face can be represented with a different combination of dark and light rectangles. Algorithms using Haar-like features scan throughout the image, looking for best matches of rectangular patterns. The most famous face detection algorithm in this category is the Viola-Jones algorithm[98]. It is based on exploiting perspective that a face (or any object, for that matter) can be represented as a collection of dark and bright regions as explained above. Haar-like features represent relationships between two different adjacent re- gions. Since there are many different regions on a human face and every region has a different relationship with different neighbor regions, the number of possible com- binations is potentially huge. For a detection system to be efficient and fast it needs to rely on a subset of combinations, focusing on relationships between adjacent rect- angle areas, which will be big enough to efficiently describe the object of interest. To determine what Haar-like features are significant and necessary in describing the object, a machine learning algorithm called Adaptive Boosting (or simply, AdaBoost) is used. The AdaBoost algorithm eliminates all small, irrelevant features and leaves just the set that is necessary. It has been shown that only two Haar-like features are enough to describe the human face. The first feature measures the difference in intensity between the region of the eyes and a region across the upper cheeks, which shows that eye region is often darker then the of the upper cheeks. The second feature compares the intensities in the eye regions to the intensity across the bridge of the nose. Again, eye regions are often darker then nose region. It was shown that these two are the most consistent features across big number of images. So if and area of interest shows to contain areas with such a relationship there is a good possibility

31 that the area of interest contains a human face (Figure 3.1).

(a) Haar-like feature (b) Applied on the (c) Face Detected

candidate

Figure 3.1: Typical face detection steps based on Haar-like features.

In the driver drowsiness detection literature, there are many solutions which have adopted this feature extraction algorithm in their systems [69, 73, 75, 67, 82]. Skin Color. Another popular method used for face detection purposes is the detection of regions of the image whose color properties are within the range of color tones associated with the human skin color [64]. Some researchers in the field of driver drowsiness detection systems have exploited the knowledge that skin colors’ Red and Green components follow planar Gauss distribution which can be used as a search parameter for finding a face [74] while others are using the YCbCr color model for easier focus on the color of the face while eliminating the intensity component[79]. Texture. Texture gives us information about the spatial arrangement of color or intensities in an image or selected region of an image. A human face can be seen as a composition of micro texture patterns. Eye regions usually contain more fine-grained texture than, for example, the cheeks. Every part of a human face can be described with its unique texture qualities. The most commonly used way of quantifying texture 32 in this domain is by expressing it in terms of Local Binary Pattern (LBP) features. A face can be divided into subsections, each with its own unique texture quality. Algorithms can search for similar spacial relationships within the image to potentially locate the face or eyes[99, 64]. Local Binary Pattern (LBP) is a simple yet very efficient texture operator which labels the pixels of an image by thresholding the neighborhood of each pixel and considers the result as a binary number. Perhaps the most important property of the LBP operator in real-world applications is its robustness to monotonic grayscale changes caused, for example, by illumination variations. Another important property is its computational simplicity. The LBP operator is a powerful means for quantita- tive texture description of an object of interest. The LBP operator is based on the observation that two-dimensional surface textures can be described by two comple- mentary measures: local spatial patterns and grayscale contrast. The basic idea is to form labels for the image pixels by thresholding the 3×3 neighborhood of each pixel with the center value and considering the result as a binary number. Take a pixel as center and threshold its neighbors against. If the intensity of the center pixel is greater or equal to its neighbor, then denote it with 1 or 0 if otherwise. We end up with a binary number for each pixel, for example 11001111. With 8 surround- ing pixels you’ll end up with 28 (i.e., 256) possible combinations, which are called LBP codes. Usually the area of interest is divided into groups of nonoverlapping 3×3 (pixel) neighborhoods. LBP codes have the ability to capture very fine grained details in an area of interest and have produced results that were able to compete with state of the art benchmarks for texture classification when the method was first applied to face recognition [99]. Soon after the operator was published, it was noted that a fixed neighborhood fails to encode details differing in scale. As a result, the operator was extended to use a variable neighborhood. A face can be described as a composition of micro-patterns. Every micro-pattern

33 has its own unique combination of textures which can be well described by LBP. Therefore, a face can be divided into subsections from which the LBP extraction can be performed (Figure 3.2). LBP histograms can serve as an input vectors to a classifier.

(a) Raw Image (b) LBP Histogram

(c) LBP Image

Figure 3.2: Local Binary Pattern example.

Eigenfaces. If we take any image of a human face and distort it so that the image becomes highly noisy, the distorted image will not look completely random and in spite of the differences between any two distorted images of any face there are same patterns which occur in them. Such patterns could be the presence of some objects (eyes, nose, mouth) as well as relative distances between these objects. These characteristic features are called eigenfaces in the facial recognition domain (Figure 3.3) . In general, we can treat any object from topological point of view as a sum of valleys and peaks and their relationships. If we take a face for example, eyes can be considered to be valleys compared to forehead, nose and cheeks. A commonly used algorithm for filtering images to emphasize on its topological structure is the 2D Gabor Function. It can enhance edge contours, as well as valleys and ridge contours of the image. A Gabor filter is a linear filter used for edge detection. The frequency and orientation representations of Gabor filters are similar to those of the human 34 Figure 3.3: Examples of eigenfaces.

visual system, and they have been found to be particularly appropriate for texture representation and discrimination. In the spatial domain, a 2D Gabor filter is a Gaussian kernel function modulated by a sinusoidal plane wave. The Gabor filters are self-similar which means that all filters can be generated from one mother wavelet by dilation and rotation. Infrared (IR) Sensitivity. Among the features that are unique only to eyes is the fact that the eye’s pupil reflects almost all of the incoming IR radiation, while the rest of the human body is absorbing it. This phenomenon can be easily detected and exploited for face/eye detection purposes in DDD systems equipped with the appropriate sensors [59, 70, 62]. In such cases, the reflection of IR light from the pupil produces a nicely shaped circle, which can be detected using the Circular Hough Transform [64]. Horizontal and Vertical Projection. The summation of grayscale pixel values in every column/row in an image is called the vertical/horizontal projection. Summed values compared to each other can reveal local minima and maxima in an image (Figure 3.4). Every object usually contains a specific set of local minima and maxima

35 Figure 3.4: Example of horizontal projection.

that can be used as characteristic to describe that object, a feature that has been used in various research studies in the literature[70, 81]. Scale-invariant feature transform (SIFT). SIFT is an algorithm in computer vision to detect and describe local features in images. The SIFT feature descriptor is invariant to uniform scaling, orientation, and partially invariant to affine distortion and illumination changes. Once it has been computed, an image is transformed into a large collection of feature vectors, each of which is invariant to image translation, scaling, and rotation, partially invariant to illumination changes and robust to local geometric distortion. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Key locations are defined as maxima and minima of the result of difference of Gaussian (DoG) function applied in scale space to a series of smoothed and re-sampled images. Low contrast candidate points and edge response points along an edge are discarded.

36 Dominant orientations are assigned to localized key points. These steps ensure that the key points are more stable for matching and recognition. SIFT descriptors, robust to local affine distortion, are then obtained by considering pixels around a radius of the key location, blurring and re-sampling of local image orientation planes. Every state of the object will produce unique set of SIFT features which is can be used to distinguish them [64].

3.3 MACHINE LEARNING CLASSIFIERS

Once the contents of the significant portions of an image have been detected, ex- tracted, and encoded, the resulting representation is used as input for machine learn- ing techniques capable of distinguishing among two or more classes. In the context of DDD systems, the problem fundamentally boils down to a two-class classification problem, namely to tell whether the driver may be drowsy or not. The most widely used classifiers in DDD systems are neural networks, AdaBoost, and support vector machines (SVM). Neural Networks. This is a mathematical model inspired by biological neural networks. A neural network consists of an interconnected group of artificial neurons, and it processes information using a connectionist approach to computation. In most cases a neural network is an adaptive system, changing its structure during a learning phase. Neural networks are used for modeling complex relationships between inputs and outputs or to find patterns in data. This machine learning method is used widely in the object detection world because it provides: (a) generalization (small distortions can be handled easily); (b) expandability (learning a different set of objects will require hardly any change to the structure of the program); (c) the ability to represent multiple samples (a class of objects can easily be represented by multiple samples under multiple conditions); and (d) efficiency (once trained, the network determines in one single step to what class the object belongs). The downside of

37 this method is that they require a large and diverse set of training examples, as well as demanding processing and storage resources. Small recognition systems, though, should benefit from all the advantages of using a neural network as a classifier [78]. Adaptive Boosting. AdaBoost is an adaptive machine learning algorithm, in the sense that subsequent classifiers built are tweaked in favor of those instances mis- classified by previous classifiers. AdaBoost generates and calls a new weak classifier in each of a series of rounds. For each call, a distribution of weights is updated that indicates the importance of the examples in the dataset for the classification. On each round, the weights of each incorrectly classified examples are increased, and the weights of each correctly classified example are decreased, so the new classifier focuses on the examples which have so far eluded correct classification. Even though AdaBoost can be sensitive to noisy data, its efficiency is what has drawn many re- searchers towards using it [69, 77]. SVM. Support Vector Machines are based on the concept of decision planes that define decision boundaries (Figure 3.5). A decision plane is one that separates between a set of objects having different class memberships. The basic SVM takes a set of input data and predicts, for each given input, which of two possible classes forms the output, making it a non-probabilistic binary linear classifier. Given a set of training examples, each marked as belonging to one of two categories, a SVM training algorithm builds a model that assigns new examples into one category or the other. A SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall on. If the sets of objects can be classified into their respective groups by a line the SVM is linear. Most classification tasks, however, are not that simple, and often more complex structures are needed in order to make an optimal separation. Using different set of mathematical equations called

38 kernels, SVM can try to rearrange the input data so that the gap between different classes is as wide as possible and the separation line can clearly be drawn between the classes. SVM is very powerful and easy to understand tool, which explains its popularity [65, 82].

Figure 3.5: Concept of a Support Vector Machine. Source: Wikimedia Commons.

3.4 CHALLENGES AND PRACTICAL ASPECTS

DDD systems are complex pieces of engineering, which should perform reliably un- der a broad range of practical scenarios. In this section we summarize some of the challenges and practical issues involved in the design and development of successful DDD systems.

3.4.1 Data collection

Due to the difficulty in collecting proper (electro-physiological, behavioral etc.) driver drowsiness data in a real world environment, researchers have resorted to safe and controlled simulated environments to carry out their experiments. The main advan- tages of using simulators include: experimental control, efficiency, low cost, safety, and ease of data collection [100, 101]. 39 Driving simulators are being increasingly used for training drivers all over the world. Research has shown that driving simulators are proven to be excellent prac- tical and effective educational tools to impart safe driving training techniques for all drivers. There are various types of driving simulators in use today, e.g., train simu- lator, bus simulator, car simulator, truck simulator etc. The most complex, such as the National Advanced Driving Simulator (Figure 3.6), have a full-sized vehicle body, with six-axis movement and 360-degree visual displays.

Figure 3.6: National Advanced Driving Simulator.

On the other end of the range, there are simple desktop simulators such as the York Driving Simulator, which are often implemented using a computer monitor for the visual display and a video game-type steering wheel and pedal input devices. These low cost simulators are used readily in the evaluation of basic and clinically oriented scientific questions [102, 103, 104, 105]. Some research teams are using automated vehicles to recreate simulator studies on a test track, enabling a more direct comparison between the simulator study and the real world[106]. One important limitation of using driving simulators also is that the drivers do not perceive any risk. The awareness of being immersed in a simulated environment might give a behavior which is different than that on an actual road [107]. However, the consensus among researchers is that driving simulators can create

40 driving environments that are sufficiently similar to road experiments.

3.4.2 Performance requirements

A successful DDD system must be fast, robust, reliable, nonintrusive and proactive. The ultimate goal of such systems is to work in a real world, which means that the system should reach conclusive decisions about the state of the driver (and the need to issue an alert) in a timely manner. Failure to respond within a reasonable amount of time (a few seconds) may turn out to be catastrophic and opposite to the original goal of having these systems in the first place. Real-time performance is desired, but it might come at the cost of increased computing power, which would also increase battery consumption and the cost of the overall solution accordingly. A DDD system has to be robust and should perform under various conditions, including severe weather, large variations in overall lighting, bumpy roads (and their impact on the quality of the acquired video frames), and noise, to mention but a few. Once again, there will be a point beyond which a system will ultimately stop working properly. The challenge is to bring this point as far away as possible from normal operating conditions, without sacrificing any other major aspect of the solution. A DDD system must be reliable, since ultimately there are human lives at stake. It belongs to the categories of systems for which the cost of a false positive is significantly lower than the cost of a false negative. In other words, it is best to (occasionally) issue an alert when none was necessary (a false positive) than to miss a truly serious situation that might lead to an accident (a false negative). A DDD system should be nonintrusive. Its setup and components should be used in a way that does not disturb normal driving. The driver’s undivided attention has to be on the road and the situation ahead of the car. The driver should not be distracted by audio and visual distractions coming from the system. Moreover, the hardware portion of the system should be small and discreet and properly placed, so

41 as to not occlude part of the driver’s view. Moreover, physical interactions between the system and the driver should be kept to a minimum, basically providing a quick setup and calibration in the initialization phase, a friendly way to dismiss false alarms (if any), and very little else. Finally, a DDD system has to be proactive. It has to be capable of attracting the driver’s attention when necessary. Usually, a DDD system will use audio/visual cues to communicate warning/alert messages to the driver, reporting suspicion of drowsy behavior and trying to prevent a potentially dangerous situation. Care must be taken to avoid causing sudden erratic behavior or startling the driver due to a loud alarm, for example. Moreover, fallback provisions (e.g., applying the vehicle’s brakes) might be implemented if it has become clear that the driver is not responding to the system’s warnings.

42 CHAPTER 4 RELATED WORK

There have been many recent efforts in the field of driver drowsiness detection using behavioral methods. As explained in Chapter 2, such methods are usually focused on monitoring the driver’s facial expressions and recognizing patterns that might describe the driver’s internal state. Visual clues that can be monitored for determining driver’s sleep deprivation level include: sudden variations in head pose (presumably associated with nodding), frequent yawning, rapid blinking or prolonged eye closure. A driver’s facial expressions and head pose can be monitored using camera sensors and an array of image processing and computer vision algorithms, e.g., face detection, head pose estimation, and feature tracking. In this section, we highlight prominent examples of recent work, classified into three categories: head pose estimation, yawning, and eye state estimation.

4.1 HEAD POSE ESTIMATION

A driver’s head pose can provide significant amounts of information regarding the driver’s internal state, level of attention and potential sleep deprivation. The work in [55] proposes that monitoring driver’s head pose and orientation can give enough clues to predict the driver’s intent. In order to determine the drivers’ head pose, the driver’s face has to be detected first. This is a typical primary step in any behavioral methodology based on monitoring a subject’s face. The Viola- Jones face detection algorithm [98] has become a reference upon which other face detection methods can be built. An attempt on improving this algorithm is done

43 by training three classifiers (focused on horizontal rotation of the driver’s head) in order to successfully detect front-facing, left-facing and right-facing faces [108]. The way to determine the head’s pose and orientation is by analyzing isophote features of the driver’s head. The isophote properties can capture important information about the face, such as direction, orientation and curvature. Two histograms are obtained from the isophote features: a histogram of direction and a histogram of curvature. The histograms’ bin counts serve as the input data to a K-Nearest Neighbor (KNN) classifier [109] that will determine where the driver’s head is facing at, relative to the camera. First system performing in real-time(i.e, at 30 frames per second) [54] was a three- part system that can detect the driver’s head, provide initial head pose estimation and continuously track the head’s position and orientation. Similarly to the previously explained work, face detection is the first step of the system’s architecture. Three Adaboost cascades are created to encompass left-facing, front-facing and right-facing faces in relation to the recording camera since they are the most commonly occurring poses of driver’s head. Once successfully detected, facial region features are expressed as a localized gradient orientation (LGO) histogram [110] which has been shown to be invariant to image’s scale and rotation as well as robust in relation to variations in illumination and noise. The LGO histogram is used as an input to a Support Vector Regression classifier [111], that can provide the driver’s current head pose. One of the most interesting aspects of their work is that it is leaning on the use of augmented reality, by using a virtual environment that simulates the view space of the real camera [112].

4.2 YAWNING

The capacity to estimate whether a driver is yawning and inferring, based on the frequency of yawning, whether he or she may be too drowsy to drive constitutes a

44 challenging research problem. The ability to detect yawning state from input captured by a camera requires detecting features and learning states based on the relative position and state of the mouth and eyes. In recent years, several researchers have been working on the topic; two of those efforts are briefly summarized below. Work in [57] proposed a fully automatic system, capable of multiple feature de- tections (among which is yawning), combating partial occlusions (like eye or mouth). In order to detect the driver’s head the authors resorted to 2-part approach that first detects lips using specific lip color predicates in combination with mouth corner detec- tion, followed by eye detection based on locating pair of non-skin-color regions above the previously detected lip region. Once those three anchors have been established, the head position and dimensions can be calculated. In order to determine the driver’s attention, eye gaze is used as a main point. It is calculated geometrically by using two parallel eye gaze rays and their relationship to dashboard. To strengthen the attention level gained from reconstructing the gaze direction of a driver, additional aspects are tracked, e.g. yawning, which is determined by monitoring lip corners direction change as well as lip corner color change. The proposed approach consists of eight main steps, as follows [57]: (1) auto- matically initialize lips and eyes using color predicates and connected components; (2) track lip corners using dark line between lips and color predicate even through large mouth movement like yawning; (3) track eyes using affine motion and color predicates; (4) construct a bounding box of the head; (5) determine rotation using distances between eye and lip feature points and sides of the face; (6) determine eye blinking and eye closing using the number and intensity of pixels in the eye region; (7) reconstruct 3-D gaze; and (8) determine driver visual attention level using all acquired information. More recently, [56] focuses more specifically on yawning analysis. In order to detect the mouth, a Haar-like-feature-based classifier is trained. Once the mouth

45 is successfully detected, its state is determined by using a Support Vector Machine classifier [113] which is trained to distinguish between two different sets of data: open mouth state (yawning) and closed mouth state. The authors performed training of SVM by providing about 20 yawning images and more than 1000 regular images along with few videos from which 10 yawning images and 10 regular images are added to the training set. The videos are than used as testing sequences. A correct detection rate of over 80% is achieved.

4.3 EYE STATE ESTIMATION

The most commonly used visual clue for determining a driver’s drowsiness level is state of the eyes. Several recent approaches to solve this problem are summarized next. Authors of [75] decided to explore the advantages of infrared camera in order to monitor driver’s eyes. Some of the infrared sensors’ main advantages are the ability to operate during nighttime and their lower sensitivity to light changes. The Viola- Jones algorithm is used once again to initially detect the driver’s face and eyes. If the eyes are successfully located, the eye regions are extracted for further processing. The algorithm can detect eye corners from given eye regions and – with given geometrical and statistical restrictions – the movement of the eyelids can be detected and used as a measure of eye closeness. The frequency of blinking is used as a measurement of how fatigued is the driver. The algorithm can track eyes in higher speeds after initial eye detection and is gender independent as well as resilient towards low levels of lighting. The work by [82] also determines driver’s drowsiness by monitoring state of the eyes of a driver. Their detection process is divided into three stages: (i) initial face detection based on classification of Haar-like features using the AdaBoost method; (ii) eye detection by SVM classification of eye candidates acquired by applying radial-

46 symmetry transformation to the detected face from the previous stage; and (iii) ex- traction of local binary pattern (LBP) feature out of left eye candidate, which can be used to train the SVM classifier to distinguish between open eye state and closed eye state. The LBP of an eye can be thought of as a simplified texture representation of it. Both closed- and open-eye LBPs are distinguishable by two-state classifiers such as the SVM. In the work in [79] it is proposed that combining several basic image processing algorithms can increase the performance of eye state detection system and bring it closer to real time performance. Moreover, two different methodologies are proposed for daytime and nighttime detection. In order to detect the driver’s face during day- time, a combination of skin color matching with vertical projection is performed. This color matching will not work during night, therefore vertical projection is per- formed on the cropped face region acquired after adjusting image intensity to have uniform background. For locating the eyes, horizontal projection is applied on the previously detected face. The eye region is then converted to a binary image in or- der to emphasize the edges of the eyes. The converted image is processed through a complexity function, which provides an estimate of its complexity level: an im- age containing open eyes has more complex contours than another image containing closed eyes. The difference in complexity level can be used to distinguish between those two states. In a follow-up work [73], few modifications to their original methodology are in- troduced. To increase success rate of face detection, an optimized Haar-like feature approach is used, due to its better detection rate and ability to reduce the number of false positives. An AdaBoost classifier is fed with both the results of applying a Canny edge detector to the image as well as the original image, resulting in increased face detection performance. An additional change that consequently provides better re- sults in the eye detection state is that the detected face image is modified to eliminate

47 the unwanted area about the head that can skew the results of horizontal projection. Additionally, smoothing part of the horizontal projection curve is eliminated, thereby increasing the speed of the algorithm. A complexity function is adopted to be able to compensate for environmental changes. Combined, all the optimization changes have increased the speed of the overall system to approach real-time performance. Recent work [68] addresses the eye state detection problem by extracting discrim- inative features of the eye with unique intensity spatial correlation, such as the color correlogram [114], and using a reliable machine learning classification method (e.g., AdaBoost) to distinguish between open and closed eye states. Several papers recognize the complexity of building the whole driver’s drowsiness detection system and in return only focus on significant modules of the system. One example of such type of effort is a recent paper [66]. They strictly focus on eye state detection and identification while assuming that other modules, such as face and eye detection, are given. They propose building an eye model based on the Embedded Hidden Markov Model (EHMM) using color frequency features of images containing closed and open eyes extracted by applying two-dimensional Discrete Cosine Trans- formation (2-D DCT) on them. The low frequency coefficients are used to generate EHMM observation vectors that are then used to train the EHMM model of an eye. The approach in [65] follow the most commonly used approach of face and eye detection, feature extraction and eye state analysis based on different feature set for different eye state. In this paper they propose the computation of least correlated local binary patterns (LBP), which are used to create highly discriminate image fea- tures that can be used for robust eye state recognition. An additional novelty of the proposed method is the use of independent component analysis (ICA) in order to derive statistically independent and low dimensional feature vectors. The feature vectors are used as classification data for an SVM classifier, which provides informa- tion about the current eye state. This information is used to determine the frequency

48 of blinking, which is then used to determine the state of the driver. Recent work in [64] propose a sophisticated system that monitors and measures the frequency of blinking as well as the duration of the eye being closed. In order to detect the eyes, a skin color segmentation algorithm with facial feature segmentation algorithm is used. For skin color segmentation, a neural network is trained by using RGB skin color histogram. Different facial regions, such as eyes, cheeks, nose, mouth etc. can be considered as different texture regions which can be used to subsegment given skin model regions of interest. Each segment is filtered additionally to create SURF features that are used to estimate each class’s probability density function (PDF). The segmented eye region is filtered with Circular Hough transform in order to locate iris candidates, i.e., the location of the driver’s eyes. To track the eyes over an extended period of time, Kalman filtering is used.

49 CHAPTER 5 FIRST PROTOTYPE

This chapter describes the requirements, constraints, basic architecture, and selected algorithms associated with the first prototype of our driver drowsiness detection sys- tem. It consists of four stages, Figure 5.1:

1. System Initialization – Preparation

2. Regular Stage – Eye Tracking with Eye-State Analysis

3. Warning Stage – Nod Analysis

4. Alert Stage

Figure 5.1: The four stages of our Drowsiness Detection System.

Our approach follows a behavioral methodology by performing a non-invasive monitoring of external cues describing a driver’s level of drowsiness. We look at this complex problem from a systems engineering point of view: how to go from a proof of concept prototype to a stable software framework that can provide solid basis for future research in this field.

50 5.1 SYSTEM INITIALIZATION – PREPARATION

The initialization stage consists of analyzing the environment and optimizing some parameters for best performance. The image capture device (in our case, an Android- based smartphone) should be positioned as to enable appropriate distance and proper viewing angle between driver and camera. Moreover, the lighting conditions must be adequate (above a minimum threshold) and any potential occlusions of the camera view by the vehicle’s internal components should be avoided. Our system extracts user-specific features during the initialization stage. Those features include: skin color, head position and eye features. Once these features have been extracted, the process of localization of key components (head and eyes) takes place. We use the Viola-Jones face/eyes detection algorithm[98], due to its speed and simplicity. The algorithm performs well if the user is facing the camera but the performance deteriorates fast as users gaze moves further away from the camera. Initial tests have shown that algorithm performs within the satisfactory boundaries both in terms of quality and speed. Typical limitations are shown on Figures 5.2 and 5.3.

(a) Head rotating to the right - eyes still (b) Head rotating to the right - eyes lost

located properly in the following frame

Figure 5.2: Eye detection algorithm: limitations due to horizontal angle change.

Tests have shown that the face detection rate is high as long as the driver’s gaze 51 (a) Head tilting forward - eyes still lo- (b) Head tilting forward - eyes lost in the

cated properly following frame

Figure 5.3: Eye detection algorithm: limitations due to vertical angle change. does not deviate more than 30 degrees relative to the camera. Conveniently, this allows for the positioning of the phone around the dashboard area of the car which is generally most commonly used place for positioning phone device. This position gives us a clear line of sight of a driver without the steering wheel occluding the view. After proper positioning of the camera phone, the main detection algorithm will locate the drivers face and consecutively the driver’s eyes. Figure 5.4 shows a typical successful initial face and eyes detection.

Figure 5.4: Successful initial face and eyes detection.

In certain circumstances, when the phone position is off or the lighting conditions are significantly brighter or darker then normal, the system might not be able to initially locate face and eyes of the driver. In those cases, additional drivers help

52 might be requested to manually extrapolate locations of key components. Driver will be exposed to very simple user interface where he will point towards location of his/ her face by simply touching the area of the screen where face is located. He/she might also be required to keep their eyes closed and consequently open for certain duration of time in order to allow system to learn how to distinguish between those two states, a process that is explained in more detail in section 5.1.2. Once all the required information has been acquired, the extraction of key features begins. This process takes a few seconds, during which each camera frame key features will be extracted individually. Since each frame is expected to be slightly different than the previous frame, our algorithm ensures that the built model will only include the unique qualities that were constant and consistent throughout all the used frames; all of the differences should be discarded as noise. Building this model helps us know what to specifically search for in the future frames, thereby significantly increasing the speed and robustness of the system.

5.1.1 Skin Color Feature Analysis

The idea behind extracting skin color features of the driver’s face is to create a user specific skin model that can be used as alternative to or as complimentary face/eye detection and tracking method. According to [115], the chroma components for human skin color using a YCbCr color model fall into the following ranges:

• Cr Range: 133 - 173

• Cb Range: 77 - 127

This range is purposefully created too broad and too general in order to encompass as many skin varieties as possible. However, such a broad range does not compensate for illumination variation and also introduces a large number of false positives, Figure 53 5.5a clearly shows that – using only these ranges – even though the face is successfully located, a large number of incorrect detections (e.g. in the t-shirt area) appeared in the result as well. Our system overcomes this limitation by extracting the chroma values from the (previously detected) face pixels and producing a more accurate, user-centric range for chroma values. This process is computationally inexpensive and can significantly reduce the number of false positives, as shown on Figure 5.5b.

(a) Generalized skin color chroma range (b) User-specific skin color chroma range

Figure 5.5: Chroma-based skin detection comparison.

5.1.2 Eye Model Analysis

Our system should be capable of reliably distinguishing between open and closed eyes. For that purpose, a Support Vector Machine classifier [113] was chosen. During the training stage, the classifier is fed with examples consisting of grayscale images of equal size that were cropped from frames used in the initialization stage, so as to contain only the (open or closed) eyes portion of the frame.

5.1.3 Head Position Analysis

Our prototype uses two main factors to determine if the driver is in a state that may require issuing a warning: (i) the duration of the driver’s “closed eyes” state; and (ii) the detection of a characteristic type of nodding while the eyes are closed. We are

54 interested in the type of nodding associated with dozing, which typically consists of a slow vertical head drop with rapid recovery back up. During the initialization stage, when the system interacts briefly with the driver for the sake of initial calibration, an analysis of the relative location of the driver’s head within the frame allows the calculation of all the necessary parameters for correct functioning of the nod tracking method. Those parameters are used to create two thresholds. The nod tracking method will be based on driver’s head’s relative position compared to those two thresholds. In rested state, the driver keeps his/hers head in a fairly stationary position, above the upper threshold. The lower threshold can be determined by exploiting physical properties of a human body. Stretchability of a human neck are know as well as radius of motion of human head. It can be determined how much can head physically lean forward while nodding. Lower threshold can be statistically determined as a value beneath which we can claim that head nodded with high degree of certainty.

5.2 REGULAR STAGE - EYE TRACKING WITH EYE-STATE ANAL- YSIS

At the end of the initialization stage, our system has successfully created the skin color model, the eye state model, and the head position model for the current driver. It is now ready to start actively tracking the driver’s eyes and monitoring the driver’s drowsiness level. If the eyes are properly located in the initialization stage, tracking eye position through subsequent frames is a relatively straightforward task. In the current proto- type, the tracking area will be dynamically defined based on the speed and direction of eye movement in previous frames. Basically, we apply the detection method based on the Viola-Jones algorithm to a candidate area where the eyes are expected to be found in subsequent frames. Since this corresponds to a small region of the overall

55 frame, tracking is significantly faster compared to initial detection. Additionally, in cases where the eyes are occluded in the subsequent frame, e.g. when the driver’s head tilts forward during nodding, we use a skin color-based local- ization method to confirm that the face is still where it is expected to be, even though the eyes are occluded. Once the eyes are successfully tracked in a given frame, the pixels containing only the eyes’ region are cropped, converted to grayscale, and fed to the SVM classifier, which will determine the current state of the eyes: open or closed.

5.3 WARNING STAGE - NOD ANALYSIS

The precondition for reaching the warning state is that either the driver keeps his or her eyes closed for prolonged amount of time or that specific head movement that can be classified as nodding started to happen. In our prototype, every time closed eyes are detected at a given point of time, timers will be activated to determine the duration of that state. Based on those timers, the system tries to determine if the driver is blinking or something else is happening. If the timer exceeds the duration of time considered to be safe to keep eyes closed while driving, the system switches to the warning stage. Once in the warning stage, the eyes continue to be tracked in the same manner as in the regular stage. The biggest difference is that the eye location and eye state are monitored more closely in order to detect if prolonged state of closed eyes continues or if a driver starts (or continues) to nod. If the behavior persists, it would mean that it is time to switch to final stage and alert the driver of the potential danger. If, on the other hand, the eyes become open at any time while the system is in a warning stage, that is used as a signal to return system into regular tracking stage and reset all the counters.

56 5.3.1 Head Position Monitoring

Sudden, sharp drop of the driver’s head followed by slow recovery back up to the normal position might indicate that the driver is nodding and could be showing signs of sleepiness. In general, the position of a driver’s head while driving feeling rested is usually above the upper threshold that system extrapolated in the initialization stage and does not change significantly over time (Figure 5.6).

Figure 5.6: Nod stages: eyes above upper threshold.

When the eyes start moving vertically down and their position crosses the up- per threshold, the system considers that to be a potential beginning of the nodding sequence (Figure 5.7a). If the head continues to drop vertically, portrayed with a continuous downward movement of the eyes, eventually the lower threshold will be crossed (Figure 5.7b). In the case of a true nodding occurrence, crossing the upper and lower thresholds happens fairly quickly, while recovering back up is usually a much slower process (Figures 5.7c and 5.7d).

57 (a) Head tilting (b) Head tilting (c) Head recovering (d) Head recovering down – eyes beneath down – eyes beneath up – eyes above lower up – eyes above up- the upper threshold. the lower threshold. threshold per threshold: Nod is

complete

Figure 5.7: Nodding detection method and its stages.

5.4 ALERT STAGE

If the eyes are kept closed for exceedingly long periods of time and/or if the system detects that the driver has started nodding, it is time to alert the driver that his safety is in danger. This is accomplished in the current prototype by combining a high-pitch audio alert with a bright, colorful, visual warning on the smartphone’s screen.

5.5 PRELIMINARY EXPERIMENTS

In order to fulfill all of the challenging prerequisites we have to test out limitations of our chosen off-the-shelf components. Some of our preliminary tests were devised to help us out with that. We started by setting our work environment in MATLAB. For the purposes of testing, we simplified our dynamic solution to its core, into a horizontal, linear solution that can perform basic face/eye detection as well as simple classification task of determining eye state. The goal of the devised set of experiments is to test the basic performance in various ways. From the very beginning we decided to use Viola-Jones algorithm, Haar-like fea- ture based face/eyes detection algorithm available in MATLAB [98]. It is known for 58 being stable and computationally non-intensive algorithm. Haar-like feature can be described as a set of two or more adjacent rectangular regions organized in a specific way in a given detection window. This algorithm proves to be compute non-intensive since it’s core operation is simple summing up of the pixel intensities in a given re- gions, followed by calculation of differences between them. That difference is then used to categorize subsections of an image to be face/eyes or not. Also, for differentiating between open and closed eyes we decided to go with a proven two-class Support Vector Machine (SVM) classifier [113]. SVM is a non- probabilistic binary linear classifier. It takes a set of input data and predicts, for each given input, which of two possible classes forms the output. This is perfect since our basic problem can be defined as two class problem: open eyes vs. closed eyes.

5.5.1 Camera rotation test

The aim of this test is to find out how does movement of the camera and change in the viewing angle relative to the driver influences the performance of used face/eye detection algorithm as well as the consistency of its detection. The experiment was setup in following manner: the driver is sitting inside the car, looking towards the road; the camera, approximately 50 centimeters away from the driver’s head, will make a semi-circle around the driver’s head, keeping the same distance throughout; the camera starts pointed at driver’s left profile and ends pointing at driver’s right profile. The driver’s head stays stationary, only the camera moves around it while keeping constant distance from it. Video used for testing contains 310 frames of driver’s head from various angles. Figure 5.8 shows the process and the results. We can deduct that the detection algorithm performs consistently. If we define an angle of 0 degrees to be when the camera is directly facing the driver, we can conclude that the range in which it was reliably detecting drivers eyes falls approximately within 25 degree viewing angle in both direction from a driver. This result is very encouraging

59 since this means that we can position the camera of the system conveniently on car’s dashboard (usually has available slot for attaching devices such as phones etc.) which is within driver’s reach.

(a) Test setup (b) Angle range results

(c) Consistency results

Figure 5.8: Camera rotating: angle change limitations.

5.5.2 Head rotation test

In this test, the camera stays directly facing the driver while the driver rotates his head to the right, returns to the starting position and then rotates the head to the left and back to center again. We are again looking to see what is the angle of breaking for our head/eye detection algorithm and what is its consistency across the frames. Figure 5.9 shows the tests setup and results. Difference in detection angles is much smaller than in previous test, only around 5 degrees. It seems that the algorithm is

60 consistently and reliably detecting drivers eyes as long as the driver is facing towards the camera and his gaze does not deviate more then 35 degrees in both directions from the front facing position. Even though the difference in detection angles is much smaller, there is an obvious false positive produced when drivers head turns almost 90 degrees to the left. Though one eye is visible the other eye is completely occluded. The background influenced the algorithm into wrong conclusion.

(a) Test setup (b) Angle range results

(c) Consistency results

Figure 5.9: Head rotating: angle change limitations.

5.5.3 ”Real-World” Test

For pure test of consistency of the detection algorithm we recorded an approximately 900 frame video sequence of a driver behaving naturally behind the wheel of the car. He is moving around, adjusting his seat, mirror, turning to face the passengers. We want to see if there are false positives and in what volume and how reliable or basis detection algorithm really is. Figure 5.10 shows quite good consistency: there are no sudden changes except with one spike for the duration of one frame. This spike 61 actually is a true positive.

Figure 5.10: Real-World test results

5.5.4 Open vs. Closed Eyes test

To test the behavior of our chosen classification algorithm we decided to go with the simple solution of using gray-scaled and cropped image regions containing eyes and to feed that set of pixels as an input to the SVM classification algorithm. A special video containing 571 frames of a driver sitting in a driver’s position is used. Driver is moving minimally. Every frame is manually marked as containing open or closed eyes. Open and closed eye samples are roughly the same in number, 304 containing open eyes and 267 containing closed eyes. 70% of randomly selected samples were used to train the SVM model on which the remaining 30% of the samples were tested. Using available SVM MATLAB version, our preliminary results showed success rate of 96% . We are acknowledging the fact that the sample size is very small and that we only had samples relating to one subject so instead of focusing on high accuracy rate we concluded that building a system in which an SVM eye model is user-specific is encouragingly good idea. It can simplify significantly our system as well as provide a dose of robustness and reliability to it. Such an eye model can always be upgraded,

62 thus increase its quality, through frequent use of the system. Some of the classification results are shown on Figure 5.11.

(a) Closed (b) Open

(c) Called closed (d) Called open (e) Called closed

Figure 5.11: Support Vector Machine result examples: (a) & (b) correct; (c), (d) & (e) incorrect.

63 CHAPTER 6 ANDROID IMPLEMENTATION

This chapter describes selected design and implementation aspects of our Android- based prototype for driver drowsiness detection(DDD).

6.1 INITIALIZATION STAGE

The initialization stage, as first described in 5.1, plays a key role in the overall per- formance of the complete system. It serves the purpose of properly extrapolating and preparing necessary custom, user-specific features without which the system cannot operate. Those features are: skin color information, head position and good sample size of open and closed eyes of a current user in order to build a suitable SVM eye model [113]. This is the stage where interaction with the user is the heaviest, and yet it must be kept seamless, simple and non-intrusive. The initialization steps where user’s assistance and responses are needed have to be as user-friendly as possible. The following subsections are organized to describe the general flow of the initialization stage; extrapolation and analysis techniques of head position, skin color and eyes; as well as how the data is recorded and organized and how the user’s interaction platform is implemented.

6.1.1 Algorithm outlook

It is safe to assume that the person installing this application will be the recurring user of it. User’s data has to be saved in order to be reused at the next appropriate time. Information like custom skin color analysis and existing SVM eye model can certainly be reused. Even head position information can be reused due to the assumption that 64 position of the camera and typical seating arrangement of a driver usually do not change significantly over time. But head position information is easy to obtain and thus it is prudent to do so at every startup of the system. All of the user’s data are saved and organized in a database that is easy to manipulate and update. We have built a database management class within Android using SQLite. It is known for being stable and simple and for allowing fast querying.

Figure 6.1: Top: four stages of our Driver Drowsiness Detection System. Bottom: outline of the initialization stage.

At every startup of the system, the user chooses either to create a new profile or to reuse an existing one from the list. If an existing profile is chosen, the user will be prompted with an offer to update it or not. If a user does not want to update the profile, the system proceeds toward the next stage, namely the tracking and monitoring of head position using preexisting parameters. In case updating of a profile is needed or creation of a new profile is selected, these three major steps

65 are followed: (i) collection of basic user information such as first and last name; (ii) recording of open and closed eye sequences and head movement sequences; and (iii) data extrapolation and analysis from recorded sequences. Figure 6.1 (bottom) shows a flowchart of the initialization stage. Depending on the software capabilities of the currently-used Android device, it is possible that not all of the system’s capabilities will be supported. Specifically, it is highly probable that the built-in Android face detection algorithm does not contain information about location of the eyes within the face or that the captured face is blurred or a significant part of the face is missing, thus an additional step is needed: manual confirmation with eye region extrapolation if necessary. All of these steps are described next.

6.1.2 Record sequence

Recording sequence is a sensitive and crucial portion of the system. It is sensitive because user interaction is required and ,more specifically, user will be asked to stand still for small periods of time while keeping his eyes closed or open. The biggest chal- lenge is how to interact seamlessly with a user while his eyes are closed. Successful recording of a sequence of images in which user holds his eyes open and a sequence in which the eyes are closed can greatly simplify the following processes of confirma- tion and extrapolation of features. A third recording sequence serves the purpose of detecting the limitations of Android face detection process by asking driver to turn his head to the left and right, away from the camera as well as to artificially nod.

Open eye segment recording

At the beginning, user is welcomed with a message describing the procedure and explaining what is the purpose of it. The message will describe that two recording sequences need to be taken: one in which the user will keep his eyes open and one in which his eyes should be closed. When the user is ready to proceed he can tap

66 anywhere on the screen to continue. Firstly, open eye sequence is recorded. The process takes 5-6 seconds, enough time to capture 300 images in which the face was successfully detected. The user is encouraged to stay still and not blink. Also, a 5-second countdown message precedes the recording to give user time to settle down and get ready. This message is dismissed when the user is ready to proceed with the recording. The five-second counter starts counting down immediately and as the counter counts down the display will show: 5, 4, READY, DON’T BLINK, GO!. Once the counter reaches zero, GO! message disappears from the display and user can see a preview of what the front facing camera is recording and information about successful acquisition of images. Once the number of recorded images reaches 300, a message of successful completion is displayed followed with the information about what goes next.

Closed eye segment recording

The system is now ready to record the sequence of images containing faces with closed eyes. The text within the messages is similar to the text of the preparation message for previous recording. It describes the expected recording duration of roughly 5- 6 seconds and that eyes should be closed and head still. User is warned that the countdown will last 5 seconds and along with the countdown messages displayed there will be sound signaling the countdown process as well. When user is ready to proceed, a tap on the screen will erase the message and the counter will start. Messages displayed are: 5, 4, READY, DON’T PEEK, GO!. They will be accompanied by a sequence of 3 short beeps followed by one prolonged beep with a duration of 2 seconds emphasizing that countdown is almost done and recording will commence. When recording begins display will contain the preview of what the front-facing camera is capturing and information about acquired number of images. Recording process is accompanied with appropriate beeps whose duration and timing are dependent upon

67 the number of acquired images. First, a short beep of 1 second in duration will be played when 100 images are obtained. The second milestone is signaled with a 2 second long beep after obtaining 200 images. When all 300 images are acquired, the longest beep of 3 seconds will mark the ending of this recording process.

Face detection limitation recording

The third recording segment consists of three 7-second recordings that capture: (i) driver turning the head to the right for 90 degrees; (ii) driver turning his head to the left for 90 degrees; and (iii) driver simulating nodding. At the very beginning, the display holds explanatory message of what is to be expected in the following record- ing segment. When ready to proceed, driver taps the screen and the first recording sequence is about to begin. A message is displayed specifying the duration of the recording, which is 7 seconds, and the motion that driver is supposed to perform. When driver is ready to proceed, he taps the screen, the message is erased and 5- second preparation counter starts. During these 5 seconds the following preparation information is displayed: 5, 4, READY, TURN RIGHT, GO!. When recording se- quence starts, the driver starts slowly rotating her head to the right and once she cannot go further returns back to the starting position. When 7 seconds of recording have expired, a message is shown. The message explains that the following recording sequence will also last for 7 seconds and that driver is supposed to rotate her head to the left and back in that time. Tapping of screen starts the 5-second countdown sequence: 5, 4, READY, TURN LEFT, GO!. Recording starts and driver in a steady manner rotates his head to the left and back to the center. Once the recording is com- plete it is time to proceed to the last recording sequence: nod simulation. Displayed is the message with the instructions about how the nod is supposed to be performed and that it is also going to be a 7-second recording sequence as well. Driver is sup- posed to start vertically dropping his head down until her gaze is completely off the

68 road ahead and focused on his lap and recover back up. Before recording starts, the 5 second countdown sequence is introduced: 5, 4, READY, NOD, GO!. System collects all preview frames for all three recording sequences and their corresponding face detection information for each frame. This is needed to properly analyze the sequences and extract useful informations about limits of face detection during these important behaviors. Of course user’s help is needed to properly extrapolate the needed informations. The process is described in 6.1.5.

Recording process

The Android API offers ease of access to the key hardware component, the camera, via software driver with the same name. This driver provides various features and supports very useful capabilities. Aforementioned acquisition of images contains: obtaining raw pixel data from an image taken by front facing camera in which a face was detected; obtaining position and the area size of a detected face within the image; and finally extracting only pixel bytes representing only luma and chroma components that correspond to the area where the face is by using obtained position and the area size from already obtained raw pixel data. Conveniently, Android Camera driver can help automatize and speed up the process of obtaining necessary information. Since the API level 1 Camera driver provides raw pixel data of a current image taken and displayed on the screen to anyone who subscribes to receive it by implementing Camera.PreviewCallback interface and necessary function that serves as a receiver of the data. As just described, to obtain raw pixel data of a preview frame is relatively easy and simple process. The challenge is to save the ones containing face and discard ones that do not. Starting in Android API level 14, the Camera driver has built in face detection capabilities. In a similar fashion as with preview frames, information about detected face is provided to anyone who subscribes to receive it. The system implements

69 Camera.FaceDetectionListener callback interface that allows this functionality. Potential synchronization issues might arise here. System has to be certain that the preview frame and the detected face do correspond to each other. Proper work is achieved by manipulating the synchronization flag. We determined experimentally that chronological order of the data arrival is: detected face information followed by the preview frame. So when the information regarding detected face arrives (with detection confidence above 50 %) the flag is raised signaling that the next incoming preview frame should be saved. The acquired information gets properly indexed and stored and the flag is lowered. The system is ready to acquire another data pair(detected face, corresponding preview image). Once all 300 pairs are detected and stored, the recording is considered completed. Recording process for following sequences will differ: head rotation and nod sim- ulation. Synchronization as described is still needed but without the confidence con- dition for successful face detection. Now all of the frames within the given 7 second interval are saved along with any acquired face detection information.

Default image formatting

The default image capturing format on the Android platform is NV21(YCrCb 4:2:0) format. The raw data representing captured preview image is stored into a byte array formatted in a following way: let’s assume that the taken image has pixel width of X and height of Y; first X x Y bytes in the array represent luma or Y component of the image, followed by X x (Y/2) byte pairs representing chroma component of the image. Information about detected face contains specific coordinates of face location within the given image. Coordinates fall within the bounds of (-1000, -1000) representing the top-left of the camera field of view, and (1000, 1000) representing the bottom-right of the field of view. The reason for this is to accommodate all of the various display sizes and display resolutions supported on mobile devices. The

70 given coordinates need to be scaled to fit within the bounds where top-left is (0,0) and bottom-right is (width, height) of the image.On the luma region of the face, the system will perform the eye region extrapolation while on the corresponding chroma region of the face extrapolation of the user specific skin color information. The area of interest containing the face is extracted while the rest of the image is discarded. Freeing up of the unused memory spaces reduces memory overhead.

6.1.3 Head position localization

Head position localization is a very important process because it is used to determine positions of the two thresholds used in the nod analysis. The necessary parameter to achieve this is the location of driver’s eyes. Position of the eyes within the face region is very well known parameter assuming that face is in the same horizontal plane as camera and that camera captured the en face image of it, Figure 6.2. This very often is not the case due to the nature of the setup and environment. Some Android devices might have eye detection feature supported. If this is not the case, the eye region have to be located manually with the user’s help. All of the head and eye coordinates collected(either manually or automatically) are used to determine the average values for the face and eye region and their most common location within the preview image. Each region can be fully described with their top-left and bottom-right coordinate values, as shown in the Figure 6.2. Two coordinate pairs are stored in an array and provided to the localization algorithm. Extrapolation of the face and eye region bounds with calculation of the threshold positions follows. Firstly, process of the eye region and the upper threshold localization is described. Four variables are created: xeLeft, xeRight, yeUp and yeDown.They will hold the ex- trapolated coordinates of the eye region. They are initialized with the first two coordi- nate pairs from the array: (xeLeft, yeUp) = (xeMin[0], yeMin[0]) and (xeRight, yeBot-

71 Figure 6.2: Bounding box of a detected object is defined with two sets of coordinates. En face detection of a driver’s head.

tom) = (xeMax[0], yeMax[0]). Once initialized, algorithm starts iterating through the whole array searching for the smallest values of xeMin[i] and yeMin[i] and assigning them to xeLeft and yeUp respectfully. It searches for the biggest values of xeMax[i] and yeMax[i] and assigns them to xeRight and yeBottom respectfully. At the end of the iteration xeLeft holds the leftmost occurring coordinate, yeUp holds the topmost occurring coordinate while xeRight and yeBottom are holding the rightmost and bot- tommost coordinates respectfully. The acquired bottommost coordinate yeBottom is used as an upper threshold location. To determine the position of the lower threshold information about the upper threshold and the face region is needed. The face region calculation process is fairly similar to the eye region extrapolation process. Four variables that will hold the coordinates of the face region are created: xfLeft, xfRight, yfUp and yfDown. First

72 two coordinate pairs from the face array are used for initialization: (xfLeft, yfUp) = (xfMin[0], yfMin[0]) and (xfRight, yfBottom) = (xfMax[0], yfMax[0]). The algorithm then iterates through the whole remaining array in search for the smallest values of xfMin[i] and yfMin[i] and biggest values of xfMax[i] and yfMax[i]. Once the iteration is over coordinate pair (xfLeft,yfUp) holds the smallest found values and (xfRight, yfBottom) holds the biggest values. Distance between the two thresholds is one third of the height of the face region.

6.1.4 Skin color extraction and analysis

Skin color across the globe dramatically varies in both color and tone. It would be virtually impossible to implement the system that would be able to encompass all of the variations. More practical approach would be to adopt the system to the current user’s specifics. In order to achieve that the system has to incorporate the extrapolation process in the initialization stage. Two key parameters that are extrapolated from the current user are the custom red and blue chroma pixel ranges and the user-specific color distribution model. Generalized skin color chroma range values are: (133 - 173) for the red chroma component and (77 - 127) for the blue chroma component. The analysis of the driver’s skin color is expected to produce custom range within aforementioned margins, for example: red chroma range of (136 - 169) and blue chroma range of (81 - 123). This user specific range will allow us to eliminate a lot of artifacts that can skew the results of the system, as explained in more detail in 5.1.1. The user-specific skin color distribution model is the histogram of the red and the blue chroma values. This histogram varies from a user to a user and is sensitive to different lighting and other environmental conditions. That is why it is necessary to retrieve this information from the current user at the beginning of the monitoring session.

73 (a) Typical artifacts captured. (b) Minimizing the impact by

adding a frame.

Figure 6.3: Captured face regions contain artifacts that can skew the results which can be minimized by masking most commonly affected areas.

With all the best efforts, the detected face image will contain some artifacts in- cluding captured background, Figure 6.3a. These areas are located mostly along the edges of the captured image. To eliminate their impact on the extraction process the areas along the edges of the image will be masked out and the main focus will be in the center of the image, Figure 6.3b. Excluded areas are approximately 10 % in width starting from all four edges of the image. This leaves the extraction process area of (80 % of image width) x (80 % of image height) to work with. All chroma pixel pairs within the area are analyzed and used to create the histogram representing the user specific skin color distribution model as well as the custom chroma range.

6.1.5 Manual confirmation with eye region extrapolation

To ensure that all of the gathered information regarding captured faces and eyes of the driver are valid additional step of manual control is introduced. It is also possible that the current Android device does not provide to the system needed information regarding driver’s eyes, so that information needs to be obtained manually. Bellow is the description of the process that confirms, modifies or acquires following informa- tion with driver’s help: face detection, eye detection, the eye status(open or closed),

74 position and size of the face region, position and size of the eye region and face detection limitations. Process begins with the explanatory message explaining the purpose of the process and following steps. User will be shown series of detected face images with overlaid highlighted eye region is available. He is asked to confirm good cropping or dismiss an unusable image. The message describes the process of placing bounding box over the eye region, saving the changes and then proceeding to the next one. User begins by tapping the screen, message is erased and the first image is displayed. Process of displaying the images requires first the adjusting of given byte arrays representing captured faces to the appropriate image format supported by Android. Android’s class Canvas allows drawing of simple shapes and objects directly on the display. It is used to draw the rectangle representing bounding box of the eye region and it is overlaid over the displayed image. The initial position of the rectangle is based on the existing eye region information or on the system’s best estimation. Bottom of the display is the position where the button for confirming or changing stated state of the eyes is located. Also, at the bottom right corner of the display, index of the displayed image and total number of images to process is shown. User interacts with the screen and makes changes by: tapping the button to change the eye status; swiping over the screen from bottom to the top to discard current image; swiping over the screen from right to left to save the changes and proceed to the next image in line; swiping in a rectangular motion to create the bounding box; tapping and holding to drag and drop the existing bounding box; pinching the screen to resize the bounding box. User’s input is monitored constantly and commands are interpreted on the fly. User can try to draw a new rectangle over the eyes or drag and drop provided rectangle. Every finger movement over the screen is captured and new input data is rendered into the appropriate command and the changes are displayed on the Canvas.

75 For resizing the bounding box algorithm is: (i) if user wants to expand or shrink the bounding box in only one direction, he has to hold one finger stationary while moving the other finger in the desired direction of change; (ii) if the expansion or shrinking has to be done on both sides of the bounding box, both fingers should be moved away from one another if expanding or towards each other if contracting. If user is satisfied with the results, he slides his finger over the screen from right to left and the updated information is stored. Top-left and bottom-right coordinates from the eye region bounding box are extracted and saved. Provided coordinates are expressed in generic terms and are converted into the appropriate relative coordinate system based on the offsetting from the top-left coordinate of the face. Option is also given to the user to discard currently displayed image if it is deemed unusable. This is achieved by sliding a finger over the screen in the upward motion. If that happens, all of the information about that image is deleted and the appropriate arrays are updated. Total count of the available images is updated along with the indexing system. There is a high probability that two consecutively captured face regions are very similar and that the eye region location within them has to be very close. Because of that, to help the user to speed up the whole extrapolation and confirmation process, the bounding box from the previous image will be used and overlaid over the current image. This applies if the information about the new eye region is nonexistent and the top-left coordinate of the two face regions in two consecutive frames are very close. There is a high chance that user will have to make only small changes if any to the newly displayed image in order to proceed to the next one. This way overall user-friendliness of the application is increased. When all of the images are processed a message regarding successful completion and explanation of the following steps is shown. Following segments requests user’s help in order to determine and confirm the limitations of the face detection process. Three segments were recorded that need

76 to be analyzed by the user: (i) head rotation to the left, (ii) head rotation to the right and (iii) nod simulation. User helps the system in determining what is the head rotation and tilting angle up to which the system successfully performs face detection. The recorded sequences can be broken down into two parts of the recorded sequence in which face detection process is successful, separated by the recorded sequence in which it is unsuccessful. Successful detection is expected at the beginning of the recording, since it starts with driver facing the camera and beginning to move his head in a certain direction. So, the first x consecutive frames starting from the beginning of the recorded sequence should all be with a successful face detection. First frame in which the face is not detected is marked as the breaking point. Continuous segment of y frames starts here for which the face couldn’t be detected by the system. This is the segment where the driver moves his head further away from the breaking point into the designated direction until he reaches maximum point of movement and returns back towards the breaking point. First successful face detection should happen once the driver’s head crosses the breaking point moving towards the starting point which is facing the camera. So the last w frames in the recording sequence represent the segment where once again the system is able to successfully detect the driver’s face. The driver is shown three images that were marked as the breaking point images. He is asked to estimate the rotation and tilting angle of the head. Angle of the front facing image is considered 0 degrees while the maximum turning point of the head is considered to be 90 degrees (Figure 6.4). User is also asked to confirm the movement direction taken in the image. When satisfied with the results, the user continues to the next one by sliding over the display from right to left. The system saves following input information: (i) three frames marked as a breaking point frames with corresponding angles of breaking and (ii) three frames that were recorded right before the breaking point was detected. Latter frames also hold the information about their successful face detection.

77 (a) Driver facing the camera - (b) Head rotated to the left -

0 degrees. 90 degrees

Figure 6.4: Head rotation angle example.

When all of the images are processed, a message regarding successful completion is displayed and it proceeds toward building the user specific eye model as well as toward the extrapolation of the skin color information and calculation of the lower and the upper thresholds.

6.1.6 Building eye model using SVM

As previously mentioned in the 3.3, Support Vector Machines is a non-probabilistic binary linear classifier. This means that SVM takes a set of input data and assigns each individual input to one of two possible classes based on the used model. Good model is a prerequisite for the future successful classifications and its training with a solid sample size is a must. SVM related code is organized in the SVMManager class, which is based on the Java available SVM related code - libSVM. Code was extensively altered to be able to operate on the Android devices. The system provides the SVM manager two groups of data: (i) array of bytes representing the gray-scale images of the extracted eye regions; (ii) array of complementary informations regarding each eye region - width, 78 height and the eye state. Before training the SVM in order to produce the eye state model, the training data has to be organized and arranged in a specific way. Organized training data consists of arrays of bytes (gray-scale eye regions). To each array is assigned a single integer value that represents the eye state in that image. The eye state value can be represented with a single integer number by assigning value of 1 to the open eye state and -1 to the closed eye state. One additional constraint is that the training data arrays have to be equal in length. The system is going to determine what should be the length of the arrays and to adjust them accordingly. Each eye region can be described as a two-dimensional area(width w x height h). These two parameters will vary slightly between the captured eye regions due to the relatively constant distance between the Android device and the driver. Four variables are created and initialized to: hMin = 1000; hMax = 0; wMin = 1000; wMax = 0;. They will hold the minimum and maximum values of the width and the height. The algorithm iterates through the whole list of available eye regions and searches for the smallest and biggest occurrences of height and width and assigns them to the appropriate variables. Once the iteration is over the extreme values are used to find the average numbers. Newly determined averages are used to resize all of the eye regions. The system splits the input data into the training segment and the testing segment. Five sixths of the pairs will be used for the training while remaining one sixth is organized into the testing set. Creation of the sets is randomized. When ready, the system feeds training set into the SVM multiple times alternating each time available SVM kernels, thus producing multiple instances of the eye model. Testing set is then fed into the SVM multiple times along with newly created eye models in order to determine the best eye model to be used by the system - one with the highest hit ratio.

79 6.2 MONITORING STAGE

6.2.1 Algorithm outlook

Overall, the system can take either proactive or retroactive monitoring approach. This means that it can provide an estimate of the driver’s state either from the information received from the current frame and previous state or from the recorded sequence of frames. Both approaches are described next.

Proactive approach

The system continuously monitors the driver and provides an estimate of his state at every possible point in time. Estimation is calculated based on the information from the current preview image, from the pre-extrapolated user specific data and from the state from the previous frame. This approaches’ performance largely depends on

Figure 6.5: Top: four stages of our Driver Drowsiness Detection System. Bottom: outline of the proactive monitoring stage.

80 the device’s capabilities and system’s setup since it is a compute intensive approach. Analysis of the currently selected image is time consuming and causes the system to skip many frames in between ones that is capable to capture and analyze. Some of the steps in both approaches are the same, just the organization of them is different. Proactive approach algorithm outlook is provided on the Figure 6.5.

Retroactive approach

In this approach the system captures segment of continuous frames and then analyses the whole segment in order to provide an estimate of the driver’s state. So, this approach produces one estimate per recorded segment not per preview frame.

Figure 6.6: Top: four stages of our Driver Drowsiness Detection System. Bottom: outline of the retroactive monitoring stage.

This approach “lags behind” the current point in time and the provided estimate

81 for current time is based on the short-term history of the driver’s behaviors. In order to successfully capture continuous sequences of frames, built-in face detection process had to be sacrificed. Because of that each frame has to be additionally examined for the information regarding driver’s face, which is performed after the recoding of the sequence. For that reason retroactive approach has different algorithmic outline, Figure 6.6.

6.2.2 Direction and speed estimation

In order to build as fast and as robust, dynamic system as possible, it makes sense to use every possible parameter at our disposal. When the face detection is not performed by the Android, as it is the case in the retroactive monitoring mode, alternate detection method is used. It is described in detail in 6.2.4. This method performs the face detection on a given image and the speed performance of it largely depends on the size of the image. We will refer to the image as a search area for better explaining purposes. We have decided to limit the search area in which the face and the eyes are being tracked. For that we need to extrapolate the speed and the direction of movement for the driver’s face. If we observe the image as a two dimensional plane where (0,0) coordinates are at the top-left corner of the image, the head movement through this plane can roughly be assigned to one of eight directions: top, bottom, left, right, top-left, top-right, bottom-left, bottom-right, Figure 6.7. Head may be stationary as well. All the information that we need to calculate the speed and the direction are two top-left coordinates of the face region’s bounding boxes: one from the current frame and one from the previous frame. Assuming that the horizontal coordinate is labeled with x and vertical coordinate is labeled y, we will name two top-left coordinates as: (xPrev, yPrev) for the previous frame and (xCurr, yCurr)) for the current frame. Distance traveled horizontally and vertically is calculated and stored into

82 Figure 6.7: Eight potential directions of head movement.

xDiff and yDiff variables respectfully. Due to the constant changes in our adaptive system it is possible that two used frames are not the two consecutively captured frames, thus in order to calculate the speed it is necessary to know the actual count number of all the frames skipped between two captured frames and incorporate that parameter to accurately calculate the speed. Now that we know the speed we move on to determining the direction of movement. Precisely determining the angle of movement from the previous frame to the cur- rent one is not necessary. We just need the rough estimation of the most dominant direction of movement that falls in one of the nine possible categories previously men- tioned. In order to calculate the dominant direction we divide the absolute values of the horizontal and vertical distances (abs(xDiff ) / abs(yDiff )). Potential results are: 0 - vertical movement; 1 - diagonal movement; 2 - horizontal movement. If 0 is received that means that predominant direction is vertical or potentially station- ary. Actual vertical direction depends on the polarity of the yDiff variable. If it is negative the dominant movement is upwards. If it is positive the movement is down- wards. In case that the yDiff is equal to zero the movement is stationary. If value of 1 is received there has been significant movement in both horizontal and vertical 83 direction. Four potential directions are: (i) top-left, (ii) top-right, (iii) bottom-left and (iv) bottom-right. Actually-chosen direction depends on the polarity of both xDiff and yDiff. Top-left means that both variables are negative, while both positive variables means that the direction is bottom-right. Positive horizontal difference in combination with the negative vertical difference will produce the top-right direc- tion. The only remaining combination of negative horizontal difference with positive vertical difference produces the bottom-left direction. If the calculated number is 2, horizontal movement is the predominant movement. The actual direction of left or right is dependent upon the polarity of the xDiff. If the xDiff is negative the direction is left. If it is a positive number the direction is to the right.

6.2.3 Tracking area extrapolation

Face and eye detection process is the most compute intensive one in the retroactive monitoring approach. Speed of the detection process depends vastly on the size of the search area. If the search area can be minimized, speed increase can be significant. Keeping history of the locations where the face and the eyes were detected throughout the history of preview frames can provide the pattern that can be used to predict with a good degree of probability where the next occurrence might happen. Size of the search area depends upon how fast the head is moving as well as its direction of movement. As described before in 6.2.2 there are eight possible directions plus a stationary state. Provided direction will determine the way in which the search area expands more than in other three directions. And the speed decides how big the expansion of the search area will be, as shown on the Figure 6.8b. If the provided direction is marked as being stationary, the direction cannot be predicted so the search area is expanded equally in all directions around the starting point, which is the top-left coordinate of the face’s bounding box(Figure 6.8a).

84 (a) Head is stationary with little move- (b) Head rapidly moves to the left

ment

Figure 6.8: Extrapolated search area depends on the speed and direction of the driver’s head: area within the light blue rectangle is extrapolated.

The directional expansion of the search area is done in steps. The length of the step is calculated as one fourth of the previously detected face’s width or height. If the speed is deemed low, expansion of the search area is one step long, meaning one fourth of the width in the left and right directions and one fourth of the height in the up and down directions. The number of expansion steps taken increase with the increase of the speed. If the direction is other than stationary, expansion in that specific direction happens in steps that are of a different length. The length of the step is doubled. In the other three directions length of the step is halved. Once determined, the search area is extracted from the captured preview image and provided to the face/eye detection algorithm.

6.2.4 Eye tracking

Tracking of just the eyes is enough for our system to achieve its goals. Unfortunately we have to encompass for various capabilities of different Android devices that will be using our program. Depending on a specific device, built-in eye detection feature might be supported. But this feature is typically not supported on many devices,

85 especially older ones. In order for our system to be operational in many different situations three different eye tracking approaches are developed. Eye information is essential. Tracking process differs in the way how that information is obtained. That information can be acquired in one of following three ways: (i) automatically provided by the built-in Camera class; (ii) extrapolated from the face region when the eye detection feature is supported by the Camera class; (iii) detected by the Android’s FaceDetector class that supports eye detection feature. Our system chooses which approach to take based on the system’s setup and the chosen monitoring mode. Three different ways are described next.

Eye detection feature supported

If used Android device has the Camera class that supports the eye detection feature it provides our system with everything that we need in a very fast fashion. This is true if the system is in a proactive mode. Both coordinates defining bounding boxes for the face and the eye region are available. Advantages of having this information provided to us is that tracking process gets simplified and streamlined, as shown on the Figure 6.5. Many steps are skipped and the system proceeds directly onto checking the eye state and monitoring position of the eyes relatively to the two thresholds. This approach yields the biggest speed gains because it utilizes Android’s capabilities to the maximum and the overhead is lowered to the minimum.

Eye region extrapolation from the face region

The Camera class is guaranteed to support the face detection feature on every Android device with camera capabilities. The eye detection feature is an optional one. If eye detection is not supported, the eye region has to be extrapolated from the face region. Relative position and size of the eye region within the face can be roughly estimated. But rough estimation is not good enough. Exact location and size have to

86 be determined or closely approximated. Example of a good estimation is shown on the Figure 6.9a. Good approximation is necessary in order for the tracking processes to be successful. For example, eye region cannot contain foreign artifacts because that would potentially skew the results from the eye state analysis, Figures 6.9b and 6.9c. If the eye region suddenly includes eyebrows, or tip of the nose instead of solely the eyes the system will not be able to determine with a high degree of certainty if the eyes are open or closed. Also, variable size of the eye region will prevent the system from precisely determining the relation between position of the eyes and the two thresholds.

(a) Good eye region estima- (b) Eye region containing (c) Eye region containing

tion eyebrows. portion of the background

Figure 6.9: Eye region extrapolation examples.

Initialization stage collected 600 images of a driver’s face along with additional information about each detected face: width, height, bounding box coordinates, eye region width and height, eye region bounding box coordinates in relation to the face ( Figure 6.10). Detected face region does not change significantly over time. Neither does the extrapolated eye region. Expecting this we can make a few assumptions:

• Eye region size scales proportionally in relation to the size of the face region.

• Distance between top-left corner of the face and top-left corner of the eye region scales proportionally with resizing of the face region, as well as depends on the actual face pose.

• Driver’s behavior falls into predictable pattern that can be extrapolated and 87 used from the provided history.

Figure 6.10: Extracted information from an image: face and eye region bounding boxes and their relation.

Provided with a currently detected location and size of the face, algorithm will iterate through the list of previously detected faces in search for the one whose coordi- nates and size closely match the current data. Information from that closest instance is used to best estimate the size and the location of the current eye region. If sizes of the current face area and the previous face area are very similar, it is to be expected that the size of the current eye region will be close in size to the previous eye region. Driver’s behavioral pattern is mostly repetitive and predictable. Based on that we can assume that if at two different moments in time the camera has detected driver’s face at the same location, those two faces will share many similarities. Probably the eye gaze will be the same making the relative position of the eye regions as pretty much the same. That is why finding matching or similar coordinates of a face regions is important. This approach cannot provide the exact position and size of the eye region, but can offer pretty close estimation.

88 Eye detection with FaceDetector class

Since the Android API level 1, FaceDetector class is available. Its purpose is to detect a face and eyes within a given image. Even though it provides us with the needed information about the eye region, tracking based on this class while in the proactive mode introduces big overhead because of the necessary conversion of provided byte array representing captured current preview frame into the appropriate bitmap image format supported by this class. Since the conversion is necessary, system tries to speed up the detection time by minimizing the search area for the detection process, Figure 6.2.3. This detection process provides the system with useful information about the detected face and eyes in a form of:(i) top-left coordinates of the face region,(ii) distance between the eyes and (iii) coordinates of the mid-point between the eyes( shown with dark orange lines on the Figure 6.11). Our system then calculates the eye region coordinates in accordance with the rest of the processes of the system( light blue lines on the Figure 6.11). Needed coordinates are top-left and bottom-right for the detected eye region. Once obtained, monitoring stage can proceed with the next steps of eye state analysis.

Figure 6.11: Extracted eye region from the face detection done by FaceDetector class: dark orange is provided information, light blue is extrapolated information.

89 6.2.5 Skin color confirmation

In situations where face detection performance falls under certain confidence levels it is prudent to run the confirmation process based on the custom skin color information. This process analyzes the image using the custom skin color model and chroma range and either confirms or dismisses the existence within it. Three confirmation methods that can be used are explained in the next sections, Figure 6.12.

(a) Full skin color con- (b) Quick skin color (c) Very quick skin

firmation - complete confirmation - pixels color confirmation -

area will be checked on yellow lines will be pixels within yellow

checked rectangles will be

checked

Figure 6.12: Skin confirmation areas of interest.

Full skin color confirmation

This method performs most exhaustive check on a given image. Due to the dynam- ics of a driving experience, driver is not expected to keep his head still constantly. Because of this, the detected area that contains the face might also contain portions of the background captured as well, which is typically located at the edges of the area, Figure 6.3a. To minimize the impact of this unwanted part in the confirmation process, the area of interest will be padded, Figure 6.3b. Padding process is explained in 6.1.4. 90 The remaining area is the area of our focus, Figure 6.12a. This area has a width of X chroma pixels and a height of Y chroma pixels. Each row of the area contains X / 2 pairs of the red and blue chroma components. The algorithm analyses one row of chroma pixels at a time. Every red and blue chroma pixel in a current row will be tested to see if they fall within the custom chroma skin range obtained in the initialization stage of the process. If both red and blue chroma pixel do fall within the custom range hit counter for the current row will be incremented by one. Once the algorithm has checked all of the chroma pixels in a current row, hit counter will reflect how many of them satisfy custom range conditions necessary to be counted as skin pixels. The algorithm will repeat itself Y amount of times until it checks all of the rows in the area of interest. Once it has completed, algorithm produced Y numbers with values ranging from 0 to 100, representing the skin pixel count per row expressed as a percentage value. If average percentage per row is above 50 % we can conclude with some certainty that the given image indeed does contain the face.

Quick skin color confirmation

As it’s name in the title suggests, this algorithm is designed to perform faster than the full skin color confirmation method. It is meant to be used in the tracking and warning stages to provide quick response with satisfactory results in order for the system to make good decisions in as close to real time as possible. Due to the reasons mentioned in the full skin color confirmation methodology, given image will have padding as well. The difference is that in the full skin color confirmation method the whole area of interest is being checked against the custom skin range, while in the quick method check will be performed on the selected rows and columns within the area of interest. Depending on the size of the area, algorithm will choose to test R number of columns and rows where R can be either 4, 5 or 6.

91 So, the number of columns and rows tested per image are: (i) 4x4 = 16; (ii) 5x5 = 25 or (iii) 6x6 = 36; The area of interest is minimized to a grid of R x R columns and rows, Figure 6.12b. Chosen rows and columns are equidistant to each other and are evenly spaced out across the area. The algorithm will iterate R x R times and in each iteration it will check all of the red and blue chroma pairs against the custom skin color range. Every successful detection of the custom skin pixel will increment the hit counter set up in a similar matter as in the full skin color confirmation method. At the end of each iteration, accumulated number of detected skin pixels will be converted to a rough percentage. If a given image contains human face, appearance of skin color pixels should be very frequent, and consistent in both vertical and horizontal direction.

Very quick skin color confirmation

This method is designed with maximum speed in mind. There are some situations when the speed is of utmost importance and the trade off between quality and speed has to be made. Sometimes hardware capabilities cannot support properly the per- formance of the previously described methods. Typically, the way to increase the speed is to shrink the area of interest even further. The given image is padded the same way as in the previously described methods. Over remaining area of interest the grid of equidistant, evenly spread horizontal and vertical lines is applied, as in the quick skin color confirmation method. The areas of interest are minimized to the intersecting points of those horizontal and vertical lines. The areas of interest are small 3 x 3 areas with a center at each intersecting point, Figure 6.12c. Same as in the quick skin color confirmation method, we can have 4, 5 or 6 vertical and horizontal lines. This gives us the number of 16, 25 or 36 3 x 3 areas of interest respectively. All 9 red and blue chroma pixel pairs per area of interest are being checked if

92 they are actually skin color pixels. They are tested to see if they are falling withing provided custom skin chroma ranges. Each area then provides a number of successful hits expressed in terms of percentages. If all areas of interest turn to have high percentage values, we can claim with a high degree of certainty that the provided image does contain a face.

6.2.6 Eye state analysis

Regardless of which tracking method is used, the eye state analysis process has to be provided with the coordinates of the current eye region and raw pixel data of that eye region. There are two points of interest in this process: (i) status of the eyes - are they open or closed; (ii) and their relative position in relation to the upper threshold. This is the step in the monitoring stage where it is determined if the driver’s state is considered normal or potentially sleepy and distracted. Most time-consuming part of this process is the conversion and adaptation of the eye region for the SVM classification process. As previously described in 3.3,the SVM accepts input data organized only as a list of 1-D byte arrays. To each array is assigned a single value that can be either 1, -1 or 0. Value zero represents that state of the eyes provided in this region is unknown and needs to be determined by the SVM classification process based on the provided eye model. All of the training data has to be adjusted to be of a same size. So before the system feeds the current region into the SVM for testing, it has to be converted to a gray-scale and resized to fit the predetermined size that was calculated in the initialization stage 6.1.6. The SVM classification will produce an eye state outcome for the current eye region. If this state is eyes are closed and if this is the first occurrence of this spe- cific state, system will start the timer in order to monitor the duration of this state throughout the future frames. If eyes are closed state persists for longer than 2 seconds alertness of the system is raised and the system enters the warning stage.

93 Alertness is raised because the driver is not monitoring the road for a prolonged pe- riod of time. If the eyes get classified as open before timer reaches 2 seconds, timer is reset and stopped and it is assumed everything is normal. Not only the time but position of the eyes relative to the upper threshold play a role in determining the state of the driver. If eyes are classified closed and the eye position falls beneath the upper threshold the system treats this situation as potentially dangerous and proceeds to the warning stage regardless of a timer status. This particular scenario is deemed po- tential beginning of the nodding process. One potential situation can as well happen due to the dynamic nature of the system and that is: eyes are continuously being classified as open but the position of them are off in regard to the position of the upper threshold. If this is the case system will adopt this new position of the eyes as a new reference point and will place the upper threshold beneath that new reference point and will recalculate the position of the lower threshold from the current face height information.

6.3 WARNING STAGE

6.3.1 Algorithm outlook

If the system enters the warning stage that means that driver’s behavior is considered suspicious and needs to be monitored closer. In order to reach this stage system needs to detect: (i) prolonged period of eyes being closed, (ii) that driver is not monitoring the road or (iii) a beginning of a nodding process. In this stage many of the steps from the monitoring stage are reused with slight alterations and added time constraining conditions along with introduction of the nod detection process, Figure 6.13. Most time consuming process here is still the eye tracking with an eye region extrapolation and adaptation for the eye state analysis. If the face detection process performs with certainty that is less than 50 %, skin color based confirmation process is activated to perform the additional check if the face is actually detected. Only 94 Figure 6.13: Top: four stages of our Driver Drowsiness Detection System. Bottom: outline of the warning stage.

the quick confirmation method or the very quick confirmation method can be used because of their speed advantages. System selects automatically between the two based on the following condition: if the face detection certainty is bellow 50 % but above 25 % the very quick method is used; if the certainty falls bellow 25 % the quick method is used. Once the system obtains the current eye region, it is ready to proceed with the analysis of its state as well as the nod analysis or the distraction analysis.

6.3.2 Closed eyes monitoring

Eye state is still the crucial component in determining driver’s level of alertness. If the warning stage was reached with information that the eyes have been closed for the period of 2 seconds, it is necessary to see if they will stay in that state for longer. If they keep continuously getting classified as being closed for another 2 seconds, it is

95 concluded that the driver needs to be warned about the potential dangerous situation and the system moves forward to the alerting stage where driver receives audio/visual alert as described in alert. If at any time the eyes are classified open before 2 seconds have passed, the system demotes the level of alertness to normal and returns to the monitoring stage. Timer is reset back to zero and it will be started again at the first following occurrence of closed eyes.

6.3.3 Nod detection

Holding the eyes above the upper threshold assures that the driver is monitoring the road, unless his eyes are closed while being above it. If that happens for a prolonged period of time the system will respond accordingly, as described in 6.2.6. But if it happens that driver closed the eyes and his head is starting to drop vertically down, potential beginning of the nodding is discovered. Nodding process can be broken into 5 stages in relation to the two thresholds being used. Completion of all of the five stages in predetermined order is counted as a nod. Any other combination is discarded. This process monitors the relative position of the upper horizontal edge of the bounding box surrounding the eye region and its distance and position relative to the thresholds. Only when this line crosses one of the thresholds the nodding stage is changed. Horizontal movement is ignored as long as the predominant movement is in the vertical direction. State of the eyes are still very important factor in the warning stage. When the eye region is in the second nodding stage, eyes have to be classified as closed for the system to count current behavior as the potential drowsy nod sequence. If they get opened while eyes are moving down but are still in between the upper and the lower threshold the system aborts the nodding process and dismisses the behavior as not dangerous. The area in between the two thresholds is considered by the system as the area from which the driver can still observe the road, thus being classified as the

96 average danger zone. But the region bellow the lower threshold is considered as the most dangerous area. It is classified as dangerous because it is very likely that the driver cannot monitor the road ahead from here. Because of that it is also important to know for how long the driver’s head will stay in this position. For the purpose of monitoring that, the timer tStage3 is initialized and started. If the eyes are located here, that means they are in the stage 3 of the nodding process. While in this stage the eye state is not important since the driver is certainly not observing the road which is equally dangerous as feeling drowsy. If the driver stays in this position for longer than 2 seconds the system immediately goes to the alert stage to notify the driver. If the true drowsy nod is being observed, driver will stay under the lower threshold very briefly and will recover back up very rapidly. Any prolonged staying under the lower threshold is considered abnormal. It is also expected to start detecting the open eyes from this stage and onwards successfully. Stages 4 and 5 are recovering stages where the eye region recovers its position above the upper threshold(stage 5) by rapidly going through the area between two thresholds(stage 4). Completion of a nod leads toward the alert stage, since a nod is a clear sign that the driver is experiencing sleepiness. Once the nodding process starts, the state of the eyes is only really important in the first two stages of it. In all other stages eye state is not as relevant and can be skipped. Skipping of the eye state analysis process yields speed benefits.

6.3.4 Distraction analysis

Situations can occur when driver’s behavior deviates from the expected due to the unpredictability of the driving experience. Driver might turn around or turn to look through the windows for example. Due to the face detection algorithm’s limitations, there are situations in which the exact location of the face and state of the eyes are unknown. Face detection algorithm performs well within a certain range. That range

97 conveniently encompasses driver’s viewing range of a road ahead. Assumption here is that if the driver’s face cannot be detected it is outside of this range, thus it is very likely that driver is not focused on the road and driving. This situation is deemed potentially dangerous and if it lasts for a prolonged period of time system has to take action. If this situation is detected in the tracking stage and it lasts for 2 seconds the system goes into the warning stage. Same timer, tDistr, used in the previous stage continues to measure the duration of the unsuccessful face detection over time. If it lasts for an additional 2 second period the system considers situation dangerous enough to raise the level of alertness, switching to the alert stage and “sounding the alarm”.

6.4 ALERT STAGE

This stage is the simplest one of all four but serves the important purpose of notifying the driver of a potentially dangerous state he might be in. This state can be entered from the warning stage when the system is in the proactive mode or directly from the monitoring stage if the system is in the retroactive mode. Our system entered this stage if it detected: (i) the prolonged period time where the driver kept his eyes closed or (ii) the driver started nodding or (iii) the driver not monitoring the road ahead. Regardless of the reason for entering this stage, the behavior of the system while in this stage is the same. As soon as the system enters this stage the alert message is shown on the display with a prolonged, repetitive beep sound accompanying it. The message warns the driver of the potential danger and suggests taking a break from driving for at least 20 minutes. The audio/visual cues are used to attract the driver’s attention: (i) bright colors are chosen for the background of the alert message, (ii) the chosen sound is of a medium loudness not to scare the driver and cause more harm than good. To make sure that the system achieves the goal of attracting the

98 attention, both message and the sound will be continuously played until the driver erases the warning by swiping over the display of the phone in any chosen direction. Also, prolonged ignoring by the driver will cause the sound to become louder and screen brighter as the time goes by. Once the driver erases the alert message the system resets the alertness level to normal and returns to the tracking stage where continues the monitoring process.

6.5 EXPERIMENTS AND RESULTS

6.5.1 Synchronization Test

Android OS provides means of obtaining information about current preview frame and information about captured face withing that frame. The issue is that they can be acquired only individually, in two separate functions: onPrevieFrame and OnFaceDetection. When the system is in a proactive mode it obtains the information “on the fly” which can cause potential synchronization problem. The system is set up in following way: Every time the OnFaceDetection function is triggered we save the currently available frame provided in the onPrevieFrame function. We capture 100 preview frames in this fashion per each run. Android device is mounted in front of the subject and it records him leaning to the left and right in different speeds that can be classified as:(i) fast, (ii) slow and (iii) minimal. To test how the resolution of the camera recording influences the performance, test is repeated for the following 4 resolutions: (i) 1080p, (ii) 720p, (iii) 480p, (iv) 320p. The face region is extrapolated from each corresponding frame and is manually evaluated. If the cropped area contained the face or most of the face it is classified as successful detection, otherwise it is classified as unsuccessful. Typical outcomes of cropping the supposed face region are shown in Figure 6.14. It can be concluded that the system can experience significant synchronization issues if the movement of the driver’s head is considered to be fast. The test run on 99 (a) Fast (b) Fast (c) Slow (d) Minimal

Figure 6.14: Typical outlook of a cropped face region for various movement speeds. the various resolutions showed that there is practically no difference in behavior if the resolutions change, Figure 6.15.

Figure 6.15: Success rate for different resolutions.

100 6.5.2 Face detection speed comparison of original and custom detection area

In the retroactive mode, the system leans on FaceDetector Android class to perform face and eye detection on the preview frames. This class analyzes given image in search for the potential candidates representing a face. Minimizing the search area can yield in speed gains. This test is devised to show what are the potential gains that can be achieved here. Hundred consecutive images of a driver are recorded and the custom detection areas are cropped from them. Typical size comparison example between the original image and the cropped area is given in Figure 6.16.

Figure 6.16: Size comparison example: Original captured image with the cropped detection area.

Custom area size depends on the speed and direction of the head in the previous frame. If the movement was slow the area is going to be smaller compared to the area if the movement was rapid. Also, the resolution of the image plays very significant role in the speed performance of the face detection process. The average times for face detection are shown in the Figure 6.17. As the resolution drops the face detection time drastically decreases. It is also clear that introduction of the custom detection area provides significant speed increase 101 Figure 6.17: Speed comparison: Average times for face detection depending on the resolution and detection area size. of 10 times approximately.

6.5.3 Eye region extrapolation limitations

Due to the nature of the face detection process provided by the Android OS, the size of the detected face region varies in size depending on the speed of head movement as well as the detection confidence level. Example on the Figure 6.18 shows three consecutive frames captured by the system with face region extrapolated from the preview frames. Since the eye region is not detected by the Android itself, it has to be estimated by the system. Estimation results for corresponding face regions are shown on the Figure 6.18 as well. Estimated eye regions vary significantly in size and their location which intro- duces significant amount of artifacts into the data that is fed into the SVM eye state classifier, thus rendering the results unusable.

102 (a) Big (b) Medium (c) Small

(d) Big (e) Medium (f) Small

Figure 6.18: Detected face regions vary drastically in size causing the estimated eye regions to vary drastically in size and location.

6.5.4 Tracking stage in proactive mode - speed test

When the system is in the proactive stage it provides the estimation of the driver’s state on the fly. Obviously the speed is very important factor for the good performance of the system. The Figure 6.19a shows the average times for the all of the major steps of the tracking stage for different resolutions. The Figure 6.19b on the other hand shows the average times for all of the supporting processes needed in the tracking stage. The total average time per frame is calculated by adding up the values from those two figures, Figure 6.19c. From the figures can be concluded that for the 320p resolutions the system is capable of monitoring the driver in a “real-time” since it is capable of processing approximately 30 frames per second. With the increase of the resolution the speed of the system decreases significantly, especially for the 1080p resolutions.

103 (a) Major tracking steps (b) Supporting processes

(c) Total time

Figure 6.19: Average time per frame(c) is calculated by adding up the time from all of the major steps(a) with the time to complete the supporting processes(b).

6.5.5 Tracking stage in retroactive mode - speed test

The goal of the system running in the retroactive mode is not as focused on the speed as in the proactive mode. But the system needs to be capable of providing the estimate of the driver’s state in the reasonable amount of time and in the reasonable time periods. The breakdown of the average times of the steps of the tracking stage in this mode is given in the Figure 6.20. On average the supporting processes and the overhead introduced in this stage takes two times longer than the core algorithms of this stage. The average time to process one frame is significantly higher than in the proactive mode mostly due to the face detection process used and necessary overhead introduced for the proper image formatting to be used by the face detection process. Even the measurement done for the smallest resolution of 320p shows that the performance is three times slower than the real time performance of 30 frames per 104 (a) Major tracking steps (b) Supporting processes

(c) Total time

Figure 6.20: Average time per frame(c) is calculated by adding up the time from all of the major steps(a) with the time to complete the supporting processes(b). second. For the highest resolution of 1080p, the system is capable of roughly process- ing one frame per second.

105 CHAPTER 7 CONCLUDING REMARKS

This dissertation described two prototypes of Driver Drowsiness Detection(DDD) Systems: one prototype is MATLAB-based and the other one is Android-based. Im- portant objectives that were introduced at the beginning of our designing process included: reliability, accuracy and speed of the system. In other words, the overall goal of the system is to try and meet those constraints as best as possible. Our preliminary MATLAB-based prototype followed a proof-of-concept approach that focused on setting up the working framework for the Android-based implementa- tion. It is used to analyze the weaknesses and strengths of the individual components used in the system and to detect associated bottlenecks and overheads. Our first prototype showed us that individual components perform well within certain range of conditions and gave us a quantitative measure of such ranges. We have experi- mentally determined that a Haar-like-based face detection approach performs well in a common driving setup where a camera is positioned on a dashboard of a car and the driver monitors the road ahead. For those situations when the system cannot detect the driver’s face, due to the known limitations of this method, we introduce a novel user-specific skin confirmation method. This approach of complementing the face detection process with our skin confirmation process provides our system with additional safety step. Our experiments on the first prototype indicated good but un- reliable performance of the SVM classifier used to distinguish between two potential eye states: open and closed. This classification method is used as our main process in determining the overall state of the driver. In order to improve the reliability of this stage of the system we added a novel two-threshold nodding detection process. While 106 building our first prototype we have shown that a detection system can be constructed from scratch by combining available off-the-shelf software tools and novel algorithms. The Android-based prototype was built based on the framework from the MAT- LAB version. The main change was the replacement of the Haar-like face detection module with the Android’s built-in face detection capabilities, which turned out to be more reliable and fast. There is only one problem, which is the optional support of an eye detection process, a feature that depends on the Android device being used. Since this is an essential piece of data for our system, we provided a novel mechanism of eye region extrapolation based on a behavioral history of a driver along with provided information regarding the currently detected face. An additional significant improve- ment compared to the first prototype was the introduction of a complex interactive tool which allows the system to extrapolate user-specific information that is used to boost its overall performance. Our decision at the beginning of our work to make the system user-centered proves to be a prudent one, since this approach allows for the more robust and accurate predictions. Our first prototype didn’t focus on a “real world” applicability while one of the aims of the second prototype was precisely that. The Android-based system can run in one of two modes: proactive and retroactive mode. In proactive mode, the state of the driver is determined based on the information extracted from the current captured image and determined state of the driver in a previous image. The prediction is made on the fly. The retroactive mode provides a prediction by analyzing recorded past sequences. The retroactive mode more resembles the approach in the first prototype. The only addition would be the implementation of a novel tracking area determination method. This method analyzes the driver’s movement, more specifically direction and speed of the head moving, to extrapolate the search area in which to perform the face detection. Introducing this method we have significantly reduced the bottleneck that is created by the face detection process. Learning from the experience of creating

107 the first prototype, we decided to provide an additional method of determining the driver’s state besides the eye state analysis and the nod detection. This method focuses on determining how distracted the driver is and how closely he pays attention to the road ahead, thus introducing another complementing determination method. Though the setup framework is stable, there is room for additional improvements and more exhaustive testing of the system and its components. For example, some of the system’s components are suitable for parallelization and porting onto CUDA- capable Android devices, which are increasingly more available. Suitable algorithms are skin color extrapolation and confirmation and search area extrapolation. If care- fully implemented, speed benefits can be significant. Some of the suggestions for the future work are related to increasing the overall user-friendliness of the system. One of the features that can be added is face recognition in the initialization stage instead of manual selection by a user. Also, adding capability of determining if the initialization stage is necessary for the current session by recognizing the user and analyzing current driving conditions. A very useful feature that can be added is providing the playback capability in the recording sessions and allowing the user to restart the recording process if unsatisfied. In order to improve the overall reliability of the system, long term monitoring should be implemented. The system should keep track of nod counts, prolonged period of eyes kept closed, etc. The strongest and most urgent suggestion for future work has to be experimentation and implementation of a different method of eye state analysis other than using SVM classification method, because it showed to be the weakest link in our system.

108 BIBLIOGRAPHY

[1] World Health Organization, Global status report on road safety: summary. World Health Organization, 2009. [Online]. Available: http://books.google. com/books?id=FPIlAQAAMAAJ

[2] ——, Global Status Report on Road Safety 2013: Supporting a Decade of Action : Summary. World Health Organization, 2013. [Online]. Available: http://books.google.com/books?id=qzK2nQEACAAJ

[3] R. R. Knipling and J.-S. Wang, “Crashes and fatalities related to driver drowsi- ness/fatigue,” National Highway Traffic Safety Administration, Tech. Rep., November 1994.

[4] Powell N.B. and Schechtman K.B. and Riley R.W, “The road to danger: the comparative risks of driving while sleepy,” The Laryngoscope, vol. 111, no. 5, pp. 887–893, May 2001.

[5] B. C. Tefft, “Asleep at the wheel: the prevalence and impact of drowsy driving,” American Automobile Association Foundation for Traffic Safety, Tech. Rep., November 2010.

[6] A. Colic, O. Marques, and B. Furht, Driver Drowsiness Detection - Systems and Solutions, ser. Springer Briefs in Computer Science. Springer, 2014. [Online]. Available: http://dx.doi.org/10.1007/978-3-319-11535-1

[7] ——, “Design and implementation of a driver drowsiness detection system - A practical approach,” in SIGMAP 2014 - Proceedings of the 11th International Conference on Signal Processing and Multimedia Applications, Vienna, Austria, 28-30 August, 2014, 2014, pp. 241–247. [Online]. Available: http://dx.doi.org/10.5220/0005012302410247

[8] T. kertedt, P. Fredlung, M. Gillberg, and B. Jansson, “A prospective study of fatal occupational accidents relationship to sleeping difficulties and occupational factors,” Journal of Sleep Research, vol. 11, no. 1, pp. 69–71, 2002. [Online]. Available: http://dx.doi.org/10.1046/j.1365-2869.2002.00287.x

[9] G. S. Richardson, M. A. Carskadon, E. J. Orav, and W. C. Dement, “Circadian variation of sleep tendency in elderly and young adult subjects.” Sleep, vol. 5 Suppl 2, pp. S82–94, 1982. [Online]. Available: http://www.biomedsearch. com/nih/Circadian-variation-sleep-tendency-in/7156658.html

109 [10] T. kertedt and Torbjorn, “Work hours, sleepiness and accidents introduction and summary,” Journal of Sleep Research, vol. 4, pp. 1–3, 1995. [Online]. Available: http://dx.doi.org/10.1111/j.1365-2869.1995.tb00219.x

[11] A. Samel, H. M. Wegmann, and M. Vejvoda, “Jet lag and sleepiness In aircrew,” Journal of Sleep Research, vol. 4, no. s2, pp. 30–36, 1995. [Online]. Available: http://dx.doi.org/10.1111/j.1365-2869.1995.tb00223.x

[12] V. Brodbeck, A. Kuhn, F. von Wegner, A. Morzelewski, E. Tagliazucchi, S. Borisov, C. M. Michel, and H. Laufs, “Eeg microstates of wakefulness and nrem sleep.” NeuroImage, vol. 62, no. 3, pp. 2129–2139, 2012. [Online]. Avail- able: http://dblp.uni-trier.de/db/journals/neuroimage/neuroimage62.html# BrodbeckKWMTBML12

[13] L. Rosenthal, T. A. Roehrs, A. Rosen, and T. Roth, “Level of sleepiness and total sleep time following various time in bed conditions.” Sleep, vol. 16, no. 3, pp. 226–32, 1993. [Online]. Available: http: //www.biomedsearch.com/nih/Level-sleepiness-total-sleep-time/8506455.html

[14] P. Naitoh, N. H. R. C. (U.S.), U. S. N. M. Research, and D. Command, Minimal Sleep to Maintain Performance: Search for Sleep Quantum in Sustained Operations, ser. Report (Naval Health Research Center). Naval Health Research Center, 1989. [Online]. Available: http://books.google.com/ books?id= WV7PwAACAAJ

[15] M. A. Carskadon and W. C. Dement, “Cumulative effects of sleep restriction on daytime sleepiness.” Psychophysiology, vol. 18, no. 2, pp. 107–13, 1981. [Online]. Available: http://www.biomedsearch.com/nih/ Cumulative-effects-sleep-restriction-daytime/6111825.html

[16] D. Dinges, “An overview of sleepiness and accidents.” J Sleep Res, vol. 4, no. S2, pp. 4–14, 1995. [Online]. Available: http://www.biomedsearch.com/ nih/overview-sleepiness-accidents/10607205.html

[17] R. A. McMurray, “Safety recommendation,” National Transportation Safety Board, Tech. Rep., February 2009.

[18] M. M. Mitler, K. S. Gujavarty, and C. P. Browman, “Maintenance of wake- fulness test: a polysomnographic technique for evaluation treatment efficacy in patients with excessive somnolence.” Electroencephalogr Clin Neurophysiol, vol. 53, no. 6, pp. 658–61, 1982.

[19] Carskadon MA and Dement WC and Mitler MM and Roth T and Westbrook PR and Keenan S, “Guidelines for the multiple sleep latency test (MSLT): a standard measure of sleepiness,” Sleep, vol. 9, pp. 519–524, 1989.

[20] M. Johns, “A new method for measuring daytime sleepiness: the Epworth sleepiness scale.” Sleep, vol. 14, no. 6, pp. 540–5, 1991. 110 [21] E. Hoddes, V. Zarcone, H. Smythe, R. Phillips, and W. C. Dement, “Quan- tification of sleepiness: a new approach.” Psychophysiology, vol. 10, no. 4, pp. 431–6, 1973. [22] T. Akerstedt, K. Hume, D. Minors, and J. Waterhouse, “The subjective mean- ing of good sleep, an intraindividual approach using the Karolinska sleep diary.” Percept Mot Skills, vol. 79, no. 1 Pt 1, pp. 287–96, 1994. [23] M. E. Wewers and N. K. Lowe, “A critical review of visual analogue scales in the measurement of clinical phenomena.” Res Nurs Health, vol. 13, no. 4, pp. 227–36, 1990. [24] P. Philip, I. Ghorayeb, D. Leger, J. Menny, B. Bioulac, P. Dabadie, and C. Guilleminault, “Objective measurement of sleepiness in summer vacation long-distance drivers.” Electroencephalogr Clin Neurophysiol, vol. 102, no. 5, pp. 383–9, 1997. [25] L. Rosenthal, T. A. Roehrs, and T. Roth, “The sleep-wake activity inventory: A self-report measure of daytime sleepiness,” Biological Psychiatry, vol. 34, no. 11, pp. 810 – 820, 1993. [Online]. Available: http://www.sciencedirect.com/science/article/pii/000632239390070T [26] D. J. Buysse, C. F. Reynolds III, T. H. Monk, S. R. Berman, and D. J. Kupfer, “The Pittsburgh sleep quality index: A new instrument for psychiatric practice and research,” Psychiatry Research, vol. 28, pp. 193–213, 1989. [27] A. B. Douglass, R. Bornstein, G. Nino-Murcia, S. Keenan, L. Miles, V. P. Zarcone, C. Guilleminault, and W. C. Dement, “The sleep disorders question- naire. i: Creation and multivariate structure of SDQ.” Sleep, vol. 17, no. 2, pp. 160–7, 1994. [28] M. Akin, M. B. Kurt, N. Sezgin, and M. Bayram, “Estimating vigilance level by using EEG and EMG signals,” Neural Comput. Appl., vol. 17, no. 3, pp. 227–236, Apr. 2008. [Online]. Available: http://dx.doi.org/10.1007/s00521-007-0117-7 [29] A. Kokonozi, E. Michail, I. C. Chouvarda, and N. Maglaveras, “A study of heart rate and brain system complexity and their interaction in sleep-deprived subjects,” in Computers in Cardiology, 2008, 2008, pp. 969–971. [30] R. Khushaba, S. Kodagoda, S. Lal, and G. Dissanayake, “Driver drowsiness classification using fuzzy wavelet-packet-based feature-extraction algorithm,” Biomedical Engineering, IEEE Transactions on, vol. 58, no. 1, pp. 121–131, 2011. [31] M. Patel, S. K. L. Lal, D. Kavanagh, and P. Rossiter, “Applying neural network analysis on heart rate variability data to assess driver fatigue,” Expert Syst. Appl., vol. 38, no. 6, pp. 7235–7242, Jun. 2011. [Online]. Available: http://dx.doi.org/10.1016/j.eswa.2010.12.028 111 [32] C.-T. Lin, C.-J. Chang, B.-S. Lin, S.-H. Hung, C.-F. Chao, and I.-J. Wang, “A real-time wireless brain - computer interface system for drowsiness detection,” Biomedical Circuits and Systems, IEEE Transactions on, vol. 4, no. 4, pp. 214– 222, 2010.

[33] F.-C. Lin, L.-W. Ko, C.-H. Chuang, T.-P. Su, and C.-T. Lin, “Generalized EEG-based drowsiness prediction system by using a self-organizing neural fuzzy system,” IEEE Trans. on Circuits and Systems, vol. 59-I, no. 9, pp. 2044–2055, 2012.

[34] J. Liu, C. Zhang, and C. Zheng, “EEG-based estimation of mental fatigue by using KPCAHMM and complexity parameters,” Biomedical Signal Processing and Control, vol. 5, no. 2, pp. 124 – 130, 2010.

[35] S. Hu and G. Zheng, “Driver drowsiness detection with eyelid related parameters by Support Vector Machine,” Expert Syst. Appl., vol. 36, no. 4, pp. 7651–7658, May 2009. [Online]. Available: http://dx.doi.org/10.1016/j.eswa. 2008.09.030

[36] M. B. Kurt, N. Sezgin, M. Akin, G. Kirbas, and M. Bayram, “The ANN-based computing of drowsy level,” Expert Systems with Applications, vol. 36, no. 2, Part 1, pp. 2534 – 2542, 2009.

[37] W. C. Liang, J. Yuan, D. C. Sun, and M. H. Lin, “Changes in physiological parameters induced by indoor simulated driving: Effect of lower body exercise at mid-term break,” Sensors, vol. 9, no. 9, pp. 6913–6933, 2009.

[38] J. Gomez-Clapers and R. Casanella, “A fast and easy-to-use ECG acquisition and heart rate monitoring system using a wireless steering wheel,” Sensors Journal, IEEE, vol. 12, no. 3, pp. 610–616, 2012.

[39] X. Yu, U. of Minnesota. Intelligent Transportation Systems Institute, D. D. o. M. University of Minnesota, and I. Engineering, Real-time Nonintrusive De- tection of Driver Drowsiness: Final Report, ser. CTS (Series : Minneapolis, Minn.). Intelligent Transportation Systems Institute, Center for Transporta- tion Studies, University of Minnesota, 2009.

[40] H. J. Baek, G. S. Chung, K. K. Kim, and K.-S. Park, “A smart health mon- itoring chair for nonintrusive measurement of biological signals,” Information Technology in Biomedicine, IEEE Transactions on, vol. 16, no. 1, pp. 150–158, 2012.

[41] P.-C. Hii and W.-Y. Chung, “A comprehensive ubiquitous healthcare solution on an Android mobile device,” Sensors, vol. 11, no. 7, pp. 6799–6815, 2011.

[42] B.-G. Lee and W.-Y. Chung, “Multi-classifier for highly reliable driver drowsi- ness detection in Android platform,” Biomedical Engineering: Applications, Basis and Communications, vol. 24, no. 02, pp. 147–154, 2012. 112 [43] A. T. McCartt, S. A. Ribner, A. I. Pack, and M. C. Hammer, “The scope and nature of the drowsy driving problem in New York State,” Accident Analysis and Prevention, vol. 28, no. 4, pp. 511 – 517, 1996.

[44] A. I. Pack, A. M. Pack, E. Rodgman, A. Cucchiara, D. F. Dinges, and C. Schwab, “Characteristics of crashes attributed to the driver having fallen asleep,” Accident Analysis and Prevention, vol. 27, no. 6, pp. 769 – 775, 1995. [Online]. Available: http://www.sciencedirect.com/science/article/pii/ 0001457595000348

[45] J. A. Horne and L. A. Reyner, “Sleep related vehicle accidents,” BMJ, vol. 310, no. 6979, pp. 565–567, 3 1995.

[46] R. Knipling, J. Wang, and M. J. Goodman, “The role of driver inattention in crashes: New statistics from the 1995 crashworthiness data system,” Annual proceedings of the Association for the Advancement of Automotive Medicine, vol. 40, pp. 377–392, 1996. [Online]. Available: http://dx.doi.org/

[47] S. H. Fairclough and R. Graham, “Impairment of driving performance caused by sleep deprivation or alcohol: A comparative study,” Human Factors: The Journal of the Human Factors and Ergonomics Society, vol. 41, no. 1, pp. 118– 128, 1999.

[48] S. Otmani, T. Pebayle, J. Roge, and A. Muzet, “Effect of driving duration and partial sleep deprivation on subsequent alertness and performance of car drivers,” Physiology and Behavior, vol. 84, no. 5, pp. 715 – 724, 2005.

[49] P. Thiffault and J. Bergeron, “Monotony of road environment and driver fatigue: a simulator study,” Accident Analysis and Prevention, vol. 35, no. 3, pp. 381 – 391, 2003.

[50] R. Feng, G. Zhang, and B. Cheng, “An on-board system for detecting driver drowsiness based on multi-sensor data fusion using Dempster-Shafer theory,” in Networking, Sensing and Control, 2009. ICNSC ’09. International Conference on Networking, 2009, pp. 897–902.

[51] E. Vural, “Video based detection of driver fatigue,” Ph.D. dissertation, Sabanci University, 2009.

[52] M. Ingre, T. Akerstedt, B. Peters, A. Anund, and G. Kecklund, “Subjective sleepiness, simulated driving performance and blink duration: examining individual differences,” Journal of Sleep Research, vol. 15, no. 1, pp. 47–53, 2006. [Online]. Available: http://dx.doi.org/10.1111/j.1365-2869.2006.00504.x

[53] Volvo, “Volvo driver alert control and lane depar- ture warning system,” http://www.zercustoms.com/news/ Volvo-Driver-Alert-Control-and-Lane-Departure-Warning.html, 2007.

113 [54] E. Murphy-Chutorian and M. M. Trivedi, “Head pose estimation and aug- mented reality tracking: An integrated system and evaluation for monitoring driver awareness,” Intelligent Transportation Systems, IEEE Transactions on, vol. 11, no. 2, pp. 300–311, 2010.

[55] X. Zhang, N. Zheng, F. Mu, and Y. He, “Head pose estimation using isophote features for driver assistance systems,” in Intelligent Vehicles Symposium, 2009 IEEE, 2009, pp. 568–572.

[56] M. Saradadevi and P. Bajaj, “Driver fatigue detection using Mouth and Yawn- ing analysis,” International Journal of Computer Science and Network Security, vol. 8, no. 6, pp. 183–188, 2008.

[57] P. Smith, M. Shah, and N. da Vitoria Lobo, “Determining driver visual atten- tion with one camera,” Intelligent Transportation Systems, IEEE Transactions on, vol. 4, no. 4, pp. 205–218, 2003.

[58] T. Abe, T. Nonomura, Y. Komada, S. Asaoka, T. Sasai, A. Ueno, and Y. Inoue, “Detecting deteriorated vigilance using percentage of eyelid closure time during behavioral maintenance of wakefulness tests,” International Journal of Psychophysiology, vol. 82, no. 3, pp. 269 – 274, 2011. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0167876011002789

[59] L. Bergasa, J. Nuevo, M. Sotelo, R. Barea, and M. Lopez, “Real-time sys- tem for monitoring driver vigilance,” Intelligent Transportation Systems, IEEE Transactions on, vol. 7, no. 1, pp. 63–77, 2006.

[60] D. Dinges and U. S. N. H. T. S. Administration, Evaluation of Techniques for Ocular Measurement as an Index of Fatigue and as the Basis for Alertness Management, ser. United States. Dept. of Transportation. National Highway Traffic Safety Administration, 1998. [Online]. Available: http://books.google.com/books?id=LZ0HHQAACAAJ

[61] T. DOrazio, M. Leo, C. Guaragnella, and A. Distante, “A visual approach for driver inattention detection,” Pattern Recognition, vol. 40, no. 8, pp. 2341 – 2355, 2007. [Online]. Available: http://www.sciencedirect.com/science/article/ pii/S0031320307000544

[62] D. Liu, P. Sun, Y. Xiao, and Y. Yin, “Drowsiness detection based on eye- lid movement,” in Education Technology and Computer Science (ETCS), 2010 Second International Workshop on, vol. 2, 2010, pp. 49–52.

[63] R. A. McKinley, L. K. McIntire, R. Schmidt, D. W. Repperger, and J. A. Caldwell, “Evaluation of eye metrics as a detector of fatigue,” Human Factors: The Journal of the Human Factors and Ergonomics Society, vol. 53, no. 4, pp. 403–414, 2011. [Online]. Available: http://hfs.sagepub.com/content/53/4/403.abstract

114 [64] A. Lenskiy and J.-S. Lee, “Drivers eye blinking detection using novel color and texture segmentation algorithms,” International Journal of Control, Automation and Systems, vol. 10, no. 2, pp. 317–327, 2012. [Online]. Available: http://dx.doi.org/10.1007/s12555-012-0212-0

[65] C.-C. Lien and P.-R. Lin, “Drowsiness recognition using the Least Correlated LBPH,” in Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP), 2012 Eighth International Conference on, 2012, pp. 158–161.

[66] H. Qin, J. Liu, and T. Hong, “An eye state identification method based on the Embedded Hidden Markov Model,” in Vehicular Electronics and Safety (ICVES), 2012 IEEE International Conference on, 2012, pp. 255–260.

[67] C. Qingzhang, W. Wenfu, and C. Yuqin, “Research on eye-state based monitor- ing for drivers’ dozing,” Intelligent Information Technology Applications, 2007 Workshop on, vol. 1, pp. 373–376, 2009.

[68] H. Wang, L. Zhou, and Y. Ying, “A novel approach for real time eye state de- tection in fatigue awareness system,” in Robotics Automation and Mechatronics (RAM), 2010 IEEE Conference on, 2010, pp. 528–532.

[69] E. Cheng, B. Kong, R. Hu, and F. Zheng, “Eye state detection in facial im- age based on linear prediction error of wavelet coefficients,” in Robotics and Biomimetics, 2008. ROBIO 2008. IEEE International Conference on, 2009, pp. 1388–1392.

[70] W. Dong and P. Qu, “Eye state classification based on multi-feature fusion,” in Control and Decision Conference, 2009. CCDC ’09. Chinese, 2009, pp. 231–234.

[71] J. Guo and X. Guo, “Eye state recognition based on shape analysis and fuzzy logic,” in Intelligent Vehicles Symposium, 2009 IEEE, 2009, pp. 78–82.

[72] R. Hammoud, A. Wilhelm, P. Malawey, and G. Witt, “Efficient real-time al- gorithms for eye state and head pose tracking in advanced driver support sys- tems,” in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 2, 2005, pp. 1181 vol. 2–.

[73] T. Hong, H. Qin, and Q. Sun, “An improved real time eye state identification system in driver drowsiness detection,” in Control and Automation, 2007. ICCA 2007. IEEE International Conference on, 2007, pp. 1449–1453.

[74] C. Jiangwei, J. Lisheng, G. Lie, G. Keyou, and W. Rongben, “Driver’s eye state detecting method design based on eye geometry feature,” in Intelligent Vehicles Symposium, 2004 IEEE, 2004, pp. 357–362.

[75] A. Liu, Z. Li, L. Wang, and Y. Zhao, “A practical driver fatigue detection algorithm based on eye state,” in Microelectronics and Electronics (PrimeAsia), 2010 Asia Pacific Conference on Postgraduate Research in, 2010, pp. 235–238.

115 [76] W. Liu, Y. Wang, and L. Jia, “An effective eye states detection method based on projection,” in Signal Processing (ICSP), 2010 IEEE 10th International Conference on, 2010, pp. 829–831.

[77] Z. Liu and H. Ai, “Automatic eye state recognition and closed-eye photo correc- tion,” in Pattern Recognition, 2008. ICPR 2008. 19th International Conference on, 2008, pp. 1–4.

[78] Y.-l. Tian, T. Kanade, and J. F. Cohn, “Eye-state action unit detection by Gabor Wavelets,” in Proceedings of the Third International Conference on Advances in Multimodal Interfaces, ser. ICMI ’00. London, UK, UK: Springer-Verlag, 2000, pp. 143–150. [Online]. Available: http: //dl.acm.org/citation.cfm?id=645524.656647

[79] Z. Tian and H. Qin, “Real-time driver’s eye state detection,” in Vehicular Elec- tronics and Safety, 2005. IEEE International Conference on, 2005, pp. 285–289.

[80] N. H. Villaroman and D. C. Rowe, “Improving accuracy in face tracking user interfaces using consumer devices,” in Proceedings of the 1st Annual conference on Research in information technology, ser. RIIT ’12. New York, NY, USA: ACM, 2012, pp. 57–62. [Online]. Available: http://doi.acm.org/10.1145/2380790.2380806

[81] F. Wang, M. Zhou, and B. Zhu, “A novel feature based rapid eye state detec- tion method,” in Robotics and Biomimetics (ROBIO), 2009 IEEE International Conference on, 2009, pp. 1236–1240.

[82] Y.-S. Wu, T.-W. Lee, Q.-Z. Wu, and H.-S. Liu, “An eye state recognition method for drowsiness detection,” in Vehicular Technology Conference (VTC 2010-Spring), 2010 IEEE 71st, 2010, pp. 1–5.

[83] M. Dehnavi, N. Attarzadeh, and M. Eshghi, “Real time eye state recognition,” in Electrical Engineering (ICEE), 2011 19th Iranian Conference on, 2011, pp. 1–4.

[84] A. Mizuno, H. Okumura, and M. Matsumura, “Development of neckband mounted active bio-electrodes for non-restraint lead method of ECG R wave,” in 4th European Conference of the International Federation for Medical and Biological Engineering, ser. IFMBE Proceedings, J. Sloten, P. Verdonck, M. Nyssen, and J. Haueisen, Eds. Springer Berlin Heidelberg, 2009, vol. 22, pp. 1394–1397. [Online]. Available: http: //dx.doi.org/10.1007/978-3-540-89208-3 330

[85] B. Cheng, W. Zhang, Y. Lin, R. Feng, and X. Zhang, “Driver drowsiness detection based on multisource information,” Human Factors and Ergonomics in Manufacturing and Service Industries, vol. 22, no. 5, pp. 450–467, 2012. [Online]. Available: http://dx.doi.org/10.1002/hfm.20395

116 [86] G. Yang, Y. Lin, and P. Bhattacharya, “A driver fatigue recognition model based on information fusion and dynamic bayesian network,” Information Sciences, vol. 180, no. 10, pp. 1942 – 1954, 2010, ¡ce:title¿Special Issue on Intelligent Distributed Information Systems¡/ce:title¿. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0020025510000253

[87] Ford, “Ford driver alert,” http://corporate.ford.com/microsites/ sustainability-report-2012-13/vehicle-technologies-avoidance, 2012. [Online]. Available: http://www.euroncap.com/rewards/ford driver alert.aspx

[88] Toyota, “Lexus ls advanced active safety features,” http://www.newcarnet. co.uk/lexus news.html?id=5787, 2012. [Online]. Available: http://www. newcarnet.co.uk/Lexus news.html?id=5787

[89] ——, “Toyota redesigns crown and introduces hy- brid model,” http://www.worldcarfans.com/10802192219/ toyota-redesigns-crown--introduces-hybrid-model, 2012. [On- line]. Available: http://www.worldcarfans.com/10802192219/ toyota-redesigns-crown--introduces-hybrid-model

[90] Mercedes-Benz, “Attention assist: Drowsiness-detection system warns drivers to prevent them falling asleep momentarily,” online ar- ticle, 2008. [Online]. Available: http://media.daimler.com/dcmedia/ 0-921-658892-1-1147698-1-0-0-1147922-0-1-11702-0-0-1-0-0-0-0-0.html?TS= 1266506682902

[91] Volkswagen, “Driver alert,” http://www-nrd.nhtsa.dot.gov/pdf/esv/esv19/ Other/Print%2007.pdf, 2012.

[92] F. I. for Digital Media Technology, “Eyetracker watches drivers’ eyes for signs of drowsiness,” online article, 2010. [Online]. Available: http: //www.gizmag.com/fraunhofer-eyetracker-driver-monitoring-system/16643/

[93] A. S. Pilot, “Eyetracker watches drivers’ eyes for signs of drowsiness,” online article, 2012. [Online]. Available: http://www.antisleeppilot.com/

[94] DSS, “Eyetracker watches drivers’ eyes for signs of drowsiness,” online article, 2012. [Online]. Available: http://www.seeingmachines.com/product/dss/

[95] Takata, “Safetrak 1++,” http://www.safetrak.takata.com/ProductDetails. aspx, 2009.

[96] NapZapper, “Nap zapper,” http://www.napzapper.com/.

[97] M. J. Flores, J. M. Armingol, and A. de la Escalera, “Real-time warning system for driver drowsiness detection using visual information,” Journal of Intelligent and Robotic Systems, vol. 59, no. 2, pp. 103–125, 2010.

117 [98] P. Viola and M. Jones, “Robust real-time object detection,” International Jour- nal of Computer Vision, vol. 57, no. 2, pp. 137–154, 2002.

[99] T. Ahonen, A. Hadid, and M. Pietik¨ainen,“Face recognition with local binary patterns,” in ECCV (1), 2004, pp. 469–481.

[100] F. Rosey, S.-M. Auberlet, O. Moisan, and G. Dupr, “Impact of narrower lane width: Comparison between fixed-base simulator and real data,” Transportation Research Record: Journal of the Transportation Research Board, vol. 2138, no. 1, pp. 112–119, 2009. [Online]. Available: http://trb.metapress.com/content/Y5855443H361773K

[101] P. Konstantopoulos, P. Chapman, and D. Crundall, “Driver’s visual attention as a function of driving experience and visibility. using a driving simulator to explore drivers eye movements in day, night and rain driving,” Accident Analysis and Prevention, vol. 42, no. 3, pp. 827 – 834, 2010.

[102] D. Dawson, C. J. van den Heuvel, K. J. Reid, S. N. Biggs, and S. D. Baulk, “Chasing the silver bullet: Measuring driver fatigue using simple and complex tasks,” Accident analysis and prevention, vol. 40, no. 1, pp. 396–402, 2008. [Online]. Available: http://dx.doi.org/10.1016/j.aap.2007.07.008

[103] Z. Li and P. Milgram, “An investigation of the potential to influence braking behaviour through manipulation of optical looming cues in a simulated driv- ing task,” Proceedings of the Human Factors and Ergonomics Society Annual Meeting, vol. 49, no. 17, pp. 1540–1544, 2005.

[104] J. Telner, The Effects of Linguistic Fluency on Performance in a Simulated Cellular Telephone and Driving Situation, ser. Canadian theses. York University (Canada), 2008. [Online]. Available: http://books.google.com/ books?id=d09aXaShgSEC

[105] J. A. Telner, D. L. Wiesenthal, and E. Bialystok, “Video gamer advantages in a cellular telephone and driving task,” Proceedings of the Human Factors and Ergonomic Society annual meeting, vol. 53, no. 23, pp. 1748–1752, 2009. [Online]. Available: http://dx.doi.org/10.1177/154193120905302301

[106] W. Nic, “Program develops new test track capability,” The Sensor Newsletter, 2004.

[107] F. Bella, “Driver perception of roadside configurations on two-lane rural roads: Effects on speed and lateral placement,” Accident Analysis and Prevention, vol. 50, no. 0, pp. 251 – 262, 2013.

[108] J. Lichtenauer, E. Hendriks, and M. Reinders, “Isophote properties as fea- tures for object detection,” in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 2, June 2005, pp. 649–654 vol. 2. 118 [109] C. M. Bishop, Pattern Recognition and Machine Learning (Information Science and Statistics). Secaucus, NJ, USA: Springer-Verlag New York, Inc., 2006.

[110] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vision, vol. 60, no. 2, pp. 91–110, Nov. 2004. [Online]. Available: http://dx.doi.org/10.1023/B:VISI.0000029664.99615.94

[111] H. Drucker, C. J. C. Burges, L. Kaufman, A. J. Smola, and V. Vapnik, “Support vector regression machines,” in NIPS, 1996, pp. 155–161.

[112] E. Murphy-Chutorian and M. M. Trivedi, “3D tracking and dynamic analysis of human head movements and attentional targets.” in ICDSC. IEEE, 2008, pp. 1–8. [Online]. Available: http://dblp.uni-trier.de/db/conf/icdsc/icdsc2008. html#Murphy-ChutorianT08

[113] C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning, vol. 20, no. 3, pp. 273–297, 1995. [Online]. Available: http://dx.doi.org/10. 1007/BF00994018

[114] J. Huang, S. R. Kumar, M. Mitra, W.-J. Zhu, and R. Zabih, “Image indexing using color correlograms,” in Proceedings of the 1997 Conference on Computer Vision and Pattern Recognition (CVPR ’97), ser. CVPR ’97. Washington, DC, USA: IEEE Computer Society, 1997, pp. 762–. [Online]. Available: http://dl.acm.org/citation.cfm?id=794189.794514

[115] D. Chai and K. N. Ngan, “Face segmentation using skin-color map in videophone applications,” IEEE Trans. Cir. and Sys. for Video Technol., vol. 9, no. 4, pp. 551–564, Jun. 1999. [Online]. Available: http://dx.doi.org/10.1109/76.767122

119