Acoustic Sleepiness Detection: Framework and Validation of a Speech-Adapted Pattern Recognition Approach

Behavior Research Methods 2009, 41 (3), 795-804 doi:10.3758/BRM.41.3.795 Acoustic sleepiness detection: Framework and validation of a speech-adapted pattern recognition approach JAREK KRAJEWSKI University of Wuppertal, Wuppertal, Germany ANTON BATLINER University of Erlangen-Nürnberg, Erlangen, Germany AND MARTIN GOLZ University of Applied Sciences Schmalkalden, Schmalkalden, Germany This article describes a general framework for detecting sleepiness states on the basis of prosody, articulation, and speech-quality-related speech characteristics. The advantages of this automatic real-time approach are that obtaining speech data is nonobstrusive and is free from sensor application and calibration efforts. Different types of acoustic features derived from speech, speaker, and emotion recognition were employed (frame-level-based speech features). Combing these features with high-level contour descriptors, which capture the temporal information of frame-level descriptor contours, results in 45,088 features per speech sample. In general, the measurement process follows the speech-adapted steps of pattern recognition: (1) recording speech, (2) preprocessing, (3) feature computation (using perceptual and signal-processing-related features such as, e.g., fundamental frequency, intensity, pause patterns, formants, and cepstral coefficients), (4) dimensionality reduction, (5) classification, and (6) evaluation. After a correlation-filter-based feature subset selection employed on the feature space in order to find most relevant features, different classification models were trained. The best model—namely, the support-vector machine—achieved 86.1% classification accuracy in predicting sleepiness in a sleep deprivation study (two-class problem, N 12; 01.00–08.00 a.m.). Sleepiness impairs a series of cognitive abilities, such such as chemical factories, nuclear power stations, and as early perceptual (visual sensitivity; Tassi, Pellerin, air traffic control; Melamed & Oksenberg, 2002; Wright Moessinger, Eschenlauer, & Muzet, 2000), central (cen- & McGown, 2001) contexts. Accordingly, 21% of the re- tral slowing hypothesis; Bratzke, Rolke, Ulrich, & Peters, ported incidents mentioned in the Aviation Safety Report- 2007), and late motor-processing (psychomotor slowing; ing System (including those involving pilots and air traffic Dinges & Kribbs, 1991) steps. Furthermore, the decrements controllers) were related to fatigue. Thus, the prediction in the speed and accuracy of various task performances of and warning of traffic employees against impending can be explained by changes in working memory, execu- critical sleepiness play an important role in preventing ac- tive function, supervisory control (Jennings, Monk, & van cidents and the resulting human and financial costs. der Molen, 2003; Nilsson et al., 2005), spatial orientation, In addition to the commonly accepted fact of sleep- situational awareness (see Harwood, Barnett, & Wickens, induced cognitive impairments, previous research lends 1988), mathematical processing, motor task abilities (e.g., some support for the mood-disturbing effects of sleepiness manual dexterity, grip strength, tapping, fine motor control; (see Engle-Friedman et al., 2003). Drawing from these Durmer & Dinges, 2005; Rogers, Dorrian, & Dinges, 2003; findings, we assume that in analogy to the sleepiness- Wesensten, Belenky, Thorne, Kautz, & Balkin, 2004), and induced decrease of performance within the transporta- divergent-thinking capacity (Horne, 1988; Linde & Berg- tion sector, the performance of communication-centered ström, 1992). services will also suffer from sleepiness-related impair- Due to these impairments, sleepiness is a factor in a va- ments. In addition to sleepiness-induced disturbances in riety of incidents and accidents in road traffic (e.g., Flat- human-to-human communication, human–computer in- ley, Reyner, & Horne, 2004; Horberry, Hutchins, & Tong, teraction (HCI) could also benefit from the detection of 2008; Read, 2006) and work (e.g., safety sensitive fields, and automatic countermeasures to sleepiness. Knowing J. Krajewski, [email protected] 795 © 2009 The Psychonomic Society, Inc. 796 KRAJEWSKI, BATLINER, AND GOLZ the speaker’s sleepiness state can contribute to the natu- Sleepiness and Speech Changes ralness of HCI. If the user shows unusual fatigue states, Sleepiness-related cognitive-physiological changes— giving feedback about this fact would make the communi- such as decreased muscle tension or reduced body cation more empathic and human-like. This enhanced nat- temperature— can indirectly influence voice characteris- uralism might improve the acceptance of these systems. tics according to the following stages of speech produc- Furthermore, it may result in better comprehensiveness, tion (O’Shaughnessy, 2000). if the system output is adapted to the user’s actual fatigue- 1. Cognitive speech planning: reduced cognitive pro- impaired attentional and cognitive resources. cessing speed (central slowing hypothesis; Bratzke et al., Hence, many efforts have been reported in the literature 2007) % impaired speech planning (Levelt, Roelofs, & to measure fatigue states (Sommer, Chen, Golz, Trutschel, Meyer, 1999) and impaired neuromuscular motor co- & Mandic, 2005). These systems have focused mainly ordination processes ( psychomotor slowing; Dinges & on (1) saccade eye movement (Zils, Sprenger, Heide, Kribbs, 1991) % impaired fine motor control and slowed Born, & Gais, 2005), instability of pupil size (Wilhelm articulator movement % slackened articulation and et al., 2001), and eye blinking (Ingre, Åkerstedt, Peters, slowed speech. Anund, & Kecklund, 2006; Schleicher, Galley, Briest, & 2. Respiration: decreased muscle tension % flat and Galley, 2008); (2) EEG data (Davidson, Jones, & Peiris, slow respiration % reduced subglottal pressure % lower 2007; Golz, Sommer, Holzbrecher, & Schnupp, 2007); fundamental frequency, intensity, articulatory precision, and (3) behavioral expression data (gross body move- and rate of articulation. ment, head movement, mannerism, and facial expression; 3. Phonation: decreased muscle tension % increased Vöhringer-Kuhnt, Baumgarten, Karrer, & Briest, 2004) in vocal fold elasticity and decreased vocal fold tension; order to characterize the sleepiness state. Apart from these decreased body temperature % changed viscoelasticity promising advances in analyzing eye movement and be- of vocal folds % shift in the spectral energy distribution; havioral expression data, there has recently been renewed breathy and lax voice % nonraised larynx % decreased interest in vocal expression and speech analysis. This fact resonance frequencies (formants) positions and broad- is promoted mainly by the progress in speech science ened formant bandwidth. and the gaining presence of speech in voice-guided HCI. 4. Articulation/resonance: decreased muscle tension % Using voice communication as an indicator of sleepiness unconstricted pharynx and softening of vocal tract walls would have the following advantages: Obtaining speech % energy loss of the speech signal % broader formant data is nonobstrusive, free from sensor application and bandwidth; postural changes % lowered upper body and calibration efforts, robust against extreme environmental lowered head % changed vocal tract shape % changed conditions (humidity, temperature, and vibrations), and formant position; increased salivation % energy loss; “hands- and eyes-free,” and most importantly, speech data decreased body temperature % reduced heat conduc- are omnipresent in many daily life situations. tion, changed friction between vocal tract walls and air, Little empirical research has been done to examine the changed laminar flows, jet streams, and turbulences % effect of sleepiness states on acoustic voice characteris- energy loss % shift in the spectral energy distribution, tics. Most studies have analyzed only single features (Har- broader formant bandwidth, increase in formant frequen- rison & Horne, 1997; Whitmore & Fisher, 1996) or small cies especially in lower formants. feature sets containing only perceptual acoustic features, 5. Radiation: decreased orofacial movement, facial ex- whereas signal-processing-based speech and speaker pression, and lip spreading (relaxed open mouth display; recognition features (e.g., mel frequency cepstrum co- Kienast & Sendlmeier, 2000; Tartter, 1980) % lengthen- efficients [MFCCs]; see Table 1) have received little at- ing of the vocal tract % lower first and second formant tention (Greeley et al., 2007; Nwe, Li, & Dong, 2006). positions; reduction of articulatory effort % smaller open- Building an automatic sleepiness detection engine reach- ing degree % slackened articulation % decreased first ing sufficient precisions still remains undone. Thus, the formant; oropharyngeal relaxation % lowering velum % aim of this study is to apply a state-of-the-art speech emo- coupling of nasal cavity % increased nasality % broadened tion recognition engine (Batliner et al., 2006; Vlasenko, Formant 1 bandwidth, smaller Formant 1 amplitude. Schuller, Wendemuth, & Rigoll, 2007) on the detection of These changes—summarized in the cognitive- critical sleepiness states. Attention is drawn particularly physiological mediator model of sleepiness-induced to the computation of a 45,088-feature set using frame- speech changes (Krajewski, 2008)—are based on educated level descriptors (FLDs) and their temporal-information- guesses. In spite of the partially vague model predictions aggregating functionals

Load more