<<

contributed articles

DOI:10.1145/2818990 Fusing information from multiple biometric traits enhances in mobile devices.

BY MIKHAIL I. GOFMAN, SINJINI MITRA, TSU-HSIANG KEVIN CHENG, AND NICHOLAS T. SMITH Multimodal for Enhanced Mobile Device Security

security, military, and law-enforce- MILLIONS OF MOBILE devices are stolen every year, ment applications,15,18 they are not yet along with associated credit card numbers, , widely integrated into consumer mo- and other secure and personal information stored bile devices. This can be attributed to therein. Over the years, criminals have learned implementation challenges and con- to crack passwords and fabricate biometric traits key insights

and have conquered practically every kind of ˽˽ Multimodal biometrics, or identifying people based on multiple physical and user-authentication mechanism designed to stop behavioral traits, is the next logical step toward more secure and robust them from accessing device data. Stronger mobile biometrics-based authentication in authentication mechanisms are clearly needed. mobile devices. ˽˽ The face-and-voice-based biometric Here, we show how multimodal biometrics system covered here, as implemented on promises untapped potential for protecting consumer a Samsung Galaxy S5 phone, achieves greater authentication accuracy in mobile devices from unauthorized access, an uncontrolled conditions, even with poorly lit face images and voice samples, than authentication approach based on multiple physical single-modality face and voice systems. and behavioral traits like face and voice. Although ˽˽ Multimodal biometrics on mobile devices can be made user friendly multimodal biometrics are deployed in homeland for everyday consumers.

58 COMMUNICATIONS OF THE ACM | APRIL 2016 | VOL. 59 | NO. 4 cern that consumers may find the ap- (such as and iris scans) several critical issues remain, including, proach inconvenient. into the system. We hope our experi- for example, techniques for defeating We also show multimodal biomet- ence encourages researchers and mo- iPhone TouchID and Samsung Galaxy rics can be integrated with mobile bile-device manufacturers to pursue S5 recognition systems.2,26 devices in a user-friendly manner the same line of innovation. Further, consumers continue to com- and significantly improve their secu- plain that modern mobile biometric rity. In 2015, we thus implemented a Biometrics systems lack robustness and often fail to multimodal biometric system called Biometrics-based authentication es- recognize authorized users.4 To see how Proteus at California State University, tablishes identity based on physical multimodal biometrics can help ad- Fullerton, based on face and voice and behavioral characteristics (such dress these issues, we first examine their on an Samsung Galaxy S5 phone, in- as face and voice), relieving users from underlying causes. tegrating new multimodal biometric having to create and remember secure authentication optimized passwords. At the same time, it chal- The Mobile World for consumer-level mobile devices lenges attackers to fabricate human One major problem of biometric au- and an interface that allows users traits that, though possible, is difficult thentication in mobile devices is sam- to readily record multiple biometric in practice.21 These advantages con- ple quality. A good-quality biometric traits. Our experiments confirm it tinue to spur adoption of biometrics- sample—whether a photograph of achieves considerably greater authen- based authentication in smartphones a face, a voice recording, or a finger- tication accuracy than systems based and tablet computers. print scan—is critical for accurate solely on face or voice alone. The next Despite the arguable success of bio- identification; for example, a low-

IMAGE BY ANDRIJ BORYS ASSOCIATES/SHUTTERSTOCK ANDRIJ BORYS BY IMAGE step is to integrate other biometrics metric authentication in mobile devices, resolution photograph of a face or

APRIL 2016 | VOL. 59 | NO. 4 | COMMUNICATIONS OF THE ACM 59 contributed articles

noisy voice recording can lead a bio- security and robustness challenges spite the recent popularity of biomet- metric to incorrectly iden- confronting today’s mobile unimodal ric authentication in consumer mobile tify an impostor as a legitimate user, systems13,18 that identify people based devices, multimodal biometrics have or “false acceptance.” Likewise, it can on a single biometric characteristic. had limited penetration in the mo- cause the algorithm to declare a legit- Moreover, deploying multimodal bio- bile consumer market.1,15 This can be imate user an impostor, or “false re- metrics on existing mobile devices is attributed to the concern users could jection.” Capturing high-quality sam- practical; many of them already sup- find it inconvenient to record multiple ples in mobile devices is especially port face, voice, and fingerprint recog- biometrics. Multimodal systems can difficult for two main reasons. Mobile nition. What is needed is a robust us- also be more difficult to design and users capture biometric samples in a er-friendly approach for consolidating implement than unimodal systems. variety of environmental conditions; these technologies. Multimodal bio- However, as we explain, these factors influencing these conditions metrics in consumer mobile devices problems are solvable. Companies include insufficient lighting, differ- deliver multiple benefits. like Apple and Samsung have invest- ent poses, varying camera angles, and Increased . Attack- ed significantly in integrating bio- background noise. And biometric ers can defeat unimodal biometric metric sensors (such as cameras and sensors in consumer mobile devices systems by spoofing a single biomet- fingerprint readers) into their prod- often trade sample quality for por- ric modality used by the system. Es- ucts. They can thus deploy multimod- tability and lower cost; for example, tablishing identity based on multiple al biometrics without substantially the dimensions of an Apple iPhone’s modalities challenges attackers to increasing their production costs. TouchID fingerprint scanner prohibit simultaneously spoof multiple inde- In return, they profit from enhanced it from capturing the entire finger, pendent human traits—a significantly device sales due to increased security making it easier to circumvent.4 tougher challenge.21 and robustness. In the following sec- Another challenge is training the More robust mobile authentication. tions we discuss how to achieve such biometric system to recognize the When using multiple biometrics, one profitable security. device user. The training process is biometric modality can be used to based on extracting discriminative compensate for variations and quality Fusing Face and Voice Biometrics features from a set of user-supplied deficiencies in the others; for example, To illustrate the benefits of multimod- biometric samples. Increasing the Proteus assesses face-image and voice- al biometrics in consumer mobile de- number and variability of training recording quality and lets the highest- vices, we implemented Proteus based samples increases identification ac- quality sample have greater impact on on face and voice biometrics, choosing curacy. In practice, however, most the identification decision. these modalities because most mo- consumers likely train their systems Likewise, multimodal biometrics bile devices have cameras and micro- with few samples of limited variabil- can simplify the device-training proc- phones needed for capturing them. ity for reasons of convenience. Mul- ess. Rather than provide many training Here, we provide an overview of face- timodal biometrics is the key to ad- samples from one modality (as they and voice-recognition techniques, dressing these challenges. often must do in unimodal systems), followed by an exploration of the ap- users can provide fewer samples from proaches we used to reconcile them. Promise of Multimodal Biometrics multiple modalities. This identifying Face and voice recognition. We used Due to the presence of multiple pieces information can be consolidated to the face-recognition technique known of highly independent identifying in- ensure sufficient training data for reli- as FisherFaces3 in Proteus, as it works formation (such as face and voice), able identification. well in situations where images are multimodal systems can address the A market ripe with opportunities. De- captured under varying conditions, as

Figure 1. Schematic diagram illustrating the Proteus quality-based score-level fusion scheme.

Face Matching Minimum Accept Match t1 Threshold (T) Luminosity Face Quality Q1 Face Sharpness Match Score Extraction Score Normalization Contrast Generation S1 Face Image Face Quality w1 Assessment If (S1 * w1 + S2 * w2 ≤ T) Weight Decision = grant Decision Assignment w 2 else Decision = deny Voice Signal Voice Quality S Assessment 2 Q2 Match Score Denoising SNR Normalization

t2 Voice Matching

60 COMMUNICATIONS OF THE ACM | APRIL 2016 | VOL. 59 | NO. 4 contributed articles expected in the case of face images ob- signals, then normalizes the SNR value tained through mobile devices. Fisher- to the [0, 1] range using min-max nor- Faces uses pixel intensities in the face malization. images as identifying features. In the Multimodal biometric fusion. In future, we plan to explore other face- multimodal biometric systems, infor- recognition techniques, including Ga- To get its algorithm mation from different modalities can bor wavelets6 and Histogram Oriented to scale to the be consolidated, or fused, at the follow- Gradients (HOG).5 ing levels:21 We used two approaches for voice constrained Feature. Either the data or the fea- recognition: Hidden Markov Models resources of the ture sets originating from multiple (HMM) based on the Mel-Frequency sensors and/or sources are fused; Cepstral Coefficients (MFCCs) as voice device, Proteus had Match score. The match scores gen- features,10 the basis of our score-level erated from multiple trait-matching fusion scheme; and Linear Discrimi- to be able to shrink algorithms pertaining to the different nant Analysis (LDA),14 the basis for our the size of face biometric modalities are combined, and feature-level fusion scheme. Both ap- Decision. The final decisions of mul- proaches recognize a user’s voice inde- images to prevent tiple matching algorithms are consoli- pendent of phrases spoken. the algorithm dated into a single decision through Assessing face and voice sample techniques like majority voting. quality. Assessing biometric sample from exhausting Biometric researchers believe inte- quality is important for ensuring the available grating information at earlier stages of the accuracy of any biometric-based processing (such as at the feature level) authentication system, particularly device memory. is more effective than having integra- for mobile devices, as discussed tion take place at a later stage (such as earlier. Proteus thus assesses facial at the score level).20 image quality based on luminosity, sharpness, and contrast, while voice- Multimodal Mobile recording quality is based on signal- Biometrics Framework to-noise ratio (SNR). These classic Proteus fuses face and voice biomet- quality metrics are well documented rics at either score or feature level. in the biometrics research litera- Since decision-level fusion typically ture.1,17,24 We plan to explore other produces only limited improvement,21 promising metrics, including face we did not pursue it when developing orientation, in the future. Proteus. Proteus computes the average lu- Proteus does its training and test- minosity, sharpness, and contrast of ing processes with videos of people a face image based on the intensity of holding a phone camera in front of the constituent pixels using approaches their faces while speaking a certain described in Nasrolli and Moeslund.17 phrase. From each video, the face is It then normalizes each quality mea- detected through the Viola-Jones al- sure using the min-max normalization gorithm24 and the system extracts the method to lie between [0, 1], finally soundtrack. The system de-noises all computing their average to obtain a sin- sound frames to remove frequencies gle quality score for a face image. One outside human voice range (85Hz– interesting problem here is determin- 255Hz) and drops frames without ing the impact each quality metric has voice activity. It then uses the results on the final face-quality score; for exam- as inputs into our fusion schemes. ple, if the face image is too dark, then Score-level fusion scheme. Figure poor luminosity would have the greatest 1 outlines our score-level fusion ap- impact, as the absence of light would be proach, integrating face and voice bio- the most significant impediment to rec- metrics. The contribution of each mo- ognition. Likewise, in a well-lit image dality’s match score toward the final distorted due to motion blur, sharpness decision concerning a user’s authen- would have the greatest impact. ticity is determined by the respective SNR is defined as a ratio of voice sample quality. Proteus works as out- signal level to the level of background lined in the following paragraphs. noise signals. To obtain a voice-quality Let t1 and t2, respectively, denote score, Proteus adapts the probabilistic the average face- and voice-quality approach described in Vondrasek and scores of the training samples from Pollak25 to estimate the voice and noise the user of the device. Next, from a

APRIL 2016 | VOL. 59 | NO. 4 | COMMUNICATIONS OF THE ACM 61 contributed articles

test-video sequence, Proteus com- ulent activity, including deliberate at-

putes the quality scores Q1 and Q2 tempts to alter the quality score of a of the two biometrics, respective- specific biometric modality. The sys- ly. These four parameters are then tem must thus ensure the weight of passed to the system’s weight-assign- each modality does not fall below a ment module, which computes weights Storing and certain threshold so the multimodal w1 and w2 for face and voice modalities, processing scheme remains viable. respectively. Each wi is calculated as In 2014, researchers at IBM pro- vt biometric data on wi = p2 + p2 , where p1 and p2 are percent posed a score-level fusion scheme proximities of Q1 to t1 and Q2 to t2, re- the mobile device based on face, voice, and signature spectively. The system requests users biometrics for iPhones and iPads.1 train mostly through good-quality itself, rather than Their implementation considered samples, as discussed later, so close only the quality of voice recordings, proximity of the testing sample qual- offloading these not face images, and is distinctly dif- ity to that of training samples is a tasks to a remote ferent from our approach, which in- sign of a good-quality testing image. corporates the quality of both modali- Greater weight is thus assigned to the server, eliminates ties. Further, because their goal was modality with a higher-quality sam- the challenges secure sign-in into a remote server, ple, ensuring effective integration of they outsourced the majority of com- quality in the system’s final authenti- of securely putational tasks to the target server; cation process. transmitting Proteus performs all computations The system then computes and directly on the mobile device itself. To

normalizes matching scores S1 and S2 the biometric data get its algorithm to scale to the con- from the respective face- and voice- and authentication strained resources of the device, Pro- recognition algorithms applied to test teus had to be able to shrink the size of images through z-score normaliza- decisions across face images to prevent the algorithm tion. We chose this particular method from exhausting the available device because it is a commonly used normal- potentially memory. Finally, Aronowitz et al.1 used ization method, easy to implement, insecure networks. multiple facial features (such as HOG and highly efficient.11 However, we and LBP) that, though arguably more wish to experiment with more robust robust than FisherFaces, can be pro- methods (such as the tanh and sig- hibitively slow when executed locally moid functions) in the future. The sys- on a mobile device; we plan to inves- tem then computes the overall match tigate using multiple facial features in score for the fusion scheme using the the future.

weighted sum rule as M = S1w1 + S2w2. If Feature-level fusion scheme. M ≥ T (T is the pre-selected threshold), Most multimodal feature-level fu- the system will accept the user as au- sion schemes assume the modalities thentic; otherwise, it declares the user to be fused are compatible (such as to be an imposter. in Kisku et al.12 and in Ross and Go- Discussion. The scheme’s effec- vindarajan20); that is, the features tiveness is expected to be greatest of the modalities are computed in a

when t1 = Q1 and t2 = Q2. However, the similar fashion, based on, say, dis- system must exercise caution here to tance. Fusing face and voice modali- ensure significant representation of ties at the feature level is challeng- both modalities in the fusion process; ing because these two biometrics

for example, if Q2 differs greatly from are incompatible: face features are

t2 while Q1 is close to t1, the authen- pixel intensities and voice features tication process is dominated by the are MFCCs. Another challenge for face modality, thus reducing the pro- feature-level fusion is the curse of di- cess to an almost unimodal scheme mensionality arising when the fused based on the face biometric. A man- feature vectors become excessively dated benchmark is thus required for large. We addressed both challenges each quality score to ensure the fu- through the LDA approach. In addi- sion-based authentication procedure tion, we observed LDA required less does not grant access for a user if the training data than neural networks benchmark for each score is not met. and HMMs, with which we have ex- Without such benchmarks, the whole perimented. authentication procedure could be The process (see Figure 2) works like exposed to the risk of potential fraud- this:

62 COMMUNICATIONS OF THE ACM | APRIL 2016 | VOL. 59 | NO. 4 contributed articles

Figure 2. Linear discriminant analysis-based feature-level fusion.

Face Features Face Image Principal Minimum Face Component Accept Match Extraction Analysis (PCA) Threshold (T)

If(score ≥ T) LDA Fusion Score Decision = grant Decision

Feature else Decision = deny

Voice Signal Normalization Voice Features MFCC Denoising Extraction

Phase 1 (face feature extraction). The Implementation and using it to impersonate a legiti- Proteus algorithm applies Principal We implemented our quality-based mate user. It is thus imperative that Component Analysis (PCA) to the face score-level and feature-level fusion ap- Proteus stores and processes the bio- feature set to perform feature selection; proaches on a randomly selected Sam- metric data securely. Phase 2 (voice feature extraction). sung Galaxy S5 phone. User friendliness The current implementation stores It extracts a set of MFCCs from each and execution speed were our guiding only MFCCs and PCA coefficients in the preprocessed audio frame and rep- principles. device persistent memory, not raw bio- resents them in a matrix form where User interface. Our first priority metric data, from which deriving useful each row is used for each frame and when designing the interface was to biometric data is nontrivial.16 Proteus each column for each MFCC index. ensure users could seamlessly capture can enhance security significantly by And to reduce the dimensionality of face and voice biometrics simultaneous- using cancelable biometric templates19 the MFCC matrix, it uses the column ly. We thus adopted a solution that asks and encrypting, storing, and process- means of the matrix as its voice fea- users to record a short video of their fac- ing biometric data in Trusted Execu- ture vector; es while speaking a simple phrase. The tion Environment tamper-proof hard- Phase 3 (fusion of face and voice fea- prototype of our graphical user interface ware highly isolated from the rest of tures). Since the algorithm measures (GUI) (see Figure 3) gives users real-time face and voice features using different feedback on the quality metrics of their Figure 3. The GUI used to interact with Proteus. units, it standardizes them individu- face and voice, guiding them to capture ally through the z-score normaliza- the best-quality samples possible; for tion method, as in score-level fusion. example, if the luminosity in the video The algorithm then concatenates differs significantly from the average lu- these normalized features to form minosity of images in the training data- one big feature vector. If there are N base, the user may get a prompt saying, face features and M voice features, it Suggestion: Increase lighting. will have a total of N + M features in In addition to being user friendly, the the concatenated, or fused, set. The video also facilitates integration of other algorithm then uses LDA to perform security features (such as liveness check- feature selection from the fused fea- ing7) and correlation of lip movement ture set. This helps address the curse with speech.8 of the dimensionality problem by re- To ensure fast authentication, the moving irrelevant features from the Proteus face- and voice-feature ex- combined set; and traction algorithms are executed in Phase 4 (authentication). The al- parallel on different processor cores; gorithm uses Euclidean distance to the Galaxy S5 has four cores. Proteus determine the degree of similarity be- also uses similar parallel program- tween the fused features sets from the ming techniques to help ensure the training data and each test sample. If GUI’s responsiveness. the distance value is less than or equal Security of biometric data. The to a predetermined threshold, it ac- greatest risk from storing biomet- cepts the test subject as a legitimate ric data on a mobile device (Proteus user. Otherwise, the subject is de- stores data from multiple biometrics) clared an impostor. is the possibility of attackers stealing

APRIL 2016 | VOL. 59 | NO. 4 | COMMUNICATIONS OF THE ACM 63 contributed articles

the device software and hardware; the plan to make our own one publicly voice recordings per subject (extracted Galaxy S5 uses this approach to protect available. The today in- from video) as training samples. We fingerprint data.22 cludes video recordings of 54 people performed the testing through a ran- Storing and processing biometric of different genders and ethnicities domly selected face-and-voice sample data on the mobile device itself, rath- holding a phone camera in front of from a subject we selected randomly er than offloading these tasks to a re- their faces while speaking a certain from among the 54 subjects in the mote server, eliminates the challenge simple phrase. database, leaving out the training of securely transmitting the biomet- The faces in these videos show the samples. Overall, our subjects creat- ric data and authentication decisions following types of variations: ed and used 480 training and test-set across potentially insecure networks. Four expressions. Neutral, happy, combinations, and we averaged their In addition, this approach alleviates sad, angry, and scared; EERs and testing times. We under- consumers’ concern regarding the Three poses. Frontal and sideways took this statistical cross-validation security, , and misuse of their (left and right); and approach to assess and validate the biometric data in transit to and on re- Two illumination conditions. Uni- effectiveness of our proposed ap- mote systems. form and partial shadows. proaches based on the available data- The voice samples show different base of 54 potential subjects. Performance Evaluation levels of background noise, from car Quality-based score-level fusion. We compared Proteus recognition ac- traffic to music to people chatter, cou- Table 1 lists the average EERs and curacy to unimodal systems based on pled with distortions in the voice itself testing times from the unimodal and face and voice biometrics. We mea- (such as raspiness). We used 20 differ- multimodal schemes. We explain the sured that accuracy using the stan- ent popular phrases, including “Roses high EER of our HMM voice-recogni- dard equal error rate (EER) metric, or are red,” “Football,” and “13.” tion algorithm by the complex noise the value where the false acceptance Results. In our experiments, we signals in many of our samples, in- rate (FAR) and the false rejection rate trained the Proteus face, voice, and cluding traffic, people chatter, and (FRR) are equal. Mechanisms en- fusion algorithms using videos from music, that were difficult to detect abling secure storage and processing half of the subjects in our database and eliminate. Our quality-score-lev- of biometric data must therefore be (27 subjects out of a total of 54), while el fusion scheme detected low SNR in place. we considered all subjects for test- levels and compensated by adjusting Database. For our experiments, ing. We collected most of the training weights in favor of the face images we created a CSUF-SG5 homegrown videos in controlled conditions with that were of substantially better qual- multimodal database of face and good lighting and low background ity. By adjusting weights in favor of voice samples collected from Uni- noise levels and with the camera held face images, the face biometric thus versity of California, Fullerton, stu- directly in front of the subject’s face. had a greater impact on the final de- dents, employees, and individuals For these subjects, we also added a cision of whether or not a user is le- from outside the university using few face and voice samples from videos gitimate than the voice biometric. the Galaxy S5 (hence the name). To of less-than-ideal quality (to simulate For the contrasting scenario, where incorporate various types and lev- the limited variation of training samples voice samples were relatively better els of variations and distortions in a typical consumer would be expected quality than face samples, as in Table the samples, we collected them in a to provide) to increase the algorithm’s 1, the EERs were 21.25% and 20.83% variety of real-world settings. Given chances of correctly identifying the for unimodal voice and score-level fu- such a diverse database of multi- user in similar conditions. Overall, sion, respectively. modal biometrics is unavailable, we we used three face frames and five These results are promising, as they show the quality of the different Table 1. EER results from score-level fusion. modalities can vary depending on the circumstances in which mobile users Modality EER Testing Time (sec.) might find themselves. They also show Face 27.17% 0.065 Proteus adapts to different conditions Voice 41.44% 0.045 by scaling the quality weights appro- Score-level fusion 25.70% 0.108 priately. With further refinements (such as more robust normalization techniques), the multimodal method can yield even better accuracy. Table 2. EER results from feature-level fusion. Feature-level fusion. Table 2 out- lines our performance results from Modality EER Testing Time (sec.) the feature-level fusion scheme, show- Face 4.29% 0.13 ing feature-level fusion yielded signifi- Voice 34.72% 1.42 cantly greater accuracy in authentica- Feature-level fusion 2.14% 1.57 tion compared to unimodal schemes. Our experiments clearly reflect the potential of multimodal bio-

64 COMMUNICATIONS OF THE ACM | APRIL 2016 | VOL. 59 | NO. 4 contributed articles metrics to enhance the accuracy of phisticated combinations of bio- R.P.W. Is independence good for combining classifiers? In Proceedings of the 15th International Conference current unimodal biometrics-based metrics; for example, mainstream on Pattern Recognition (Barcelona, Spain, Sept. 3–7). authentication on mobile devices; consumer mobile devices lack IEEE Computer Society Press, 2000, 168–171. 14. Lee, C. Automatic recognition of animal vocalizations moreover, according to how quickly sensors capable of reliably acquir- using averaged MFCC and linear discriminant analysis. the system is able to identify a le- ing iris and biometrics in Pattern Recognition Letters 27, 2 (Jan. 2006), 93–101. 15. M2SYS Technology. SecuredPass AFIS/ABIS gitimate user, the Proteus approach a consumer-friendly manner. We Immigration and Border Control System; http:// is scalable to consumer mobile de- are thus working on designing and www.m2sys.com/automated-fingerprint- identification-system-afis-border-control-and- vices. This is the first attempt at building a device with efficient, border-protection/ implementing two types of fusion user-friendly, inexpensive soft- 16. Milner, B. and Xu, S. Speech reconstruction from mel- frequency cepstral coefficients using a source-filter schemes on a modern consumer ware and hardware to support such model. In Proceedings of the INTERSPEECH Conference (Denver, CO, Sept. 16–20). International Speech mobile device while tackling the combinations. We plan to inte- Communication Association, Baixas, France, 2002. practical issues of user friendliness. grate new biometrics into our cur- 17. Nasrollahi, K. and Moeslund, T.B. Face-quality assessment system in video sequences. In It is also just the beginning. We are rent fusion schemes, develop new, Proceedings of the Workshop on Biometrics and working on improving the perfor- more robust fusion schemes, and Identity Management (Roskilde, Denmark, May 7–9). Springer, 2008, 10–18. mance and efficiency of both fusion design user interfaces allowing the 18. Parala, A. UAE Airports get multimodal security. schemes, and the road ahead prom- seamless, simultaneous capture of FindBiometrics Global Identity Management (Mar. 13, 2015); http://findbiometrics.com/uae-airports-get- ises endless opportunity. multiple biometrics. Combining a multimodal-security-23132/ user-friendly interface with robust 19. Rathgeb, C. and Andreas U. A survey on biometric cryptosystems and cancelable biometrics. EURASIP Conclusion multimodal fusion algorithms may Journal on (Dec. 2011), 1–25. Multimodal biometrics is the next well mark a new era in consumer 20. Ross, A. and Govindarajan, R. Feature-level fusion of hand and face biometrics. In Proceedings of the logical step in biometric authentica- mobile device authentication. Conference on Biometric Technology for Human tion for consumer-level mobile de- Identification (Orlando, FL). International Society for Optics and Photonics, Bellingham , WA, 2005, vices. The challenge remains in mak- References 196–204. 1. Aronowitz, H., Min L., Toledo-Ronen, O., Harary, S., ing multimodal biometrics usable for 21. Ross, A. and Jain, A. Multimodal biometrics: An Geva, A., Ben-David, S., Rendel, A., Hoory, R., Ratha, N., overview. In Proceedings of the 12th European Signal consumers of mainstream mobile de- Pankanti, S., and Nahamoo, D. Multimodal biometrics Processing Conference (Sept. 6–10). IEEE Computer for mobile authentication. In Proceedings of the 2014 Society Press, 2004, 1221–1224. vices, but little work has sought to add IEEE International Joint Conference on Biometrics 22. Sacco, A. Fingerprint faceoff: Apple TouchID vs. multimodal biometrics to them. Our (Clearwater, FL, Sept. 29–Oct. 2). IEEE Computer Samsung Finger Scanner. Chief Information Officer Society Press, 2014, 1–8. (July 16, 2014); http://www.cio.com/article/2454883/ work is the first step in that direction. 2. Avila, C.S., Casanova, J.G., Ballesteros, F., Garcia, consumer-technology/fingerprint-faceoffapple-touch- Imagine a mobile device you can L.R.T., Gomez, M.F.A., and Sierra, D.S. State of the id-vs-samsung-finger-scanner.html Art of Mobile Biometrics, Liveness and Non-Coercion 23. Tapellini, D.S. Phone thefts rose to 3.1 million last unlock through combinations of face, Detection. Personalized Centralized Authentication year. Consumer Reports finds industry solution falls voice, fingerprints, ears, irises, and System Project, Jan. 31, 2014; https://www.pcas- short, while legislative efforts to curb theft continue. project.eu/images/Deliverables/PCAS-D3.1.pdf Consumer Reports (May 28, 2014); http://www. . It reads all these biometrics 3. Belhumeur, P.N., Hespanha, J.P., and Kriegman, D. consumerreports.org/cro/news/2014/04/smart- in one step similar to the iPhone’s Eigenfaces vector vs. FisherFaces: Recognition using phone-thefts-rose-to-3-1-million-last-year/index.htm class-specific linear projection.Pattern Analysis and 24. Viola, P. and Jones, M. Rapid object detection using a TouchID fingerprint system. This Machine Intelligence, IEEE Transactions on Pattern boosted cascade of simple features. In Proceedings of Analysis and Machine Intelligence 19, 7 (July 1997), user-friendly interface utilizes an the IEEE Computer Society Conference on Computer 711–720. Vision and Pattern Recognition (Kauai, HI, Dec. 8–14). underlying robust fusion logic based 4. Bonnington, C. The trouble with Apple’s Touch ID IEEE Computer Society Press, 2001. fingerprint reader.Wired (Dec. 3, 2013); http://www.wired. on biometric sample quality, maxi- 25. Vondrasek, M. and Pollak, P. Methods for speech com/gadgetlab/2013/12/touch-id-issues-and-fixes/ SNR estimation: Evaluation tool and analysis of VAD mizing the device’s chance of cor- 5. Dalal, N. and Triggs, B. Histograms of oriented dependency. Radioengineering 14, 1 (Apr. 2005), 6–11. gradients for human detection. In Proceedings of the rectly identifying its owner. Dirty 26. Zorabedian, J. Samsung Galaxy S5 fingerprint reader IEEE Computer Society Conference on Computer hacked—It’s the iPhone 5S all over again! Naked fingers, poorly illuminated or loud Vision and Pattern Recognition (San Diego, CA, Security (Apr. 17, 2014); https://nakedsecurity.sophos. June 20–25). IEEE Computer Society Press, 2005, com/2014/04/17/samsung-galaxy-s5-fingerprint- settings, and damage to biometric 886–893. hacked-iphone-5s-all-over-again/ 6. Daugman, J.G. Two-dimensional spectral analysis of sensors are no longer showstoppers; cortical receptive field profiles.Vision Research 20, 10 if one biometric fails, others func- (Dec. 1980), 847–856. tion as backups. Hackers must now 7. Devine, R. Face Unlock in Jelly Bean gets a ‘liveness Mikhail I. Gofman ([email protected]) is an check.’ AndroidCentral (June 29, 2012); http://www. assistant professor in the Department of Computer gain access to the many modalities androidcentral.com/face-unlock-jelly-bean-gets- Science at California State University, Fullerton, and liveness-check director of its Center for Cybersecurity. required to unlock the device; be- 8. Duchnowski, P., Hunke, M., Busching, D., Meier, U., and cause these are biometric modali- Waibel, A. Toward movement-invariant automatic lip- Sinjini Mitra ([email protected]) is an assistant ties, they are possessed only by the reading and speech recognition. In Proceedings of the professor of information systems and decision sciences 1995 International Conference on Acoustics, Speech, at California State University, Fullerton. legitimate owner of the device. The and Signal Processing (Detroit, MI, May 9–12). IEEE Computer Society Press, 1995, 109–112. Tsu-Hsiang Kevin Cheng ([email protected]) device also uses cancelable biomet- 9. Hansen, J.H.L. Analysis and compensation of speech is a Ph.D. student at Binghamton University, Binghamton, under stress and noise for environmental robustness ric templates, strong encryption, and NY, and was at California State University, Fullerton, in speech recognition. Speech Communication 20, 1 while doing the research reported in this article. the Trusted Execution Environment (Nov. 1996), 151–173. for securely storing and processing 10. Hsu, D., Kakade, S.M., and Zhang, T. A spectral algorithm for learning hidden Markov models. Journal Nicholas T. Smith ([email protected]) is all biometric data. of Computer and System Sciences 78, 5 (Sept. 2012), a software engineer in the advanced information The Proteus multimodal biomet- 1460–1480. technology department of the Boeing Company, 11. Jain, A.K., Nandakumar, K., and Ross, A. Score Huntington Beach, CA. rics scheme leverages the existing normalization in multimodal biometric systems. Pattern Recognition 38, 12 (Dec. 2005), 2270–2285. capabilities of mobile device hard- 12. Kisku, D.R., Gupta, P., and Sing, J.K. Feature-level ware (such as video recording), but fusion of biometrics cues: Human identification with Doddingtons Caricature. Security Technology (2009), mobile hardware and software are 157–164. Copyright held by authors. not equipped to handle more so- 13. Kuncheva, L.I., Whitaker, C.J., Shipp, C.A., and Duin, Publication rights licensed to ACM. $15.00

APRIL 2016 | VOL. 59 | NO. 4 | COMMUNICATIONS OF THE ACM 65