ICIP 2016 COMPETITION ON MOBILE OCULAR BIOMETRIC RECOGNITION

Ajita Rattani Reza Derakhshani Sashi K. Saripalle† Vikas Gottemukkula†

 University of Missouri- Kansas City, USA †EyeVerify Inc., USA

ABSTRACT conjunctival and episceral vasculature3 [5], and periocular re- gion [6] for personal recognition. Textural descriptors (such With the unprecedented mobile technology revolution, a num- as LBP, LQP and BSIF) and image keypoint and patch de- ber of ocular biometric based personal recognition schemes scriptors (such as SIFT and SURF) have been mostly used ei- have been proposed for mobile use cases. The aim of this ther in a learning or non-learning based framework for iden- competition is to evaluate and compare the performance of tity verification in mobile ocular biometrics [6, 7, 3]. How- mobile ocular biometric recognition schemes in visible light ever, state-of-the-art related to mobile ocular biometric recog- on a large scale database (VISOB Dataset ICIP2016 Chal- nition schemes is nascent. As such, many of the earlier mo- lenge Version) using standard evaluation methods. Four dif- bile ocular biometric recognition algorithms did not have ac- ferent teams from universities across the world participated in ceptable error rates, especially when tested under challenging this competition, submitting five algorithms altogether. The mobile use cases. Further, very few mobile ocular databases, submitted algorithms applied different texture analysis in a such as MICHE [1] and VSSIRIS [2], have been publicly learning or a non-learning based framework for ocular recog- available for research and development. Moreover, the rel- nition. The best results were obtained by a team from Norwe- ative low number of subjects in the aforesaid datasets limits gian Biometrics Laboratory (NTNU, Norway), achieving an statistical power of the ensuing calculations. Equal Error Rate of 0.06% over a quarantined test set. Thus, to facilitate advancement of research in the field of Index Terms— Mobile Biometrics, Ocular Biometrics, mobile ocular biometrics in visible wavelength: VISOB Dataset ICIP2016 Challenge Version, Visible Spec- trum, Eye Image Classification • We collected a large scale publicly available Visi- ble Light Mobile Ocular Biometric dataset (VISOB Dataset ICIP2016 Challenge Version) [8] comprising 1. INTRODUCTION of eye images captured from 550 subjects using front facing (selfie) cameras of three different mobile de- With increasing functionality and services accessible via mo- vices, namely model N1 (13 MP, autofocus), bile phones, the industry has turned its focus to the integration Samsung Galaxy Note 4 (3.7 MP, fixed focus) and of biometric technologies in mobile phones as a convenient iPhone 5s(1.2 MP, fixed focus). This dataset presents method of verifying the identity of a person accessing mobile possible intra-class variations due to the nature of mo- services. The use of biometric techniques on mobile devices bile front facing cameras and everyday mobile biomet- has been referred to as mobile biometrics [1, 2, 3], which ric use cases, such as out-of-focus images, occlusions encompasses the sensors that acquire biometric signals, and due to prescription glasses, different illumination con- 1 software algorithms for their verification . ditions, gaze deviations, eye-makeup (i.e., eye liner and 2 According to Acuity Market Intelligence forecast , mo- mascara), specular reflections, and motion blur. bile biometric revenue is expected to surpass 33 billion dol- lars by 2020, not just for unlocking the device but to approve • Further, we conducted an international competition on payments and as a part of multi-factor authentication services. VISOB Dataset ICIP2016 Challenge Version for large Consequently, recent research has been focused on develop- scale evaluation of the mobile ocular recognition al- ing biometric recognition schemes tailored for mobile envi- gorithms by different research groups from around the ronment. world. The competition evaluated the performance of In this context, mobile ocular biometrics has gained in- submitted algorithms over a quarantined portion of the creased attention from research community [4]. It comprises dataset that was not available to the participants. of scanning regions in the eye and those around it i.e., iris, 3These conjunctival and episcleral vascular patterns seen on the white of 1The term recognition and verification has been used interchangeably. the eye, or sclera, have sometimes been mistakenly ascribed to sclera itself, 2http://www.acuity-mi.com/GBMR Report.php which is avascular.

‹,((( ,&,3 This competition, besides benchmarking the performance of submissions over VISOB Dataset ICIP2016 Challenge Ver- sion, fosters independent validation of the algorithms and fu- ture research and development by the academic community. Fig. 1. Sample eye images from VISOB Dataset ICIP2016 Chal- Four universities and an industry participant submitted lenge Version [8] containing variations such as (a) light and (b) dark five algorithms to this competition. The participants include irides, (c) reflection, and (d) imaging artifact. Norwegian Biometrics Laboratory, Norwegian University of Science and Technology (NTNU), Norway; Australian National University (ANU), Australia; Indian Institute of Table 1. Characteristics of the enrollment and validation sets of 1 2 Information Technology Guwahati (IIITG), India and IBM Visit and of VISOB Dataset ICIP2016 Challenge Version used by the participants and organizers, respectively. Research India, and an anonymous team (anonymized per participant’s request). Mobile Device Enrollment Set Validation Set This paper is organized as follows: In section 2, we briefly (# of images) (# of images) review the database and evaluation protocol used for the com- VISIT 1 3 iPhone 14077 13208 petition. Section briefly describes all the participating al- Oppo 21976 21349 gorithms. We discuss the consolidated results in section 4. Samsung 12197 12240 Conclusions are drawn in section 5. VISIT 2 iPhone 12222 11740 Oppo 10438 9857 2. DATABASE AND PROTOCOL Samsung 9284 9548

2.1. VISOB Dataset ICIP2016 Challenge Version:

Visible light mobile Ocular Biometric (VISOB) Dataset used the data belonging to Visit 2 (captured at least 2 weeks ICIP2016 Challenge Version [8] is a publicly available after Visit 1 collection) from about 290 subjects with 12 sam- database consisting of eye images from 550 healthy adult ples per subject, for the evaluation of the executables, submit- volunteers acquired using three different i.e., ted by the participants. In our evaluation, Session 1 of Visit 2 iPhone 5s, Samsung Note 4 and Oppo N1. The iPhone was was used for enrollment and its Session 2 was used for perfor- set to capture bursts of still images at 720p resolution, while mance evaluation. Table 1 shows the total number of images the the Samsung and Oppo devices were capturing bursts of in the VISOB Dataset ICIP2016 Challenge Version subsets still images at 1080p resolution using binning. Vol- used by participants (Visit 1) and by the organizers (Visit 2) unteers’ data were collected during two visits (Visit 1 and for enrollment and evaluation of the submitted algorithms. Visit 2), 2 to 4 weeks apart. At each visit, volunteers were The performance evaluation was done using a standard asked to take selfie like captures using front facing cameras biometric evaluation metric i.e., Equal Error Rate (EER) of the aforementioned three mobile devices in two different which is the operating point at which false acceptance rate sessions (Session 1 and Session 2) that were about 10 to 15 (FAR) is equal to false rejection rate (FRR). minutes apart. The volunteers used the mobile phones nat- Next, we briefly discuss the algorithms submitted by the urally, holding the devices 8 to 12 inches from their faces. participants. For each session, a number of images were captured under three lighting conditions: regular office light, dim light (of- 3. SUMMARY OF PARTICIPANTS’ ALGORITHMS fice lights off but dim ambient lighting still present), and natural daylight settings (next to large sunlit windows). The 3.1. Norwegian Biometrics Laboratory, NTNU, Norway collected database was preprocessed to crop and retain only the eye regions of size 240 × 160 pixels using a Viola-Jones Norwegian Biometrics Laboratory submitted two different al- based eye detector. Figure 1 shows sample eye images from gorithms, henceforth referred to as NTNU-1 and NTNU-2,as VISOB Dataset ICIP2016 Challenge Version [8] exhibiting follows: variations such as light and dark irides, reflection, make-up 1 and imaging artifacts. 1. NTNU- [9]: is a scheme for periocular recognition based on deep neural networks trained using regular- ized stacked autoencoders [9]. Feature extraction was 2.2. Protocol done using Maximum Response (MR) based texture The Visit 1 subset of the dataset, containing the corresponding features [10] extracted by computing the response to Session 1 and Session 2,(550 subjects with about 12 samples the MR filter bank that comprised of 38 filters. These per subject), was made available to the participants. Partic- 38 filters include Gaussian, Laplacian of Gaussian, and ipants were instructed to use Session 1 for training (enroll- Edge and Oriented filters at six different orientations. ment) and Session 2 for validation of their algorithms. We A deep network was formed by coupling all four en-

 coders along with the softmax layer. Similarity scores Table 2. Textural features utilized by the participants’ algorithms were generated by extracting MR based texture features in a learning or a non-learning framework. from the test samples and classifying them using the trained deep neural network. Participant Textural Features Learning based Non-learning classifier based classifier 2. NTNU-2 [11]: The proposed scheme employs deep NTNU-1 [9] MR filters Deep neural - learning models for periocular recognition [11]. Deep network NTNU-2 [11] Deep sparse Least square - Sparse Filtering [2] was applied to learn filters that of- filters regression fer the convenience of unsupervised learning while not ANU Dense SIFT Feed forward - trying to model any distributions explicitly. Further, the neural network two layers of deep sparse filters were adapted where IIITG [13] SURF Multinomial - naive bayes layer 1 was optimized using l1 norm and layer 2 using Anonymous SIFT, SURF, - Chi-square l2 norm. The response from each of the 256 filters in MLBP, PHOG distance layer 2 formed a large feature vector. In order to avoid the computation overhead, the feature vector was fur- ther processed using histogram of each of the pooled Phase Histogram of Oriented Gradients (PHOG) [16], Scale filter responses to form a final feature vector of size Invariant Feature Transform (SIFT) [17], and Speeded Up 8192. These features were computed for enrollment Robust Features (SURF) [14]. Chi-square based distance and verification images, and classified using l2 norm as metric was used to compute distance score between his- discussed in [12]. tograms extracted from each pair of eye images using these four descriptors. The match scores corresponding to the four 3.2. Australian National University (ANU), Australia texture descriptors were combined via weighted sum rule. A scheme for ocular recognition was developed based on dense SIFT descriptors.Further, a mechanism based on Im- 4. DISCUSSION proved Fisher Vector (FV) technique was employed to trans- form local descriptors into fixed sized vectors followed by Table 2 shows that the five submitted algorithms are all based mapping into a common space using a dedicated vocabu- on textural representation of the eye images. It can be seen lary and dimensionality reduction by Principal Component that the majority of the submitted algorithms employed strong Analysis (PCA). The dimensionality reduced FVs were used learning based techniques such as deep neural networks. This to train a fully connected feed forward neural network with is in contrast to one of the teams (anonymized on request) that three layers. The final classification was performed using a used chi-square distance between histograms extracted from Nearest Neighbour classifier based on the features extracted pairs of eye images for the classification. by the Feed-forward Neural Network. Figure 2 shows the bar graph of the participants’ algo- rithms ranked on the basis of consolidated Equal Error Rate (EER). It can be seen that NTNU-1 [9] outperforms all other 3.3. Indian Institute of Information Technology Guwa- algorithms by obtaining an overall EER of 0.06% averaged hati (IIITG), India over each mobile device and lighting condition. Table 3 A supervised two-stage learning based solution to ocular bio- shows the EER of the submitted algorithms per mobile de- metrics was proposed [13]. In the first stage, a Multinomial vice and lighting condition (daylight, office, and dim light). Naive Bayes classifier was trained on clusters of vectors ex- No specific trend was observed in the performance of the tracted from local eye regions using Speeded-Up Robust Fea- algorithms for a specific device and lighting condition. tures (SURF) [14]. To further improve the performance, a Table 4 shows the performance of all the algorithms when pyramid-up topology was introduced for the second phase enrollment set consisted of images acquired in the office light- where only the top k% of the feature pairs were used for ing and validation set consisted of images acquired in the day- matching. The value of k was set to 5 based on empirical light (referred to as O-Day in Table 4) or dim light condi- evidence. Dense SIFT algorithm and Nearest Neighbor rule tions (referred to as O-Dim in Table 4). Comparisons were were used at the second stage for classification. RANSAC made with the case when both the enrollment and validation algorithm was utilized to eliminate outliers. sets consisted of images acquired under normal office lighting condition (referred to as O-O). It can be seen that performance 1 3.4. Anonymous Participant of all the algorithms degraded, but NTNU- [9] showed the least degradation across lighting conditions. This illustrates The proposed algorithm is based on the weighted combination the robustness of MR texture features and deep neural nets of four local descriptors for encoding local image textures, used by NTNU-1 [9], despite lighting variations in the enroll- namely: Multiscale Local Binary Patterns (MLBP) [15], ment and validation sets. NTNU-2 [11], ANU, IIITG [13] and

 Table 3. Equal error rate (EER) of the submitted algorithms per mobile device and lighting condition (daylight, office, and dim light). EER [%] Ranked Participants Daylight Office Dim light iPhone Oppo Samsung iPhone Oppo Samsung iPhone Oppo Samsung NTNU-1 [9] 0.06 0.10 0.07 0.06 0.04 0.05 0.06 0.07 0.07 NTNU-2 [11] 0.40 0.43 0.33 0.48 0.63 0.49 0.45 0.16 0.16 ANU 7.67 7.91 8.42 10.36 16.01 9.10 8.44 9.02 11.89 IIITG [13] 18.98 18.12 15.98 19.29 19.79 18.65 17.54 19.49 23.25 Anonymous 38.09 38.29 62.23 35.26 31.69 72.84 31.06 34.00 67.20

Table 4. Equal error rate (EER) of the submitted algorithms across lighting conditions and different mobile devices. The enrollment set consist of images acquired under normal office light and the validation set contains images from dim and day light environments. These conditions are referred to as O-Day and O-Dim, respectively. Comparative analysis was made with the case when both the enrollment and validation sets contained images acquired under normal office light (referred to as O-O) per mobile device.

EER [%] Ranked Participants iPhone Oppo Samsung O-O O-Day O-Dim O-O O-Day O-Dim O-O O-Day O-Dim NTNU-1 [9] 0.06 0.13 0.20 0.04 0.10 0.09 0.05 0.13 0.10 NTNU-2 [11] 0.48 1.82 1.45 0.63 1.90 3.34 0.49 2.5 4.25 ANU 10.36 11.03 16.64 16.01 14.75 18.24 9.10 13.69 19.57 IIITG [13] 19.29 32.93 45.34 19.79 38.24 42.59 18.65 34.29 40.21 Anonymous 35.26 28.67 42.29 31.69 31.21 37.17 27.73 24.33 50.74

the Anonymous algorithms mostly showed high performance Dataset ICIP2016 Challenge Version with the one obtained by degradation when the enrollment set was acquired in office us (organizers) on Visit 2. Our results on Visit 2 were close light and validation set was acquired in dim light (O-Dim), to the one reported by the participants on Visit 1 for all the as opposed to the case when validation set was acquired in learning-based algorithms (i.e., NTNU-1 [9], NTNU-2 [11], daylight (O-Day). On the contrary, NTNU-1 [9] performed ANU and IIITG [13]), further validating the reported accu- better when enrollment set was acquired in office light and racy levels. However, the anonymous non-learning based sub- validation set was acquired in day light (O-Day) with Oppo mission showed large variations in the performance between and Samsung devices. Visit 1 and Visit 2 of the VISOB Dataset ICIP2016 Challenge Version.

5. CONCLUSION

To the best of our knowledge, this is the first large-scale com- petition to compare visible light mobile ocular and perioc- ular biometric recognition schemes on a publicly available database. From the results, we clearly see that texture based analysis, when coupled with appropriate learning-based clas- sifiers, can achieve significantly high verification accuracy not only for different mobile devices but also across varying lighting conditions in the acquistion set-up. A possible future investigation would be to perform cross-device performance evaluation of the algorithms. The database could be further Fig. 2. The bar graph showing the ranking of the submitted algo- expanded to include other mobile devices, uncontrolled out- rithms on the basis of the consolidated EER. NTNU-1 [9] outper- forms all the other algorithms by obtaining an EER of 0.06% aver- door conditions, and spoof attacks. aged over each device and lighting condition. Acknowledgement: The authors would like to thank EyeV- erify4 for sponsoring the VISOB Dataset ICIP2016 Challenge Version collection at University of Missouri-Kansas City. Finally, we compared the performance of the submitted algorithms as reported by the participants on Visit 1 of VISOB 4www.eyeverify.com

 6. REFERENCES [12] L. Zhang, M. Yang, and X. Feng, “Sparse representa- tion or colloborative representation: Which helps face [1] M. D. Marsico, M. Nappi, D. Riccio, and H. Wechsler, recognition?,” in IEEE Intl Conf. on Computer Vision “Mobile iris challenge evaluation (miche)-i, biometric (ICCV), 2011, pp. 471–478. iris dataset and protocols,” Pattern Recognition Letters, vol. 57, pp. 17–23, 2015. [13] K. Ahuja, A. Bose, S. Nagar, K. Dey, and F. Barbhuiya, “ISURE: User authentication in mobile devices using [2] K. B. Raja, R. Raghavendra, V. K. Vemuri, and ocular biometrics in visible spectrum,” in IEEE Intl C. Busch, “ based visible iris recognition Conf. on Image Processing (ICIP) 2016, Challenge Ses- using deep sparse filtering,” Pattern Recognition Let- sion on Mobile Ocular Biometric Recognition, 2016. ters, , no. 0, pp. –, 2014. [14] H. Bay, A. Ess, T. Tuytelaars, and L. V. Gool, “Surf: [3] V. Gottemukkula, S. K. Saripalle, P. Tankasala, and Speeded up robust features,” Computer Vision and Im- R. Derakhshani, “Methods for using visible ocular vas- age Understanding (CVIU), vol. 110, no. 3, pp. 346– culature for mobile biometrics,” IET Biometrics, vol. 5, 359, 2008. pp. 3 – 12, 2016. [15] T. Ojala, M. Pietikinen, and T. Menp, “Multiresolu- [4] I. Nigam, M. Vatsa, and R. Singh, “Ocular biometrics: tion gray-scale and rotation invariant texture classifica- A survey of modalities and fusion approaches,” Infor- tion with local binary patterns,” IEEE Trans. on Pattern mation Fusion, vol. 26, no. 0, pp. 1 – 35, 2015. Analysis and Machine Intelligence, vol. 24, no. 7, pp. 971–987, 2002. [5] R. Derakhshani, A. Ross, and S. Crihalmeanu, “A new biometric modality based on conjunctival vasculature,” [16] N. Dalal and B. Triggs, “Histograms of oriented gra- in Proc. of Artificial Neural Networks in Engineering, dients for human detection,” in Proc. of Intl. Conf. St. Louis, MO, 2006. on Computer Vision & Pattern Recognition, June 2005, vol. 2, pp. 886–893. [6] U. Park, A. Ross, and A.K. Jain, “Periocular biometrics in the visible spectrum: A feasibility study,” in IEEE [17] D. G. Lowe, “Object recognition from local scale- 3rd Intl Conf. on Biometrics: Theory, Applications, and invariant features,” in Proc. of 7th Intl Conf. on Com- Systems, Sept 2009, pp. 1–6. puter Vision (ICCV’99), Corfu, Greece, 1999, pp. 1150– 1157. [7] A. Das, U. Pal, M.A. Ballester, and M. Blumenstein, “A new efficient and adaptive sclera recognition system,” in IEEE Symp. on Computational Intelligence in Biomet- rics and Identity Management (CIBIM), Dec 2014, pp. 1–8.

[8] “VISOB Dataset ICIP2016 Challenge Version,” http://r.web.umkc.edu/rattania/VISOB/ index.html.

[9] R. Raghavendra and C. Busch, “Learning deeply cou- pled autoencodes for smartphone based robust periocu- lar verification,” in IEEE Intl Conf. on Image Processing (ICIP) 2016, Challenge Session on Mobile Ocular Bio- metric Recognition, 2016.

[10] M. Varma and A. Zisserman, “A statistical approach to texture classification from single images,” Intl Journal of Computer Vision, vol. 62, no. 1-2, pp. 61–81, 2005.

[11] K. B. Raja, R. Raghavendra, and C. Busch, “Collabo- rative representation of deep sparse filtered features for robust verification of smartphone periocular images,” in IEEE Intl Conf. on Image Processing (ICIP) 2016, Chal- lenge Session on Mobile Ocular Biometric Recognition, 2016.