INCREASED USE OF AVAILABLE IMAGE DATA DECREASES ERRORS IN IRIS

A Dissertation

Submitted to the Graduate School of the University of Notre Dame in Partial Fulfillment of the Requirements for the Degree of

Doctor of Philosophy

by

Karen P. Hollingsworth

Kevin W. Bowyer, Co-Director

Patrick J. Flynn, Co-Director

Graduate Program in Computer Science and Engineering

Notre Dame, Indiana July 2010 c Copyright by Karen P. Hollingsworth 2010 All Rights Reserved INCREASED USE OF AVAILABLE IMAGE DATA

DECREASES ERRORS IN IRIS BIOMETRICS

Abstract by Karen P. Hollingsworth

Iris biometrics is used in a number of different applications, such as frequent flyer programs, identification of prisoners, and border control in the . However, governments interested in using iris biometrics have still found difficulties using it on large populations. Further improvements in are required in order to enable this technology to be used in more settings. In this dissertation, we describe three methods of reducing error rates for iris biometrics. We define and employ a metric called the fragile bit distance which uses the locations of less stable bits in an iris template to improve performance. We also investigate signal fusion of multiple frames in an iris video to achieve better recognition performance than is possible using single still images. Third, we present a study of what features are useful for identification in the periocular region. Periocular biometrics is still an emerging field of research, but we antici- pate that fusing periocular information with iris information will result in a more robust biometric system. A final contribution of this work is a study of how iris biometrics performs on twins. Our experiments confirm prior claims that iris biometrics is capable of differentiating between twins. However, we additionally show that there is Karen P. Hollingsworth texture information in the iris that is not encoded by traditional iris biometrics systems. Our experiments suggest that human examination of pairs of iris images for forensic purposes may be feasible. Our results also suggest that development of different approaches to automated iris image analysis may be useful. CONTENTS

FIGURES...... vi

TABLES ...... xi

ACKNOWLEDGMENTS ...... xii

CHAPTER 1: INTRODUCTION ...... 1

CHAPTER2:BACKGROUND...... 5 2.1 Performance of Biometric Systems ...... 5 2.1.1 Verification ...... 5 2.1.2 Identification ...... 8 2.2 EyeAnatomy ...... 9 2.3 Early Research in Iris Biometrics ...... 11 2.4 Recent Research in Iris Biometrics ...... 18 2.4.1 Image Acquisition, Restoration, and Quality Assessment . 19 2.4.1.1 ImageAcquisition ...... 19 2.4.1.2 ImageRestoration ...... 21 2.4.1.3 ImageQuality ...... 22 2.4.2 ImageCompression ...... 26 2.4.3 Segmentation ...... 27 2.4.3.1 ActiveContours ...... 27 2.4.3.2 Alternatives to Active Contours ...... 28 2.4.3.3 Eyelid and Eyelash Detection ...... 30 2.4.3.4 Segmenting Iris Images with Non-frontal Gaze . . . . 31 2.4.4 FeatureExtraction ...... 33 2.4.5 Improvements in Matching ...... 34 2.4.6 Searching Large Biometrics Databases ...... 35 2.4.7 Applications...... 36 2.4.7.1 Cryptographic Applications ...... 36 2.4.7.2 Identity Cards in the U.K...... 39

ii 2.4.8 Evaluation...... 40 2.4.9 Performance under Varying Conditions ...... 41 2.4.10 Multibiometrics ...... 43

CHAPTER 3: FRAGILE BIT COINCIDENCE ...... 46 3.1 Motivation...... 46 3.2 RelatedWork ...... 50 3.2.1 Research on Fusing Hamming Distance with Added Infor- mation...... 50 3.2.2 ResearchonFragileBits ...... 52 3.3 Data...... 54 3.4 FragileBitDistance(FBD) ...... 57 3.5 Score Distributions for Hamming Distance and Fragile Bit Distance 59 3.6 Fusing Fragile Bit Distance with Hamming Distance ...... 60 3.7 Tests of Statistical Significance ...... 68 3.8 Effect of Modifying the Fragile Bit Masking Threshold ...... 70 3.9 Discussion ...... 76

CHAPTER 4: AVERAGE IMAGES ...... 77 4.1 Motivation...... 77 4.2 RelatedWork ...... 79 4.2.1 Video ...... 79 4.2.2 StillImages ...... 80 4.3 Data...... 81 4.4 AverageImagesandTemplates ...... 82 4.4.1 Selecting Frames and Preprocessing ...... 82 4.4.2 SignalFusion ...... 85 4.4.3 Creating an Iris Code Template ...... 88 4.5 Comparison of Median and Mean for Signal Fusion ...... 90 4.6 How Many Frames Should be Fused in an Average Image? . . . . 92 4.7 How Much Masking Should be Used in an Average Image? . . . . 95 4.8 ComparisontoOtherMethods...... 96 4.8.1 Comparison to Previous Multi-gallery Methods ...... 97 4.8.2 Comparison to Previous Log-Likelihood Method ...... 101 4.8.3 Comparing to Large Multi-Gallery, Multi-Probe Methods . 103 4.8.4 ComputationTime ...... 105 4.9 Discussion ...... 110

CHAPTER 5: IRIS BIOMETRICS ON TWINS ...... 111 5.1 Motivation...... 111 5.2 RelatedWork ...... 113 5.3 Data...... 116

iii 5.3.1 FrameSelection ...... 117 5.3.2 Segmentation ...... 118 5.4 Biometric Performance on Twins’ Irises ...... 119 5.5 Similarities in Twins’ Irises Detected by Humans ...... 122 5.5.1 ExperimentalSetup...... 122 5.5.2 Results...... 126 5.5.2.1 Can Humans Identify Twins from Iris Texture Alone? 126 5.5.2.2 Can Humans Identify Twins from Periocular Informa- tionAlone?...... 126 5.5.2.3 Did Humans Score Higher on Queries where They Felt MoreCertain? ...... 127 5.5.2.4 Is It Easier to Identify Twin Pairs Using Iris Data or PeriocularData?...... 127 5.5.2.5 Did Subjects Score Better on the Second Half of the IrisTestthantheFirstHalf? ...... 128 5.5.2.6 Did Subjects Score Better on the Second Half of the Periocular Test than the First Half? ...... 129 5.5.2.7 Which Image Pairs Were Most Frequently Classified Correctly, and Which Pairs Were Most Frequently Clas- sifiedIncorrectly? ...... 130 5.5.2.8 Is It More Difficult to Label Twins as Twins than It Is to Label Unrelated People as Unrelated? ...... 132 5.6 Discussion ...... 134

CHAPTER 6: PERIOCULAR BIOMETRICS ...... 136 6.1 Motivation...... 136 6.2 RelatedWork ...... 138 6.3 Data...... 140 6.4 ExperimentalMethod...... 142 6.5 Results...... 144 6.5.1 How Well Can Humans Determine whether Two Periocular Images Are fromthe Same Person or Not? ...... 144 6.5.2 Did Humans Score Higher when They Felt More Certain? . 144 6.5.3 Did Testers Do Better on the Second Half of the Test than theFirstHalf? ...... 145 6.5.4 Which Features Are Correlated with Correct Responses? . 145 6.5.5 Which Features Are Correlated with Incorrect Responses? 147 6.5.6 What Additional Information Did Testers Provide? . . . . 147 6.5.7 Which Pairs Were Most Frequently Classified Correctly, and Which Pairs Were Most Frequently Classified Incorrectly? 149 6.6 Discussion ...... 151

iv CHAPTER7:CONCLUSIONS ...... 155

BIBLIOGRAPHY ...... 157

v FIGURES

2.1 In a biometric system, the number of false accepts and the number of false rejects are related to the chosen decision criteria (Figure modeled after [27])...... 7 2.2 Image 05495d15 from Notre Dame Dataset. Elements seen in a typical iris image are labeled here...... 10 2.3 Commercial iris cameras use near-infrared illumination so that the illumination is unintrusive to humans, and so that the texture of heavily pigmented irises can be imaged more effectively. This graph shows the spectrum of wavelengths emitted by the LEDs on an LG 2200 iris camera. This camera uses wavelengths primarily between 700 and 900 nanometers. The spectral characteristics were cap- tured using spectrophotometric equipment made available by Prof. Douglas Hall of the University of Notre Dame...... 13 2.4 Melanin pigment absorbs much of visible light, but reflects more of the longer wavelengths of light (Picture reprinted from [23], data from [71])...... 14 2.5 Major steps in iris biometrics processing. (Picture reprinted from [16] with permission from Elsevier.) ...... 17 2.6 Kang and Park [68] and He et al. [48] use information about cam- era optics and position of the subject to estimate a point spread function and restore blurry images to in-focus images. Above is an example of (a) a blurry iris image and (b) an in-focus image of the samesubject...... 23 2.7 Belcher and Du’s quality measure [7] combines information about occlusion, dilation, and texture. Above is an example of (a) a heavily occluded iris image, and (b) a less occluded image of the samesubject...... 24 2.8 As iris biometrics is used for larger and more varied applications, it will have to deal with irises with various different conditions. This image shows an unusual iris (Subject 05931) with filaments of tissue extending into the pupil...... 42

vi 2.9 MBGC data included near infrared iris videos captured with a Sarnoff Iris on the Move portal, shown above. Video of a sub- ject is captured as a user walks through the portal. This type of acquisition is less constrained than traditional iris cameras, how- ever, the quality of the iris images acquired is poorer. It is possible to acquire both face and iris information using this type of portal. (Picture reprinted from [16] with permission from Elsevier.) . . . 45

3.1 Example images from our data set. These images were captured usinganLG4000iriscamera...... 48 3.2 These are the fragile bit patterns (imaginary part) corresponding to the images in Figure 3.1. Black pixels are bits masked for fragility. We use 4800-bit iris codes and mask 25% of the bits (or 1200 bits) for fragility. Some of the bits are masked for occlusion, and so slightly less than 1200 bits are masked for fragility...... 49 3.3 These are comparisons of fragile bit patterns, each obtained by ANDing two fragile bit masks together. For example, Figure 3.3(a) is the comparison mask obtained by combining Figure 3.2(a) and 3.2(b). Black pixels show where the two masks agreed. Blue pixels show where they disagreed. White pixels were unmasked for both iris codes. There is more agreement in same-subject comparisons than there is when comparing masks of different subjects...... 49 3.4 Images in our data set were captured using this LG4000 iris cam- era [76]...... 55 3.5 The LG4000 iris camera captures images of both eyes at the same time...... 56 3.6 Score distributions for fragile bit distance...... 61 3.7 Score distributions for Hamming distance...... 62 3.8 Joint score distributions for Hamming distance and FBD. Genuine scores are shown in blue. Impostor scores are shown in red. . . . 63 3.9 A zoomed-in view of the joint score distributions for Hamming dis- tance and FBD. Genuine scores are shown in blue. Impostor scores are shown in red. Each point represents at least 0.003% of the comparisons...... 64 3.10 We fused FBD and HD using the expression, α × HD + (1 − α) × FBD. We found that an α value of 0.6 yielded the lowest equal errorrate...... 66 3.11 Fusing Hamming distance with FBD performs better than using Hamming distance alone. Fusing by multiplying and fusing by weighted averaging yield similar results...... 67

vii 3.12 We considered the effect of masking only 5% or 10% of the bits in the iris code for fragility. Using these values, we compared the performance of (1) Hamming distance (HD) with performance of (2) fusing HD and FBD with a weighted average (0.6HD × 0.4FBD). At these low levels of fragile bit masking, the difference between HD and the fusion is small. The ROC curves for the two methods overlap...... 73 3.13 We considered the effect of masking 15% or 20% of the bits in the iris code for fragility. Again, we compared the performance of (1) Hamming distance (HD) with performance of (2) fusing HD and FBD with a weighted average. At these levels of fragile bit masking, the fusion clearly does better than HD alone...... 74 3.14 We considered the effect of masking 25% or 30% of the bits in the iris code for fragility. At these levels of fragile bit masking, the fusion shows an even greater performance benefit over HD alone than there was at lower levels of fragile bit masking...... 75

4.1 The Iridian LG EOU 2200 camera used in acquiring iris video se- quences...... 83 4.2 The frames shown in (a) and (c) were selected by our frame-selection algorithm because the frames were in focus; however, these frames do not include much valid iris data. In our automated experiments presented in this paper we kept frames like (a) and (c) so that we could show how our software performed without any manual quality checking. In our semi-automated experiments we manually replaced frames like (a) and (c) with better frames from the same video like (b) and (d). We expect that in the future, we may be able to develop an algorithm to detect blinks and off-angle images so that such frames could be automatically rejected...... 86 4.3 Our automated experiments contain a few incorrect segmentations like the one shown in (a). In our semi-automated experiments we manually replaced incorrect segmentations to obtain results like thatshownin(b)...... 87 4.4 Our automated software did not correctly detect the eyelid in all frames. In our semi-automated experiments we manually replaced incorrect eyelid detections to obtain results like that shown in (b). 87 4.5 From the ten original images on the top, we created the average imageshownonthebottom...... 89 4.6 Using a mean fusion rule for fusing iris images produces better iris recognition performance than using a median fusion rule. Graph (a) shows this result using automated segmentation. Graph (b) shows the same result using the manually corrected segmentations. 91

viii 4.7 Fusing ten frames together yields better recognition performance thanfusingfour,six,oreightframes...... 93 4.8 Too much masking decreases the degrees of freedom in the non- match distribution, causing an increased false accept rate. (This graph shows the trend from the automatically segmented images. The manually corrected segmentation produces the same trend.) . 97 4.9 The amount of masking used to create average images affects per- formance. When using the manually corrected segmentation, we can use a smaller masking level (masking level = 60%). With the automated segmentation, a higher masking level (masking level = 80%) mitigates the impact of missed eyelid detections...... 98 4.10 The proposed signal-fusion method has better performance than using a multi-gallery approach with either an “average” or “mini- mum”score-fusionrule...... 100 4.11 Signal fusion and log-likelihood score fusion methods perform com- parably. The log-likelihood method performs better at operating points with a large false accept rate. The proposed signal-fusion method has better performance at operating points with a small falseacceptrate...... 104 4.12 The MGMP-minimum achieves the best recognition performance of all of the methods considered in this paper. However, the signal- fusion performs well, while taking only 1/N th of the storage and 1/N 2 ofthematchingtime...... 106 4.13 Even though a large multi-gallery, multi-probe experiment achieves better recognition performance, it comes at a cost of much slower execution time. The proposed signal fusion method is the fastest method presented in this paper, and it achieves better recognition performance than previously published multi-gallery methods. . . 109

5.1 Images of the left eyes of two identical twins. Notice the similarities in overall iris texture, and also the similarities in the appearance of theperiocularregion...... 118 5.2 Images of irises from identical twins. We segmented the images so that our testers would only see the iris, and therefore they could not use periocular features to help them decide whether two irises werefromtwins...... 119 5.3 Images of irises from unrelated people...... 120 5.4 A histogram of Hamming distance scores between twins looks sim- ilar to a histogram of Hamming distance scores between non-twins. 121

ix 5.5 We wanted to know whether humans could identify twins based on periocular information. We created images where the iris was blacked-out so that our testers would be forced to use periocular features to make a judgment. This is an example pair of images. These images are from identical twins...... 124 5.6 All 28 testers correctly classified this pair of images as being from identicaltwins...... 130 5.7 All 28 testers correctly classified this pair of images as being from identicaltwins...... 130 5.8 All 28 testers correctly classified this pair of images as being from unrelatedpeople...... 131 5.9 All 28 testers correctly classified this pair of images as being from unrelatedpeople...... 131 5.10 Twenty-five of 28 people incorrectly guessed that these images were from unrelated people. In fact these irises are from identical twins. The difference in dilation makes this pair particularly difficult to classifycorrectly...... 132 5.11 Twenty-four of 28 people incorrectly guessed that these images were from twins, when in fact, these irises are from unrelated people. The smoothness of the texture makes this pair difficult to classify correctly...... 133

6.1 Eyelashes were considered the most helpful feature for making de- cisions about identity. The tear duct and shape of the eye were also veryhelpful...... 146 6.2 We compared the rankings for the features from correct responses (Fig. 6.1) with the rankings from incorrect responses. The shape of the eye and the outer corner of the eye were both used more frequently on incorrect responses than on correct responses. This result suggests that those two features would be less helpful than other features such as eyelashes...... 148 6.3 All 25 testers correctly classified these two images as being from thesameperson...... 149 6.4 All 25 testers correctly classified these two images as being from differentpeople ...... 150 6.5 Eleven of 25 people incorrectly guessed that these images were from different people, when in fact, these eyes are from the same person. This pair is challenging because one eye is much more open than theother...... 150 6.6 Eleven of 25 people incorrectly guessed that these images were from the same person, when in fact, they are from two different people. 151

x TABLES

3.1 AVERAGE FBD FOR GENUINE AND IMPOSTOR COMPAR- ISONS...... 60 3.2 FUSING FBD WITH HAMMING DISTANCE ...... 68 3.3 IS 0.6HD + 0.4FBD BETTER THAN HD ALONE? ...... 69 3.4 IS HD × FBD BETTER THAN HD ALONE? ...... 69 3.5 IS αHD +(1−α)FBD STATISTICALLY SIGNIFICANTLY DIF- FERENT FROM 0.6HD +0.4FBD?...... 71

4.1 SIGNAL-FUSION COMPARED TO PREVIOUS METHODS . . 101 4.2 SIGNAL-FUSION COMPARED TO LOG-LIKELIHOOD SCORE FUSION...... 103 4.3 SIGNAL-FUSION COMPARED TO MULTI-GALLERY, MULTI- PROBESCOREFUSION ...... 107 4.4 PROCESSING TIMES FOR DIFFERENT METHODS ...... 108

5.1 DEMOGRAPHIC INFORMATION OF SUBJECTS ...... 123

6.1 PERIOCULAR RESEARCH ...... 141

xi ACKNOWLEDGMENTS

This dissertation would have been an impossible task without the assistance of my dedicated and supportive advisors. Dr. Bowyer and Dr. Flynn have spent countless hours in teaching, guiding, proofreading my papers, and encouraging me in my research.

I also thank my husband, Nathaniel, for listening to my daily reports of my progress, for staying up late with me when I have had deadlines to meet, and for being my number one fan. I am also grateful for the generous support of our sponsors: the National

Science Foundation under grant CNS01-30839, the Federal Bureau of Investiga- tion, the Central Intelligence Agency, the Intelligence Advanced Research Projects Activity, the Biometrics Task Force, and the Technical Support Working Group through US Army contract W91CRB-08-C-0093. The opinions, findings, and con- clusions or recommendations expressed here are my own and do not necessarily reflect the views of these sponsors.

xii CHAPTER 1

INTRODUCTION

According to the International Standards Organization, biometrics is the “au- tomated recognition of individuals based on their behavioral and biological char- acteristics” [61]. Examples of biometric characteristics include fingerprints, face, voice, and iris. A number of different commercial and governmental groups use biometrics. The largest current commercial user of biometrics is Walt Disney World [46]. Disney World takes fingerprints of guests as they enter the park, and keeps a record of the fingerprints associated with each ticket, to ensure that mul- tiday passes are not resold [46]. One government agency that uses biometrics is the U.S. Department of Homeland Security. This department employs biometrics in its US-VISIT program [116]. At ports of entry, the US-VISIT program takes photographs and digital fingerprints of international travelers holding non-U.S. passports or visas. These biometric characteristics are (1) compared to a watch- list of known or suspected terrorists and criminals, (2) compared to a database of previous US-VISIT users to ensure that a person does not enter the U.S. using two different identities, and (3) compared to the images of the person who first obtained the visa or passport to make sure that the document belongs to the person presenting it and not to an impostor [115]. The applications listed above employ fingerprint and face recognition. Iris recognition has also been successfully deployed in some settings. Between 2002

1 and 2005, the UNHCR (United Nations High Commissioner for Refugees) used iris recognition in a repatriation program for Afghan refugees. The UN provided cash assistance to the refugees, but each applicant was required to provide an iris image so that the UN could detect anyone who was trying to seek assistance more than once [58]. In another type of application, jails use iris recognition to identify prisoners. Repeat offenders may try to give false information about their identity to avoid detection by other law enforcement agencies. By using iris recognition, the officers can determine if the person has been in the jail before [59]. Additionally, the use of iris biometrics ensures that inmates do not impersonate other prisoners to get released early [62]. In some airports in the U.K., Germany, the , United States, and Canada, travelers enrolled in an frequent flyer program can have their irises scanned to bypass lines at immigration control or security checkpoints. [24, 49]. Irises are purported to be as unique as fingerprints and as stable over time. However, iris biometrics systems have not been used for as many years as fin- gerprints, nor have they been tested in as many different settings. A 2005 test conducted by the United Kingdom passport service (UKPS) tried to enroll ten thousand people into their database but only ninety percent of able-bodied users and sixty percent of disabled users succeeded in providing an iris image that passed the system’s quality checks [42]. Some of these failures could be remedied with better trained operators and iris cameras that easily adjusted to wheel-chair height. Other failures might require improvements in the iris imaging, feature ex- traction, and recognition technology. Some current research in iris biometrics aims to extend the performance of iris biometrics to less constrained image acquisition environments, and to broader groups of people.

2 Current research also aims to make iris recognition possible on larger databases. The U.S. Federal Bureau of Investigation plans to spend one billion dollars in the next ten years to create a database of biometric characteristics that includes fin- gerprints, palm prints, and eye scans [2]. As iris databases increase in size, iris biometrics algorithms must improve in accuracy and matching speed. Large iris biometrics applications require smaller error rates in order for the technology to be used on such large scales. This dissertation presents my research in decreasing error rates in iris recog- nition algorithms and improving the applicability of iris biometrics to broader applications. In Chapter 2, I give a survey of iris biometrics research. In Chap- ter 3, I present a method of improving performance by looking at the coincidence of fragile bits in two iris codes. Chapter 4 discusses how to get improved perfor- mance by using videos instead of still images of irises; I extract multiple frames from video and then average intensity values from different frames to get an im- proved iris image.

In Chapter 5, I present an experiment showing that iris biometrics can dis- tinguish between identical twins. I also show that traditional biometrics systems only encode part of the texture information apparent in iris images, and that this additional information could possibly be used in forensic applications to show genetic relationships between different eye images. Chapter 5 does not focus on the error rates in the system, but instead focuses on applying iris recognition to broader applications. Iris biometrics could be used in even more applications if we could capture images of the iris from a farther distance or from less-cooperative subjects. One possibly strategy could be to capture an image that included portions of the

3 face in addition to the iris. Potentially, information from the periocular region could be combined with iris information to create a system more robust and more accurate than iris biometrics alone. In Chapter 6, I investigate what features in the periocular region could be most helpful for identification by asking human subjects to identify people based on periocular information. Chapter 7 provides concluding remarks and suggestions for future research.

4 CHAPTER 2

BACKGROUND

This chapter provides background information on iris biometrics and a review of related research. First, it introduces basic terminology used in evaluating the performance of biometric systems. Second, it provides an explanation of some parts of the eye. Third, a typical iris recognition algorithm is explained. Finally, some recent research in iris biometrics is highlighted. This chapter includes con- tent that has been published previously in my masters thesis [50]. In addition, some content from this chapter is reprinted, with permission, from one of my prior papers published in Computer Vision and Image Understanding [16] ( c 2008, El- sevier).

2.1 Performance of Biometric Systems

Biometrics can be used in at least two different types of applications: verifi- cation scenarios and identification scenarios. The next two subsections describe these scenarios.

2.1.1 Verification

In a verification scenario, a person claims a particular identity and the biomet- ric system is used to verify or reject the claim. Verification is done by matching a

5 biometric sample acquired at the time of the claim against the sample previously enrolled for the claimed identity. If the two samples match well enough, the iden- tity claim is verified, and if the two samples do not match well enough, the claim is rejected. Thus there are four possible outcomes. A true accept (TA) occurs when the system accepts, or verifies, an identity claim, and the claim is true. A false accept (FA) occurs when the system accepts an identity claim, but the claim is not true. A true reject (TR) occurs when the system rejects an identity claim and the claim is false. A false reject (FR) occurs when the system rejects an identity claim, but the claim is true. The two types of errors that can be made are a false accept and a false reject.

The number of false accepts and the number of false rejects are dependent on the decision criteria for the system. In a biometrics system, comparisons between two samples are assigned a score related to the difference between the two samples. Figure 2.1 depicts notional genuine and impostor score distributions and related quantities. The distribution of scores for genuine comparisons is imperfectly sepa- rated from the distribution of scores for impostor comparisons. The system must decide on a decision threshold such that all scores below the threshold will be deemed genuine. Impostor comparisons with scores below this threshold are false accepts; genuine comparisons with scores above the threshold are false rejects.

Performance for the system across a range of decision thresholds can be sum- marized in a receiver operating characteristic (ROC) curve. Each point on the ROC curve represents one possible decision threshold. The curve plots the true accept rate on the Y axis and the false accept rate on the X axis, or, alternatively, the false reject rate on the Y axis and the false accept rate on the X axis. The true accept rate is the number of true accepts divided by the total number of true

6 0.08 Genuine Decision Threshold 0.07 Impostor

0.06

0.05

0.04

0.03 Probability Density 0.02 False Rejects False Accepts 0.01

0 0 0.1 0.2 0.3 0.4 0.5 0.6 Hamming Distance

Figure 2.1: In a biometric system, the number of false accepts and the number of false rejects are related to the chosen decision criteria (Figure modeled after [27]).

7 claims: T A T AR = . (2.1) (T A + F R)

The false accept rate is the number of false accepts divided by the total number of false claims: F A F AR = . (2.2) (F A + T R)

The false reject rate is

F R F RR =1 − T AR = . (2.3) T A + F R

The equal-error rate (EER) is a single number often quoted from the ROC curve. The EER is where the false accept rate equals the false reject rate.

2.1.2 Identification

In an identification scenario, a biometric sample is acquired without any asso- ciated identity claim. The closed-set identification task is to identify the unknown sample as matching one of a set of previously enrolled known samples. The open- set identification task is to either identify the unknown sample or to determine that the unknown sample does not match any of the known samples. The set of enrolled samples is often called a gallery, and the unknown sample is often called a probe. The probe is matched against all of the entries in the gallery, and the closest match, assuming it is close enough, is used to identify the unknown sample. Similar to the verification scenario, there are four possible outcomes. A true positive occurs when the system says that an unknown sample matches a particular person in the gallery and the match is correct. A false positive occurs when the system says that an unknown sample matches a particular person in

8 the gallery and the match is not correct. A true negative occurs when the system says that the sample does not match any of the entries in the gallery, and the sample in fact does not. A false negative occurs when the system says that the sample does not match any of the entries in the gallery, but the sample in fact does belong to someone in the gallery. Performance in an identification scenario is often summarized in a cumulative match characteristic (CMC) curve. The CMC curve plots the percent of probes correctly recognized on the Y axis and the cu- mulative rank considered as a correct match on the X axis. For a cumulative rank of 2, if the correct match occurs for either the first-ranked or the second-ranked entry in the gallery, then it is considered as correct recognition, and so on. The rank-one-recognition rate is a single number often quoted from the CMC curve.

2.2 Eye Anatomy

Many different physical characteristics can be used in a biometrics system.

This work focuses on iris biometrics. The iris is the “colored ring of tissue around the pupil through which light...enters the interior of the eye.” [89] The iris’s function is to control the amount of light entering the eye. Two muscles in the iris, the dilator and the sphincter muscles, control the size of the pupil, and therefore, the amount of light passing through the pupil. Figure 2.2 shows an example image acquired by an LG 2200 commercial iris biometrics system at the University of Notre Dame. The sclera, a white region of connective tissue and blood vessels, surrounds the iris. A clear covering called the cornea covers the iris and the pupil. The pupil region generally appears darker than the iris. However, the pupil may have specular highlights, and cataracts can lighten the pupil. The iris typically has a rich pattern of furrows, ridges, and pigment spots. The surface of the iris

9 is composed of two regions, the central pupillary zone and the outer ciliary zone. The collarette is the border between these two regions.

Eyelashes Eyelid Sclera Pupil Pupillary Boundary Limbic Boundary Specular Highlight Pupillary Zone Collarette Ciliary Zone

Figure 2.2: Image 05495d15 from Notre Dame Dataset. Elements seen in a typical iris image are labeled here.

The minute details of the iris texture are believed to be determined randomly in utero. They are also believed to be different between persons and between the left and right eye of the same person [31]. The color of the iris can change as the amount of pigment in the iris increases during childhood. Some research asserts that the texture is relatively constant [28], but other research has detected lower match scores between images taken multiple years apart [3].

10 2.3 Early Research in Iris Biometrics

The idea of using the iris as a biometric is over 100 years old [10]. However, the idea of automating iris recognition is more recent. In 1987, Flom and Safir obtained a patent for an unimplemented conceptual design of an automated iris biometrics system [41]. Johnston [64] published a report in 1992 on an inves- tigation of the feasibility of iris biometrics conducted at Los Alamos National Laboratory after the issuance of Flom and Safir’s patent. Iris images were ac- quired for 650 persons, and acquired again after a 15-month interval. The pattern of an individual iris was observed to be unchanged over the 15 months. The com- plexity of an iris image, including specular highlights and reflections, was noted. The report concluded that iris biometrics held potential for both verification and identification scenarios, but no experimental results were presented. The most important work to date in iris biometrics is that of Daugman. Daug- man’s 1994 patent [26] and early publications (e.g., [25]) described an operational iris recognition system in some detail. Iris biometrics as a field has developed with the concepts in Daugman’s approach becoming a standard reference model. Also, due to the Flom and Safir patent and the Daugman patent being held for some time by the same company, nearly all existing commercial iris biometric technology is based on Daugman’s work. Daugman’s patent stated that “the system acquires through a video camera a digitized image of an eye of the human to be identified.” A 2004 paper [28] said that image acquisition should use near-infrared illumination so that the il- lumination could be controlled, yet remain unintrusive to humans (Figure 2.3). Near-infrared illumination also helps reveal the detailed structure of heavily pig- mented (dark) irises. Melanin pigment absorbs much of visible light, but reflects

11 more of the longer wavelengths of light [23] (Figure 2.4). Systems built on Daugman’s concepts require subjects to cooperatively posi- tion their eye within the camera’s field of view. The system assesses the focus of the image in real time by looking at the power in the middle and upper frequency bands of the 2-D Fourier spectrum. The algorithm seeks to maximize this spectral power by adjusting the focus of the system, or giving the subject audio feedback to adjust their position in front of the camera. More detail on the focusing procedure is given in the appendix of [28]. Given an image of the eye, the next step is to find the part of the image that corresponds to the iris. Daugman’s early work approximated the pupillary and limbic boundaries of the eye as circles. Thus, a boundary could be described with three parameters: the radius r, and the coordinates of the center of the circle, x0 and y0. He proposed an integro-differential operator for detecting the iris boundary by searching the parameter space. His operator is

∂ I(x, y) max(r, x0,y0) G (r) ∗ ds (2.4) σ I ∂r r,x0,y0 2πr

where Gσ(r) is a smoothing function and I(x, y) is the image of the eye. All early research in iris segmentation assumed that the iris had a circular boundary. However, often the pupillary and limbic boundaries are not perfectly circular. Recently, Daugman has studied alternative segmentation techniques to better model the iris boundaries [29]. Even when the inner and outer boundaries of the iris are found, some of the iris still may be occluded by eyelids or eyelashes. After isolating the iris region, the next step is to describe the features of the iris in a way that facilitates comparison of irises. The first difficulty lies in the fact that not all images of an iris are the same size. The distance from the camera

12 Spectral response of LG2200 illuminant 60000

50000

40000

30000

Spectrophotometer response 20000

10000

0 650 700 750 800 850 900 950 1000 1050 Wavelength (nm)

Figure 2.3: Commercial iris cameras use near-infrared illumination so that the illumination is unintrusive to humans, and so that the texture of heavily pig- mented irises can be imaged more effectively. This graph shows the spectrum of wavelengths emitted by the LEDs on an LG 2200 iris camera. This camera uses wavelengths primarily between 700 and 900 nanometers. The spectral character- istics were captured using spectrophotometric equipment made available by Prof. Douglas Hall of the University of Notre Dame.

13 Figure 2.4: Melanin pigment absorbs much of visible light, but reflects more of the longer wavelengths of light (Picture reprinted from [23], data from [71]).

14 affects the size of the iris in the image. Also, changes in illumination can cause the iris to dilate or contract. These problems were addressed by mapping the extracted iris region into a normalized coordinate system. To accomplish this normalization, every location on the iris image was defined by two coordinates, (i) an angle θ between 0 and 360 degrees, and (ii) a radial coordinate r that ranges between 0 and 1 regardless of the overall size of the image. This normalization assumes that the iris compresses or stretches linearly in the radial direction when the pupil dilates or contracts, respectively. A paper by Wyatt [122] explained that this assumption is a good approximation, but it does not perfectly match the actual deformation of an iris under dilation or constriction.

The normalized iris image can be displayed as a rectangular image, with the radial coordinate on the vertical axis, and the angular coordinate on the horizontal axis. The left side of the normalized image marks 0 degrees on the iris image, and the right side marks 360 degrees. The division between 0 and 360 degrees is somewhat arbitrary, because a simple tilt of the head can affect the angular coordinate. Daugman accounts for this rotation later, in the matching technique. Directly comparing the pixel intensity of two different iris images could be prone to error because of differences in lighting between two different images. Daugman uses convolution with 2-dimensional Gabor filters to extract the texture from the normalized iris image. In his system, the filters are “multiplied by the raw image pixel data and integrated over their domain of support to generate coefficients which describe, extract, and encode image texture information.” [26] After the texture in the image is analyzed and represented, it is matched against the stored representation of other irises. If iris recognition were to be implemented on a large scale, the comparison between two images would have

15 to be very fast. Thus, Daugman chose to quantize each filter’s phase response into a pair of bits in the texture representation. Each complex coefficient was transformed into a two-bit code: the first bit was equal to 1 if the real part of the coefficient was positive, and the second bit was equal to 1 if the imaginary part of the coefficient was positive. Thus after analyzing the texture of the image using the Gabor filters, the information from the iris image was summarized in a 256 byte (2048 bit) binary code. The resulting binary “iris codes” can be compared efficiently using bitwise operations.1 Daugman uses a metric called the fractional Hamming distance, which mea- sures the fraction of bits for which two iris codes disagree.2 A low fractional

Hamming distance implies strong similarity of the iris codes. If parts of the irises are occluded, the fractional Hamming distance is the fraction of bits that dis- agree in the areas that are not occluded on either image. To account for rotation, comparison between a pair of images involves computing the fractional Hamming distance for several different orientations that correspond to circular permuta- tions of the code in the angular coordinate. The minimum computed fractional Hamming distance is assumed to correspond to the correct alignment of the two images. An iris biometrics system following Daugman’s general approach could be de- scribed in four basic steps: (1) acquisition, (2) segmentation, (3) texture analysis, and (4) matching. These basic modules are depicted in Figure 2.5. The goal of image acquisition is to acquire an image that has sufficient quality to support

1The term, “iris code” was used by Daugman in his 1993 paper. I use this term to refer to any binary representation of iris texture that is similar to Daugman’s representation.

2The Hamming distance is the number of bits that disagree. The fractional Hamming distance is the fraction of bits that disagree. Since fractional Hamming distance is used so frequently, many papers simply mention “Hamming distance” when referring to the fractional Hamming distance. I also follow this trend in subsequent sections of this work.

16 reliable biometrics processing. The goal of segmentation is to isolate the region that represents the iris. The goal of texture analysis is to derive a representation of the iris texture that can be used to match two irises. The goal of matching is to evaluate the similarity of two iris representations. The distinctive essence of Daugman’s approach lies in conceiving the representation of the iris texture to be a binary code obtained by quantizing the phase response of a texture filter.

This representation has several inherent advantages. Among these are the speed of matching through the fractional Hamming distance, easy handling of rotation of the iris, and an interpretation of the matching as the result of a statistical test of independence [25].

Figure 2.5: Major steps in iris biometrics processing. (Picture reprinted from [16] with permission from Elsevier.)

17 Wildes [120] described an iris biometrics system developed at Sarnoff Labs that uses a very different technical approach from that of Daugman. Whereas

Daugman’s system acquired the image using an LED-based point light source, the Wildes system used a diffuse light source. When localizing the iris boundary, Daugman’s approach looked for a maximum in an integro-differential operator that responds to circular boundary. By contrast, Wildes’ approach involved computing a binary edge map followed by a Hough transform to detect circles. The Hough transform considers a set of edge points and finds the circle that best fits the most edge points. When matching two irises, Daugman’s approach computed the fractional Hamming distance between iris codes, whereas Wildes’ method applied a Laplacian of Gaussian filter at multiple scales to produce a template and computed correlation as a similarity measure. Wildes’ work [120] demonstrated that multiple distinct technical approaches exist for each of the main modules of an iris biometrics system.

2.4 Recent Research in Iris Biometrics

A comprehensive review of iris biometrics research up to the year 2007 is given in [16]. However, there has been a large amount of research published in the

field since that time. A search for IEEE papers with the word “iris” in the title during the time period of 2008-2010 yields 290 papers3, and a similar search in Compendex yields about 700. In order to keep up with the latest state-of-the-art in iris biometrics, I summarized here many of the recent iris biometrics studies. I focused first on searching for IEEE journal articles on iris biometrics before looking at conference papers. In reading this somewhat lengthy review, readers may read

3Search run May 26, 2010

18 the sections most pertinent to their field of study, or read the start of each section then read as much or as little of the details as they desire. This section describes papers in the general field of iris biometrics. In later chapters, I mention some additional papers more directly related to my research (see sections 3.2, 4.2, 5.2, and 6.2).

2.4.1 Image Acquisition, Restoration, and Quality Assessment

2.4.1.1 Image Acquisition

Two recently published papers discussed image acquisition. He et al. [47] talked about how to make an iris camera at a cheaper cost than the commercially available cameras. Boyce et al. [17] discussed how iris recognition performed at different wavelengths of light. The next two paragraphs give details of these papers. Iris biometrics research requires high-quality imaging. Commercial iris cam- eras are expensive. He et al. [47] designed their own iris camera that would be cheaper than commercial alternatives while still acquiring high-quality images. They decided to use a CCD camera because CCD cameras produce images of superior quality than those produced by CMOS cameras. They bought a CCD sensor with resolution of 0.48 million pixels, and added an optical glass lens, custom-designed by an optical manufacturer. The lens had a fixed focus at 250 mm. They added NIR-pass filters that transmit wavelengths between 700 and 900 nm. The illumination unit consisted of NIR LEDs of 800nm wavelength, which they arranged in such a way to try to minimize specular reflections on the iris.

An LCD screen provided feedback for users. The screen displayed an image of the captured scene inside a view square, and users were asked to position their

19 iris inside the square. Additionally, the screen reported information on position and focus of each frame. Finally, the camera could be manually angled to capture images from people between 1.5 and 1.8 meters tall. Most iris biometrics is done using near infrared light, but there has been some research into the performance of iris biometrics using images taken with visible light. Boyce et al. [17] conducted experiments with multiple wavelengths of light.

To acquire multispectral information, they used a Redlake MS3100 multispectral camera which contains three CCDs and three band-pass prisms behind the lens to simultaneously capture four different wavelength bands. In this way, they acquired data from blue, green, red, and near-IR wavelengths. They captured 5 samples each from 24 subjects with varying colors of irises. They tested the effect of adaptive histogram equalization on the iris images, and reported recognition performance on an ROC curve. They reported performance from 8 different trials: the original IR, red, green, and blue channels, and the histogram-equalized IR, red, green, and blue channels. The blue channel showed improved performance after histogram equalization. The histogram equalization did not substantially affect the other channels. Next they tried matching iris images across multiple channels. They compared IR vs R, IR vs G, IR vs B, R vs G, R vs B, and G vs B. They found that cross-channel iris recognition worked better when the difference in wavelengths was smaller (an unsurprising, but previously unresearched idea). For example, recognition performance when comparing IR probe images to a gallery of blue wavelength images was poor, but performance when comparing IR to red was good. Next, they clustered pixels by their RGB value and their L*a*b* value.

They showed example images where the skin and eyelash pixels were assigned to different clusters than the iris pixels because of the different colors between the

20 iris and surrounding areas. They concluded that clustering pixels by color could potentially help in segmentation. Their final experiment involved score-level fusion of match scores from multiple spectral channels. The performance for the blue channel was improved by fusion with other channels. A fusion of IR, R, and G gave the highest Genuine Accept Rate at a FAR of 0.1%.

2.4.1.2 Image Restoration

Two papers discussed restoring a blurry iris image by estimating the proper point spread function (PSF). The ability to restore a blurry iris image to focus by estimating the PSF could increase the useable depth of field of an iris camera without requiring extra hardware. A third paper evaluates the depths of field possible when using wavefront coded imaging.

Kang and Park [68] aimed to restore blurry probe images in real time. In an initial offline step, they used information about the camera optics to determine an equation for estimating parameters of the PSF. Those parameters would be a function of the blurriness of the captured image. During online operation, they first had to estimate the actual focus of the captured image. They deinterlaced the image, found an approximate segmentation using a circular edge detector, and removed eyelashes by detecting windows of the image with high standard deviation. Then they could run a focus assessment on the iris region only. Once they knew the image focus, they used their equation to get the PSF and restore the image to its proper focus. Using focus-restored probe images instead of blurry probe images increased their equal error rate (EER) from 0.49% to 0.37%. They consequently increased their camera’s operable depth of field from 22 mm to 50 mm.

21 He et al. [48] estimated the user distance from the camera in order to get the proper PSF for image restoration. First, they measured the distance between two specular highlights on the iris. Using this information, plus knowledge about the positions of the two infrared LEDs, they could get the user’s distance from the camera without using special hardware like a distance sensor. The knowledge of the distance from the camera was used in estimating the PSF. Like Kang and

Park [68], they use the constrained least squares fit for restoration. Boddeti and Kumar [12] investigated the use of wavefront-coded imagery on iris recognition. This topic had been discussed in the literature before, but Boddeti and Kumar used a larger data set and presented experiments evaulating how different parts of the recognition pipeline (e.g. segmentation, feature extraction) are affected by wavefront coding. They proposed using unrestored image outputs from the wavefront-coded camera directly, and tested this idea using two different recognition algorithms. The authors concluded that wavefront coding could help increase the depth of field of an iris recognition system by a factor of four, and that the recognition performance on unrestored images was only slightly worse than the performance on restored images. Figure 2.6 shows examples of blurry and in-focus iris images.

2.4.1.3 Image Quality

A recent trend in iris image quality research is to combine a number of individ- ual quality factors to create an overall quality score. Belcher and Du [7, 8] combine percent occlusion, percent dilation, and “feature information”. An example of an occluded image is shown in Figure 2.7. To compute “feature information”, they calculate the relative entropy of the iris texture when compared with a uniform

22 (a)Image04336d692 (b)Image04336d695

Figure 2.6: Kang and Park [68] and He et al. [48] use information about camera optics and position of the subject to estimate a point spread function and restore blurry images to in-focus images. Above is an example of (a) a blurry iris image and (b) an in-focus image of the same subject. distribution [8]. To fuse the three types of information into a single quality score, Belcher and Du first computed an exponential function of occlusion and an expo- nential function of dilation. The final quality score was the product of the three measures. Kalka et al. [67] also use percent occlusion in their quality metric, but the other quality factors that they consider are different. In addition to occlusion, they consider defocus, motion blur, gaze deviation, amount of specular reflection on the iris, lighting variation on the iris, and total pixel count on the iris. They measure defocus using the 8x8 convolution kernel proposed by Daugman to assess high- frequency content in the image; however, they test for focus only in the bottom half of the iris region. To estimate motion blur, they first find the dominant angle of blur using a directional filter in the Fourier space; then they estimate the magnitude of the blur in that direction. They use the circularity of the pupil as a measure of gaze direction; for a range of pitch and yaw angles, they use a projective transformation to rotate the off-angle image into a frontal view image, then test

23 (a)Image04202d1064 (b)Image04202d1069

Figure 2.7: Belcher and Du’s quality measure [7] combines information about occlusion, dilation, and texture. Above is an example of (a) a heavily occluded iris image, and (b) a less occluded image of the same subject. the circularity of the pupil in the transformed image using Daugman’s integro- differential operator. They keep the pitch and yaw angles that maximize the operator. To combine the individual quality factors, Kalka et al. use Demspter- Shafer theory [108] with Murphy’s Rule [87]. In evaluating various data sets,

Kalka et al. found that the ICE data had more defocused images, the WVU data had more lighting variation, and the CASIA data had more occlusion than the other sets. Schmid and Nicolo [106] suggest a method of analyzing the quality of an entire database. The authors compare the capacity of a recognition system to the ca- pacity of a communication channel. Recognition channal capacity can be thought of as the maximum number of classes that can be successfully recognized. This capacity can also be used as a measure of overall quality of data in a database.

The authors evaluate the empirical recognition capacity of biometrics systems that use PCA and ICA. They apply their method to four iris databases and two face databases. They find that the BATH iris database has a relatively high sample signal-to-noise ratio, followed by CASIA-III, then ICE 2005. WVU had the lowest

24 signal-to-noise ratio. Another way of considering quality is to evaluate the quality of typical im- ages given by individual users. In many biometrics systems, a few users tend to be responsible for a disproportionate amount of errors in the system. This phe- nomenon was first noted by Doddington et al. [34]. Users for which the system performed well were labeled sheep. Goats was the label for users who were difficult to recognize, and thus responsible for a large number of false rejects in the system. Lambs were users who were particularly easy to imitate, and thus responsible for a large number of false accepts. Wolves were users who were particularly successful at imitating others, and therefore they, like lambs, were responsible for a large number of false accepts. A drawback to this original classification system is that it does not describe relationships between a user’s genuine and impostor scores. Wolf-like and lamb- like behavior is evaluated by looking at impostor scores only. Goat-like behavior is evaluated by looking at genuine comparisons only. Another attribute of this original classification system is that the animals are not distinct. Users who ex- hibit lamb-like behavior often exhibit wolf-like behavior as well. Yager and Dun- stone [123] define four new user types based on both the impostor and genuine scores for a user. Doves are the best users in a biometric system, matching well against themselves, and poorly against others. Chameleons match well against themselves, and against others. They rarely cause false rejects, but are likely to cause false accepts. Phantoms match poorly against themselves, and against others. They are likely to cause false rejects. Worms match poorly against them- selves and well against others. Therefore, they cause false rejects when they try to authenticate as themselves, and they cause false accepts when they try to au-

25 thenticate as others. Yager and Dunstone [123] tested for the existence or absence of these four animals in a number of different biometric databases, using a number of biometric algroithms. Each of the animal types was present in some of the experiments and absent in others. The authors note that “The reasons that a particular animal group exist are complex and varied. They depend on a number of factors, including enrollment procedures, feature extraction and matching algorithms, data quality, and intrinsic properties of the user population” [123]. Their analysis also leads the authors to assert that people are rarely “inherently hard to match”. Instead, they suggest that matching errors are more likely due to enrollment issues and algorithmic weaknesses rather than intrinsic properties of the users.

2.4.2 Image Compression

There are advantages to storing iris images instead of iris templates: raw images would improve interoperability between systems and also provide a record for any investigations of failures in the system. Unfortunately, images can occupy a significant amount of storage. Therefore, a number of papers have investigated the impact of compression on iris recognition, and they seem to concur that some JPEG2000 compression can be applied without significant impact on performance. Rakshit and Monro [99] proposed to store unwrapped iris images rather than the original iris images. They recommended using an unwrapped image size of

80 by 512, although they found that they could subsample down to 32 by 342 and still maintain “acceptable system performance”. They found that JPEG2000 compression at 0.5 bpp (bits per pixel) actually improved performance because the compression removed noise. Error curves were “acceptable” at rates down to

26 0.3 bpp, but performance degraded rapidly at lower rates. Daugman and Downing [32] chose to compress and store the original images rather than the unwrapped images. Their argument stated that “polar mappings depend strongly upon the choice of origin of coordinates, which may be prone to error, uncertainty, or inconsistency” [32]. The used the NIST ICE 1 data set for their experiments. To reduce the size of the image, they first automatically detected the iris location and cropped the 640 by 480 image down to 320 by 320. Next they detected the eyelashes, and replaced the eyelashes and eyelids with a uniform gray region. This region-of-interest isolation typically resulted in a two-fold reduction in file size. Finally they used JPEG2000 compression with a compression factor of 50. These methods reduce file size by a factor of 150, while only changing 2 to 3% of the bits in the iris code. ROC curves in the paper showed trade-offs between compression factor and recognition performance.

2.4.3 Segmentation

Segmentation continues to be an active area of research. Correct segmentation is a prerequisite to high biometric performance. A recent trend is to use active contours to find the iris and/or eyelids, and to use ellipses rather than circles to model the pupillary and limbic boundaries. However, active contours is not the only recently proposed method.

2.4.3.1 Active Contours

A paper by Daugman in 2007 [29] explained his use of active contours for

fitting the iris boundaries. First, he calculated the image gradient in the radial direction. He detected occlusions by eyelids and modeled those with separate

27 splines. Then a discrete Fourier series approximation was fit to the image gradient data. In any active contour method, there is a trade-off between how closely the contour fits the data versus the desired constraints on the final shape of the contour. Daugman modeled the pupil boundary with weaker constraints than the iris boundary, because he found that the pupil boundary tended to have stronger gradient data.

Vatsa et al. [118] improved the speed of active contour segmentation by using a two-level hierarchical approach. First, they found an approximate initial pupil boundary. The boundary was modeled as an ellipse with five parameters. The parameters were varied in a search for a boundary with maximum intensity change.

For each possible parameter combination, the algorithm randomly selected 40 points on the elliptical boundary and calculated total intensity change across the boundary. Once the pupil boundary was found, the algorithm searched for the iris boundary in a similar manner, this time selecting 120 points on the boundary for computing intensity change. The approximate iris boundaries were refined using an active contour approach. The active contour was initialized to the approximate pupil boundary and allowed to vary in a narrow band of +/- 5 pixels. In refining the limbic boundary, the contour was allowed to vary in a band of +/- 10 pixels.

2.4.3.2 Alternatives to Active Contours

Ryan et al. [104] presented an alternative fitting algorithm, called the Starburst method, for segmenting the iris. They preprocessed the image using a smoothing filter and a gradient detection filter. Then, they needed to find a pupil location as a starting point for the algorithm. To do so, they set the darkest 5% of the image to black, and all other pixels to white. Then they created a Chamfer image: the

28 darkest pixel in the Chamfer image is the pixel farthest from any white pixel in a thesholded image. They used the darkest point of the Chamfer image as a starting point. Next, they computed the gradient of the image along rays pointing radially away from the start point. The two highest gradient locations were assumed to be points on the pupillary and limbic boundaries. The detected points were used to fit several ellipses using randomly selected subsets of points. An average of the best ellipses was reported as the final boundary. The eyelids were detected using active contours. Pundlik et al. [97] presented another alternative segmentation algorithm that used graphs. Their algorithm was a labeling routine instead of a fitting routine like active contours or the Starburst method. Their first goal was to assign a label - either “eyelash” or “non-eyelash” - to each pixel. After removing specular reflections, they used the gradient covariance matrix to find intensity variation in different directions for each pixel. Then they created a probability map, P, that assigned the probability of each pixel having high texture in its neighborhood.

The “energy” corresponding to a particular labeling of the images was written as a function of a smoothness term and a data term. The data term was based on a texture probability map. They treated the image as a graph where pixels were nodes and neighboring pixels were joined with edges, then they used a minimum graph cuts algorithm to find a labeling that minimized the energy function. The second goal was to assign each pixel one of four labels: eyelash, pupil, iris, or background. They used a method similar to the initial eyelash segmentation; however, this time they used an alpha-beta swap graph-cut algorithm. Finally, they refined their labels using a geometric algorithm to approximate the iris with an ellipse.

29 2.4.3.3 Eyelid and Eyelash Detection

Some papers discussed eyelid detection without making it the primary focus of the paper. For instance, the main focus of Ryan’s paper [104] was the Star- burst segmentation routine, but they additionally used active contours to find the eyelids. In Kang and Park’s image restoration paper [68] described earlier in section 2.4.1.2, they identified eyelids by finding windows of the image with high standard deviation. Pundlik et al. [97] (described in section 2.4.3.2) used a graph cuts algorithm to label pixels as eyelash or non-eyelash. Daugman [29] performed a statistical test to see whether the distribution of the pixels in an iris region was multimodal; for multimodal distributions, he statistically selected an appropriate threshold, and masked all pixels darker than the chosen threshold. Some of these methods ([104]) were fitting methods, and others ([68],[97],[29]) were labeling methods. A 2009 paper by Li and Savvides [77] was focused entirely on occlusion detec- tion. They performed occlusion detection using a probabilistic method: Figueiredo- Jain Gaussian Mixture Models. All occlusion-detection was performed on un- wrapped iris images. They used a single image for training, then assigned each pixel in each test image a label of “occluded” or “unoccluded”. They compared their method to a rule-based segmentation method and a Fisher-Linear Discrim- inant Analysis based method. Additionally, they tried the proposed Gaussian Mixture Models (GMM) using a number of different combinations of feature sets.

They measured occlusion-detection accuracy by comparing their masks to man- ually created masks. They also evaluated their method by creating ROC curves of the recognition results using each segmentation algorithm. They found that no matter which feature set they used, the proposed GMM method outperformed

30 the FLDA and rule-based segmentation methods. The feature set resulting in the most accurate eyelid masks was a set using intensity of each pixel, and the mean and standard deviation of the pixel intensities in a 3x3 window. The fea- ture set resulting in the best recognition performance used response intensity after the image was filtered by a Gaussian filter, response intensity after filtering by a Gabor filter, and the response intensity after filtering by first-order and second- order Haar wavelets. All feature sets also included the (x, y) coordinate of pixel location. Traditionally, occluded regions are masked. However, features near the edges of the occluded regions are also affected because the tail of the Gabor filter overflows onto the occluded regions. Munemoto et al. [86] stated that “it is important to not only exclude the noise region, but also estimate the true texture patterns behind these occlusions. Even though masks are used for comparison of iris features, the features around masks are still affected by noise. This is because the response of filters near the boundary of the mask is affected by the noisy pixels.” Munemoto et al. used the Criminisi image-filling algorithm to estimate the texture behind the occlusions. This algorithm iteratively filled 9x9 patches of the occluded region with 9x9 patches from unoccluded regions. It estimated textures at the boundary of the region first, selecting 9x9 source patches from the unoccluded iris that closely matched the iris texture near the boundary of the area to be filled.

2.4.3.4 Segmenting Iris Images with Non-frontal Gaze

Schuckers et al. [107] tried two different approaches to handle “off-angle” irises. In both approaches, they sought to transform an off-angle image into an equivalent frontal image. The first method sought to determine how far an image deviated

31 from frontal by trying multiple values of pitch and yaw. For each (pitch, yaw) pair, they used bilinear interpolation to transform the image. They found the values of pitch and yaw that resulted in the maximum circularity of the detected pupil. For encoding and matching irises, they use independent component analysis. The second method modeled the relationship between actual 3-D iris points with 2-D projected points. Once that relationship was obtained, the 2-D off-angle image could be transformed into a frontal view image. Biorthogonal wavelets are used for encoding and matching. Schuckers et al. found that results using their two methods were “significantly improved over the iris recognition techniques which do not perform any correction for angle.” The first method showed “good perfor- mance for small angle deviations from training to testing, for example, training with 15 degrees and testing with 0 or 30 degrees. However, there was relatively poor performance when training using 0 degrees and testing using 30 degree im- ages. The probable cause of this is the use of the simple projective transform for large angle deviations.” They concluded that the second method was better.

However, it was unclear why they did not use the same encoding and matching step for both methods. Daugman’s 2007 paper [29] also discussed transforming off-angle iris images to frontal view. He described the shape of the pupil in the image using parametric equations. Using trigonometry and Fourier series expansions of these equations, he estimated the direction of gaze. Then he applied an affine transformation to the off-angle image to obtain an image of the eye apparently looking at the camera.

32 2.4.4 Feature Extraction

The survey paper by Bowyer et al. [16] listed an enormous number of papers which tried alternative methods of feature extraction. Fewer recent papers have focused on this topic. Miyazawa et al. [85] suggested a correlation-based technique,

Bodade and Talbar [11] recommended a Complex Wavelet Transform, and Belcher and Du [9] demonstrated a region-based SIFT approach. The motivation behind Miyazawa’s proposed method [85] was that Daugman- like, feature-based iris recognition algorithms required many parameters. They claimed that their proposed algorithm was easier to train. For each comparison using the proposed method, they took two images and selected a region that was valid (unoccluded) in both images. They took the discrete Fourier Transform of both valid regions, then applied a Phase Only Correlation function (POC).

The POC function involved a difference between the phase components from both images. They used band-limited POC to avoid information from high-frequency noise. The proposed algorithm required only two parameters: one parameter represents the effective horizontal bandwidth for recognition, and the other pa- rameter represents the effective vertical bandwidth. They achieved better results using Phase Only Correlation than using Masek’s 1D log-Gabor algorithm. Bodade and Talbar [11] suggested using a 2D Dual Trace Complex Wavelet Transform (CWT) and a 2D Dual Trace Rotated CWT because these trans- forms (1) provided features in more directions than a Discrete Wavelet Trans- form (DWT), (2) provided shift invariance, and (3) were more computationally economical than Gabor filters. The scale-invariant feature transform method (SIFT) is a method that has been used for object recognition in computer vision. A typical SIFT algorithm

33 has not worked well for iris recognition because many iris structures look similar between different eyes. To counter this difficulty, Belcher and Du [9] proposed a region-based SIFT approach. Belcher and Du cited multiple advantages of using SIFT; the method “does not require highly accurate segmentation, transformation to polar coordinates, or affine transformation”. Their method divided the iris area into three regions: left, right, and bottom. Each region was subdivided into subregions, each containing a potential feature point. After eliminating unstable points, the dominant orientation and feature point description was found using the SIFT approach. When comparing two images, they only compared a feature from a given subregion in the first image with the corresponding subregion in the second image, or with the eight nearest subregions in the second image.

2.4.5 Improvements in Matching

One recent trend in iris biometrics is that of selecting the most reliable fea- tures for matching, and masking less reliable features. One example of this idea is proposed by Ring and Bowyer [101]; they suggest removing local texture dis- tortions from a comparison by disregarding local windows of a match comparison with high fractional Hamming distance. Another example of this idea is fragile bit masking [53] which disregards parts of the iris code that would be less reliable due to the coarse quantization of a complex filter response. In Daugman’s traditional algorithm, a texture filter is applied to an iris image, and the complex filter responses are quantized to two bits. The first bit is a 1 if the real part of the number is positive, and 0 otherwise; similarly, the second bit is a 1 if the imaginary part of the number is positive, and 0 otherwise. Algorithms that follow this pattern produce templates in which not all bits have equal value [53].

34 Specifically, complex filter responses near the axes of the complex plane produce unstable bits in the iris code: a small amount of noise in the iris image can shift that filter response from one quadrant to the adjacent quadrant, causing the corresponding bit in the iris code to flip. This type of bit is defined as “fragile”; that is, there is a substantial probability of it ending up a 0 for some images of the iris and a 1 for other images of the same iris.

Hollingsworth et al. [53] suggested identifying and masking fragile bits using the following strategy. If the complex coefficient had a real part very close to 0, they masked the corresponding real bit in the iris code. If the complex co- efficient had an imaginary part very close to 0, they masked the corresponding imaginary bit. Hollingsworth et al. masked bits corresponding to the 25% of com- plex numbers closest to the axes of the complex plane. Barzegar et al. [6] applied this approach to the CASIAv3 data set and found that using a threshold of 35% worked better than 20% or 30% on that data set. Hollingsworth’s approach [53] predicts which bits in an iris code are frag- ile by looking at the complex filter response, and masking responses with small real parts or small imaginary parts. In contrast, Dozier et al. [37, 38] create a fragile bit mask for a subject from training sets of ten images from that subject. Therefore, Hollingsworth’s method detects axis-fragile bits, while Dozier’s method detects trained fragile bits. These research papers are summarized in more detail in Chapter 3.

2.4.6 Searching Large Biometrics Databases

One challenge with implementing large-scale biometrics applications is that searching large databases can be prohibitively time consuming. Therefore, some

35 researchers are interested in methods of indexing, to reduce the amount of the database that must be searched to add a new entry or search for a match. Parti- tioning methods work well in 2-D and 3-D, but do not work well with iris codes because iris codes are traditionally binary vectors thousands of bits long (e.g. 2048 bits). Partitioning methods suffer from the curse of dimensionality in a 2048-D binary lattice. Clustering methods cannot be applied to this problem because iris codes are almost uniformly distributed on the lattice. A couple of algorithms [22, 43] assume that if two records are similar, then there is a high probability that a segment within the records will match exactly. Unfortunately, these methods often must still search a large portion of the database to be effective.

One possible solution is proposed by Hao et al. [45]. They propose a “multiple collision principle”. They require three segments to match exactly before taking the time to retrieve from disk and compare the entire iris code or record. They call their algorithm a “beacon guided search” (BGS). They report a 300-times speedup over an exhaustive search of 632,500 iris codes, with only a slight drop in performance. The FRR for the exhaustive search was 0.32%; the FRR for the BGS was 0.64%.

2.4.7 Applications

2.4.7.1 Cryptographic Applications

Private cryptographic keys are typically protected with passwords. However passwords can be lost or guessed. To eliminate the possibility of people losing their passwords, a system could protect private keys with biometric data, or use biometric data to generate the private key. This strategy could also eliminate the possibility of an impostor guessing the password to get unauthorized access.

36 However, it does not eliminate the possibility of an impostor stealing biometric information and spoofing the system.

An extensive review of the literature in cryptography and biometrics is beyond the scope of this work, but we mention a few of the well-known papers here. In 1998, Davida et al. [33] evaluated a number of secure biometric identification scenarios. For one scenario, they proposed that a user could authenticate in the following manner. A user’s biometric template would be captured multiple times, and the multiple vectors would be put through a majoring decoding algorithm. The biometric would be corrected further using check digits and error correction. The signature, Sig(Hash(name, attributes, T k C)), is then verified, where Sig(x) denotes the authorization officer’s signature of x, Hash() is a partial information hiding hash function, T is the corrected biometric, and C are the check digits for the biometric. In 1999, Juels and Wattenberg [66] proposed a technique which they called a fuzzy commitment scheme. This technique aimed to recover the original biometric template. If b is a biometric template, and c is a randomly chosen codeword, enrollment consists of storing z = c ⊕ b. During verification, the system obtains a new biometric template b′. The system computes z ⊕ b′ = c ⊕ (b ⊕ b′), and then tries to use error correcting codes to correct [c ⊕ (b ⊕ b′)] thus recovering c.

If the Hamming distance between b and b′ is small, the system recovers c, and consequently can recover b as well. The fuzzy vault biometric cryptosystem was proposed by Juels and Sudan [65] in 2002. In a fuzzy vault, the private key is used to generate a polynomial, pos- sibly by using the key as coefficients of a polynomial. The components of the biometric template are used as x-axis coordinates to generate coordinate pairs of

37 genuine points. Additional false points, called chaff points are randomly gener- ated. Genuine points and chaff points are stored in a vault. During decryption, the valid user presents his biometric data to determine which points in the vault are genuine points. The private key can be retrieved by fitting a polynomial to the genuine points. Dodis et al. [35, 36] formalized the notions of a secure sketch and a fuzzy extractor. A fuzzy extractor “extracts a uniformly random string R from its input w in a noise-tolerant way” [35]. Noise-tolerance means that if the input changes slightly, but is still close to w, the string R can still be reproduced exactly. A secure sketch also allows for precise reproduction of the original input, but does not address uniformity. The above papers discuss cryptography combined with any biometric template. Some papers apply cryptographic ideas specifically for iris biometric applications. Hao et al. [44] developed a fuzzy commitment scheme for iris templates. They used Hadamard and Reed-Solomon error correcting codes to produce a 140-bit cryptographic key from iris biometric data and a tamper-resistant token, such as a smart card. Bringer et al. [18, 19] explained how to estimate the theoretical performance limit of a secure sketch for binary biometric data. They proposed a practical fuzzy commitment scheme for iris biometric templates and tested their technique on two publicly available iris data sets. Lee et al. [75] described a way to build a fuzzy vault using iris biometrics. Despite the above mentioned research in fuzzy cryptography, the combination of biometrics and cryptography is not yet a solved problem. In a recent confer- ence article, Ballard et al. [5] discuss required security requirements for biometric key generation, and demonstrate how three published application schemes fail to

38 meet important requirements. Another paper by Simoens et al. [109] revealed weaknesses in the theoretical constructions themselves. They studied two main properties of biometric template protection – indistinguishability and irreversibil- ity – and found that “some sketches based on linear codes, such as the fuzzy com- mitment scheme of Juels and Wattenberg [66], cannot be securely reused when considering biometric privacy” [109]. Thus, there is still room for more research in this area, and need for more rigorous security analysis.

2.4.7.2 Identity Cards in the U.K.

The United Kingdom has considered using iris biometrics in a national identity card system [119]. In November 2004, the government introduced the Identity Cards Bill. The Identity Card Bill scheme had a number of elements: a centralized database called the National Identity Register (NIR), a number assigned to each U.K. citizen and resident over the age of 16, individual biometrics stored in both the NIR and the card, and the legal obligation for citizens to produce the card in order to obtain some public services. In January 2005, a group of people at the London School of Economics and

Political Science (LSE) started a project, the LSE Identity Project to examine potential impacts and benefits of the Identity Cards Bill [119]. These people were concerned that the politicians had not considered all the challenges and risks asso- ciated with such system. The LSE Identity Project main report, released in June

2005, highlighted some of their concerns. One concern was the proposal for the centralized database (the NIR). Other European nations with identity card sys- tems used federated schemes rather than a centralized database (e.g. Germany) or avoided using a single national identification number to obtain government ser-

39 vices (e.g. France, Hungary, Germany, Czech Republic, and Austria). The LSE also noted that the UK government showed incredible faith in biometrics, despite the fact than most existing biometric studies had very controlled experimental se- tups (e.g. controlled lighting in frequent traveller programs) and limited database sizes. Despite these objections, the Identity Card Bill passed in March 2006. How- ever, the implementation of the bill was delayed, and there was continued consul- tation about the collection of biometrics. By 2008, iris biometrics were dropped from the plans [119], and in 2010 the U.K. Identity Card project was suspended.

2.4.8 Evaluation

Newton and Phillips [88] presented a summary of three independent state- of-the-art iris biometric evaluations: the Independent Testing of Iris Recognition Technology (ITIRT) conducted by the International Biometric Group (IBG), the Iris Recognition Study 2006 (IRIS06) conducted by Authenti-Corp (AC), and the Iris Challenge Evaluation (ICE 2006) conducted by the National Institute of Standards and Technology (NIST).

ICE2006 compared three algorithms. ITIRT and IRIS06 compared sensors. The evaluations used between 240 (ICE2006) and 458 (ITIRT) subjects. To sum- marize the three evaluations, Newton and Phillips compared the FNMRs at a FMR of 1 in 1000 (.1%). All three evaluations got similar magnitudes of error rates. The similarity in results may be partly due to the fact that all but one of the algorithms used in the evaluations were based on work by Daugman.

40 2.4.9 Performance under Varying Conditions

As iris biometrics is used for larger and more varied applications, it is essential to test the limits of the technology under a variety of conditions (Figure 2.8). Rakshit and Monro [100] tested the performance of iris biometrics for three pa- tients who underwent cataract surgery. A cataract is a clouding of the lens in the eye. More than 200,000 cataract procedures are performed every year in the U.K. In cataract surgery, the cloudy lens is replaced with a thinner implant, causing the iris plane to shift away from the cornea. The result is increased magnification of the iris by the cornea. Rakshit and Monro took pictures of the iris before and after three patients had cataract surgery. They noticed an increased number of specular reflections in the pupil after surgery but they found no visible change in the iris structure, and they obtained an equal error rate of zero when comparing pre- and post-operative images. They concluded that cataract surgery was not a degrading factor. This result is the opposite of the result obtained by Roizenblatt et al. [102], which could possibly be attributed to Rakshit having such a small data set. Rakshit and Monro [100] also examined eleven patients whose eyes were di- lated with eyedrops. They found that after instilling eyedrops, they had six failures out of 45 images matched, and many of the dilated eyes had non-circular pupils. Hollingsworth et al. [55] also investigated the effect of dilation. They achieved images with a range of dilation by darkening the lights in the room. In their experiments, they found that comparisons between two dilated eyes followed a distribution with a mean fractional Hamming distance of 0.06 higher than the mean of the distribution for non-dilated eyes. The means of both the match and the non-match distributions are expected to fall between 0 and 0.5. Therefore,

41 Figure 2.8: As iris biometrics is used for larger and more varied applications, it will have to deal with irises with various different conditions. This image shows an unusual iris (Subject 05931) with filaments of tissue extending into the pupil. a shift of 0.06 is nontrivial, amounting to twelve percent of this range. Further- more, the difference in dilation between an enrollment image and an image to be recognized had a marked affect on the comparison. Comparisons between images with widely different degrees of dilation followed a distribution with a mean about 0.08 higher than the mean of the distribution of images with similar degrees of dilation. The Multiple Biometric Grand Challenge (MBGC) is designed to test biometric performance under less controlled conditions than what has previously been used for biometrics. More information about the data released with this challenge is given in Chapter 4.

42 2.4.10 Multibiometrics

According to Kittler and Poh, “the term Multi Biometrics refers to the design of personal identity verification or recognition systems that base their decision on the opinions of more than one biometric expert” [70]. Recent research has highlighted the benefits of using multiple biometric modalities. Benefits include increased population coverage, more user choice, improved reliability, increased resilience to spoofing, and improved authentication performance. Kittler and Poh [70] showed that multi-modal biometrics can provide improved performance compared to individual component experts. They used five off-the- shelf conventional technologies and measured the FRR and FAR on each. They then used weighted averaging to fuse the scores from the five experts. A weighted fusion of all five experts had an order of magnitude lower error rates than any single expert. Adding a quality measure to multibiometrics is also beneficial. In an experiment involving the fusion of six face systems and one speech system, “using quality measures [reduced] the verification error ... over the baseline fusion classifier (without quality measure), by as much as 40%.” In creating a multi- biometric system, there is a trade-off between improved accuracy and increased computation. This is an optimization problem: find the subset of candidate bio- metric experts that maximizes performance and minimizes cost. Furthermore, the solution to the optimization problem should be robust to the population mis- match between the development and target data sets. Kittler and Poh found that cross-validation method of evaluation is particularly sensitive to the mismatch in data sets, and that the Chernoff bound is a better alternative. A book chapter by Jain et al. [63] also gives an in-depth discussion of multi- biometrics. Jain et al. divided multibiometric systems into six categories: multi-

43 sensor, multi-algorithm, multi-instance, multi-sample, multimodal, and hybrid. Information can be fused at sensor-level, feature-level, score-level, rank-level, and decision-level. Figure 2.9 shows an example acquisition setup that could be used for multibiometrics. This “Iris on the Move” portal captures video of a person’s face as the subject walks through the portal. The video frames have sufficient res- olution to enable iris recognition. Using a combination of face recognition and iris recognition on these videos would be an example of multi-algorithm biometrics.

44 Figure 2.9: MBGC data included near infrared iris videos captured with a Sarnoff Iris on the Move portal, shown above. Video of a subject is captured as a user walks through the portal. This type of acquisition is less constrained than tra- ditional iris cameras, however, the quality of the iris images acquired is poorer. It is possible to acquire both face and iris information using this type of portal. (Picture reprinted from [16] with permission from Elsevier.)

45 CHAPTER 3

FRAGILE BIT COINCIDENCE

As mentioned in section 2.4.5, not all bits in an iris code have equal value. The observation that some bits in the iris code are less consistent than others was first made by Bolle et al. [13]. We define an iris code bit as “fragile” when there is a substantial probability of it ending up a 0 for some images of the iris and a 1 for other images of the same iris. My previous research [53] has shown that iris recognition performance can be improved by masking these fragile bits. Rather than ignoring fragile bits completely, we considered what beneficial information could be obtained from fragile bits. In this chapter, we present evidence that the locations of fragile bits tend to be consistent across different iris codes of the same eye, and that this information can be used to improve iris biometrics performance. Portions of this chapter have been reprinted, with permission, from the Proc. IEEE Int. Conf. on Biometrics: Theory, Applications, and Systems [56]

( c 2009, IEEE).

3.1 Motivation

When using fragile bit masking [53], we mask a significant amount of infor- mation because it is not “stable”. Rather than completely ignoring all of that fragile bit information, we would like to find a way to make some beneficial use

46 of those bits. We know that the values (zero/one) of those bits are not stable. However, the physical locations of those bits should be stable and might be used to to improve our iris recognition performance. We call the physical locations of fragile bits a fragile bit pattern. Figure 3.1 shows some iris images and Figure 3.2 shows the corresponding fragile bit patterns. Figure 3.2(a) and Figure 3.2(b) both show subject number 2463, and Figure 3.2(c) and Figure 3.2(d) both show subject 4261. The fragile bit patterns in Figure 3.2(a) and Figure 3.2(b) are more similar to each other than the fragile bit patterns in Figure 3.2(a) and Figure 3.2(c). To compute the Hamming distance between two iris codes, we must first com- bine (AND) the masks of the two iris codes. Figure 3.3 shows the fragility masks obtained by ANDing pairs of fragility masks together. For example, Figure 3.3(a) is the comparison mask obtained by combining Figure 3.2(a) and 3.2(b). Fig- ure 3.3(a) and 3.3(b) both show masks obtained when computing the Hamming distance for a match comparison (same subject). Figure 3.3(c) and 3.3(d) show masks for nonmatch comparisons. The fragile bit patterns for the match com- parisons coincide more closely than the fragile bit patterns for the nonmatch comparisons. By looking at how well two fragile bit patterns align, we can make a prediction on whether those two irises are from the same subject or from different subjects. We can fuse that information with the Hamming distance information and get an improved prediction over using the Hamming distance alone. The rest of this chapter is organized as follows. In section 3.2 we talk about related research. Section 3.3 describes the data sets used for our experiments in this chapter. Section 3.4 defines a new metric, the fragile bit distance (FBD), which quantifies the difference between two fragile bit patterns. In section 3.5, we

47 (a)02463d1910 (b)02463d1912

(c)04261d1032 (d)04261d1034

Figure 3.1: Example images from our data set. These images were captured using an LG4000 iris camera.

48 Fragility Masks

(a) 02463d1910 fragility mask: 1116 masked bits

(b) 02463d1912 fragility mask: 1128 masked bits

(c) 04261d1032 fragility mask: 1098 masked bits

(d) 04261d1034 fragility mask: 1118 masked bits

Figure 3.2: These are the fragile bit patterns (imaginary part) corresponding to the images in Figure 3.1. Black pixels are bits masked for fragility. We use 4800- bit iris codes and mask 25% of the bits (or 1200 bits) for fragility. Some of the bits are masked for occlusion, and so slightly less than 1200 bits are masked for fragility.

Comparisons Between Pairs of Masks

(a) 2463 match comparison: 1706 masked bits

(b) 4261 match comparison: 1738 masked bits

(c) Nonmatch comparison: 1957 masked bits

(d) Nonmatch comparison: 1978 masked bits

Figure 3.3: These are comparisons of fragile bit patterns, each obtained by AND- ing two fragile bit masks together. For example, Figure 3.3(a) is the comparison mask obtained by combining Figure 3.2(a) and 3.2(b). Black pixels show where the two masks agreed. Blue pixels show where they disagreed. White pixels were unmasked for both iris codes. There is more agreement in same-subject compar- isons than there is when comparing masks of different subjects.

49 present graphs of the distributions of FBD and Hamming distance. Section 3.6 discusses methods of fusing Hamming distance with FBD. In section 3.7 we show that the proposed method results in a statistically significant improvement over using Hamming distance alone. Finally, section 3.8 presents experiments showing the effect of changing the amount of fragile bit masking used.

3.2 Related Work

In the previous chapter, we surveyed some of the many papers presenting research in iris biometrics. Here, we focus on the small subset of research that investigates fusing Hamming distance with other information.

3.2.1 Research on Fusing Hamming Distance with Added Information

A small subset of iris biometrics research investigates combining Hamming distance with other information. A work by Sun et al. [112] aims to characterize global iris features using the following feature extraction method. First, they introduce a local binary pattern operator (LBP) to characterize the iris texture in each block of the iris image. The image block information is combined to construct a global graph. Finally, the similarity between two iris images is measured using a graph matching scheme. They fuse the LBP score with Hamming distance using the sum rule. They report that using Hamming distance alone yields an equal error rate (EER) of 0.70%, but the score-fusion of Hamming distance with their LBP method yields an EER of 0.37%. As an alternative to the sum rule, Sun et al. [112] state that the LBP score could be combined with Hamming distance using cascaded classifiers. Since their LBP method is slower than computing the Hamming distance, they suggest cal-

50 culating the Hamming distance first. If the Hamming distance is below some low threshold, the comparison is classified as a match. If the Hamming distance is above some high threshold, the comparison is classified as a nonmatch. If the Hamming distance is between those two thresholds, the second classifier (using LBP) should make the decision. Vatsa et al. [117] characterize iris texture using Euler numbers. They use a Vector Difference Matching algorithm to compare Euler codes from two irises. Vatsa et al. combine Hamming distance and Euler score using a cascaded classifier. Zhang et al. [124] use log Gabor filters to extract 32 global features character- izing iris texture. To compare the global features from two iris images, they use a weighted Euclidean distance (WED) between feature vectors. Zhang et al. use cascaded classifiers to combine the global WED with a Hamming distance score. However, unlike Sun et al. [112] and Vatsa et al. [117], they propose using their global classifier first, and then using Hamming distance. In their experiments, using Hamming distance alone gave a false accept rate (FAR) of 8.1% when the false reject rate (FRR) was 6.1%. The fusion of WED and Hamming distance gave FAR = 0.3%, FRR = 1.9%. Park and Lee [90] generate one feature vector using the binarized directional subband outputs at various scales. To compare two binary feature vectors, they use Hamming distance. A second feature vector is computed as the blockwise normalized directional energy values. Energy feature vectors are compared using Euclidean distance. To combine scores from these two feature vectors, Park and Lee use a weighted average of the two scores. Using the binary feature vectors alone gives an EER of 5.45%; the energy vectors yield an EER of 3.80%; when the two scores are combined, the EER drops to 2.60%.

51 All of the above mentioned papers combine Hamming distance scores with some other scores at the matching score level to improve iris recognition. Sun et al. [112] combine scores by summing. Three of the papers [90, 112, 117] use cascaded classifiers. Park and Lee [90] use a weighted average. Our work is similar to these papers in that we also consider combining two match scores to improve performance. We differ from these other works in that we are the first to use a score based on the location of fragile bits in two iris codes.

3.2.2 Research on Fragile Bits

Research on fragile bits is a more recent trend in iris biometrics literature. One of our previous papers [53] presented evidence that not all bits in the iris code are of equal consistency. We investigated the effect of different filters on bit fragility.

We used 1D log-Gabor filters and multiple sizes of a 2D Gabor filter, and found that the fragile bit phenomenon was apparent with each filter tested. The largest filter tended to yield fewer fragile bits than the smaller filters. We investigated possible causes of inconsistencies and concluded that the inconsistencies are due largely to the coarse quantization of the filter response. We performed an exper- iment comparing (1) no masking of fragile bits (baseline) with (2) masking bits corresponding to complex filter responses near the axes of the complex plane. We masked fragile bits corresponding to the 25% of filter responses closest to the axes. Using a data set of 1226 images from 24 subjects, we found that fragile bit masking improved the separation between the match and nonmatch score distributions. Other researchers have also begun to investigate the effects of masking fragile bits. Barzegar et al. [6] investigated fragile bit masking using different thresh- olds. They compared (1) no fragile bit masking to (2) fragile bit masking with

52 thresholds of 20%, 30% and 35%. They found that using a threshold of 35% for masking produced the lowest error rates on the CASIA-IrisV3 data set. Our own initial investigations have shown that the optimal fragility threshold may depend partly on the quality of the iris images being used; therefore, we feel that further investigation into the proper fragility threshold would be worthwhile. Dozier et al. [37] also tried masking inconsistent bits and found an improvement in performance. However, they used a different method than Hollingsworth et al. [53] and Barzegar et al. [6]. Hollingsworth et al. [53] and Barzegar et al. [6] approximated fragile bit masking by masking filter responses near the axes of the complex plane. In contrast, Dozier et al. used a training set of ten images per subject to find consistency values for each bit in the iris code. Then for that subject, they only kept bits that were 90% or 100% consistent in their training set, and masked all other bits. In addition, they also considered only those bits that had at least 70% coverage in their training set; that is, if a bit was occluded by eyelids or eyelashes in four or more of the training images, they masked that bit. Dozier et al. tested their method on six subjects from the ICE data set. In a similar paper, Dozier et al. [38] again showed the benefit of masking inconsistent bits. In this work, they used a genetic algorithm to create an iris code and corresponding mask for each subject. Once again, they used ten training images per subject in generating their fragile bit masks. Each of the above mentioned papers showed the benefit of masking fragile bits, but in every case, they simply discarded all information from the fragile bits. None of them considered employing the locations of fragile bits as an extra feature to fuse with Hamming distance. The only paper that showed a benefit from using the locations of fragile bits

53 was our conference paper [56]. That paper introduced the idea of comparing fragile-bit-locations between two irises, and tested our idea on a data set of 9784 images. Here we present experiments on a data set more than twice the size of our prior set. We have further analyzed the distribution of fragile bit distance, added statistical tests evaluating our proposed method, and investigated the effect of varying the amount of fragile bit masking used.

3.3 Data

We acquired a data set of 19,891 iris images taken with an LG4000 iris cam- era [76] at the University of Notre Dame. Some example images are shown in

Figure 3.1 and the camera is shown in Figures 3.4 and 3.5. The images are 640 pixels by 480 pixels. All images in this set were acquired between January 2008 and May 2009. A total of 686 different people attended acquisitions sessions, so there are 1372 different eyes in the data set. Each subject attended between one and eighteen acquisition sessions. At each session, we usually acquired three left eye images and three right eye images. The minimum number of images for any one subject is four (two of each iris), and the maximum is 108 (54 of each iris). For our experiments, we used the most current version of our in-house iris bio- metric software. This software is based on NIST’s IrisBEE software. It uses one- dimensional log-Gabor filters for extracting the iris texture from the segmented and unwrapped iris image. One modification that we made to the IrisBEE soft- ware is that our software now uses active contours for segmentation. Additionally, we added fragile bit masking to the software; we use a default fragile bit masking threshold of 25% [53]. In section 3.8 of this paper, we investigate the effects of changing this threshold. A third modification involves the size of the iris code.

54 Figure 3.4: Images in our data set were captured using this LG4000 iris cam- era [76].

55 Figure 3.5: The LG4000 iris camera captures images of both eyes at the same time.

56 We took the default 240 by 20 normalized iris image, and averaged neighboring rows to create a smaller image to use when generating the iris code. We averaged pixel values from rows one and two to produce the first output row, from rows three and four to produce the second output row, and so forth, so that the final normalized iris image was reduced to a 240 by 10 image. Let L(x, y) be the pixel intensity at position (x, y) in the 240 by 20 normalized image. Let S(x, y) be the pixel intensity at position (x, y) in smaller image. The computation used to create the smaller image was

L(2x, y)+ L(2x − 1,y) S(x, y)= . (3.1) 2

This row-averaging resulted in a smaller iris code, with no loss in performance [92]. From each pixel in the normalized image, we get two bits in the iris code, so the

final iris code size is 240 by 10 by 2, or 4800 bits.

3.4 Fragile Bit Distance (FBD)

Figure 3.3 provides some indication of what we should expect when comparing two fragile bit patterns. In a genuine comparison, the locations of the fragile bits coincide. In an impostor comparison the locations of the fragile bits do not. When we compare two iris codes, we mask any bit that is fragile in either of the two fragile bit patterns. Therefore, we expect more bits to be masked for fragility in impostor comparisons than in genuine comparisons. We can theoretically predict how many bits will be unmasked in an impostor comparison. In this analysis, we make the assumption that the fragility of bits is independent of position and that each position is independent of all other posi- tions. Consider the iris code for a single, unoccluded image. We mask 25% of bits

57 for fragility, and leave 75% of bits unmasked. Now consider a comparison of two unoccluded images from different subjects. We expect (75%)(75%) = 56.25% of the bits to be unmasked, and 43.75% of the bits to be masked. Another way to analyze how many bits will be masked is to consider the number of coincident bits. If we mask 25% of the bits in each of the two irises in an impostor comparison, we expect (25%)(25%) = 6.25% of the bits to be coincident fragile bits. About 25% − 6.25% = 18.75% of the bits in the first iris code will be marked as fragile and not line up with any fragile bits from the second iris code. The total number of masked bits for the comparison will be the coincident fragile bits, plus the bits masked in first iris code only, plus the bits masked in the second iris code only. Therefore we expect 6.25% + 18.75% + 18.75% = 43.75% of the unoccluded bits will be masked in an impostor comparison. In contrast, a genuine comparison will have fewer masked bits. In two identical images, the fragile bits will line up exactly and the comparison will have 75% unmasked bits and 25% masked bits. However, two different images of the same iris are not identical because of differences in lighting, dilation, distance to the camera, focus, or occlusion. Therefore, on average, more than 25% of the bits will be masked in a genuine comparison. We define a metric called the fragile bit distance (FBD) to measure how well two fragile bit patterns align. In order to compute fragile bit distance, we need to store occluded bits and fragile bits separately. Therefore, each iris template will consist of three matrices: an iris code i, an occlusion mask m, and a fragility mask f. Unmasked bits are represented with ones and masked bits are represented with zeros. Specifically, unoccluded bits and consistent bits are marked as ones, while occluded and fragile bits are zeros. We do not want FBD to be affected by

58 occlusion, so we consider only unoccluded bits when computing the FBD. Take two iris templates, template A and template B. The FBD is computed as follows: km ∩ m ∩ f ∩ f k FBD = A B A B (3.2) kmA ∩ mBk where ∩ represents the AND operator, and the line over fA ∩ fB represents the NOT operator. The norm (kk) of a matrix tallies the number of ones in the matrix.

In above equation, fA ∩ fB is a matrix storing all bits masked for fragility. mA∩ mB is a matrix marking all bits unoccluded by eyelashes and eyelids. The FBD expresses the fraction of unoccluded bits masked for fragility in the comparison. This metric is large for impostor comparisons, and small for genuine comparisons. Our theory predicts that we will have an average FBD of 0.4375 for impostor comparisons, and an average FBD of somewhere between 0.25 and 0.4375 for genuine comparisons. We tested these predictions on our data set of 19,891 images. The average FBD for genuine and impostor comparisons are reported in Table 3.1, with standard deviations reported in parentheses. The average impostor FBD was within one standard deviation of the theoretical prediction. Also, the average genuine FBD was less than the average impostor FBD.

3.5 Score Distributions for Hamming Distance and Fragile Bit Distance

We graphed the genuine and impostor score distributions for fragile bit distance (FBD) from all possible comparisons in our 19,891-image data set. Figure 3.6 shows the result. In comparison, Figure 3.7 shows the genuine and impostor score distributions for Hamming distance. There is more separation between the

59 TABLE 3.1

AVERAGE FBD FOR GENUINE AND IMPOSTOR COMPARISONS

Avg. Genuine FBD Avg. Impostor FBD

Theoretical value between 0.25 & 0.4375 0.4375

LG4000 images 0.4047 (std dev=0.0149) 0.4397 (std dev=0.0097)

genuine and the impostor score distributions for Hamming distance than there is for FBD. The FBD genuine score distribution looks more bell-shaped than the Hamming distance genuine score distribution. Figure 3.8 shows the joint distribution of FBD and Hamming distance. Fig- ure 3.9 shows the same joint distribution, zoomed-in on the area of the graph where the genuine and impostor score distributions meet. Each blue point in these figures represents at least 0.003% of the 247,872 match comparisons in our experiment, and each red point represents at least 0.003% of the 197,229,390 non- match comparisons. Selecting a single threshold of Hamming distance (e.g. HD =

0.35) would separate genuine and impostor comparisons better than any threshold we might choose for FBD. Using FBD, we achieve an equal error rate of 6.34×10−2 on this data set. Using Hamming distance, the equal error rate is 8.70 × 10−3.

3.6 Fusing Fragile Bit Distance with Hamming Distance

Even though the FBD is not as powerful a metric as the Hamming distance, we can combine the features to create a better classifier than Hamming distance alone. To combine Hamming distance and FBD, we first tried a weighted average

60 FBD Score Distributions 0.05 Genuine 0.045 Impostor 0.04

0.035

0.03

0.025

Percent 0.02

0.015

0.01

0.005

0 0.3 0.35 0.4 0.45 0.5 0.55 Fragile Bit Distance

Figure 3.6: Score distributions for fragile bit distance.

61 HD Score Distributions 0.12 Genuine Impostor 0.1

0.08

0.06 Percent 0.04

0.02

0 0 0.1 0.2 0.3 0.4 0.5 Hamming Distance

Figure 3.7: Score distributions for Hamming distance.

62 Fragile Bit Distance vs Hamming Distance 0.48

0.46

0.44

0.42

0.4 Fragile Bit Distance

0.38

0.36 0 0.1 0.2 0.3 0.4 0.5 Hamming Distance

Figure 3.8: Joint score distributions for Hamming distance and FBD. Genuine scores are shown in blue. Impostor scores are shown in red.

63 Fragile Bit Distance vs Hamming Distance

0.46 Each blue triangle represents at least 7 match comparisons. Each red dot represents at least 5917 nonmatch comparisons.

0.45

0.44

0.43 Fragile Bit Distance

0.42

0.41 HD = constant 0.6HD + 0.4FBD = constant

0.25 0.3 0.35 0.4 0.45 Hamming Distance

Figure 3.9: A zoomed-in view of the joint score distributions for Hamming distance and FBD. Genuine scores are shown in blue. Impostor scores are shown in red. Each point represents at least 0.003% of the comparisons.

64 technique, using the same approach as [90]. We combined the two scores using the equation,

ScoreW = α × HD + (1 − α) × FBD. (3.3)

We varied the parameter α in steps of 0.1 from 0 to 1, and calculated the equal error rate for each run. Figure 3.10 shows how the equal error rate changes as α varies. The lowest equal error rate was 8.02 × 10−3, which was obtained using an α value of 0.6. The benefit of using a weighted average can be seen visually in Figure 3.9. This figure shows the joint distribution of Hamming distance and FBD scores. The vertical line marked “HD=constant” shows how using Hamming distance would separate the genuine and impostor scores. The diagonal line marked “0.6HD + 0.4FBD=constant” shows that a better separation between genuine and impostor scores is achieved using the weighted average.

Multiplication can be used as an alternative method of score fusion:

ScoreM = HD × FBD. (3.4)

When using multiplication, the equal error rate was 7.99 × 10−3. Fusing by mul- tiplication and fusing by weighted average yielded similar results. An ROC curve showing the results of these tests is shown in Figure 3.11, and Table 3.2 shows summary statistics of these experiments including the equal error rate (EER) and the false reject rate at an operating point of FAR=0.001 (FRR at FAR=0.001).

Based on the values in Table 3.2, we see that both methods of fusing Hamming distance with FBD performed better than using Hamming distance alone. By incorporating FBD, we improved the accuracy of our iris matcher.

65 −3 x 10 Change in EER as Scores Weights Change 11

10.5

10

9.5 EER when using 0.6HD + 0.4FBD 9

8.5 Equal Error Rate

8

7.5

7 0 0.2 0.4 0.6 0.8 1 alpha

Figure 3.10: We fused FBD and HD using the expression, α×HD+(1−α)×FBD. We found that an α value of 0.6 yielded the lowest equal error rate.

66 Fusion of HD with FBD 1 HD 0.6HD + 0.4FBD HD*FBD 0.995

0.99 True Accept Rate 0.985

0.98 0 0.005 0.01 0.015 0.02 False Accept Rate

Figure 3.11: Fusing Hamming distance with FBD performs better than using Hamming distance alone. Fusing by multiplying and fusing by weighted averaging yield similar results.

67 TABLE 3.2

FUSING FBD WITH HAMMING DISTANCE

Method EER FRR at FAR=0.001

HD (baseline) 8.70 × 10−3 1.40 × 10−2

0.6HD + 0.4FBD 8.02 × 10−3 1.25 × 10−2 HD × FBD 7.99 × 10−3 1.23 × 10−2

One caveat with using FBD is that in order to compute FBD, we have to store the fragility mask separately from the occlusion mask. Therefore, our iris template is 50% larger than it would be if we did not use FBD.

3.7 Tests of Statistical Significance

The proposed fusion between Hamming distance and FBD works better than the baseline test of Hamming distance alone. We performed a statistical test to determine whether this difference was statistically significant. The null hypothesis for this test is that there is no difference between the baseline Hamming distance method and the proposed fusion of Hamming distance and FBD. The alterna- tive is that there is a significant difference. To test for statistical significance, we randomly divided the subjects into ten different test sets. For each test set, we measured the performance of using Hamming distance alone, and of using fusion of Hamming distance and FBD. Then we used a paired t-test to see whether the proposed method obtained a statistically significant improvement. The results are given in Table 3.3 for weighted average fusion. Table 3.4 shows the results for fu-

68 TABLE 3.3

IS 0.6HD + 0.4FBD BETTER THAN HD ALONE?

Method Avg. EER Avg. FRR at FAR=0.001

HD (baseline) 8.68 × 10−3 1.51 × 10−2

0.6HD + 0.4FBD (proposed) 8.08 × 10−3 1.33 × 10−2

p-value 3.68 × 10−3 1.45 × 10−3

TABLE 3.4

IS HD × FBD BETTER THAN HD ALONE?

Method Avg. EER Avg. FRR at FAR=0.001

HD (baseline) 8.68 × 10−3 1.51 × 10−2

HD × FBD (proposed) 8.00 × 10−3 1.33 × 10−2

p-value 6.43 × 10−3 1.90 × 10−3

sion using multiplication. The t-test showed statistically significant improvement of the proposed method over the baseline for both EER and false reject rate at a false accept rate of 0.1% (FRR at FAR=0.001). Rerunning the same experiment using different random test sets gave similar results.

Recall that when we performed fusion using a weighted average, we used the

69 equation,

ScoreW = α × HD + (1 − α) × FBD (3.5) and we found that using a weight value of α =0.6 worked best. We performed a statistical test to determine whether this value of α was statistically significant. For this test, we again divided the subjects randomly into ten different test sets.

We varied the parameter α in steps of 0.1 from 0 to 1. For a given value of α, we computed the equal error rate for each of the test sets, then found the average equal error rate for this value of α across all test sets. The results are shown in Table 3.5. Next, we performed a paired t-test to determine whether the given value of α produced significantly different results than using α =0.6. The p-values for these tests are also shown in Table 3.5. We found that at a significance level of p=0.05, values of α between 0.4 and 0.7 were not significantly different from α =0.6. However, other values of α were significantly different.

3.8 Effect of Modifying the Fragile Bit Masking Threshold

Recall that fragile bit masking ignores the bits corresponding to complex filter responses close to the axes of the complex plane (see section 2.4.5). In the ex- periments presented up to this point, we masked 25% of bits in each iris code for fragility. We chose this threshold because that was the threshold used in previous work [53]. In this paper, we wanted to study how changing the threshold would affect our results. We ran experiments varying the threshold used for fragile bit masking. First we ran one test with 0% fragile bit masking. We ran an all-vs-all test (comparing all images to all other images in the data set) and computed the performance

70 TABLE 3.5

IS αHD + (1 − α)FBD STATISTICALLY SIGNIFICANTLY DIFFERENT FROM 0.6HD +0.4FBD?

α Avg. EER p-value Yes/No?

0 6.28 × 10−2 2.81 × 10−8 Yes

0.1 2.09 × 10−2 1.75 × 10−6 Yes 0.2 1.16 × 10−2 1.79 × 10−4 Yes

0.3 8.98 × 10−3 1.24 × 10−2 Yes

0.4 8.21 × 10−3 4.29 × 10−1 No

0.5 8.08 × 10−3 9.62 × 10−1 No

0.6 8.08 × 10−3 --

0.7 8.19 × 10−3 1.18 × 10−1 No 0.8 8.37 × 10−3 2.56 × 10−2 Yes

0.9 8.56 × 10−3 5.25 × 10−3 Yes

1.0 8.68 × 10−3 3.68 × 10−3 Yes

71 using Hamming distance alone. The equal error rate for that test was 8.26× 10−3. Next, we varied the threshold from 5% to 30% in increments of 5%. At each threshold, we ran three all-vs-all tests. The first test was using Hamming distance alone. The second test was using the a weighted average of Hamming distance and fragile bit distance: 0.6HD × 0.4FBD. The third test used the multiplication of Hamming distance and fragile bit distance: HD × FBD. At low levels of fragile bit masking, the Hamming distance test and the weighted average test gave very similar results. The ROC curves for those tests are shown in Figure 3.12. At thresholds of 15%, 20%, 25%, and 30%, the difference between Hamming distance and weighted average was larger (see Figures 3.13 and 3.14).

The best performance using Hamming distance alone was achieved using 5% fragile bit masking; at this threshold, the equal error rate was 8.15 × 10−3. The best performance using the weighted average of Hamming distance and FBD was achieved using a 25% fragile bit threshold; the equal error rate on this test was 8.02 × 10−3. The best performance for the multiplication of Hamming distance and fragile bit distance was 7.99 × 10−3, and this was achieved using a 25% fragile bit masking threshold. We observe that the fusion of Hamming distance and fragile bit distance has greater benefit when a higher level of fragile bit masking is used. We only tested fragile bit masking thresholds up to 30% on our data set because for our data and software, our experiments indicate that increasing the fragile bit masking further would not improve performance. On the other hand, other researchers have found that fragile bit masking of 35% worked best on the CASIA version 3 data set [6].

We postulate that any system that uses a fragile bit masking level of 15% or higher could benefit from using fragile bit distance in addition to Hamming distance.

72 Verification Performance with Low Levels of Fragile Bit Masking 0.995

0.99 True Accept Rate Fusion with 5% fragile bit masking HD with 5% fragile bit masking Fusion with 10% fragile bit masking HD with 10% fragile bit masking 0.985 0 0.002 0.004 0.006 0.008 0.01 False Accept Rate

Figure 3.12: We considered the effect of masking only 5% or 10% of the bits in the iris code for fragility. Using these values, we compared the performance of (1) Hamming distance (HD) with performance of (2) fusing HD and FBD with a weighted average (0.6HD × 0.4FBD). At these low levels of fragile bit masking, the difference between HD and the fusion is small. The ROC curves for the two methods overlap.

73 Verification Performance with Medium Levels of Fragile Bit Masking 0.995

0.99 True Accept Rate Fusion with 15% fragile bit masking HD with 15% fragile bit masking Fusion with 20% fragile bit masking HD with 20% fragile bit masking 0.985 0 0.002 0.004 0.006 0.008 0.01 False Accept Rate

Figure 3.13: We considered the effect of masking 15% or 20% of the bits in the iris code for fragility. Again, we compared the performance of (1) Hamming distance (HD) with performance of (2) fusing HD and FBD with a weighted average. At these levels of fragile bit masking, the fusion clearly does better than HD alone.

74 Verification Performance with High Levels of Fragile Bit Masking 0.995

0.99 True Accept Rate Fusion with 25% fragile bit masking HD with 25% fragile bit masking Fusion with 30% fragile bit masking HD with 30% fragile bit masking 0.985 0 0.002 0.004 0.006 0.008 0.01 False Accept Rate

Figure 3.14: We considered the effect of masking 25% or 30% of the bits in the iris code for fragility. At these levels of fragile bit masking, the fusion shows an even greater performance benefit over HD alone than there was at lower levels of fragile bit masking.

75 3.9 Discussion

In this chapter, we defined a new metric, the fragile bit distance (FBD) which measures how two fragile bit masks differ. Low FBDs are associated with genuine comparisons between two iris codes. High FBD are associated with impostor comparisons.

Fusion of FBD and Hamming distance is a better classifier than using Hamming distance alone. Fusion can be done either by using a weighted average of FBD and Hamming distance, or by multiplying. The multiplication of FBD and Hamming distance reduces the EER of our iris recognition system by eight percent – from

8.70 × 10−3 to 7.99 × 10−3 – a statistically significant improvement. Fusing FBD and Hamming distance has a greater benefit when higher levels of fragile bit masking is used. At low levels of fragile bit masking, fusion had similar results to using Hamming distance alone on our data. When using fragile bit masking thresholds of 15% or greater, fusion had superior performance.

76 CHAPTER 4

AVERAGE IMAGES

The previous chapter focused on reducing error rates in experiments involving still images. In this chapter, we consider how to reduce error rates when we have an entire video clip available for both probe and gallery. Portions of this chapter have been reprinted, with permission, from the Proc. Int. Conf. on Biomet- rics [54] ( c 2009, Springer Berlin/Heidelberg) and from IEEE Transactions on Information Forensics and Security [57] ( c 2009, IEEE).

4.1 Motivation

Zhou and Chellappa [126] reported that using video can improve face recog- nition performance. We postulated that employing similar techniques for iris recognition could also yield improved performance. There is some prior research in iris recognition that uses multiple still images; for example, [39, 40, 72, 80, 105], but the research using video for iris recognition is still in its infancy. There are drawbacks to using single still images. One problem with single still images is that they usually have a moderate amount of noise. Specular highlights and eyelash occlusion reduce the amount of iris texture information present in a single still image. With a video clip of an iris, however, a specular highlight in one frame may not be present in the next. Additionally, the amount of eyelash

77 occlusion is not constant throughout all frames. It is possible to obtain a better image by using multiple frames from a video to create a single, clean iris image.

A second difficulty with still images is that lighting differences can cause an in- creased Hamming distance score in a comparison between two stills. By combining information from multiple frames of a video, we can reduce variations caused by changes in lighting.

Zhou and Chellappa suggested averaging to integrate texture information across multiple video frames to improve face recognition performance. By combining multiple images, noise is smoothed away, and relevant texture is maintained. In this chapter, we present a method of averaging frames from an iris video. Our experiments demonstrate that that our signal-level fusion of multiple frames in an iris video can improve iris recognition performance. We perform image fusion of iris images at the pixel level. Our experiments show that the traditional segmentation and unwrapping of the iris can be used as a satisfactory method of image registration. We compare two methods of pixel fusion: using the mean and using the median. There have been a number of papers discussing score-level fusion for iris recog- nition, but there has not been any work done with signal-level fusion for iris recognition. Since we are the first to propose the use of signal-level fusion for iris recognition, we show that this type of fusion can perform comparably to score- level fusion. We focus on reimplementing multiple score-level fusion techniques to show that signal-level fusion can achieve at least as good recognition rates as score-level fusion. Our experiments show that our method achieves superior recognition rates to some score-level fusion techniques suggested in the literature. Additionally, our signal-fusion method has a faster computation time for matching

78 than the score-level fusion methods.

4.2 Related Work

4.2.1 Video

Video has been used effectively to improve face recognition. A recent book chapter by Zhou and Chellappa [126] surveyed a number of methods to employ video in face biometrics. In contrast, there is very little research using video in iris biometrics. In an effort to encourage research in iris biometrics using un- constrained video, the U.S. government organized the Multiple Biometric Grand Challenge [95]. The data provided with this challenge included two types of near infrared iris videos: (1) iris videos captured using an LG 2200 camera, and (2) videos containing iris and face information captured using a Sarnoff Iris on the

Move portal [82]. There has been a small amount of work published using the MBGC data. First, some preliminary results were presented at a workshop [93]. In addition, two conference papers using MBGC iris videos were published in the most recent

International Conference in Biometrics. The first paper was our initial version of this research [54]. The second paper by Lee et al. [74] presented methods to detect eyes in the MBGC portal videos and measure the quality of the extracted eye images. They compared portal iris videos to still images. At a false accept rate of 0.80%, they achieved a false reject rate of 43.90%. A recent journal paper by Zhou et al. [127] also presented some results on the MBGC iris video data. Zhou et al. suggested making some additions to the traditional iris system in order to select the best frames from video. First they checked each frame for interlacing, blink, and blur. They used interpolation to

79 correct deinterlacing, and they discarded blurry frames and frames without an eye. Selected frames were segmented in a traditional manner and then assigned a confidence score relating to the quality of the segmentation. They further evalu- ated quality by looking at the variation in iris texture, the amount of occlusion, and the amount of dilation. They divided the iris videos into five groups based on quality score, and showed that a higher quality score correlated with lower equal error rate.1 Our work differs from Lee’s [74] and Zhou’s [127] in that we use videos for both gallery and probe sets. Also, we compare the use of stills and the use of videos directly, while they do not. In addition, their papers focus on selecting the best frame from a video to use for subsequent processing. In contrast, the main focus of this work is to how to combine information from multiple frames using signal-level fusion.

4.2.2 Still Images

Some iris biometric research has used multiple still images, but all such research uses score-level fusion, not signal-level fusion. The information from multiple im- ages has not been combined to produce a better image. Instead, these experiments typically employ multiple enrollment images of a subject, and combine matching results across multiple comparisons.

Du [39] showed that using three enrollment images instead of one increased their rank-one recognition rate from 98.5% to 99.8%. The paper reported, “We randomly choose three images [of] each eye from the database to enroll and used the rest [of the] images to test. We did [this] multiple times and the average iden-

1Lee et al. [74] and Zhou et al. [127] both investigate quality of video frames. A number of papers have investigated quality of still images including Vatsa et al. [118], Belcher and Du [8], and Proen¸ca and Alexandre [96].

80 tification [accuracy] rate is 99.8%. If two images are randomly selected to enroll, ... the average identification accuracy rate is 99.5%. If one image is randomly selected to enroll ... the average identification accuracy is 98.5%.” In another paper [40], Du et al. used four enrollment images instead of three. Ma et al. [80] also used three templates of a given iris in their enrollment database, and took the average of three scores as the final matching score. Krichen et al. [72] performed a similar experiment, but used the minimum match score instead of the average. Schmid et al. [105] presented two methods for fusing Hamming distance scores. They computed average Hamming distance and also a log-likelihood ratio. They found that in many cases, the log-likelihood ratio outperformed the average Hamming distance. In all of these cases, information from multiple images was not combined until after two stills were compared and a score for the comparison obtained. Thus, these researchers used score-level fusion. Another method of using multiple iris images is to use them to train a classifier. Liu et al. [78] used multiple iris images for a linear discriminant analysis algorithm.

Roy and Bhattacharya [103] used six images of each iris class to train a support vector machine. Even in training these classifiers, each still image was treated as an individual entity, rather than being combined with other still images to produce an improved image.

4.3 Data

We used the Multiple Biometric Grand Challenge (MBGC) version 2 iris video data [95] in our experiments. The videos in this data set were acquired using an Iridian LG EOU 2200 camera (Figure 4.1). To collect iris videos using the LG2200 camera, the analog NTSC video signal from the camera was digitized

81 using a Daystar XLR8 USB digitizer and the resulting videos were stored in a high bit rate (nearly lossless) compressed MP4 format.

The MBGCv2 data contains 986 iris videos collected during the spring of 2008. However, three of the videos in the data set contain less than ten frames. We dropped those three videos from our experiments and used the remaining 983 videos. The data includes videos of both left and right eyes for each subject; we treated each individual eye as a separate “subject” in our experiments. There are a total of 268 different eyes in these videos. We selected the first video from each subject to include in the gallery set and put the remaining 715 videos in our probe set. For each subject, there were between one and seven iris videos in the data set. Any two videos from the same subject were acquired between one week and three months apart. The MBGC data is the only set of iris videos publicly available.

4.4 Average Images and Templates

4.4.1 Selecting Frames and Preprocessing

Once each iris video was acquired, we wanted to create a single average im- age that combined iris texture from multiple frames. The first challenge was to select focused frames from the iris video. The auto-focus on the LG 2200 camera continually adjusts the focus in attempts to find the best view of the iris. Some frames have good focus, while others suffer from severe blurring due to subject motion or illumination change. We used a technique described by Daugman with a filter proposed by Kang to select in-focus images. As described by Daugman in [28], a filter can be applied to an image as a fast focus measure, typically in the Fourier domain. By exploiting

82 Figure 4.1: The Iridian LG EOU 2200 camera used in acquiring iris video se- quences.

83 Parseval’s Theorem, we were instead able to apply the filter within the image domain, squaring the response at each pixel. We summed the responses over the entire image, applying the filter to non-overlapping pixels within the image and then averaged the response over the number of pixels the kernel was applied to. The kernel described by Kang and Park [68] was applied to each frame, and the ten with the highest scores were extracted from the video for use in the image averaging experiments. The raw video frames were not pre-processed like the still images that the Iridian software saved. We do not know what preprocessing is done by the Iridian system, although it appears that the system does contrast enhancement and possi- bly some deblurring. Differences between the stills and the video frames are likely due to differences in the digitizers used to save the signals. We used the Matlab imadjust function [83] to enhance the contrast in each frame. This function scales intensities linearly such that 1% of pixel values saturate at black (0), and 1% of pixel values saturate at white (255).

Our next step was to segment each frame. Our segmentation software uses a Canny edge detector and a Hough transform to find the iris boundaries. The boundaries are modeled as two non-concentric circles. A description of the seg- mentation algorithm is given in [79]. Our segmentation algorithm is designed to work for frontal iris images acquired from cooperative subjects. A possible area of future work would be to obtain a segmentation algorithm that could work on off-angle irises and test our image-averaging technique on that type of iris images. Our segmentation and eyelid detection algorithms are not as finely tuned as commercial iris recognition software. To make up for this limitation, we ran two types of experiments for this paper. The first type of experiments uses the

84 data obtained from the completely automated frame selection, segmentation, and eyelid detection algorithms. We also ran a second set of experiments that included manual steps in the preprocessing. We manually checked all 9830 frames selected by our frame-selection algorithm. A few of the frames did not contain valid iris information; for example, some frames showed blinks. We also found some off- angle iris frames. We replaced these frames with other frames from the same video (Figure 4.2). In total, we replaced 86 (0.9%) of the 9830 frames. Next we manually checked all of the segmentation results and replaced 153 (1.6%) incorrect segmentations (Figure 4.3). We corrected the eyelid detection in an additional 1765 (18%) frames (Figure 4.4).

4.4.2 Signal Fusion

For each video, we now had ten frames selected and segmented. We wanted to create an average image consisting only of iris texture. In order to align the irises in the ten frames, we transformed the raw pixel coordinates of the iris area in each frame into normalized polar coordinates. In polar coordinates, the radius r ranged from zero (adjacent to the pupillary boundary) to one (adjacent to the limbic boundary). The angle θ ranged from 0 to 2π. This yielded an “unwrapped” iris image for each video frame selected. In order to combine the ten unwrapped iris images, we wanted to make sure they were aligned correctly with each other. Rotation around the optical axis induces a horizontal shift in the unwrapped iris texture. We tried three methods of alignment. First, we identified the shift value that maximized the correlation between the pixel values. Second, we tried computing the iris codes and selecting the alignment that produced the smallest Hamming distance. Third, we tried the

85 (a) (b)

(c) (d)

Figure 4.2: The frames shown in (a) and (c) were selected by our frame-selection algorithm because the frames were in focus; however, these frames do not include much valid iris data. In our automated experiments presented in this paper we kept frames like (a) and (c) so that we could show how our software performed without any manual quality checking. In our semi-automated experiments we manually replaced frames like (a) and (c) with better frames from the same video like (b) and (d). We expect that in the future, we may be able to develop an algorithm to detect blinks and off-angle images so that such frames could be automatically rejected.

86 (a) (b)

Figure 4.3: Our automated experiments contain a few incorrect segmentations like the one shown in (a). In our semi-automated experiments we manually replaced incorrect segmentations to obtain results like that shown in (b).

(a) (b)

Figure 4.4: Our automated software did not correctly detect the eyelid in all frames. In our semi-automated experiments we manually replaced incorrect eyelid detections to obtain results like that shown in (b).

87 naive assumption that people would not actively tilt their head while the iris video was being captured and thus assumed that no shifts were needed. The first two approaches did not produce any better recognition results than the naive approach. This is because the images used in our experiments are frontal iris images from cooperative users. A different method of alignment would be necessary for iris videos with more eye movement. Since the naive approach worked well for our data, we used it in our subsequent experiments. Parts of the unwrapped images contained occlusion by eyelids and eyelashes. We masked eyelid regions in our image. Then we computed an average unwrapped image from unmasked iris data in the ten original images, using the following algorithm. For each (r, θ) position, we find how many of the corresponding pixels in the ten unwrapped images are unmasked. If a pixel is occluded in nine or ten of the images, we mask it in the average image. Otherwise, an average pixel value is based on unmasked pixel values of the corresponding frames. Therefore, the new pixel value could be an average of between two and ten pixel intensities, depending on mask values. Section 4.5 will give more details on averaging the pixel values. Using this method, we obtained 268 average images from the gallery videos. We similarly obtained 715 average images from the probe videos. An example average image is shown in Figure 4.5. On the top of the figure are the ten original images, and on the bottom is the average image fused from the original signals.

4.4.3 Creating an Iris Code Template

Our software uses one-dimensional log-Gabor filters to create the iris code template. The log-Gabor filter is convolved with rows of the image, and the corresponding complex coefficients are quantized to create a binary code. Each

88 Figure 4.5: From the ten original images on the top, we created the average image shown on the bottom.

89 complex coefficient corresponds to two bits of the binary iris code – either “11”, “01”, “00”, or “10” – depending on whether the complex coefficient is in quadrant

I, II, III, or IV of the complex plane. Complex coefficients near the axes of the complex plane do not produce stable bits in the iris code, because a small amount of noise can shift a coefficient from one quadrant to the next. We use fragile-bit masking [52, 53] to mask out complex coefficients near the axes, and therefore improve recognition performance.

4.5 Comparison of Median and Mean for Signal Fusion

Using the basic strategy described in 4.4.2 and 4.4.3, we needed to determine the best method of averaging pixels. Recall that each (r, θ) position in the new average image is the average of corresponding, unoccluded pixels in the ten original unwrapped iris images. We considered two ideas: using the median to combine the pixel values, or using the mean.2

To determine which of these two methods was most appropriate for iris recog- nition, we compared all images in our probe set to all images in our gallery and graphed a detection error tradeoff (DET) curve [81]. Figure 4.6 shows the result. It is clear from the graphs that using the mean for creating the average images produces better recognition performance than using the median. The median is a useful statistic for removing outliers. However, it is possible that many of the extreme outliers in these iris images have already been removed by eyelid detection. Furthermore, since we are averaging only a small number of pixels (ten or fewer), the median statistic may be less useful that if we had more available data. While the median statistic uses information from only one or two

2To compute the mean, we first summed original pixel values, then divided by the number of pixels, then rounded to the nearest unsigned 8-bit integer.

90 Comparison of Two Methods of Signal Fusion −1 10 Median Method of Signal Fusion Mean Method of Signal Fusion

False Reject Rate −2 10

Automated segmentation

−4 −3 −2 −1 10 10 10 10 (a) False Accept Rate

Comparison of Two Methods of Signal Fusion −1 10 Median Method of Signal Fusion Mean Method of Signal Fusion

False Reject Rate −2 10

Manually corrected segmentation

−4 −3 −2 −1 10 10 10 10 (b) False Accept Rate

Figure 4.6: Using a mean fusion rule for fusing iris images produces better iris recognition performance than using a median fusion rule. Graph (a) shows this result using automated segmentation. Graph (b) shows the same result using the manually corrected segmentations.

91 pixels, the mean statistic involves information from all available pixels. Therefore, in this context, the mean is a better averaging rule than the median.

4.6 How Many Frames Should be Fused in an Average Image?

As described in subsection 4.4.2, we fuse ten frames together to create an average image. However, ten frames may not be the optimal number of frames to use. Fusing more frames can give a better average. On the other hand, we add the best focused frames first, so as we increase the number of frames, we are fusing poorer quality data. To investigate this trade-off, we ran an experiment varying the number of frames to use in the fusion.

Recall that from each video, we had frames selected, segmented, and un- wrapped into normalized polar coordinates. For this experiment, rather than using all ten selected frames to create an average image, we selected the four frames having the highest focus scores and we created an average image. In this manner, we collected a gallery set of four-frame average images, and a probe set of four-frame average images. We compared all gallery images to all probe images and graphed the corresponding DET curve (red dash-dot line, Figure 4.7). We repeated this procedure, this time using six of our selected frames to create each average image. The set of six frames from each video was a superset of the set of four frames. We created a gallery set of six-frame average images, and a probe set of six-frame average images, tried all comparisons, and graphed the DET curve on the same axes as the four-frame curve (green solid line, Figure 4.7). We repeated the same procedure three more times, using eight, nine, and ten frames.

All DET curves are shown together in Figure 4.7. With the automated segmentation, each increase in the number of frames fused

92 Performance of Signal Fusion using a Varying Number of Frames −1 10 4 frames 6 frames 8 frames 9 frames 10 frames

−2 False Reject Rate 10

Automated segmentation

−4 −3 −2 −1 10 10 10 10 (a) False Accept Rate

Performance of Signal Fusion using a Varying Number of Frames −1 10 4 frames 6 frames 8 frames 9 frames 10 frames

−2 False Reject Rate 10

Manually corrected segmentation

−4 −3 −2 −1 10 10 10 10 (b) False Accept Rate

Figure 4.7: Fusing ten frames together yields better recognition performance than fusing four, six, or eight frames.

93 yielded an increase in performance. With the manually corrected segmentation, this trend holds for four, six, and eight frames. However, the DET curves for eight, nine, and ten frames all overlap, suggesting that we have approached the limit of the benefit that can be gained by adding frames. In a previous paper [54], we used six frames instead of ten, but in that paper, we had a different data set and different frame selection algorithm. The data set in our previous paper was a pre-release version of the MBGCv2 videos. 617 of those videos were included in MBGCv2 and we also had an additional 444 iris videos captured during the same semester that were not included in MBGCv2. In our previous paper [54], we chose to use the same frames as were selected by the Iridian driven software that came with the camera. The software saved frames in sets of three, where one of the three frames was captured while the top camera LED was lit, one frame was captured while the right LED was lit, and one frame was captured while the left LED was lit. Therefore, that technique guaranteed some lighting differences between the frames selected. Our current frame selection technique does not enforce such a requirement, so the ten frames selected using our current method may have fewer variations between them. With fewer variations between the frames, it makes sense that we could average more frames before losing any important texture in the iris.

We imagine that the optimal number of frames to fuse in creating an average image depends both on the data set and on the frame selection algorithm. For this paper, we decided to use ten frames in creating our average images. Using ten frames gave the best performance using the automated segmentation. The choice between using eight, nine, or ten frames for the manually corrected segmentation was not as clear, but ten frames still gave the best equal error rate, and gave

94 reasonable performance across the whole DET curve.

4.7 How Much Masking Should be Used in an Average Image?

We initially allowed a pixel to be unmasked in the average image if at least two corresponding pixels from the ten frames were unmasked. However, we suspected that a different masking rule could improve performance. We could require that all unmasked pixels in an average image be an average of ten unmasked pixel values from the ten frames (instead of an average of at least two pixels). This requirement could result in average images with not much available unmasked data. If any one frame had a large amount of occlusion, the average image would be heavily masked. On the other hand, we could use any unmasked pixel values from the frames in creating the average image, so that an average pixel value could be an average of between one and ten pixel intensities from the frames, depending on mask values in the frames.

We defined a parameter, the masking level, to specify how much masking is done in an average image. A masking level of 100% means that we only have unmasked pixels in the average image if all ten of the corresponding pixels from our ten frames were unmasked. A masking level of 10% means that the new pixel value could be an average of between one and ten pixel intensities, depending on mask values. A masking level of 50% means that we require at least half of the corresponding pixels to be unmasked before we compute an average and create an unmasked pixel in the average image. At this level, the new pixel value could be an average of between five and ten pixel intensities, depending on mask values.

When we mask too much, we do not have as much iris data in our images from which to make appropriate decisions. With less iris data, and consequently

95 fewer unmasked bits in a comparison, we get fewer degrees of freedom in the nonmatch distribution. To illustrate this phenomenon, we graphed the nonmatch distribution for a range of masking levels (Figure 4.8). As the masking level increased, the histogram of nonmatch scores got wider, causing an increased false accept rate. In contrast, when we mask too little, we lose the power gained from combining data from a number of different images. The result would be like using too few gallery images in a multi-gallery biometrics experiment. The optimal masking level depends partly on the quality of the segmentation. We created DET curves showing the verification performance as we varied the masking level used in creating the average images (Figure 4.9). With our auto- mated segmentation, a higher masking parameter is better to mitigate the impact of segmentation errors. With the manually corrected segmentations, the quality of the segmentation is good enough for us to use a smaller masking parameter and thus avoid as large an increase in false accept rate. For our current data set and segmentation, we chose to use a masking level of 80% for the automated segmentation experiments, and a masking level of 60% when using the manually corrected segmentation.

4.8 Comparison to Other Methods

We now present experiments comparing our method to previous methods. We compare our signal-fusion method to the multi-gallery score-fusion methods de- scribed by Ma [80] and Krichen [72]. Then we compare signal-fusion to Schmid’s log-likelihood method [105]. Our last experiment compares signal-fusion to a new multi-gallery, multi-probe score-fusion method.

96 Masking Level Affects the Nonmatch Distribution 0.2 masking 100% masking 80% masking 60% masking 40% 0.15 masking 20%

zoomed−in view Automated segmentation 0.04 0.1

0.02

0.05 Percent of Comparisons 0 0.34 0.36 0.38 0.4 0.42

0 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 Hamming Distance

Figure 4.8: Too much masking decreases the degrees of freedom in the nonmatch distribution, causing an increased false accept rate. (This graph shows the trend from the automatically segmented images. The manually corrected segmentation produces the same trend.)

4.8.1 Comparison to Previous Multi-gallery Methods

In biometrics, it has been found that enrolling multiple images improves perfor- mance [14, 21, 94]. Iris recognition is no exception. Many researchers [72, 80, 105] enroll multiple images, obtain multiple Hamming distance scores, and then fuse the scores together to make a decision. However, the different researchers have chosen different ways to combine the information from multiple Hamming distance scores. Let N be the number of gallery images for a particular subject. Comparing a single probe image to the N gallery images gives N different Hamming distance scores. To combine all of the N scores into a single score, Ma et al. [80] took the average Hamming distance. We will call this type of experiment an N-to- 1-average comparison. Krichen et al. [72] also enrolled N gallery images of a

97 Performance of Signal Fusion using Different Masking Levels −1 10 masking 100% masking 80% masking 60% masking 40% masking 20%

−2 False Reject Rate 10

Automated segmentation

−4 −3 −2 −1 10 10 10 10 (a) False Accept Rate

Performance of Signal Fusion using Different Masking Levels −1 10 masking 100% masking 80% masking 60% masking 40% masking 20%

−2 False Reject Rate 10

Manually corrected segmentation

−4 −3 −2 −1 10 10 10 10 (b) False Accept Rate

Figure 4.9: The amount of masking used to create average images affects perfor- mance. When using the manually corrected segmentation, we can use a smaller masking level (masking level = 60%). With the automated segmentation, a higher masking level (masking level = 80%) mitigates the impact of missed eyelid detec- tions.

98 particular subject. However, they took the minimum of all N different Hamming distance scores. We call this type of experiment an N-to-1-minimum comparison.

In our signal-fusion method, we take N frames from a gallery video and do signal-level fusion, averaging the images together to create one single average image. We then take N frames from a probe video and average them together to create a single average image. Thus, we can call our proposed method a signal fusion-1-to-1 comparison. One automatic advantage of the signal fusion method is that storing a single, average-image iris code takes only a fraction of the space of the score-fusion meth- ods. Instead of storing N gallery templates per subject, the proposed method only requires storing one gallery template per subject. In order to compare our method to previous methods, we have implemented the N-to-1-average and N-to-1-minimum methods. For our experiments, we let N = 10. For each of these methods, we used the same data sets. Figure 4.10 shows the detection error tradeoff curves for these experiments and Table 4.1 shows the corresponding statistics for the manually corrected segmentation. As an additional baseline, we also show results for a single-gallery, single-probe experiment (No Fusion). The DET curve shows that the proposed signal fusion method has the lowest false accept and false reject rates of all methods shown here.

We conclude that on our data set, the signal-fusion method generally performs better than the previously proposed N-to-1-average or N-to-1-minimum methods.

In addition, the signal fusion takes 1/N th of the storage and 1/N th of the matching time.

99 Signal−fusion Compared to Previous Methods −1 10

No Fusion

False Reject Rate −2 10 Score Fusion N to 1 average Score Fusion N to 1 minimum Signal Fusion

Automated segmentation

−4 −3 −2 −1 10 10 10 10 (a) False Accept Rate

Signal−fusion Compared to Previous Methods −1 10 No Fusion Score Fusion N to 1 average Score Fusion N to 1 minimum Signal Fusion

False Reject Rate −2 10

Manually corrected segmentation

−4 −3 −2 −1 10 10 10 10 (b) False Accept Rate

Figure 4.10: The proposed signal-fusion method has better performance than using a multi-gallery approach with either an “average” or “minimum” score-fusion rule.

100 TABLE 4.1

SIGNAL-FUSION COMPARED TO PREVIOUS METHODS

Method d′ EER FRR@FAR=0.001

no fusion 4.62 1.56 × 10−2 3.32 × 10−2

score fusion: N-to-1 avg 5.02 8.62 × 10−3 1.90 × 10−2 score fusion: N-to-1 min 5.49 7.55 × 10−3 1.36 × 10−2

signal fusion: 1-to-1 6.06 6.99 × 10−3 1.10 × 10−2

4.8.2 Comparison to Previous Log-Likelihood Method

Schmid et al. [105] enrolled N gallery images of a particular subject and also took N images of a probe subject. The N gallery images and N probe images were paired in an arbitrary fashion and compared. Thus they obtained N different Hamming distance scores. They combined the N different Hamming scores using the log-likelihood ratio. We give a brief summary of the log-likelihood method here. A more detailed description can be found in [105]. Let X1,X2, ..., XN be a sequence of N iriscodes representing a single subject in the gallery. Let Y1,Y2, ..., YN be a sequence of N iriscodes representing a single subject as a probe. Let d = [d1,d2, ..., dN ] be a vector of N Hamming distances formed from these two iriscode sequences. The impostor hypothesis H0 states that the vector d is Gaussian distributed with common unknown mean for all entries m0, and unknown covariance matrix C0.

The genuine hypothesis, H1 states that the vector d is Gaussian distributed with a common unknown mean m1 and unknown covariance matrix C1. Denote p(d|Hi)

101 the conditional probability density function for the vector d under hypothesis Hi. The log-likelihood ratio test statistic is

lN = (1/N)log[p(d|H1)/p(d|H0)]. (4.1)

The statistic lN can be computed as a function of m0, m1,C0,C1, d, and N. The values m0, m1,C0,C1 are obtained using training data, and a vector of Hamming distances d is obtained using testing data. Fractional Hamming distance scores are bounded between zero and one, but log-likelihood test statistics have a wider range. In our experiments we obtained scores between −1.99 to 44.60. Low scores are from impostor comparisons and high scores are from genuine comparisons. The log-likelihood method requires both training and testing data, so we split our gallery and our probe each in half. We used the first half of the gallery videos

(gallery-set-A) and the first half of the probe videos (probe-set-A) for training and obtained a set of maximum-likelihood parameters. Next we compared the second half of the gallery videos (gallery-set-B) and the second half of the probe videos (probe-set-B); applying the the maximum-likelihood parameters to the resulting Hamming distance vectors gave us log-likelihood scores from the test data B.

Of course, it would be better to have as many scores as possible from our data, so we repeated the experiment, this time using set B to train the maximum- likelihood parameters and set A to test. We obtained log-likelihood scores from test data A. We combined all log-likelihood scores and created a DET curve rep- resenting the performance of the log-likelihood method. Table 4.2 gives statistics comparing the log-likelihood method with the signal fusion method for the manually corrected segmentations, and Figure 4.11 shows the DET curves for the comparison. The log-likelihood method has a lower equal

102 TABLE 4.2

SIGNAL-FUSION COMPARED TO LOG-LIKELIHOOD SCORE

FUSION

Method d′ EER FRR@FAR=0.001

log-likelihood 3.90 2.65 × 10−3 9.20 × 10−3

signal fusion 6.06 6.99 × 10−3 1.10 × 10−2 error rate, but the signal fusion method performs better at smaller false accept rates. In addition, the signal fusion takes 1/N th of the storage and 1/N th of the matching time.

4.8.3 Comparing to Large Multi-Gallery, Multi-Probe Methods

The previous subsections compared our signal-fusion method to previously- published methods. Each of those score-fusion methods fused N Hamming dis- tance scores to create the final score. We also wished to consider the situation where for a single comparison, there are N gallery images and N probe images available, and all N2 possible Hamming distance scores are computed and fused. We would expect that the fusion of N2 scores would perform better than the fusion of N scores. Although this multi-gallery, multi-probe fusion is a simple extension of the methods listed in subsection 4.8.1, we do not know of any published work that uses this idea for iris recognition. We tested two ideas: we took the average of all N2 scores, and also the min- imum of all N2 scores. We call these two methods the (1) multi-gallery, multi-

103 Signal Fusion Compared to Log−likelihood Method −1 10 Log−likelihood Signal Fusion

False Reject Rate −2 10

Automated segmentation

−4 −3 −2 −1 10 10 10 10 (a) False Accept Rate

Signal Fusion Compared to Log−likelihood Method −1 10 Log−likelihood Signal Fusion

False Reject Rate −2 10

Manually corrected segmentation

−4 −3 −2 −1 10 10 10 10 (b) False Accept Rate

Figure 4.11: Signal fusion and log-likelihood score fusion methods perform compa- rably. The log-likelihood method performs better at operating points with a large false accept rate. The proposed signal-fusion method has better performance at operating points with a small false accept rate.

104 probe, average method (MGMP-average) and the (2) multi-gallery, multi-probe, minimum method (MGMP-minimum). The MGMP-average method produces impostor Hamming distance distributions with small standard deviations. Using the “minimum” rule for score-fusion produces smaller Hamming distances than the “average” rule. However, both the genuine and impostor distributions are affected. Based on the DET curves (Figure 4.12), we found that for these two multi-gallery, multi-probe methods, the “minimum” score-fusion rule works bet- ter than the “average” rule for this data set. We compared the MGMP methods to the signal fusion method. The signal- fusion method presented in this subsection is unchanged from the previous subsec- tion, but we are presenting the results again, for comparison purposes. Statistics for the signal fusion and the MGMP methods are shown in Table 4.3. The error rates for signal fusion in Table 4.1 and Table 4.3 are the same because we are running the same algorithm on the same data set. Based on the equal error rate and false reject rate, we conclude that the multi- gallery, multi-probe minimum method that we present in this section achieves the best recognition performance of all of the methods considered in this paper. However, the signal-fusion performs well, while taking only 1/N th of the storage and 1/N 2 of the matching time.

4.8.4 Computation Time

In this subsection, we compare the different methods presented in this pa- per in terms of processing time. We have three types of methods to compare: (1) the multi-gallery, multi-probe approaches (both MGMP-average and MGMP- minimum) which require N2 iris code comparisons before fusing values together

105 Signal Fusion Compared to MGMP Methods −1 10 MGMP average MGMP minimum Signal Fusion

False Reject Rate −2 10

Automated segmentation

−4 −3 −2 −1 10 10 10 10 (a) False Accept Rate

Signal Fusion Compared to MGMP Methods −1 10 MGMP average MGMP minimum Signal Fusion

False Reject Rate −2 10

Manually corrected segmentation

−4 −3 −2 −1 10 10 10 10 (b) False Accept Rate

Figure 4.12: The MGMP-minimum achieves the best recognition performance of all of the methods considered in this paper. However, the signal-fusion performs well, while taking only 1/N th of the storage and 1/N 2 of the matching time.

106 TABLE 4.3

SIGNAL-FUSION COMPARED TO MULTI-GALLERY,

MULTI-PROBE SCORE FUSION

Method d′ EER FRR@FAR=0.001

MGMP-average 5.32 5.47 × 10−3 1.17 × 10−2

MGMP-minimum 6.51 1.60 × 10−3 3.08 × 10−3

signal fusion 6.06 6.99 × 10−3 1.10 × 10−2 to create a single score; (2) the multi-gallery approaches (Ma and Krichen) which compare N gallery iris codes to one probe before fusing scores together; and (3) the signal-fusion approach which first fuses images together, and then has a single iris code comparison. For this analysis, we first define the following variables. Let P be the prepro- cessing time for each image, I be the iris code creation time, and C be the time required for the XOR comparison of two iris codes. Let N be the number of images of a subject in a single gallery entry for the multi-gallery methods. Let A be the time required to average N images together (to perform signal-fusion). Finally, suppose we have an application such as in the United Arab Emirates where each person entering the country has his or her iris compared to a watchlist of one million people [30]. For this application, let W be the number of people on the watchlist. Expressions for the computation times for all three methods are given in terms of these variables in Table 4.4.

The multi-gallery, multi-probe methods must do preprocessing and iris code

107 TABLE 4.4

PROCESSING TIMES FOR DIFFERENT METHODS

Method Gallery Probe Comparison Total Preprocessing Preprocessing to Watchlist Time

MGMP NP+NI NP+NI WCN2 1008.9 s = 4.46 s = 4.46 s = 1000 s

Multi- NP+NI P+I = 0.446 s WCN 104.9 s gallery = 4.46 s = 100 s

Signal- NP+A+I NP+A+I WC=10s 17.09s fusion = 3.55 s = 3.55 s creation for N images to create one gallery entry. Thus, the gallery preprocessing time for one gallery subject is NP+NI. They also preprocess and create N iris codes for a probe subject, so the probe preprocessing time is also NP+NI. To compare a single probe entry to a single gallery entry takes CN2 time because there are N2 comparisons to be done. To compare a probe to the entire watchlist takes WCN2 time. Similar logic can be used to find expressions for the time taken for the other two methods. All such expressions are presented in Table 4.4. From Daugman’s work [28], we can see that typical preprocessing time for an image is 344 ms. He also notes that iris code creation takes 102 ms and an XOR comparison of two iris codes takes 10 µs. Throughout this paper, we have used ten images for all multi-gallery experiments. The time to compute an average image from ten preprocessed images is 5 ms. Lastly, we know that the United Arab Emirates watchlist contains one million people. By substituting these numbers in for our variables, we found the processing time for all of our three types of

108 methods. These numeric values are also presented in Table 4.4. A graph of the total computation time for these methods over a number of different sizes of watchlist is shown in Figure 4.13. From this analysis it is clear, that although a multi-gallery, multi-probe method may have some performance improvements over the signal fusion method, it comes at a high computational cost.

Time Required to Compare One Probe to a Watchlist 1000 M−G, M−P Score Fusion N to 1 Signal Fusion 800

600

Seconds 400

200

0 1 2 3 4 5 6 10 10 10 10 10 10 Size of Watchlist (People)

Figure 4.13: Even though a large multi-gallery, multi-probe experiment achieves better recognition performance, it comes at a cost of much slower execution time. The proposed signal fusion method is the fastest method presented in this paper, and it achieves better recognition performance than previously published multi- gallery methods.

109 4.9 Discussion

We performed fusion of multiple biometric samples at the signal level. Our signal fusion approach utilizes information from multiple frames in a video. We were the first to publish research that used video to improve iris recognition per- formance [54]. Our experiments show that using average images created from ten frames of an iris video performs very well for iris recognition. Average images perform better than (1) experiments with single stills and (2) experiments with ten gallery images compared to single stills. Our proposed multi-gallery, multi- probe minimum method achieves slightly better recognition performance than our proposed signal-fusion method. However, the matching time and memory require- ments are lowest for the signal-fusion method, and the signal-fusion method still performs better than previously published multi-gallery methods.

110 CHAPTER 5

IRIS BIOMETRICS ON TWINS

Prior research has shown that the textural detail of the iris is sufficiently distinctive to distinguish identical twin siblings. However, no research has ad- dressed the question of whether twins’ irises are sufficiently similar in some sense to correctly determine that two irises are from twins. We conducted a human classification study in which participants were asked to label pairs of iris images as “twins” or “unrelated”. This study shows that there is information in the iris appearance that is not captured by iris codes. This information could potentially be used for forensic applications to show genetic relationships between different eye images. Portions of this chapter are reprinted, with permission, from the

Proc. IEEE Computer Vision and Pattern Recognition Biometrics Workshop [51] ( c 2010, IEEE).

5.1 Motivation

Iris biometrics systems exploit textural details on the iris that have been shown to be independent even between irises of genetically identical individuals. There- fore, automated iris biometrics systems can distinguish between identical twins. Companies selling biometrics products appropriately focus on the differences in iris texture without considering similarities in iris appearance. L-1 Identity Solu- tions, a company that licenses the algorithms developed by Dr. John Daugman,

111 advertises that “No two irises are alike. There is no detailed correlation between the iris patterns of even identical twins, or the right and left eye of an individ- ual” [73]. Unfortunately, laypersons incorrectly infer from such statements that the textures in images of irises of genetically identical humans have no similar- ities. Wikipedia reports, “Even genetically identical individuals have completely independent iris textures” [60] (emphasis added). The computer vision and bio- metrics research communities have not done any previous research investigating the similarities between genetically identical irises. There is information in iris appearance that is not captured by iris codes. Experiments described in this chapter show that untrained human observers can detect the similarities in genetically related irises. This fact suggests a couple of implications. First, the degree of similarity between twins’ irises is an impor- tant privacy concern. Managers of an iris biometrics database may assume that because an iris image is not labeled with a name, that the image does not need to be encrypted or highly protected. However, if a hacker can determine genetic relation simply from iris images, database managers should use caution to protect the images as much as possible. Second, if it is possible for humans to determine genetic relation from a pair of iris images, it may be possible to design a com- puterized texture analysis system that could detect the traits that humans are identifying. We envision a computerized system to predict whether or not two iris images represent genetically related persons. When considering templates created by iris biometric systems, we agree with other researchers that identical twins’ templates are no more similar than those of unrelated persons’. But, when we do an analogous experiment with human observers we get a very different result. This raises the possibility that a differ-

112 ent kind of texture analysis algorithm could answer questions that current iris biometric texture analysis cannot. Based on our human observer results, it is reasonable to look for an automated texture analysis that could say whether or not iris images come from identical twins. The remainder of this chapter is organized as follows. Section 5.2 summarizes related research including iris biometric experiments on twins, and experiments to see what additional information (i.e. gender, race) can be determined from iris texture. Section 5.3 describes our data acquisition and image segmentation. Section 5.4 corroborates other researchers’ claims that iris biometrics algorithms generate templates that encode differences in the texture of twins. Section 5.5 explains our new experiments to test how much similarity humans can detect between identical twins’ irises. Section 5.6 provides a summary and conclusion.

5.2 Related Work

A small number of iris papers have reported on experiments involving twins. Daugman reports that “about 1% of all persons in the general population have an identical twin” [28]. Flom and Safir, who held the first patent discussing the concept iris recognition (1987), asserted that twins irises are different: “Not only are the irises of the eyes of identical twins different, but the iris of each eye of any person is different from that of his other eye” [41]. However, Flom and Safir did not have an iris biometrics implementation, so their claim is based on ophthalmologic observations rather than biometric experimentation. Daugman’s seminal paper on iris recognition from 1993 reported that iris texture develops randomly. He stated, “a property the iris shares with fingerprints is the random morphogenesis of its minutiae. Because there is no genetic penetrance in the expression of this

113 organ beyond its anatomical form, physiology, color and general appearance, the iris texture itself is stochastic or possibly chaotic. Since its detailed morphogenesis depends on initial conditions in the embryonic mesoderm from which it develops, the phenotypic expression even of two irises with the same genetic genotype (as in identical twins, or the pair possessed by one individual) have uncorrelated minutiae” [25]. In 1997, Wildes et al. reported on iris biometrics experiments that contained twin data. Their experiments used data from 60 different irises from 40 people, but it is not clear how many of those people were twins. They report, “Of note is the fact that this sample included identical twins. ... There were no observed false positives or false negatives in the evaluation of this corpus of data. In this case, statistical analysis was eschewed owing to the small sample size. At a qualitative level, however, the data for authentics and imposters were well separated” [120]. Daugman’s later papers reported on experiments done on genetically identical eyes. He “compared genetically identical eyes ... in order to discover the degree to which their textural patterns were correlated and hence genetically determined. A convenient source of genetically identical irises are the right and left pair from any given person; such pairs have the same genetic relationship as the four irises of monozygotic twins, or indeed the prospective 2N irises of N clones. Although eye color is of course strongly determined genetically, as is overall iris appearance, the detailed patterns of genetically identical irises appear to be as uncorrelated as they are among unrelated eyes. ... 648 right/left iris pairs from 324 persons were compared pairwise. Their mean HD was 0.497 with standard deviation 0.031, and their distribution ... was statistically indistinguishable from the distribution for unrelated eyes. A set of six pairwise comparisons among the eyes of actual

114 monozygotic twins also yielded a result (mean HD = 0.507) expected for unrelated eyes. It appears that the phenotypic random patterns visible in the human iris are almost entirely epigenetic” [28]. A recent paper by Sun et al. [111] evaluated performance of iris, fingerprint, and face biometric systems on a set of twins data. Their data set contained 51 pairs of identical twins and 15 pairs on non-identical twins for a total of 66 twin families. They generated biometric scores from twin comparisons and from unrelated-person comparisons, and found that “the identical twin impostor dis- tribution is very similar to the general impostor distribution. However, the peaks that are present in the identical twin impostor distribution tail may indicate that the irises of identical twins have some correlation” [111]. Any difference between twins and the general population was very small. They graphed ROC curves show- ing performance of a twins experiment compared to an unrelated-impostor exper- iment and concluded that “there is no significant difference in the performance of the biometric system for the identical twin data and for the general data, which means the iris biometric system can distinguish identical twins as much as it can distinguish any two different persons who are not identical twins” [111]. The above quotations demonstrate that at least four different groups of re- searchers – Flom and Safir [41], Daugman [28], Wildes et al. [120], and Sun et al. [111] – have found twins’ irises to be distinct according to current iris recog- nition algorithms. None of these researchers investigate the similarities between twins’ iris texture. No prior work has considered similarities in iris texture between genetically related people [16]. However, there has been work investigating whether gender or race can be predicted from iris texture. Thomas et al. [113] used decision trees

115 to classify irises as being from male or female subjects. They used two different types of features. First, they used geometric features such as distance between the detected iris center and pupil center, and difference in iris area and pupil area. Second, they used texture-based features such as the mean and standard deviation of filter responses along rows of an “unwrapped” iris image. Using these features, they achieved close to 80% accurate gender prediction. Qiu et al. [98] used an Adaboost algorithm to predict whether irises came from Asian or non- Asian subjects. They achieved 85.95% correct prediction rate. These two papers do not directly relate to identifying twins from iris texture. However, they show that it is possible to predict information about a subject based on iris texture alone.

5.3 Data

To obtain data from a large number of twins, we attended the Twins Days festival in Twinsburg, Ohio in August 2009 [114]. Twins Days is the largest annual gathering of twins in the world, and therefore a logical place to gather biometric data from twins. Video data of irises was collected using an LG 2200 EOU camera attached to a Phillips DVDR3576H digital video recorder. The analog signal from the camera was captured, digitized, and stored using a high bit rate (effectively lossless) compressed MP4 format. No DNA testing was required for the twins to participate in the biometric video acquitions. However, the twins were asked to report whether they were identical or fraternal twins. Of the 98 twin pairs that came to our research booth,

84 said that they were identical twins, 9 reported that they were fraternal twins, and 5 reported that they did not know.

116 From the collection of videos of self-reported identical twins, we discarded videos of subjects wearing glasses, hard contacts, or patterned contacts [4]. We also discarded videos where the light from the sun had saturated the sensor, result- ing in poor-contrast video. Videos from the remaining 76 pairs of identical twins (152 people) were used in our experiments. The largest iris biometrics experiment previously published contained 51 pairs of identical twins [111]. Therefore, our data set is about fifty percent larger than the data set in the largest previous study on twin iris biometrics. Our experiment uses pairs of images from twins and pairs from unrelated people. We captured enough data during the Twins Days festival to add an additional 44 people to our experiment, for a total of 196 people, or 392 distinct eyes. Some subjects participated on two days, so we have 450 total videos that we used for the experiments in this paper.

5.3.1 Frame Selection

Each video of iris data contained frames of varying quality. We used a computer program to help us select which frames to use in our experiments. First, to avoid using unusually dark frames, our software automatically rejected all frames with an average intensity less than a threshold of 115. Second, our software used a Fourier transform to detect and reject frames with high-frequency noise. From the remaining frames, our software selected the ten most in-focus frames in each video. From 450 videos, we selected 4500 frames. Unfortunately, one video had several imaging artifacts, so we only had 4494 usable frames. We used these frames in a small experiment to show that current iris biometrics algorithms use detailed texture information that is capable of distinguishing between twins (Section 5.4).

117 Figure 5.1: Images of the left eyes of two identical twins. Notice the similarities in overall iris texture, and also the similarities in the appearance of the periocular region.

We examined all 4494 frames, and hand-selected one frame from each eye to use in our queries to our human testers (Section 5.5). We chose frames that had the least eyelid occlusion obscuring the iris. We also favored irises centered in the frame. An example pair of images used in our human-tester experiment is shown in Figure 5.1. For one video, the ten automatically-selected frames were not frontal images, so we hand-picked a frame from the original video that presented a clear frontal iris image.

5.3.2 Segmentation

For our experiments, we needed to accurately locate the iris in each image. We

first used our automatic segmentation software, which uses active contours to find the inner and outer iris boundaries. Since our automatic segmentation does not always segment the image correctly, we hand-checked all of the segmentations. If our software had made an error in finding the inner or outer iris boundary, we manually marked the center and a point on the boundary to identify the correct center and radius of an appropriate circle. If the software had made an error in finding the eyelid, we marked four points along the boundary to define three line

118 (a) (b)

Figure 5.2: Images of irises from identical twins. We segmented the images so that our testers would only see the iris, and therefore they could not use periocular features to help them decide whether two irises were from twins. segments approximating the eyelid contour. Our primary goal in presenting iris images to humans was to answer the ques- tion, “Can humans determine whether two irises are from identical twins?” We did not want eyelashes, eyelids, or other features around the iris to appear in our images, because those features might influence our testers’ responses. For all seg- mented iris images, we set all pixels outside the iris region to black. We colored the pupil and the eyelid black as well. An example of a pair of twins’ images with our hand-segmentation is shown in Figure 5.2. We hand-marked the eyelid in both 5.2(a) and 5.2(b). Figure 5.3(a) shows an iris where we used the original active contour segmentation and did not correct the eyelid.

5.4 Biometric Performance on Twins’ Irises

Our data set of iris images from 76 pairs of identical twins gives us an oppor- tunity to verify previous claims that identical twins’ irises are different. We used

119 (a) (b)

Figure 5.3: Images of irises from unrelated people. our iris biometrics software to generate templates or “iris codes” from 4494 iris images. Next, we compared each template with every other template, in an “all- vs-all” experiment. We computed the fractional Hamming distance between each pair of iris codes, then normalized the scores based on the number of unmasked bits used in each comparison, using the score normalization technique proposed by Daugman [29]. Figure 5.4 shows the distributions of normalized Hamming distances from our experiment. We assumed that the system could know whether an image was a left eye or a right eye, so all scores included in these distributions are either comparisons of a left eye vs. a left eye, or comparisons of a right eye vs. a right eye. The blue histogram shows authentic comparisons. This histogram contains scores from comparisons where an iris image is compared to other images of the same iris. The black histogram shows impostor comparisons of twins. In other words, the histogram contains scores from comparisons where a person’s eye was compared to an eye from that person’s twin. The red histogram shows impostor comparisons

120 Score Distributions 0.05 Authentic 0.045 Twin impostor Non−twin impostor 0.04

0.035

0.03

0.025

0.02

0.015

Fraction of Comparisons 0.01

0.005

0 0 0.1 0.2 0.3 0.4 0.5 0.6 Normalized Hamming Distance

Figure 5.4: A histogram of Hamming distance scores between twins looks similar to a histogram of Hamming distance scores between non-twins. from non-twins. That is, the histogram contains scores from comparisons where a person’s eye was compared to an eye from an unrelated person.

The histograms do not show perfect separation between the authentic and the impostor histograms, but it is clear that the twin impostor histogram is quite simi- lar to the non-twin impostor histogram. This result agrees with others’ claims that iris biometrics systems can differentiate between twins [28, 41, 111, 120]. A larger data set might make the twin impostor and the non-twin impostor histograms match more closely.

121 5.5 Similarities in Twins’ Irises Detected by Humans

5.5.1 Experimental Setup

Our data set contained a total of 196 people, enough for 98 query pairs without using a subject in more than one query. We decided to have 49 queries where the images were twins, and 49 queries where the images in the pair were from unrelated people. To create our query pairs, we randomly selected 49 pairs of twins from the list of identical twin pairs. These twins were used in twin queries.

The remaining twins and the other subjects were used to create unrelated-person query pairs. These subjects were paired randomly, and then each pair was checked to ensure that the two subjects in each of these pairs were not twins. An example of an unrelated pair of irises is shown in Figure 5.3. In the end, we had 98 pairs of images to present to our testers, with exactly 49 of those pairs containing matching identical twins and 49 containing unrelated persons. No subject appeared more than once in all the iris image pairs, and therefore a total of 196 people were represented. The demographic information for our subjects is shown in Table 5.1.

In addition to our original question about whether humans could pick out twins’ irises, we also decided to ask our testers whether they could pick out twins using the features in the eye image that were not part of the iris. From the same 196 subjects, we again randomly selected 49 pairs of identical twins and constructed 49 twin pairs of periocular images (Figure 5.5). We randomly paired the remaining subjects to create unrelated-person periocular queries. No subject appeared more than once in all the periocular image pairs. There is not a one-to-one correspondence between the iris pairs and the perioc- ular pairs used in our experiment. We selected the iris pairs at random, and then

122 TABLE 5.1

DEMOGRAPHIC INFORMATION OF SUBJECTS

Total number of subjects 196

Number of self-reported identical twins 152

Additional subjects imaged on the same day 44

Number of males 47

Number of females 149

White 171

Black or African-American 24

Hispanic 1

Age18-20 17 Age21-30 60

Age31-40 21

Age41-50 32

Age51-60 31

Age61-70 26

Age 71-80 9

123 Figure 5.5: We wanted to know whether humans could identify twins based on periocular information. We created images where the iris was blacked-out so that our testers would be forced to use periocular features to make a judgment. This is an example pair of images. These images are from identical twins. re-ran our script to select the periocular pairs at random. Since we had a limited amount of data, there are some queries where a twin’s right iris appeared the iris portion of the experiment, and later the twin’s right periocular region appeared in the periocular portion of the experiment. In other instances we presented twins’ right eyes in the iris portion, and used left eyes in the periocular portion. In other cases, a twin’s eye appeared in a twin pair in the iris section (paired with his twin, of course), and then appeared in an unrelated-person query in the periocular por- tion of the experiment (paired with a different, randomly-selected but unrelated subject). We used this strategy to maximize the randomization in the selected pairs. This strategy does prohibit some analysis that we could have performed if we had used a one-to-one correspondence of images in the iris and periocular portions of the experiment. We utilized a graphical user interface to present image pairs to our testers. This software displayed instructions and 12 example image pairs to familiarize users with the task. The examples included three pairs of iris images from twins, three pairs of periocular images from twins, three pairs of iris images from unrelated

124 people, and three pairs of periocular images from unrelated people. Next, the software presented the 98 iris image pairs. The 98 pairs were presented in a different random order each time the program was run. Each image pair was displayed for three seconds. After each pair, the program asked “Were these images from identical twins?” Five responses were possible: (1) Certain these images were from identical twins, (2) Likely they were from identical twins, (3)

Can’t tell, (4) Likely they were NOT from identical twins, or (5) Certain they were NOT from identical twins. After the user responded, the software revealed whether the user was correct or not. Then the user could click a button to continue to the next image pair. Once all the iris image pairs were presented, the 98 periocular image pairs were presented. These pairs were also presented in a different random order each time the program was run. We chose to present all of the iris image pairs first, and all of the periocular image pairs second so that our testers would not be confused switching between types of questions. Our primary goal was to answer the question, “can humans determine whether two irises are from identical twins?” We did not want the presence of the periocular image pairs to affect our iris experiment. However, we were also interested in some secondary questions: “can humans determine whether two eye images are from identical twins by looking at the periocular region alone?” and “is the iris or the periocular region more useful for identifying twins?” Presenting the periocular image pairs gave us the opportunity to study these questions. Our experiment is not a perfect way to test the difference between the power of the iris and the periocular region, because testers might perform better on the later queries than on earlier ones as they became more familiar with test format. Nevertheless, our experiment still provides a suggestion of which is

125 better. We solicited volunteers to participate in our experiment, and twenty-eight peo- ple signed up. Volunteers were offered ten dollars to participate, and an additional ten dollars if they could correctly categorize 80 percent or more of the image pairs.

5.5.2 Results

5.5.2.1 Can Humans Identify Twins from Iris Texture Alone?

To find an overall accuracy score, we counted the number of times the tester was “likely” or “certain” of the correct response; that is, we made no distinction based on the tester’s confidence level, only on whether they believed a pair to be twins when they were twins, or believed a pair to be unrelated when they were unrelated. We divided the number of correct responses by 98 (the total number of iris queries) to yield an accuracy score. The average percent correct on the iris portion of the experiment was 81.3% (standard deviation 5.2%). The minimum score was 68.4%, and the maximum score was 89.8%. We used a t-test to evaluate the null hypothesis that humans did not perform differently than random guessing. The resulting p-value was less that 10−4. Thus, we have statistically significant evidence that our testers were doing better than random.

5.5.2.2 Can Humans Identify Twins from Periocular Information Alone?

We might expect that our testers would be more familiar with the test format and perform better on the periocular queries than the iris queries. In fact, the reverse was true. The average percent correct on the periocular queries was 76.5% (standard deviation 5.1%). This result suggests that for our data, the iris has better information for identifying twins than does the periocular region. See

126 section 5.5.2.4 for more discussion of this idea. The minimum score on the periocular portion was 63.3% and the maximum score was 86.7%. A t-test showed that this result is statistically better than random guessing (p-value < 10−4).

5.5.2.3 Did Humans Score Higher on Queries where They Felt More Certain?

As mentioned above in section 5.5.1, our testers had the option to mark (1) Certain these images were from identical twins, (2) Likely they were from identical twins, (3) Can’t tell, (4) Likely they were NOT from identical twins, or (5) Certain they were NOT from identical twins. Some testers were more “certain” than others. One tester responded “certain” for 64 of the 98 iris queries and 57 of the 98 periocular queries. At the other extreme, one tester responded “certain” for only one of the iris queries and none of the periocular queries. The average number of “certain” responses on the iris portion of the test was 29.2 out of 98 (standard deviation 17.1). The average number of “certain” responses on the periocular portion of the test was 25.6 out of 98 (standard deviation 17.9). Out of the queries that testers were “certain” about, the average percent cor- rect on the iris portion was 92.1% (standard deviation 18.4%). On the periocular, the average percent correct, excluding the three subjects who were never certain, was 93.4% (standard deviation 5.8%). Therefore, the testers obviously scored better on the subset of the queries where they felt “certain” of their answer.

5.5.2.4 Is It Easier to Identify Twin Pairs Using Iris Data or Periocular Data?

The majority of testers, 20 out of 28, performed better on the iris portion of the experiment. One tester scored the same on both portions, and seven testers

127 performed better on the periocular potion. We found the difference between the iris accuracy score and the periocular score for each tester. The average difference was 4.9% (standard deviation 6.2%). The minimum difference was -4.1%, meaning that one subject scored about 4% better on the periocular queries compared to the iris queries. The maximum difference was 17.4%, meaning that one subject scored over 17% better on the iris portion. We used a paired t-test to test the null hypothesis that the scores on the iris portion and the scores on the periocular portion came from distributions with equal means. The p-value for the test was 0.0003. Thus, there is a statistically significant difference between the scores on the two portions. This result suggests that for our data, the iris appearance was more valuable than the periocular appearance for identifying twin pairs.

5.5.2.5 Did Subjects Score Better on the Second Half of the Iris Test than the First Half?

We computed the difference between the accuracy on the second half of the iris queries and the first half of the iris queries. That is, we found the accuracy of the second 49 questions and subtracted the accuracy of the first 49 questions. We found this difference for each tester, then computed the average difference across all 28 testers. The average difference was 1.2% (standard deviation 7.4%). The minimum difference was -12.2% and the maximum difference was 18.4%. Thirteen of the 28 subjects performed better on the second half of the iris queries; eleven did worse, and four stayed the same. Since the average difference is positive, this might suggest that there was some learning as the test progressed. However, the average difference is small compared to the standard deviation. A one-tailed t-test shows that the difference is not

128 statistically significant (p-value 0.2064).

5.5.2.6 Did Subjects Score Better on the Second Half of the Periocular Test than the First Half?

We wanted to acertain whether the testers learned during the periocular por- tion of the exam. To answer this question, we computed the difference between the accuracy on the second half of the iris queries and the first half of the iris queries. The average difference was -0.1% (standard deviation 9.5%). Fourteen subjects performed better on the second half of the periocular queries, thirteen performed worse, and one subject’s performance did not change. It seems that any ideas the testers might have “learned” in the first 49 periocular queries did not help in the second 49 periocular queries.

The lack of improvement between the first and second halves of the periocular test does not necessarily imply that humans cannot learn from viewing additional periocular images. It may be that humans need to see a larger number of examples to learn the most effective features for discrimination. A number of our images showed twin eyes with similar mascara or eyeliner. However, there were also twin pairs where only one of the two twins had eye make-up. A tester viewing the twin pairs with matching make-up might falsely assume that all twins in the data set had similar make-up. Thus there may be instances of false learning which were not corrected in the first 49 image pairs and resulted in incorrect responses later.

129 Figure 5.6: All 28 testers correctly classified this pair of images as being from identical twins.

Figure 5.7: All 28 testers correctly classified this pair of images as being from identical twins.

5.5.2.7 Which Image Pairs Were Most Frequently Classified Correctly, and Which Pairs Were Most Frequently Classified Incorrectly?

One pair of twins’ irises was classified correctly by all 28 testers. This pair is shown in Figure 5.6. Six pairs of twins’ periocular images were classified correctly by all 28 testers. An example is shown in Figure 5.7. There were ten pairs of unrelated iris images that were classified correctly by all 28 testers. An example is shown in Figure 5.8. There was also one pair of unrelated periocular images that was classified correctly by all testers. This pair is shown in Figure 5.9.

130 Figure 5.8: All 28 testers correctly classified this pair of images as being from unrelated people.

Figure 5.9: All 28 testers correctly classified this pair of images as being from unrelated people.

131 Figure 5.10: Twenty-five of 28 people incorrectly guessed that these images were from unrelated people. In fact these irises are from identical twins. The difference in dilation makes this pair particularly difficult to classify correctly.

Figure 5.10 shows the image pair that was most frequently classified incorrectly. Twenty-five of the 28 subjects incorrectly guessed that these images were from unrelated people. One of the challenges with this pair of images is the significant difference in pupil radius. Of all unrelated-person pairs, the one most frequently misclassified is shown in Figure 5.11. The challenge with this pair is that both of the irises have fairly uniform texture.

5.5.2.8 Is It More Difficult to Label Twins as Twins than It Is to Label Unrelated People as Unrelated?

As mentioned in section 5.5.2.7, only one pair of twins’ irises was classified correctly by all 28 testers, yet ten pairs of unrelated iris images were classified correctly by all 28 testers. This finding prompts a question – is it harder to label twins as twins than it is to label unrelated people as unrelated? When we consider the scores over all iris images, this seems to be true. Forty-nine queries presented pairs of twins’ irises. The average percent correct on those queries

132 Figure 5.11: Twenty-four of 28 people incorrectly guessed that these images were from twins, when in fact, these irises are from unrelated people. The smoothness of the texture makes this pair difficult to classify correctly. was 79.4% (standard deviation 8.2%). A different forty-nine queries presented pairs of unrelated irises. The average percent correct on those queries was 83.3% (standard deviation 6.5%). We used a paired t-test to evaluate the null hypothesis that the scores on the twin queries and the scores on the unrelated queries came from distributions of equal mean. The resulting p-value was 0.059 which is not a strongly significant result, but it does indicate some evidence to show that there is a difference. It seems easier to label unrelated irises as unrelated. We considered the same question for the periocular images. There were forty- nine queries which presented pairs of twins’ periocular regions. The average per- cent correct on those queries was 75.9% (standard deviation 6.8%). The average percent correct on the unrelated periocular regions was 75.2% (standard deviation 8.5%). This difference was not significant (p-value 0.382).

133 5.6 Discussion

We have found that when presented with unlabeled twin and non-twin image pairs in equal numbers, humans can classify pairs of twins with 81% accuracy using only the appearance of the iris. Furthermore, humans can classify pairs of twins with 76% accuracy using only the appearance of the periocular region. Our testers achieved these results using only a three-second display of each image pair. For the subset of the data where our testers felt more certain, the accuracy was even better: 92% on the iris portion and 93% on the periocular portion. For our data, the iris appearance was more valuable than the periocular ap- pearance for identifying twin pairs. The pair of twin iris images most frequently misclassified had noticeable difference in pupil size between the two images; this suggests that it is likely easier to identify twins’ irises when the irises have similar degrees of dilation. There is a small amount of evidence that it is easier to label an unrelated iris pair as “unrelated” than it is to label a twin pair as “twins”, at least for our data. The majority of testers scored better on the second half of the iris test than the first half, but the improvement was not statistically significant. Similarly, there was no statistically significant evidence of learning on the periocular portions of the test. Our testers clearly performed well above random guessing. Therefore, we can conclude that there are similarities in twin iris texture that untrained human testers can detect. We anticipate that humans can surpass the performance re- ported in this paper, if given a longer time to study the images and if given the entire eye image rather than the iris or periocular region alone. This suggests that human examination of pairs of iris images for forensic purposes may be feasible.

134 Our results also suggest that development of different approaches to automated iris image analysis may be useful.

135 CHAPTER 6

PERIOCULAR BIOMETRICS

The previous chapters investigated additional information that we could use in the iris region. Iris biometrics systems typically disregard all information outside the iris region when making decisions about identity. This chapter investigates what additional information we can gain from the periocular region.

6.1 Motivation

The periocular region is the part of the face immediately surrounding the eye. While the face and the iris have both been studied extensively as biometric char- acteristics, the use of the periocular region for a biometric system is an emerging field of research. Periocular biometrics could potentially be combined with iris biometrics to obtain a more robust system than iris biometrics alone. If an iris biometrics system captured an image where the iris image was poor quality, the region surrounding the eye might still be used to confirm or refute an identity. A further argument for researching periocular biometrics is that current iris biometric systems already capture images containing some periocular information, yet when making recognition decisions, they ignore all pixel information outside the iris region. The periocular area of the image may contain useful information that could improve recognition performance, if we could identify and extract useful features in that region.

136 A few papers [1, 84, 91, 121] have presented algorithms for periocular recog- nition, but their approaches have relied on general computer vision techniques rather than methods specific to this biometric characteristic. One way to begin designing algorithms specific to this region of the face is to examine how humans make recognition decisions using the periocular region. Other computational vision problems have benefitted from a good understand- ing of the human visual system. In a recent book chapter, O’Toole [20] says, “Collaborative interactions between computational and psychological approaches to face recognition have offered numerous insights into the kinds of face represen- tations capable of supporting the many tasks humans accomplish with faces” [20].

Sinha et al. [110] describe numerous basic findings from the study of human face recognition that have direct implications for the design of computational systems. Their report says “The only system that [works] well in the face of [challenges like sensor noise, viewing distance, and illumination] is the human visual system. It makes eminent sense, therefore, to attempt to understand the strategies this biological system employs, as a first step towards eventually translating them into machine-based algorithms” [110]. In this study, we investigated which features humans found useful for making decisions about identity based on periocular information. We presented pairs of periocular images to testers and asked them to determine whether the two images were from the same person or from different people. We also asked them to describe what features in the images were helpful to them in making their decisions. We found that the features that humans found most helpful were not the features that the current periocular biometrics work uses. Based on this study, we anticipate that explicit modeling and description of eyelids, eyelashes, and tear

137 ducts could yield more recognition power than the current periocular biometrics algorithms published in the literature.

The rest of this chapter is organized as follows. Section 6.2 summarizes the previous work in periocular biometrics. Section 6.3 describes how we selected and pre-processed eye images for our experiment. Our experimental method is outlined in Section 6.4. Section 6.5 presents our analysis. Finally, Section 6.6 presents a summary of our findings and a discussion of the implications of our experiment.

6.2 Related Work

As mentioned above, face recognition and iris recognition have both been re- searched extensively [16, 125]. In contrast, the field of periocular biometrics is in its infancy, and only a few authors have published in the area. A pioneering paper by Park et al. [91] presented a feasibility study for the use of the periocular biometrics. The authors implemented two methods for analyzing the periocular region. In their “global method”, they used the location of the iris as an anchor point. They defined a grid around the iris and computed gradient orientation histograms and local binary patterns for each point in the grid. They quantized both the gradient orientation and the local binary patterns (LBP) into eight dis- tinct values to build an eight-bin histogram, and then used Euclidean distance to evaluate a match. Their “local method” involved detecting key points using a SIFT matcher. They collected a database of 899 high-resolution visible-light face images from 30 subjects. A face matcher gave 100% rank-one recognition for these images, and the matcher that used only the periocular region gave 77%. Another paper by Miller et al. also used LBP to analyze the periocular re-

138 gion [84]. They used visible-light face images from the Facial Recognition Grand Challenge (FRGC) data and the Facial Recognition Technology (FERET) data.

The periocular region was extracted from the face images using the provided eye center coordinates. Miller et al. extracted the LBP histogram from each block in the image and used City Block distance to compare the information from two im- ages. They achieved 89.76% rank-one recognition on the FRGC data, and 74.07% on the FERET data. Adams et al. [1] also used LBP to analyze periocular regions from the FRGC and FERET data, but they trained a genetic algorithm to select the subset of features that would be best for recognition. The use of the genetic algorithm increased accuracy from 89.76% to 92.16% on the FRGC data. On the FERET dataset, the accuracy increased from 74.04% to 85.06%. While Park et al., Miller et al., and Adams et al. all used datasets of visible- light images, Woodard et al. [121] performed experiments using near-infrared (NIR) light images from the Multi-Biometric Grand Challenge (MBGC) portal data. The MBGC data shows NIR images of faces, using sufficiently high res- olution that the iris could theoretically be used for iris recognition. However, the portal data is a challenging data set for iris analysis because the images are acquired while a subject is in motion, and several feet away from the camera.

Therefore, the authors proposed to analyze both the iris and the periocular re- gion, and fuse information from the two biometric modalities. From each face, they cropped a 601x601 image of the periocular region. Their total data set con- tained 86 subjects’ right eyes and 88 subjects’ left eyes. Using this data, the authors analyzed the iris texture using a traditional Daugman-like algorithm [28], and they analyzed the periocular texture using LBP. The periocular identification

139 performed better than the iris identification, and the fusion of the two modalities performed best.

One difference between our work and the above mentioned papers is the target data type (Table 6.1). The papers above all used periocular regions cropped from face data. Our work uses near infrared images of a small periocular region, from the type of image we get from iris cameras. The anticipated application is to use periocular information to assist in iris recognition when iris quality is poor. Another difference between our work and the above work is the development strategy. The papers mentioned above used gradient orientation histograms, local binary patterns, and SIFT features. These authors have followed a strategy of applying common computer vision techniques to analyze images. We attempt to approach periocular recognition from a different angle. We aim to investigate the features that humans find most useful for recognition in near infrared images of the periocular region.

6.3 Data

In selecting our data, we considered using eye images taken from two different cameras: an LG2200 and an LG4000 iris camera. The LG2200 is an older model, and the images taken with this camera sometimes have undesirable interlacing or lighting artifacts [15]. On the other hand, in our data sets, the LG4000 images seemed to show less periocular data around the eyes. Since our purpose was to investigate features in the periocular region, we chose to use the LG2200 images so that the view of the periocular region would be larger. We hand-selected a subset of images, choosing images in good focus, with minimal interlacing and shadow artifacts. We also favored images that included both the inner and outer corners

140 TABLE 6.1

PERIOCULAR RESEARCH

Paper Data Algorithm Features

Park 899 visible light Gradient orientation Eye region with

[91] face images histograms width:

30 subjects Local binary patterns 6*iris-radius

Euclidean distance height:

SIFT matcher 4*iris-radius

Miller FRGC data and Local binary patterns Skin [84] FERET data: City block distance

visible light

face images

464 subjects

Adams Same as Miller et al. Local binary patterns Skin

[1] Genetic algorithm

Woodard MBGC data: near Local binary patterns Skin

[121] infrared face images, Result fused with iris 88 subjects matching results

This work Near infrared iris Human analysis Eyelashes,

images from LG 2200 Tear duct,

camera, 120 subjects Eyelids, and

Shape of eye

141 of the eye. We selected images from 120 different subjects. We had 60 male subjects and

60 female subjects. 108 of them were Caucasian and 12 were Asian. For 40 of the subjects, we selected two images of an eye and saved the images as a “match” pair. In each case, the two images selected were acquired at least a week apart. For the remaining subjects, we selected one image of an eye, paired it with an image from another subject, and saved it as a “nonmatch” pair. Thus, the queries that we would present to our testers involved 40 match pairs, and 40 nonmatch pairs. All queries were either both left eyes, or both right eyes. Our objective was to examine how humans analyzed the periocular region.

Consequently, we did not want the iris to be visible during our tests. To locate the iris in each image, we used our automatic segmentation software, which uses active contours to find the iris boundaries. Next, we hand-checked all of the segmentations. If our software had made an error in finding the inner or outer iris boundary, we manually marked the center and a point on the boundary to identify the correct center and radius of an appropriate circle. If the software had made an error in finding the eyelid, we marked four points along the boundary to define three line segments approximating the eyelid contour. For all of the images, we set the pixels inside the iris/pupil region to black.

6.4 Experimental Method

In order to determine which features in the periocular region were most helpful to the human visual system, we designed an experiment to present pairs of eye images to volunteers and ask for detailed responses. We designed a graphical user interface to display our images. At the beginning of a session, the computer

142 displayed two example pairs of eye images to the user. The first pair showed two images of a subject’s eye, taken on different days. The second pair showed eye images from two different subjects. Next, the GUI displayed the test queries. In each query, we displayed a pair of images and asked the user to respond whether he or she thought the two images were from the same person or from different people. In addition, he could note his level of confidence in his response – whether he was “certain” of his response, or only thought that his response was “likely” the correct answer. The user was further asked to rate a number of features depending on whether each feature was “very helpful”, “helpful”, or “not helpful” for determining identity. The features listed were “eye shape”, “tear duct”1, “outer corner”, “eyelashes”, “skin, “eyebrow”, “eyelid, and “other”. If a user marked that some “other” feature was helpful, he was asked to enter what feature(s) he was referring to. A final text box on the screen asked the user to describe any other additional information that he used while examining the eye images. Users did not have any time limit for examining the images. After the user had classified the pair of images as “same person” or “different people” and rated all features, then he could click “Next” to proceed. At that point the user was told whether he had correctly classified the pair of images. Then, the next query was displayed. All users viewed the same eighty pairs of images, although they were presented in a different random order for each user. We solicited volunteers to participate in our experiment and 25 people signed up to serve as testers in our experiment. Most testers responded to all of the queries in about 35 minutes. The fastest tester took about 25 minutes, and the slowest took about an hour and 40 minutes. They were offered ten dollars for

1We used the term “tear duct” informally in this instance to refer to the region near the inner corner of the eye. A more appropriate term might be “medial canthus” but we did not expect the volunteers in our experiment to know this term.

143 participation and twenty dollars if they classified at least 95% of pairs correctly.

6.5 Results

6.5.1 How Well Can Humans Determine whether Two Periocular Images Are from the Same Person or Not?

To find an overall accuracy score, we counted the number of times the tester was “likely” or “certain” of the correct response; that is, we made no distinction based on the tester’s confidence level, only on whether they believed a pair to be from the same person, or believed a pair to be from different people. We divided the number of correct responses by 80 (the total number of queries) to yield an accuracy score. The average tester classified about 74 out of 80 pairs correctly, which is about 92% (standard deviation 4.6%). The minimum score was 65 out of 80 (81.25%) and the maximum score was 79 out of 80 (98.75%)

6.5.2 Did Humans Score Higher when They Felt More Certain?

As mentioned above, testers had the option to mark whether they were “cer- tain” of their response or whether their response was merely “likely” to be correct. Some testers were more “certain” than others. One responded “certain” for 70 of the 80 queries. On the other hand, one tester did not answer “certain” for any queries. Discounting the tester who was never certain, the average score on the questions where testers were certain was 97% (standard deviation 5.2%). The average score when testers were less certain was 84% (standard deviation 11%). Therefore, testers obviously did better on the subset of the queries where they felt

“certain” of their answer.

144 6.5.3 Did Testers Do Better on the Second Half of the Test than the First Half?

The average score on the first forty queries for each tester was 92.2%. The av- erage score on the second forty queries was 92.0%. Therefore, there is no evidence of learning between the first half of the test and the second.

6.5.4 Which Features Are Correlated with Correct Responses?

The primary goal of our experiment was to determine which features in the periocular region were most helpful to the human visual system when making recognition decisions. Specifically, we are interested in features present in near- infrared images of the type that can be obtained by a typical iris camera. To best answer our question, we only used responses from cases where the tester correctly determined whether the image pair was from same person. From these responses, we counted the number of times each feature was “very helpful” to the tester, “helpful”, or “not helpful”. A bar chart of these counts is given in Figure 6.1.

The features in this figure are sorted by the number of times each feature was regarded as “very helpful”. According to these results, the most helpful feature was eyelashes, although tear duct and eye shape were also very helpful. The ranking from most helpful to least helpful was (1) eyelashes, (2) tear duct, (3) eye shape, (4) eyelid, (5) eyebrow, (6) outer corner, (7) skin, and (8) other.

Other researchers have found eyebrows to be more useful than eyes in identi- fying famous people [110], so the fact that eyebrows were ranked fifth out of eight is perhaps deceiving. The reason eyebrows received such a low ranking in our experiment is that none of the images showed a complete eyebrow. In about forty queries, the two images both showed some part of the eyebrow, but in the other forty queries, the eyebrow was outside the image field-of-view in at least one of

145 Rated Helpfulness of Features from Correct Responses 1800 Very Helpful 1600 Helpful Not Helpful 1400

1200

1000

800

600

Number of Responses 400

200

0 Eyelashes Tear Duct Eyeshape Eyelid Eyebrow Outer Corner Skin Other

Figure 6.1: Eyelashes were considered the most helpful feature for making deci- sions about identity. The tear duct and shape of the eye were also very helpful. the images in the pair. On images with a larger field of view, eyebrows could be significantly more valuable. We suggest that iris sensors with a larger field of view would be more useful when attempting to combine iris and periocular biometric information. The low ranking for “outer corner” (sixth out of eight) did not surprise us, because in our own observation of a number of eye images, the outer corner does not often provide much unique detail for distinguishing one eye from another.

There were a three queries where the outer corner of the eye was not visible in the image. Skin ranked seventh out of eight in our experiment, followed only by “other”. Part of the reason for the low rank of this feature is that the images were all near- infrared images. Therefore, testers could not use skin color to make their decisions. This result may not be quite as striking if we used a data set containing a greater diversity of ethnicities. However, we have noticed that variations in lighting can make light skin appear dark in a near-infrared image, suggesting that overall

146 intensity in the skin region may have greater intra-class variation than inter-class variation in these types of images.

6.5.5 Which Features Are Correlated with Incorrect Responses?

In addition to considering which features were marked most helpful for correct responses, we also looked at how features were rated when testers responded in- correctly. For all the incorrectly answered queries, we counted the number of times each feature was “very helpful”, “helpful”, or “not helpful”. A bar chart of these counts is given in Figure 6.2. We might expect to have a similar rank ordering for the features in the incorrect queries as we had for the correct queries, simply be- cause if certain features are working well for identification, a tester would tend to continue to use the same features. Therefore, rather than focusing on the overall rank order of the features, we considered how the feature rankings differed from the correct responses to the incorrect responses. The ranking from most helpful feature to least helpful feature for the incorrect queries was (1) eye shape, (2) tear duct, (3) eyelashes, (4) outer corner, (5) eyebrow, (6) eyelid, (7) skin, and (8) other. Notice that “eye shape” changed from rank three to rank one. Also “outer corner” changed from rank six to rank four. This result implies that eye shape and outer corner are features that are less valuable for correct identification. On the other hand, “eyelashes” and “eyelid” both changed rank in the opposite direction, implying that those features are more valuable for correct identification.

6.5.6 What Additional Information Did Testers Provide?

In addition to the specific features that testers were asked to rate, testers were also asked to describe other factors they considered in making their decisions.

147 Rated Helpfulness of Features from Incorrect Responses 150 Very Helpful Helpful Not Helpful

100

50 Number of Responses

0 Eyeshape Tear Duct Eyelashes Outer Corner Eyebrow Eyelid Skin Other

Figure 6.2: We compared the rankings for the features from correct responses (Fig. 6.1) with the rankings from incorrect responses. The shape of the eye and the outer corner of the eye were both used more frequently on incorrect responses than on correct responses. This result suggests that those two features would be less helpful than other features such as eyelashes.

Testers were prompted to “explain what features in the image were most useful to you in making your decision”, and enter their response in a text box.

Testers found a number of different traits of eyelashes valuable. They consid- ered the density of eyelashes (or number of eyelashes), eyelash direction, length, and intensity (light vs. dark). Clusters of eyelashes, or single eyelashes pointing in an unusual direction were helpful, too. Contacts were helpful as a “soft bio- metric”. That is, the presence of a contact lens in both images could be used as supporting evidence that the two images were of the same eye. However, no testers relied on contacts as a deciding factor. One of the eighty queries showed two images of the same eye where one image showed a contact lens, and the other did not. Make-up was listed both as “very helpful” for some queries, and as “mis- leading” for other queries. One of the eighty queries showed a match pair where only one of the images displayed make-up.

148 Figure 6.3. All 25 testers correctly classified these two images as being from the same person.

6.5.7 Which Pairs Were Most Frequently Classified Correctly, and Which Pairs Were Most Frequently Classified Incorrectly?

There were 21 match pairs that were classified correctly by all testers. One example of a pair that was classified correctly by all testers is shown in Figure 6.3. There were 12 nonmatch pairs classified correctly by all testers. An example is shown in Figure 6.4.

Figure 6.5 shows the match pair most frequently classified incorrectly. Eleven of the 25 testers mistakenly thought that these two images were from different people. This pair is challenging because the eye is wide open in one of the images, but not it the other. Figure 6.6 shows the nonmatch pair most frequently classified incorrectly. This pair was also misclassified by 11 testers, although the set of 11 testers who responded incorrectly for the pair in Figure 6.6 was different from the set of testers who responded incorrectly for Figure 6.5.

149 Figure 6.4. All 25 testers correctly classified these two images as being from different people

Figure 6.5. Eleven of 25 people incorrectly guessed that these images were from different people, when in fact, these eyes are from the same person. This pair is challenging because one eye is much more open than the other.

150 Figure 6.6. Eleven of 25 people incorrectly guessed that these images were from the same person, when in fact, they are from two different people.

6.6 Discussion

We have found that when presented with unlabeled pairs of periocular images in equal numbers, humans can classify the pairs as “same person” or “different people” with an accuracy of about 92%. When expressing confident judgement, the accuracy is about 97%. We compared scores on the first half of the test to the second half of the test and found no evidence of learning as the test progressed. In making their decisions, testers reported that eyelashes, tear ducts, shape of the eye, and eyelids were most helpful. However, eye shape was used in a large number of incorrect responses. Both eye shape and the outer corner of the eye were used a higher proportion of the time for incorrect responses than they were for correct responses, thus those two features might not be as useful for recognition. Eyelashes were helpful in a number of ways. Testers used eyelash intensity, length, direction, and density. They also looked for groups of eyelashes that clustered together, and for single eyelashes separated from the others. The presence of contacts was used as a soft biometric. Eye make-up was helpful in

151 some image pairs, and distracting in others. Changes in lighting were challenging, and large differences in eye occlusion were also a challenge.

Our analysis suggests some specific ways to design powerful periocular biomet- rics systems. We expect that a biometrics system that explicitly detects eyelids, eyelashes, the tear duct and the entire shape of the eye could be more powerful than some of the skin analysis methods presented previously.

The most helpful feature in our study was eyelashes. In order to analyze the eyelashes, we first would locate and detect the eyelids. Eyelids can be detected using edge detection and Hough transforms [69, 120], a parabolic “integrodiffer- ential operator” [28], or active contours [104]. The research into eyelid detection has primarily been aimed at detecting and disregarding the eyelids during iris recognition, but we suggest detecting and describing eyelids and eyelashes to aid in identification. Feature vectors describing eyelashes could include measures for the density of eyelashes along the eyelid, the uniformity of direction of the eye- lashes, and the curvature and length of the eyelashes. We could also use metrics comparing the upper and lower lashes. The second most helpful feature in our study was the tear duct region. Once we have detected the eyelids, we could extend those curves to locate the tear duct region. This region should more formally be referred to as the medial canthus. A canthus is the angle or corner on each side of the eye, where the upper and lower lids meet. The medial canthus is the inner corner of the eye, or the corner closest to the nose. Two structures are often visible in the medial canthus, the lacrimal caruncle and the plica semilunaris [89]. These two features typically have lower contrast than eyelashes and iris. Therefore, they would be harder for a computer vision algorithm to identify, but if they were detectable, the sizes and shapes of

152 these structures would be possible features. Detecting the medial canthus itself would be easier than detecting the caruncle and plica semilunaris, because the algorithm could follow the curves of the upper and lower eyelids until they meet at the canthus. Once detected, we could measure the angle formed by the upper and lower eyelids and analyze how the canthus meets the eyelids. In Asians, the epicanthal fold may cover part of the medial canthus [89] so that there is a smooth line from the upper eyelid to the inner corner of the eye (e.g. Figure 6.3). The epicanthal fold is present in fetuses of all races, but in Caucasians it has usually disappeared by the time of birth [89]. Therefore, Caucasian eyes are more likely to have a distinct cusp where the medial canthus and upper eyelid meet (e.g.

Figure 6.5). The shape of the eye has potential to be helpful, but the term “eye shape” is ambiguous, which might explain the seemingly contradictory results we obtained about the helpfulness of this particular feature. To describe the shape of the eye, we could analyze the curvature of the eyelids. We could also detect the presence or absence of the superior palpebral furrow – the crease in the upper eyelid – and measure its curvature if present. Previous periocular research has focused on texture and key points in the area around the eye. The majority of prior work [1, 84, 121] masked an elliptical region in the middle of the periocular region “to eliminate the effect of textures in the iris and the surrounding sclera area” [84]. This mask effectively occludes a large portion of the eyelashes and tear duct region, thus hiding the features that we find are most valuable. Park et al. [91] do not mask the eye, but they also do not do any explicit feature modeling beyond detecting the iris. These promising prior works have all shown recognition rates at or above 77%. However, we suggest that

153 there is potential for greater recognition power by considering additional features.

154 CHAPTER 7

CONCLUSIONS

In this dissertation, we presented methods of reducing error rates and increas- ing the applicability of eye biometrics. Our work was the first to propose the fragile bit distance metric. We introduced this metric and proposed fusing Ham- ming distance with fragile bit distance. This optimization reduced the equal error rate by eight percent on a data set of 19,891 iris images. Our second optimization fused frames from video to create average images. We tested this method on a data set of 983 iris videos. Using average images from video reduced the equal er- ror rate by 8.6 × 10−3 when compared with using single frames. In comparing the proposed average images method to a multi-gallery method, we reduced the equal error rate by 5.6 × 10−4 while using only one-tenth the matching time required for the multi-gallery method. To increase the applicability of eye biometrics, we investigated what additional information was present in eye images that current algorithms do not detect. By looking at iris data from identical twins, we showed that there is genetically-related information present in iris texture that existing iris biometrics algorithms do not capture. Our work is the first work to experimentally document that people can reliably distinguish images of twins’ irises from images of unrelated persons’ irises. Finally, using images from the periocular region, we showed that the features most useful to humans in that region are not the features that current systems use for

155 analyzing that region. As future work, we hope to develop automated algorithms that can detect and describe the eyelashes, eyelids, and tear duct to identify people based on that periocular information.

156 BIBLIOGRAPHY

1. Joshua Adams, Damon L. Woodard, Gerry Dozier, Philip Miller, Kelvin Bryant, and George Glenn. Genetic-based type II feature extraction for periocular biometric recognition: Less is more. Proc. Int. Conf. on Pattern Recognition, 2010. to appear.

2. Kelli Arena and Carol Cratty. FBI wants palm prints, eye scans, tattoo mapping. CNN.com, Feb 2008. http://www.cnn.com/2008/TECH/02/04/ fbi.biometrics/, accessed July 2009.

3. Sarah Baker, Kevin W. Bowyer, and Patrick J. Flynn. Empirical evidence for correct iris match score degradation with increased time-lapse between gallery and probe matches. Proc. Int. Conf. on Biometrics (ICB2009), pages 1170–1179, 2009.

4. Sarah Baker, Amanda Hentz, Kevin W. Bowyer, and Patrick J. Flynn. Con- tact lenses: Handle with care for iris recognition. Proc. IEEE Int. Conf. on Biometrics: Theory, Applications, and Systems, pages 1–8, Sept 2009.

5. Lucas Ballard, Seny Kamara, and Michael K. Reiter. The practical subtleties of biometric key generation. 17th USENIX Security Symposium, pages 61–74, 2008.

6. Nakissa Barzegar and M. Shahram Moin. A new user dependent iris recog- nition system based on an area preserving pointwise level set segmentation approach. EURASIP Journal on Advances in Signal Processing, pages 1–13, 2009.

7. Craig Belcher and Yingzi Du. Feature information based quality measure for iris recognition. Proc. IEEE International Conference on Systems, Man, and Cybernetics, pages 3339–3345, Oct 2007.

8. Craig Belcher and Yingzi Du. A selective feature information approach for iris image-quality measure. IEEE Transactions on Information Forensics and Security, 3(3):572–577, Sept 2008.

157 9. Craig Belcher and Yingzi Du. Region-based sift approach to iris recognition. Optics and Lasers in Engineering, 47:139–147, 2009.

10. A. Bertillon. La couleur de l’iris. Revue scientifique, 36(3):65–73, 1885.

11. Rajesh M. Bodade and Sanjay N. Talbar. Shift invariant iris feature extrac- tion using rotated complex wavelet and complex wavelet for iris recognition system. Proc. 2009 Seventh International Conference on Advances in Pattern Recognition, pages 449–452, 2009.

12. Vishnu Naresh Boddeti and B.V.K. Vijaya Kumar. Extended-depth-of-field iris recognition using unrestored wavefront-coded imagery. IEEE Transac- tions on Systems, Man, and Cybernetics - Part A: Systems and Humans, 40(3):495–508, May 2010.

13. Ruud M. Bolle, Sharath Pankanti, Jonathon H. Connell, and Nalini Ratha. Iris individuality: A partial iris model. Proc. Int. Conf. on Pattern Recogni- tion, pages II: 927–930, 2004.

14. Kevin W. Bowyer, Kyong I. Chang, Ping Yan, Patrick J. Flynn, Earnie Hansley, and Sudeep Sarkar. Multi-modal biometrics: an overview. Proc. Second Workshop on Multi-Modal User Authentication, pages 1–8, May 2006. Toulouse, France.

15. Kevin W. Bowyer and Patrick J. Flynn. The ND-IRIS-0405 iris image dataset. Technical report, University of Notre Dame, 2009. http://www. nd.edu/∼cvrl/papers/ND-IRIS-0405.pdf.

16. Kevin W. Bowyer, Karen P. Hollingsworth, and Patrick J. Flynn. Image understanding for iris biometrics: A survey. Computer Vision and Image Understanding, 110(2):281–307, 2008.

17. Christopher Boyce, Arun Ross, Matthew Monaco, Lawrence Hornak, and Xin Li. Multispectral iris analysis: A preliminary study. Proc. IEEE Computer Vision and Pattern Recognition Workshops, pages 1–9, Jun 2006.

18. J. Bringer, H. Chabanne, G. Cohen, B. Kindarji, and G. Z´emor. Optimal iris fuzzy sketches. Proc. IEEE Int. Conf. on Biometrics: Theory, Applications, and Systems, pages 1–6, Sept 2007.

19. Julien Bringer, Herv´eChabanne, Gerard Cohen, Bruno Kindarji, and Gilles Z´emor. Theoretical and practical boundaries of binary secure sketches. IEEE Transactions on Information Forensics and Security, 3(4):673–683, 2008.

158 20. A. Calder and G. Rhodes, editors. Handbook of Face Perception, chapter Cognitive and Computational Approaches to Face Perception by O’Toole. Oxford University Press, 2010. in press.

21. Kyong I. Chang, Kevin W. Bowyer, and Patrick J. Flynn. An evaluation of multi-modal 2D+3D face biometrics. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(4):619–624, Apr 2005.

22. Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. Locality-sensitive hashing scheme based on p-stable distributions. Proceed- ings of the Twentieth Annual Symposium on Computational Geometry, pages 253–262, 2004.

23. John Daugman. Absorption spectrum of melanin. http://www.cl.cam.ac. uk/∼jgd1000/melanin.html, accessed July 2009.

24. John Daugman. Introduction to iris recognition. http://www.cl.cam.ac.uk/ ∼jgd1000/iris recognition.html, accessed Jun 2010.

25. John Daugman. High confidence visual recognition of persons by a test of statistical independence. IEEE Transactions on Pattern Analysis and Machine Intelligence, 15(11):1148–1161, Nov 1993.

26. John Daugman. Biometric personal identification system based on iris anal- ysis. U.S. Patent No. 5,291,560, Mar 1994.

27. John Daugman. Biometric decision landscapes. Technical Report UCAM- CL-TR-482, University of Cambridge Computer Laboratory, 2000.

28. John Daugman. How iris recognition works. IEEE Transactions on Circuits and Systems for Video Technology, 14(1):21–30, 2004.

29. John Daugman. New methods in iris recognition. IEEE Transactions on Systems, Man and Cybernetics - B, 37(5):1167–1175, Oct 2007.

30. John Daugman. United Arab Emirates deployment of iris recognition, http://www.cl.cam.ac.uk/jgd1000/deployments.html, accessed Jan 2009.

31. John Daugman and Cathryn Downing. Epigenetic randomness, complexity and singularity of human iris patterns. Proceedings of the Royal Society of London - B, 268:1737–1740, 2001.

32. John Daugman and Cathryn Downing. Effect of severe image compression on iris recognition performance. IEEE Transactions on Information Forensics and Security, 3(1):52–61, March 2008.

159 33. George Davida, Yair Frankel, and Brian Matt. On enabling secure applica- tions through off-line biometric identification. IEEE Symposium on Security and Privacy, pages 148–157, 1998.

34. G. Doddington, W. Liggett, A. Martin, M Przybocki, and D. Reynolds. Sheep, goats, lambs, and wolves: A statistical analysis of speaker perfor- mance in the NIST 1998 speaker recognition evaluation. 5th International Conference on Spoken Language Processing, pages 1–4, 1998. Sydney, Aus- tralia.

35. Yevgeniy Dodis, Rafail Ostrovsky, Leonid Reyzin, and Adam Smith. Fuzzy extractors: How to generate strong keys from biometrics and other noisy data. SIAM Journal of Computing, 38(1):97–139, 2008.

36. Yevgeniy Dodis, Leonid Reyzin, and Adam Smith. Advances in Cryptology - EUROCRYPT, chapter 13: Fuzzy Extractors: How to Generate Strong Keys from Biometrics and Other Noisy Data, pages 523–540. Springer Berlin/Heidelberg, 2004.

37. Gerry Dozier, David Bell, Leon Barnes, and Kelvin Bryant. Refining iris templates via weighted bit consistency. Proc. Midwest Artificial Intelligence and Cognitive Science (MAICS) Conference, pages 1–5, Apr 2009. Fort Wayne, Indiana.

38. Gerry Dozier, Kurt Frederiksen, Robert Meeks, Marios Savvides, Kelvin Bryant, Darlene Hopes, and Taihei Munemoto. Minimizing the number of bits needed for iris recognition via bit inconsistency and grit. Proc. IEEE Workshop on Computational Intelligence in Biometrics: Theory, Algorithms, and Applications, pages 30–37, Apr 2009.

39. Yingzi Du. Using 2D log-Gabor spatial filters for iris recognition. Proc. SPIE 6202: Biometric Technology for Human Identification III, pages 62020:F1– F8, 2006.

40. Yingzi Du, Robert W. Ives, Delores M. Etter, and Thad B. Welch. Use of one-dimensional iris signatures to rank iris pattern similarities. Optical Engineering, 45(3):037201–1 – 037201–10, 2006.

41. Leonard Flom and Aran Safir. Iris recognition system. U.S. Patent 4,641,349, 1987.

42. Karen Gomm. Passport agency: ‘Iris recognition needs work’. ZDNet UK, Oct 2005. http://news.zdnet.co.uk/emergingtech/0,1000000183,39232694, 00.htm, accessed July 2009.

160 43. Jaap Haitsma and Ton Kalker. A highly robust audio fingerprinting system with an efficient search strategy. Journal of New Music Research, 32(2):211– 221, June 2003.

44. Feng Hao, Ross Anderson, and John Daugman. Combining crypto with biometrics effectively. IEEE Transactions on Computers, 55(9):1081–1088, Sept 2006.

45. Feng Hao, John Daugman, and Piotr Zielinski. A fast search algorithm for a large fuzzy database. IEEE Transactions on Information Forensics and Security, 3(2):203 –212, June 2008.

46. Karen Harmel. Walt disney world: The government’s tomorrowland? News21, Sept 2006. http://news21project.org/story/2006/09/01/walt disney world the governments, accessed July 2009.

47. Xiaofu He, Jingqi Yan, Guangyu Chen, and Pengfei Shi. Contactless aut- ofeedback iris capture design. IEEE Transactions on Instrumentation and Measurement, 57(7):1369–1375, Jul 2008.

48. Zhaofeng He, Zhenan Sun, Tieniu Tan, and Xianchao Qiu. Enhanced usabil- ity of iris recognition via efficient user interface and iris image restoration. Proc. 15th IEEE Int. Conf. on Image Processing (ICIP2008), pages 261–264, Oct 2008.

49. Sean Henahan. The eyes have it. Access Excellence, Jun 2002. http://www. accessexcellence.org/WN/SU/irisscan.php, accessed July 2009.

50. Karen Hollingsworth. Sources of error in iris biometrics. Master’s thesis, University of Notre Dame, 2008.

51. Karen Hollingsworth, Kevin W. Bowyer, and Patrick J. Flynn. Similarity of iris texture between identical twins. Computer Vision and Pattern Recogni- tion Biometrics Workshop, pages 1–8, June 2010.

52. Karen P. Hollingsworth, Kevin W. Bowyer, and Patrick J. Flynn. All iris code bits are not created equal. Proc. IEEE Int. Conf. on Biometrics: Theory, Applications, and Systems, pages 1–6, Sept 2007.

53. Karen P. Hollingsworth, Kevin W. Bowyer, and Patrick J. Flynn. The best bits in an iris code. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(6):964–973, Jun 2009.

54. Karen P. Hollingsworth, Kevin W. Bowyer, and Patrick J. Flynn. Image averaging for improved iris recognition. Proc. Int. Conf. on Biometrics (ICB2009), pages 1112–1121, 2009.

161 55. Karen P. Hollingsworth, Kevin W. Bowyer, and Patrick J. Flynn. Pupil dilation degrades iris biometric performance. Computer Vision and Image Understanding, 113(1):150–157, 2009.

56. Karen P. Hollingsworth, Kevin W. Bowyer, and Patrick J. Flynn. Using fragile bit coincidence to improve iris recognition. Proc. IEEE Int. Conf. on Biometrics: Theory, Applications, and Systems, pages 1–6, Sept 2009.

57. Karen P. Hollingsworth, Tanya Peters, Kevin W. Bowyer, and Patrick J. Flynn. Iris recognition using signal-level fusion of frames from video. IEEE Transactions on Information Forensics and Security, 4(4):837–848, 2009.

58. Iris testing of returning afghans passes 200,000 mark. UNHCR The UN Refugee Agency, Oct 2003. http://www.unhcr.org/cgi-bin/texis/vtx/search? docid=3f86b4784, accessed July 2009.

59. Iris recognition for inmates. Tarrant County website, Jul 2004. http://www. tarrantcounty.com/esheriff/cwp/view.asp?a=792&q=437580, accessed June 2010.

60. Iris recognition. http://en.wikipedia.org/wiki/Iris recognition, accessed March 2010.

61. ISO SC37 Harmonized Biometric Vocabulary (Standing Document 2 Version 12). Technical report, International Standards Organization, Sept 2009.

62. Jail using new iris scanning system. KSBW.com, Jan 2006. http://www. ksbw.com/news/6403339/detail.html, accessed July 2009.

63. Anil K. Jain, Patrick Flynn, and Arun A Ross. Handbook of Biometrics, chapter Chapter 14: Introduction to Multibiometrics by Ross, Nandakumar, and Jain, pages 271–292. Springer, 2008.

64. R. Johnston. Can iris patterns be used to identify people? Los Alamos National Laboratory, Chemical and Laser Sciences Division Annual Report LA-12331-PR, Jun 1992. pages 81-86.

65. Ari Juels and Madhu Sudan. A fuzzy vault scheme. Designs, Codes, and Cryptography, 38(2):237–257, 2006.

66. Ari Juels and Martin Wattenberg. A fuzzy commitment scheme. Proceedings of the ACM Conference on Computer Communications Security, pages 28– 36, 1999.

162 67. Nathan D. Kalka, Jinyu Zuo, Natalia A. Schmid, and Bojan Cukic. Esti- mating and fusing quality factors for iris biometrics images. IEEE Trans- actions on Systems, Man, and Cybernetics - Part A: Systems and Humans, 40(3):509–524, May 2010.

68. Byung Jun Kang and Kang Ryoung Park. Real-time image restoration for iris recognition systems. IEEE Transactions on Systems, Man, and Cybernetics– Part B, 37(6):1555–1566, Dec 2007.

69. Byung Jun Kang and Kang Ryoung Park. A robust eyelash detection based on iris focus assessment. Pattern Recognition Letters, 28(13):1630–1639, Oc- tober 2007.

70. Josef Kittler and Norman Poh. Multibiometrics for identity authentication: Issues, benefits, and challenges. Proc. IEEE Int. Conf. on Biometrics: The- ory, Applications, and Systems, pages 1–6, Sept 2008.

71. N. Kollias. The spectroscopy of human melanin pigmentation. Melanin: Its Role in Human Photoprotection, pages 31–38. Valdenmar Publishing Co., 1995.

72. Emine Krichen, Lor`ene Allano, Sonia Garcia-Salicetti, and Bernadette Dorizzi. Specific texture analysis for iris recognition. Proc. Int. Conf. on Audio- and Video-Based Biometric Person Authentication (AVBPA 2005), pages 23–30, 2005.

73. L-1 identity solutions, understanding iris recognition. http://www.l1id.com/ pages/383-science-behind-the-technology, accessed March 2010.

74. Yooyoung Lee, P. Jonathan Phillips, and Ross J. Michaels. An automated video-based system for iris recognition. Proc. Int. Conf. on Biometrics (ICB2009), pages 1160–1169, 2009.

75. Youn Joo Lee, Kang Ryoung Park, Sung Joo Lee, Kwanghyuk Bae, and Jai- hie Kim. A new method for generating an invariant iris private key based on the fuzzy vault system. IEEE Transactions on Systems, Man, and Cy- bernetics - Part B: Cybernetics, 38(5), October 2008.

76. LG IrisAccess 4000. http://www.lgiris.com/ps/products/irisaccess4000.htm, accessed Apr 2009.

77. Yung-hui Li and Marios Savvides. Fast and robust probabilistic inference of iris mask. Proceedings of SPIE, page 730621, May 2009. vol 7306.

78. Chengqiang Liu and Mei Xie. Iris recognition based on DLDA. Proc. Int. Conf. on Pattern Recognition, pages 489–492, Aug 2006.

163 79. Xiaomei Liu, Kevin W. Bowyer, and Patrick J. Flynn. Experiments with an improved iris segmentation algorithm. Proc. Fourth IEEE Workshop on Automatic Identification Technologies, pages 118–123, Oct 2005.

80. Li Ma, Tieniu Tan, Yunhong Wang, and Dexin Zhang. Efficient iris recog- nition by characterizing key local variations. IEEE Transactions on Image Processing, 13(6):739–750, Jun 2004.

81. A. Martin, G. Doddington, T. Kamm, M. Ordowski, and M. Przybocki. The DET curve in assessment of detection task performance. Proc. 5th European Conference on Speech Communication and Technology, pages 1895– 1898, 1997.

82. J. R. Matey, O. Naroditsky, K. Hanna, R. Kolczynski, D. LoIacono, S. Man- gru, M. Tinker, T. Zappia, and W. Y. Zhao. Iris on the MoveTM: Acquisition of images for iris recognition in less constrained environments. Proceedings of the IEEE, 94(11):1936–1946, 2006.

83. The MathWorksTM. Image processing toolbox documentation. http://www. mathworks.com/access/helpdesk/help/toolbox/images/index.html. accessed June 2009.

84. Phillip Miller, Allen Rawls, Shrinivas Pundlik, and Damon Woodard. Per- sonal identification using periocular skin texture. Proc. ACM 25th Sympo- sium on Applied Computing (SAC2010), pages 1496–1500, 2010.

85. Kazuyuki Miyazawa, Koichi Ito, Takafumi Aoki, and Koji Kobayashi. An ef- fective approach for iris recognition using phase-based image matching. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(10):1741– 1756, Oct 2008.

86. Taihei Munemoto, Yung hui Li, and Marios Savvides. “Hallucinating irises” - dealing with partial and occluded iris regions. Proc. IEEE Int. Conf. on Biometrics: Theory, Applications, and Systems, pages 1–6, Sept 2008.

87. R. Murphy. Dempster-shafer theory for sensor fusion in autonomous mobile robots. IEEE Trans. Robot. Autom., 14(2):197–206, 1998.

88. Elaine M. Newton and P. Johnathan Phillips. Meta-analysis of third-party evaluations of iris recognition. IEEE Transactions on Systems, Man, and Cybernetics–Part A: Systems and Humans, 39(1):4–11, Jan 2009.

89. Clyde Oyster. The Human Eye Structure and Function. Sinauer Associates, 1999.

164 90. Chul-Hyun Park and Joon-Jae Lee. Extracting and combining multimodal directional iris features. Int. Conf. on Biometrics (Springer LNCS 3832), pages 389–396, Jan 2006.

91. Unsang Park, Arun Ross, and Anil K. Jain. Periocular biometrics in the visible spectrum: A feasibility study. Proc. IEEE Int. Conf. on Biometrics: Theory, Applications, and Systems, pages 1–6, Sept 2009.

92. Tanya Peters. Effects of segmentation routine and acquisition environment on iris recognition. Master’s thesis, University of Notre Dame, 2009.

93. P. Jonathon Phillips. MBGC presentations and publications. http://face. nist.gov/mbgc/mbgc presentations.htm, Dec 2008.

94. P. Jonathon Phillips, Patrick J. Flynn, Todd Scruggs, Kevin W. Bowyer, and William Worek. Preliminary Face Recognition Grand Challenge results. Proc. Int. Conf. on Automatic Face and Gesture Recognition (FG 2006), pages 15–24, Apr 2006.

95. P. Jonathon Phillips, Todd Scruggs, Patrick J. Flynn, Kevin W. Bowyer, Ross Beveridge, Geoff Givens, Bruce Draper, and Alice O’Toole. Overview of the multiple biometric grand challenge. Proc. Int. Conf. on Biometrics (ICB2009), pages 705–714, 2009.

96. Hugo Proen¸ca and Lu´ıs Alexandre. Toward noncooperative iris recognition: A classification approach using multiple signatures. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(4):607–612, Apr 2007.

97. Shrinivas J. Pundlik, Damon L. Woodard, and Stanley T. Birchfield. Non-ideal iris segmentation using graph cuts. Computer Vision and Pattern Recognition Workshops, IEEE Computer Society Conference on (CVPR2008), pages 1–6, June 2008.

98. Xianchao Qiu, Zhenan Sun, and Tieniu Tan. Global texture analysis of iris images for ethnic classification. Springer LNCS 3832: Int. Conf. on Biometrics, pages 411–418, Jan 2006.

99. Soumyadip Rakshit and Donald M. Monro. An evaluation of image sam- pling and compression for human iris recognition. IEEE Transactions on Information Forensics and Security, 2(3):605–612, Sept 2007.

100. Soumyadip Rakshit and Donald M. Monro. Medical conditions: Effect on iris recognition. Proc. IEEE 9th Workshop on Multimedia Signal Processing (MMSP), pages 357–360, Oct 2007.

165 101. Sarah Ring and Kevin W. Bowyer. Detection of iris texture distortions by analyzing iris code matching results. Proc. IEEE Int. Conf. on Biometrics: Theory, Applications, and Systems, pages 1–6, Sept 2008.

102. Roberto Roizenblatt, Paulo Schor, Fabio Dante, Jaime Roizenblatt, and Rubens Belfort Jr. Iris recognition as a biometric method after cataract surgery. Biomedical Engineering Online, 3(1):2–7, Jan 2004.

103. Kaushik Roy and Prabir Bhattacharya. Iris recognition with support vector machines. Proc. Int. Conf. on Biometrics, pages 486–492, Jan 2006.

104. Wayne J. Ryan, Damon L. Woodard, Andrew T. Duchowski, and Stan T. Birchfield. Adapting starburst for elliptical iris segmentation. Proc. IEEE Int. Conf. on Biometrics: Theory, Applications, and Systems, pages 1–7, Sept 2008.

105. Natalia A. Schmid, Manasi V. Ketkar, Harshinder Singh, and Bojan Cukic. Performance analysis of iris-based identification system at the matching score level. IEEE Transactions on Information Forensics and Security, 1(2):154– 168, Jun 2006.

106. Natalia A. Schmid and Francesco Nicol`o. On empirical recognition capacity of biometric systems under global PCA and ICA encoding. IEEE Transac- tions on Information Forensics and Security, 3(3):512–528, June 2008.

107. Stephanie A. C. Schuckers, Natalia A. Schmid, Aditya Abhyankar, Vivekanand Dorairaj, Christopher K. Boyce, and Lawrence A. Hornak. On techniques for angle compensation in nonideal iris recognition. IEEE Trans- actions on Systems, Man, and Cybernetics - Part B, 37(5):1176–1190, Oct 2007.

108. Glenn Shafer. A Mathematical Theory of Evidence. Princeton, N.J.: Prince- ton University Press, 1976.

109. Koen Simoens, Pim Tuyls, and Bart Preneel. Privacy weaknesses in biometric sketches. IEEE Symposium on Security and Privacy, pages 188–203, 2009.

110. Pawan Sinha, Benjamin Balas, Yuri Ostrovsky, and Richard Russell. Face recognition by humans: Nineteen results all computer vision researchers should know about. Proceedings of the IEEE, 94(11):1948–1962, Nov 2006.

111. Zhenan Sun, Alessandra A. Paulino, Jianjiang Feng, Zhenhua Chai, Tieniu Tan, and Anil K. Jain. A study of multibiometric traits of identical twins. SPIE, pages 1–12, March 2010.

166 112. Zhenan Sun, Tieniu Tan, and Xianchao Qiu. Graph matching iris image blocks with local binary pattern. Proc. Int. Conf. on Biometrics (Springer LNCS 3832), pages 366–372, Jan 2006.

113. Vince Thomas, Nitesh Chawla, Kevin Bowyer, and Patrick Flynn. Learning to predict gender from iris images. Proc. IEEE Int. Conf. on Biometrics: Theory, Applications, and Systems, pages 1–5, Sept 2007.

114. Twins days festival official website. http://www.twinsdays.org/, accessed March 2010.

115. U.S. Department of Homeland Security. US-VISIT biometric identification services. http://www.dhs.gov/xprevprot/programs/gc 1208531081211.shtm, accessed July 2009.

116. U.S. Department of Homeland Security. US-VISIT traveler information. http://www.dhs.gov/xtrvlsec/programs/content multi image 0006.shtm, accessed July 2009.

117. Mayank Vatsa, Richa Singh, and Afzel Noore. Reducing the false rejection rate of iris recognition using textural and topological features. Int. Journal of Signal Processing, 2(2):66–72, 2005.

118. Mayank Vatsa, Richa Singh, and Afzel Noore. Improving iris recognition performance using segmentation, quality enhancement, match score fusion, and indexing. IEEE Transactions on Systems, Man, and Cybernetics - Part B, 38(4):1021–1035, Aug 2008.

119. Edgar A. Whitley and Ian R. Hosein. Doing the politics of technological decision making: Due process and the debate about identity cards in the U.K. European Journal of Information Systems, 17:668–677, 2008.

120. Richard P. Wildes. Iris recognition: An emerging biometric technology. Pro- ceedings of the IEEE, 85(9):1348–1363, Sept 1997.

121. Damon L. Woodard, Shrinivas Pundlik, Philip Miller, Raghavender Jillela, and Arun Ross. On the fusion of periocular and iris biometrics in non-ideal imagery. Proc. Int. Conf. on Pattern Recognition, 2010. to appear.

122. Harry Wyatt. A minimum wear-and-tear meshwork for the iris. Vision Research, 40:2167–2176, 2000.

123. N. Yager and T. Dunstone. The biometric menagerie. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(2):220–230, 2010.

167 124. Peng-Fei Zhang, De-Sheng Li, and Qi Wang. A novel iris recognition method based on feature fusion. Proc. Int. Conf. on Machine Learning and Cyber- netics, pages 3661–3665, 2004.

125. W. Zhao, R. Chellappa, P. J. Phillips, and A. Rosenfeld. Face recognition: A literature survey. ACM Computing Surveys, 35(4):399–458, 2003.

126. Wenyi Zhao and Rama Chellappa, editors. Face Processing: Advanced Model- ing and Methods, chapter 17: Beyond one still image: Face recognition from multiple still images or a video sequence by S.K. Zhou and R. Chellappa, pages 547–567. Elsevier, 2006.

127. Zhi Zhou, Yingzi Du, and Craig Belcher. Transforming traditional iris recog- nition systems to work in nonideal situations. IEEE Transactions on Indus- trial Electronics, 56(8):3203–3213, Aug 2009.

A This document was prepared & typeset with LTEX2ε, and formatted with nddiss2ε classfile (v3.0[2005/07/27]) provided by Sameer Vijay.

168