Improving -Gaze Tracking Accuracy Through Personalized Calibration of a User’s Aspherical Corneal Model

by

Isabella Taba

B.Sc., Simon Fraser University, 2008

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF

MASTER OF SCIENCE

in

The Faculty of Graduate Studies

(Electrical and Computer Engineering)

THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) January 2012 c Isabella Taba 2012 Abstract

The present us with a window through which we view the world and gather information. Eye-gaze tracking systems are the means by which a user’s point of gaze (POG) can be measured and recorded. Despite the active research in gaze tracking systems and major advances in this field, calibration remains one of the primary challenges in the development of eye tracking systems. In order to facilitate gaze measurement and tracking, eye-gaze trackers utilize simplifications in modeling the . These simplifications include using a spherical corneal model and using population averages for eye parameters in place of individual measurements, but use of these simplifications in modeling contribute to system errors and impose inaccuracies on the process of point of gaze estimation. This research introduces a new one-time per-user calibration method for gaze estimation systems. The purpose of the calibration method developed in this thesis is to calculate and estimate different individual eye parameters based on an aspherical corneal model. Replacing average measurements with individual measurements promises to improve the accuracy and reliability of the system. The approach presented in this thesis involves estimating eye parameters by statistical modeling through least squares curve fitting. Compared to a current approach referred to here as the Hennessey’s calibra- tion method, this approach offers significant advantages, including improved, individual calibration. Through analysis and comparison of this new cali- bration method with the Hennessey calibration method, the research data presented in this thesis shows an improvement in gaze estimation accuracy of approximately 27%. Research has shown that the average accuracy for the Hennessey calibration method is about 1.5 cm on an LCD screen at a distance of 60 cm, while the new system, as tested on eight different sub- jects, achieved an average accuracy of 1.1 cm. A statistical analysis (T-test) of the comparative accuracy of the new calibration method versus the Hen- nessey calibration method has demonstrated that the new system represents a statistically significant improvement.

ii Table of Contents

Abstract ...... ii

Table of Contents ...... iii

List of Tables ...... vi

List of Figures ...... viii

Acknowledgements ...... xii

Dedication ...... xiii

1 Introduction ...... 1 1.1 Overview ...... 1 1.2 Research Goal ...... 2 1.3 Contributions of This Thesis ...... 3

2 The Eye and Eye Tracking Background ...... 5 2.1 Eye Detection Methods ...... 5 2.2 Review of Gaze Estimation Methods ...... 7 2.2.1 (EOG) ...... 8 2.2.2 Scleral ...... 9 2.2.3 Recent Eye-Tracking Systems ...... 9 2.3 Basic Optics of the Eye ...... 19 2.3.1 The ...... 21 2.3.2 Axes and Angle of the Eye ...... 25 2.3.3 Light and the Eye ...... 28 2.3.4 Ocular Biometry ...... 28

3 Hennessey’s POG Tracking Method Used Here ...... 30 3.1 Background ...... 30 3.1.1 Eye Model ...... 31 3.1.2 Corneal Model ...... 33

iii Table of Contents

3.2 Coordinate Systems ...... 37 3.2.1 Camera Coordinate System(CCS) ...... 38 3.2.2 Light Source Model ...... 39 3.3 Eye, Camera, Light Sources, in One System ...... 40 3.3.1 One Camera and One Light Source ...... 40 3.3.2 One Camera and Two Light Sources ...... 41 3.3.3 Two Cameras and One or More Light Sources . . . . 41 3.4 The Method ...... 43 3.4.1 Feature Extraction ...... 43 3.4.2 Pattern Matching ...... 44 3.4.3 Center of Corneal Curvature Estimation ...... 45 3.4.4 Pupil Center Estimation ...... 48 3.4.5 Optical Axis Estimation ...... 50 3.4.6 Calibration Phase ...... 50 3.4.7 Summary of Point of Gaze (POG) Estimation . . . . 53

4 Proposed New Calibration Method ...... 54 4.1 Introduction ...... 54 4.2 Calibration Method ...... 55 4.2.1 System Calibration ...... 57 4.2.2 User Calibration ...... 59 4.3 Calibration Method Based on Personalized Eye Parameters and Aspherical Corneal Surface ...... 61 4.3.1 Calibration Phase I ...... 63 4.3.2 Calibration Phase II ...... 65

5 Experimental Method and Results ...... 71 5.1 Introduction ...... 71 5.2 Experimental Methods ...... 72 5.2.1 System Design ...... 72 5.2.2 Accuracy Metrics ...... 73 5.2.3 Methods ...... 73 5.3 Results ...... 76 5.3.1 Radius of the Corneal Curvature for Different Users . 81 5.4 Statistical Analysis ...... 84

6 Discussion and Conclusions ...... 90 6.1 Summary of Contribution ...... 90 6.2 Discussion ...... 90 6.3 Conclusions and Future Work ...... 92

iv Table of Contents

6.3.1 Future Work ...... 93

References ...... 95

Appendices

A Appendix ...... 101 A.1 Purkinje Images ...... 101 A.2 Optics of a Keratometer ...... 101 A.3 Keratoscopy ...... 104

B Appendix ...... 105 B.1 Stereo Camera ...... 105

v List of Tables

2.1 Path of light ray using Gullstrand’s eye model based on a spherical eye and corneal model [1] [2] ...... 13

3.1 Different conic curves that are defined by Baker’s equation according to their p-values ...... 35

5.1 During the calibration phase, a grid of 3x3 points was shown to each subject. The above table shows these calibration points in the world coordinate system in centimeters...... 76 5.2 During the trial phase, a different grid of 3x3 points was shown to each subject. The above table shows these points in centimeters...... 77 5.3 The above table summarizes an average error on POG estima- tions. These data were collected by Hennessey [3]. In total, 7 test subjects were tested on the system. An average error for each subject by using new proposed calibration method and the old calibration method (i.e. Hennessey’s method [4]) was estimated...... 79 5.4 A new set of data was collected during this experiment. In total, 8 test subjects were tested on the system. An aver- age gaze error for each subject by using the new proposed calibration method and old calibration method was estimated. 79 5.5 Radius of the corneal variation between different test sub- jects. The radius of corneal curvature is estimated at each calibration point for each test subject, (the new data set). . . 82 5.6 Estimated values for rp (i.e. distance between the center of the corneal curvature and pupil center) for each test subject. 84 5.7 Calculated mean, standard deviation, and variance for two data groups collected by Hennessey [3]. The data was cal- ibrated using the old calibration method (group 1) and the new calibration method (group 2). The two groups were com- pared using the T-test statistical analysis ...... 86

vi List of Tables

5.8 T-test statistical analysis for data collected by Hennessey [3]. 86 5.9 Calculated mean, standard deviation, and variance for the newly recorded data groups used in the T-test statistical anal- ysis ...... 87 5.10 T-test statistical analysis for the new collected data set . . . 89

A.1 Location of the Purkinje images for Gullstrand schematic of the eye ...... 101

vii List of Figures

1.1 Overview of a remote eye-gaze tracking system based on 3D modeling. The optical axis of the eye is a vector from the center of corneal curvature to the center of the pupil. The visual axis of the eye is a vector from the fovea to the point of the regard on the screen...... 3 1.2 The new calibration method provides personalized eye pa- rameters (α and β : angular offsets between the visual and optical axes, Rc: radii of the corneal curvature, and Rp: the distance between the center of corneal curvature and pupil center) as inputs for Hennessey’s gaze tracking system. . . . . 4

2.1 The appearance of the eye as is projected into the camera image may vary as the user changes the view angle or moves the head...... 5 2.2 Image of EyeSeeCam, a head mounted VOG[5] ...... 11 2.3 The Cornea is not spherical. The radius of the corneal cur- vature is the shortest at the apex of the cornea...... 13 2.4 The left image is dark pupil image and the right image is bright pupil image. As shown, in dark pupil image the light reflections from the surface of the cornea (glints) are visible. . 14 2.5 The P-CR vector is a vector from corneal reflection to the cen- ter of pupil. Through 2-D mapping in the calibration phase, the P-CR vector is related to POG on the screen [3] . . . . . 15 2.6 Human eye cross sectional view ...... 20 2.7 The cornea is modeled as an ellipsoid whose outer limit corre- sponds to the limbus. The approximate eccentricity, and the radius of curvature at the apex varies between individuals. q 2 2 Rc is the radius of corneal curvature, where Rc = Rx + Ry. 23 2.8 The sagittal depth is the distance between the flat (bottom) plane at given diameter to the apex of the cornea...... 24 2.9 Shape factor for aspheric surface modified from [6]...... 25

viii List of Figures

2.10 Family of conic sections with different shape factor (P), values modified from [6] ...... 26

3.1 Nodal points of the eye, N1 = first nodal point, N2= second nodal point. Nodal points are points on the optical axis of the eye. If a ray is directed toward one of them it gets refracted by the eye’s lens in a direction toward the other one as if the ray comes from the other nodal point. The nodal axis is a line from the point of the regard to the first nodal point . . . 32 3.2 Cross section of the eyeball showing the center of the corneal curvature along with camera’s nodal points (Ci,Cj) and po- sition of light sources, modified from [7] ...... 36 3.3 The camera and world coordinate systems. A ray traced from multiple light sources to the surface of the cornea reflects back to the camera sensor...... 37 3.4 Perspective projection in pinhole camera ...... 39 3.5 The high level eye-gaze tracking system block-diagram. In the calibration phase, the angular offset between the optical axis and visual axis is estimated...... 43 3.6 Valid corneal reflections consist of four corneal reflections (glints) undistorted on the boundary between the cornea and the sclera...... 45 3.7 Glint coordinate system for light source qi, the origin of the coordinate system is defined as the camera optical center. . . 46 3.8 The pupil center is estimated through back projection and ray tracing. A point on the pupil perimeter image is denoted by ki and the corresponding vector from the image point to 0 the center of the camera O is Ki. The vector Ki is a refracted vector from point ui to the actual point on the perimeter of 0 the pupil ui. As shown, the distance between the center of pupil Pc and the center of corneal curvature C is rd. The 0 pupil radius (distance from Pc to ui) is noted by rp...... 49 3.9 The eye model used to estimate the optical axis of the eye (OA). The center of corneal curvature is located at C and the center of the pupil is at Pc. The optical axis of the eye (OA) is a vector from C to Pc...... 51

4.1 The high-level overview of calibration steps. As shown, sys- tem calibration will be covered in section 4.2.1 and user cali- bration will be covered in section 4.2.2 ...... 56

ix List of Figures

4.2 The high-level calibration process in eye-gaze tracking system block diagram. One-time calibration data acquisition is done by instructing the user to gaze at different target points across the screen. The α and β are angular offsets between the visual and the optical axes of the eye, Rc and Rp are the radius of the corneal curvature and distance between the pupil center and the center of corneal curvature, respectively...... 62 4.3 A high-level over view of the calibration phase I. The main objective in this phase is initial estimation of angular offsets (α and β) between the visual and the optical axes of the eye. 63 4.4 The high-level system overview for calibration phase II . . . . 68 4.5 The Calibration Phase II ...... 70

5.1 In this figure, the gaze tracking system is shown. The camera is located under the screen. The off-axis light sources are placed at known distances with respect to the corner of the screen. The on-axis light source is placed around the camera lens...... 74 5.2 Conversion of the pixel error to the angular error. Actual POG is the location of the target point on the screen, esti- mated POG is estimated point of the gaze, ∆X is error in x direction, ∆Y is error in y direction, and θ is the angular error. 75 5.3 Estimated average gaze error converges to a single value dur- ing the calibration phase, (using new data of 8 subjects). . . . 77 5.4 Estimated angular offsets between the visual and optical axes (α and β) converge during the iteration process, (using new data of 8 subjects)...... 78 5.5 Estimated average error across the screen at each test point. . 80 5.6 Comparison of the gaze estimation error over all test points by applying the new system and the old system to the new data set. The new system is based on using an aspherical model of the cornea with personalized eye parameters. The old system was based on using a spherical model of the cornea with population averages for eye parameters...... 81 5.7 The average, maximum, and minimum values of the radius of corneal curvature is estimated for each user as shown (the new data set)...... 83 5.8 The estimated distance between the pupil center and the corneal center of curvature (Rp) for each user...... 85

x List of Figures

5.9 Comparison of the estimated error in point of the gaze esti- mation between the new method and old method (i.e. Hen- nessey’s method[3])for the new set of collected data...... 87 5.10 Comparison of the estimated error in point of the gaze esti- mation between the new method and old method (i.e. Hen- nessey’s method [3]) for the old set of collected data. . . . . 88

A.1 Light Rays striking the surface of the eye produce four reflec- tions, called Purkinje images...... 102 A.2 Schematic of image formation principal used in Keratometry 103

B.1 Placido disk is used to study the cornea surface. The reflec- tion of the image from the cornea causes the concentric lines to deviate where there is irregularity of the corneal surface. . 105 B.2 A stereo camera, with camera field of view θ, The overlap between each camera’s field of views is binocular field of view. 106 B.3 Bumblebee 2 accuracy variation in short distance from the camera. As it is shown in the graph in closer range to the camera the accuracy of the system drops sharply.This graph was provided by PointGreyResearch...... 107 B.4 Overview of the stereo camera field of view. In order to get an accurate image of the cornea the user need to sit closer to the camera. This will force the user fall outside if the region where to camera’s field of views overlap...... 108

xi Acknowledgements

It is a great pleasure to thank the many people who made this thesis possible:

I would like to express my gratitude to Dr.Peter Lawrence for his supervision and continuous help and support during the different stages of preparing this thesis. His guidance helped me explore this topic to a far greater breadth that I thought possible.

I would also like to thank Dr.Craig Hennessey for his supervision and help during the course of this thesis.

I wish to thank all my colleagues and friends in the RCL lab for their help and support.

xii Dedication

This thesis is dedicated to my husband, Kevin, who taught me there is no substitution for hard work, to my dad, who taught me to live for today, forget yesterday, and not count on tomorrow, to my mom who showed me to see the world through the window of the soul and to my brothers and sisters.

xiii Chapter 1

Introduction

1.1 Overview

If the “eyes are the windows to the soul” (William Shakespeare), then one of the logical steps to understanding human behavior and motivation should involve the study of eye gaze tracking. When someone’s eye movements are tracked, the path where attention is deployed can be determined. Human vision and acuity is one of the most complex systems of the human body and is important to human survival. We move our eyes to shift our attention from one portion of the visible field to another. In doing so, we can obtain a higher resolution where ever direction we direct the central point of our gaze. If we track someone’s eye movement we can follow the path of their attention. Eye gaze tracking could facilitate human machine interfaces by minimizing the need for complex and relatively slow motor movements of the hand and arm [8]. Eye movements can be used as an input command. The simplest solution is to use the eye-gaze tracker directly as a mouse. Ware and Mikaelian [9] observed that the eye movement input is faster than other current inputs. They showed that when using eye tracking devices by the disabled, a selection rate of almost 60 targets per minute have been obtained. Typically the user fixates the gaze for approximately 0.5 seconds for it to be selected. Using eye gaze trackers as a selection tool is necessary tool for people with disabilities where eye movements may be the only body movement over which the person has control. Analysis of the eye movements has been used in a variety of applications such as the study of mood and attention [10], social signal processing [11],driving research , pilot training [12], 3D ocular ultrasound using gaze tracker [13], and marketing and advertising research [14]. Marketing and commercial uses could provide information on what sort of packaging may attract more attention or what aspects of commercials or marketing strategies are suc- cessful. Studies in eye detection and tracking mainly focus on two areas: eye local- ization in the image and gaze estimation. There are three aspects to eye gaze detection: one is detection of the existence of the eye in the image,

1 1.2. Research Goal the second accurately estimates eye positions in the images and thirdly for video images, tracking the detected eye from frame to frame [15]. Eye gaze trackers (EGTs) are devices that are used to estimate the direction of a person’s gaze. Eye gaze tracking systems use a variety of eye movement recording tech- niques that involves the measurement of distinguishable features of the eyes under rotation or translation movements. These features include the shape of the pupil, the position of the limbus, and corneal reflection of closely placed infrared light sources. The pupil or limbus border, position of the pupil center, corneal glint position and other characteristics of the eye are detected by using image processing algorithms that are used to calculate the three dimensional rotation angles of the eye [8]. The most important challenge facing eye trackers is calibration of the eye tracker for different indi- viduals with different eye characteristics. Most high accuracy gaze tracking systems require a calibration phase before the system is used. The cali- bration phase is based on asking the user to fixate on a specific point on the screen for a specific period of time. Even though much research has been done to eliminate the calibration phase in gaze tracking systems, eye gaze tracking systems with a calibration phase still offer better accuracy [16]. Figure 1.1 shows an overview of eye gaze tracking, based on a camera, infrared light sources and 3D modeling.

1.2 Research Goal

The goal of this thesis is to introduce a novel one-time calibration method for each user to improve the accuracy and robustness of the existing system by Hennessey et al. [4] [3]. The term robustness here refers to the system’s ability to adapt to different individual eye orientations and shapes as well as different user head positions with respect to the screen. Bohme et al. [17] showed in their simulator that the accuracy of the system decreases as the user’s gaze moves toward the corners of the screen. The accuracy of the current gaze tracking systems is limited by the number of approxi- mations used in calculating the user’s point of the gaze. Two of the most common approximations that are used in these systems are the spherical corneal model and using population averages for the eye parameters. The goal of this research project is to test the hypothesis that using an aspheri- cal corneal model instead of a spherical corneal model and personalized eyes parameters instead of population averages, will improve the gaze tracking accuracy over the method introduced by Hennessey [3].

2 1.3. Contributions of This Thesis

Figure 1.1: Overview of a remote eye-gaze tracking system based on 3D modeling. The optical axis of the eye is a vector from the center of corneal curvature to the center of the pupil. The visual axis of the eye is a vector from the fovea to the point of the regard on the screen.

1.3 Contributions of This Thesis

Different eye models have been developed to simplify the complex optics of the eyes and average the results. One of the eye models was developed by Gullstrand [18]. Gullstrand used population averages for different eye parameters and estimated the cornea as a sphere. The Gullstrand eye model is the most commonly used eye model in gaze trackers. Although using simplified eye models is generally useful to under- stand and work with optical principals of the eye in gaze estimation, these simplifications contribute to inaccuracies in gaze estimation for eye gaze trackers. The cornea is flattened as one moves from the apex of the cornea to the corner. Therefore, the variation in cornea curvature is more pro- nounced in the corner of the screen. The deviation from the actual cornea shape and the eye’s parameters for each user is one of the major contribu- tors to the system’s error and inaccuracy in point of gaze estimation. The precision of remote eye-gaze tracking systems in estimating the point of gaze is one of the main factors in determining their usability and applications.

3 1.3. Contributions of This Thesis

Figure 1.2: The new calibration method provides personalized eye parame- ters (α and β : angular offsets between the visual and optical axes, Rc: radii of the corneal curvature, and Rp: the distance between the center of corneal curvature and pupil center) as inputs for Hennessey’s gaze tracking system.

To overcome the above issue, this thesis presents a novel one time user calibration based on an aspherical corneal model and personalized eye pa- rameters. The contributions of this thesis are as follows:

• Design and implementation of a novel one time calibration method based on an aspherical corneal model and personalized eye parameters

• The new calibration method holds an advantage of simplicity and easy integration into any gaze tracking systems that has a calibration phase

• Improvement in the overall accuracy of the gaze tracking systems by about 27%

• The best improvement on the system accuracy was achieved around edges of the screen

Figure 1.2 shows that in this thesis Hennessey’s gaze tracking system [3] uses estimated eye parameters from the new calibration method developed in this thesis to estimate and track the user’s point of gaze (POG). Using the new calibration method may open the door to new applications that require these improvements. Improving the accuracy of the system on the corners of the screen provides a more stable point of gaze estimation with similar accuracy across the screen as well as creating a more robust system to the user’s head movements.

4 Chapter 2

The Eye and Eye Tracking Background

2.1 Eye Detection Methods

In eye detection it is important to discern the appearance of the eye as it may change when the user is viewing from different angles. Even small changes in viewing angle can cause significant changes in the appearance of the eye. However, as the eye rotates, the size and shape of the cornea remains relatively the same. Figure 2.1 illustrates different viewing angles versus different eye shapes seen by the camera. As the viewing angle rotates between the object, the vertical and horizontal length of the eye that is seen by camera varies. The horizontal and vertical lengths are shown in 2.1 as dotted green lines.

Eye detection can be divided into shape based detection, appearance based detection, hybrid, and other methods [15]. In shape-based methods, the method is either constructed by local point features of the eye and face or from their contours. These methods use the prior model of the eye shape and it’s surrounding structures [15]. Ivins et al. [19] described a deformable model of the human iris to measure three-

Figure 2.1: The appearance of the eye as is projected into the camera image may vary as the user changes the view angle or moves the head.

5 2.1. Eye Detection Methods dimensional eye movements and torsion associated with horizontal and ver- tical eye rotation. Hansen et al. [20] describes a method that uses an active iris contour modeling that is based on image statistics and avoids the ex- plicit feature detection. The appearance of the eye is also as important as the shape of the eye. The appearance-based method employs an image template. This method detects and tracks the eye based on the photometric appearance of the eye. The photometric appearance is characterized by color distribution or fil- ter responses [20]. The appearance-based approach is based on template matching by construction image patch model and performing eye detection through model matching. Hybrid-models use a combination of different eye models within a system. This method uses a combination of features, shape and appearance meth- ods. The shape based model is used to locate the eye. Eye features can be modeled by using templates based on the appearance- based method. The methods mentioned above, suffer from complexity of calculations and feature extraction as well as lower accuracy with head movement or change in system parameters. Other methods are based on using active infrared light sources. Most ac- tive light implementations use near infrared light sources with a wavelength around 780 − 880nm. If the light source is placed close to the axis of the camera (on-axis light source) the captured image shows a bright pupil since most of the incident light to the eye is reflected back to the camera(bright pupil image). If the light source is placed off the axis of the camera, (off-axis light source) the image shows a dark pupil. Some eye models based on the ac- tive IR illumination use the difference of dark and bright pupil by switching between on on-axis and off axis light sources. This method has advantages of robustness to light changes, simplicity, and efficiency. Methods using IR light are not particular to any eye model category. Eye models based on the active IR illumination may use the difference between the dark and bright pupil images, by switching between the off-axis and on-axis light sources. Hansen [15] pointed that the major advantages of using the image difference methods are their robustness to background light changes, simplicity and efficiency. Larger and faster head movements could cause larger differences in dark-bright pupil images. One of the methods introduced, to compensate for these effect is by Sugioka [21]. Sugioka et al. [21] proposed a system that allows large head movements by using a rotational camera system. An ultrasonic distance meter was applied to measure the distance between the camera rotation center and the eye. They showed almost no distortion be- tween the target and gaze position under the small head movements and the

6 2.2. Review of Gaze Estimation Methods large head movements condition.

2.2 Review of Gaze Estimation Methods

Eye gaze trackers (EGTs) are devices that can estimate the direction of the gaze in different individuals. Early EGTs were developed for scientific explo- ration in controlled laboratories. Eye gaze tracking systems also have been suggested to be used as human-machine interface for individuals with high spinal cord injuries or neuron disorders who can not use standard user inter- face devices such as a mouse or a keyboards [22].These systems employ the eye gaze tracking systems which obtain eye coordinates and convert them into mouse-pointer coordinates. A variety of disciplines use eye tracking systems such as: laser , vehicle simulators, training simulators, advertising,package design and automotive engineering. Eye gaze tracking could facilitate human ma- chine interfaces by minimizing the need for complex and relatively slow motor movements. Eye gaze tracking could be instrumental in aiding peo- ple with disabilities, rehabilitation of patients after traumatic accidents, and motor vehicle accident avoidance. More commercial uses could provide in- formation on what sort of packaging may attract more attention or what aspects of commercials or marketing strategies are successful. Duchowski [23] describes eye gaze tracking applications in two different categories, di- agnostic or interactive. Diagnostic applications use eye trackers to collect eye data as a quantitative evidence of the user’s visual and attention pro- cess. In this case the data can be collected in a short experiment and later be analyzed. Interactive application use gaze data from eye tracker to in- teract with the user through tracking and observing eye movements. Most of interactive applications require realtime eye data processing to send feed- back to the user, while data collection is processed. Frey et al. [24] developed a personalized computer that is connected to an eye gaze tracker(Erica) and a disabled user can use this system by looking at different menu options displayed at different locations on the computer monitor, and invoke commands. This system uses infrared light sources and camera that tracks the reflection off the eye surface. The system determines where the user is looking by tracking and analysis the image of the reflected light from the user’s eye. Jacob et al. developed a system that uses eye movements as an input by computing stable fixation points, to point and select objects on the screen. In this system various eye movement computer interaction techniques were investigated, such as Eye-Controlled Scrolling

7 2.2. Review of Gaze Estimation Methods

Text, to scroll down or up for a user while reading a text on the screen or menu command, by using eye movement to browse and select from a pull- down menu [25]. Tomono et al. [26] introduced a real-time imaging system composed of a 3 CCD cameras and 2 near infrared(IR) light sources. The light sources have different wavelengths and one of the light sources is polarized. The polarized light source is placed near the camera optical axis to generate the bright pupil image, and the other light source is placed slightly off the axis to generate the dark pupil image. One of the CCD cameras, (CCD3 ), is only sensitive to a polarized light source in a specific wavelength and therefore it only outputs the bright pupil image. The other CCD camera (CCD1 ) has a polarized filter to receive and collect only diffuse light components. CCD1 and CCD2 are sensitive to the second (unpolarized) light source wavelength. The pupil is segmented from differencing and thresholding the obtained images from CCD3 and CCD2. The corneal reflection used for the gaze estimation is obtained by using images from CCD2 and CCD1. Eye-gaze trackers(EGTs) measure eye movement and eye position in space. Interest in the ability to measure eye movements and monitor a point of interest goes back to 100 years ago [23]. There are two types of eye move- ments monitoring, one method measures the position of the eye relative to the head, and the other method measures orientation of the eye in space or the point of the regard. Eye movement measurement is done under one of the following categories, Electrooculargraphy (EOG), Scleral contact lens, Photo oculargraphy or Video oculargraphy. Video-oculargraphy(VOG) is the most popular method of the eye movement measurements. In these systems pupil and corneal re- flection or iris and corneal reflection is combined. Non-Invasive Eye trackers rely on measurements of visible features of the eyes, such as pupil, iris-sclera boundary and cornea reflection of closely positioned light sources. Today most widely applied eye movement technique primarily used for point of regard measurements is a method based on corneal reflection. The first ob- jective eye movement measurements based on using corneal reflection was recorded in 1901 [8].

2.2.1 Electrooculography(EOG) Electrooculography (EOG), mostly used in eye movement measurement 40 years ago that relied on the skin’s electro potential differences, using elec- trodes placed around the eyes [8]. This technique is based on measuring resting potential of retina.

8 2.2. Review of Gaze Estimation Methods

A potential difference of up to 1 mV between the cornea and the retina, having cornea as a positive side, normally exist in the eye and is used as the basis of the EOGs [27]. The basic of the system is based on placing pairs of electrodes either above and below the eye, or to the left and right side of the eye. If the eye moves from the center towards one of the electrodes, that electrode sees the positive side of the retina and the opposite electrode sees the negative side of the retina. Therefore a potential difference hap- pens between two electrodes, assuming the resting potential is constant, the new potential difference is a measure for the eye position. The recorded potentials are in range 15 − 200µV with nominal sensitivity of 20µV/deg of the eye movement [8]. Since this technique measures the eye movements relative to the head position, it is not suitable for the point of gaze (POG) measurements, unless by other means the head position is also measured.

2.2.2 Scleral Contact Lens One of the most accurate eye movement measurement methods is attaching a mechanical or optical interface on the surface of a lens that sits on the cornea or sclera. A scleral lens is a large lens that is placed on the sclera. Early recording using contact lens involved a plaster of paris rings attached directly to the cornea and through mechanical linkage to a recording pen. Various mechanical or optical devices have been placed on the lens such as: reflecting phosphors, line diagrams, and wire coils are the most popular. The wire coil measures the eye movements through the electromagnetic field. Robinson [28] uses a small coil embedded into a contact lens that is worn by a subject. The lens is tightly fit over the sclera with small suction to avoid drift during fast eye movements. By using an alternating magnetic field, the eye position is recorded from the voltage generated in the coil. Although this method is intrusive, the system has a very high accuracy (reported accuracy about 0.08o).

2.2.3 Recent Eye-Tracking Systems Most of the modern eye gaze-tracking (EGTs) systems are based on tracking eye movement by digital processing of video images of the eyes. Video-based systems require high resolution images of the eye to track and estimate the point of gaze (POG). This method can use a wide variety of analysis techniques that involve the measurement of distinguishable features of the eyes under rotation or translation movements, such as the shape of the pupil, the position of the limbus, corneal reflections of nearby infrared light sources.

9 2.2. Review of Gaze Estimation Methods

The position of the pupil, cornea glint positions and the other characteristics of the eye are detected by using image processing algorithms, which are used to calculate the three dimensional rotation angles of the eye [8]. EGTs can utilize inexpensive cameras and image processing hardware to compute the point of gaze (POG) in real time. The limbus and pupil are the most common features used for tracking. The limbus is the boundary between the sclera and the iris. Because of the contrast between these two features, they can be easily tracked horizontally, however because of the cover part of the iris the limbus tracking techniques have low vertical accuracy. The pupil is harder to track because of the lower contrast between the iris and the pupil, however it has a better accuracy since it is not covered by eyelids and can be identified when illuminated by an infrared light source on the camera axis (on-axis light source), and producing a ’red-eye’ effect. To enhance the contrast between the eye features many of eye trackers use infrared (IR) light sources. The reason for that is IR is not visible and does not distract the user or cause discomfort for the user during tracking. Infrared-based gaze trackers are either head mounted or remote.

Head Mounted EGTs In a head mounted gaze tracking system, the system is mounted on the head of the user. The objective of the point of the gaze estimation method- ology is to calculate the intersection of the gaze vector with the observed screen. Head mounted gaze tracking systems are used as a preferred choice for estimating gaze vector when high accuracy and abrupt and free head movements are needed. In order to estimate the point of gaze (POG) with the head-mounted eye tracking system, the angular rotation of the eyes rel- ative to the head, the position of the head relative to the scene and the location of the observed screen are needed. The angular position of the eye relative to the head can be estimated by the head-mounted eye trackers. A single light source is used to generate a corneal glint. A camera is used to measure the eye motion by using the reflected glint from the corneal surface. The 3D position of the head relative to the screen is measured by position sensors placed on the head mounted eye tracker. The position of the screen is defined with respect to a fixed coordinate system. By using a Euclidean transformation between the coordinate systems fixed on the center of the eye and the head, the intersection of the gaze vector with the screen is cal- culated [27]. An example of a head mounted VOG system is the EyeSeeCam VOG system that is shown in Figure 2.2. Most of the head mounted eye tracking systems include a scene camera

10 2.2. Review of Gaze Estimation Methods

Figure 2.2: Image of EyeSeeCam, a head mounted VOG[5] that continuously monitors and records the observing scene based on the head position. Features extracted from the video of the scene are used to superimpose the eye position on the images of the scene, and this allows for realtime viewing of the subject’s point of gaze [29]. The first requirement for the eye tracker is to obtain inputs from the scene so it can calculate the point of gaze (POG) for the displayed scene which is viewed by the user. These inputs could be one or more eye features detection such as the pupil perimeters, the corneal reflections, or the corneal-sclera boundary. The second requirement is proper interpretation of the point of gaze. This means based on the application and usage of the eye tracker, the host system need to read and interpret the input information from the gaze tracker. One main issue with this type of systems is the user discomfort in carrying the device for a period of the time. The other issues are the user restricted mobility due to fact that the device is mounted on the user’s head and requirement for long setup time [27].

Remote EGTs Remote gaze tracking systems reduce the user discomfort and set up com- plexity. Remote EGTs are easier to transport, install and use. Remote eye

11 2.2. Review of Gaze Estimation Methods gaze tracking systems composed of a CPU for data acquisition, a monitor or screen that is used to locate the focus of the subject’s eyes, an imaging camera and infrared light sources(IR). Video based systems require high res- olution images of the eye to accurately estimate the point of gaze. Different algorithms and methods with accuracy of less than 1o have been developed. Camera-based remote eye-gaze trackers rely on properties and features of the eyes that can be detected and tracked by camera. One of the commonly used methods in camera- based trackers use infrared (IR) light sources po- sitioned at a known distance from the camera(s) to generate multiple glints on the cornea. The system is based on using glint reflection generated on the surface of the cornea and detection of pupil position in space, and using a calibration process to estimate the user point of gaze. The proposed esti- mated error in recent systems is about 1o degree. The cornea plays a major role in human optics. Eye gaze tracking systems use the surface of the cornea as a reflective surface to estimate generated glint positions from light sources to estimate the center of corneal curvature. The eye model used in these systems is usually based on Gullstrand model of the eye [2]. This model assumes a spherical reflective surface or a spherical cornea with radius of corneal curvature (denoted as rc). rc is approximated by the population average of the anterior surface of the radius of corneal curvature, which is about 7.8 mm [8]. However, in reality the cornea is not spherical. Geometry of the cornea is complicated. If one looks at a vertical cross section of the cornea, as is shown in Figure 2.3, the radius of curvature of the cornea is shortest on the apex of the cornea, and as we move to the edge of the corneal surface the radius of curvature lengthens, and its surface flattens just before meeting the sclera. In corneal keratometry, the cornea is modeled as a conicoidal shape mostly as an ellipsoid. In current gaze tracking systems the distance between the center of corneal curvature and the pupil center of the eye that is called rp is also estimated based on the population average using the Gullstrand eye model. Different eye models describe optical characteristics of the human eye under different complexity levels. Gullstrand’s eye model is one of the popular eye models used to demonstrate some properties of eye gaze tracking systems. Table 2.1 shows the properties of the boundary surfaces that are in the light path through cornea until retina [1]. Two types of approaches are commonly used in camera-based gaze tracking systems in estimating the point of gaze: the pupil-corneal reflection (P-CR) method and the model-based method.

12 2.2. Review of Gaze Estimation Methods

Figure 2.3: The Cornea is not spherical. The radius of the corneal curvature is the shortest at the apex of the cornea.

Table 2.1: Path of light ray using Gullstrand’s eye model based on a spherical eye and corneal model [1] [2]

Position (mm) Radius of Curvature (mm) Refraction Index

Anterior Surface of Cornea 0 7.7 1.376 posterior Surface of Cornea 0.5 6.8 1.336

Anterior Surface of Lens 3.2 5.33 1.386 Anterior Surface of Core Lens 3.87 2.655 1.406 posterior Surface of Lens 6.528 -2.655 1.385 posterior Surface of core Lens 7.2 -5.33 1.336

Retina 23.89 -11.5

13 2.2. Review of Gaze Estimation Methods

Figure 2.4: The left image is dark pupil image and the right image is bright pupil image. As shown, in dark pupil image the light reflections from the surface of the cornea (glints) are visible.

Pupil-Corneal Reflection Vector Method (P-CR) The P-CR method uses a single cornea reflection. The approximation used in this technique is that the cornea surface is a perfect spherical mirror. When the head is kept fixed as the corneal surface rotates, the glint remains stationary. The difference between the glint and the pupil center is used to estimate the gaze direction. A mapping from the pupil-glint difference vector to the screen is conducted [15].

Bright Pupil Effect: Bright pupil effect is used for pupil detection and tracking. The IR light source illuminates the user’s eye and generates two kinds of pupil images: bright and dark pupil images as is shown in Figure 2.4. The difference between these two type of images is based on the location of the illumination source with respect to the optical axis of the camera. The bright pupil image is produced when the light source is placed coaxial with the camera optical path, then the eye acts as a retro-reflector as the light reflects off the retina and creates the bright pupil effect similar to the so-called red eye effect in photography. If the light source is off the optical axis of the camera the pupil appears dark because the retro-reflection from the retina is directed away. In some P-CR methods the basic assumption is that the mapping from image features to the gaze coordinates in 2D have a specific parametric form such as polynomial. This method uses specific number of calibration points to compute all the coefficients optimum for minimum mapping error and also can be evaluated in real time [30]. During the calibration phase the user is asked to gaze at known points on the screen. The P-CR vector (V ) is then mapped to the user point of gaze (POG, U=(ux, uy)). The mapping is done

14 2.2. Review of Gaze Estimation Methods

Figure 2.5: The P-CR vector is a vector from corneal reflection to the center of pupil. Through 2-D mapping in the calibration phase, the P-CR vector is related to POG on the screen [3] through a second order polynomial which is shown in equations 2.1 and 2.2. The coefficients ai and bi are calculated during the calibration phase. The P-CR vector (V = (vx, vy)) is a vector from on-axis corneal reflection to the center of the pupil in the recorded bright pupil image.

2 2 ux = a0 + a1vx + a2vy + a3vxvy + a4vx + a5vy (2.1) 2 2 uy = b0 + b1vx + b2vy + b3vxvy + b4vx + b5vy (2.2) As is shown in equations 2.1 and 2.2, a simple mapping is used to correlate the 2-D point of gaze on the screen (POG vector) to the 2-D image vector formed from the corneal reflection to the center of pupil. Figure 2.5 illus- trates this mapping [3].

Cherif et al. [31] proposed the P-CR vector method for eye gaze position measurement by using infrared light sources, and a single camera to provide the horizontal and vertical eye movements. The spatial mapping of the gaze position over the displayed image on the camera is done through polynomial transformation of a higher order. It was shown that as the polynomial order increases, the mean square error calculated for each object decreases. Mori- moto et al. [1] used a single CCD camera and two infrared (IR) light sources

15 2.2. Review of Gaze Estimation Methods with a single wavelength that are placed symmetrically around the camera’s optical axis in order to generate concentric corneal reflections. They also utilized a single second order polynomial for the x and y directions sepa- rately and used it for mapping of the glint-pupil difference vector to the point of regard. The glint-pupil difference vector is a vector from the center of the pupil to a glint on the surface of the cornea. The main constraint in this method is that the calibration mapping accuracy decays as the head moves away from its original position. Jacob [32] proposed a PC-R method to estimate user’s point of gaze and used the eye movement as a selection input signal. Instead of using physi- cal input devices to perform generic task of human-computer dialogue, the user’s point of gaze was used to choose one object among several displayed on the screen. If the center of rotation of the eye stays stationary, which means that the user’s head does not move, there is a one-to-one relation between the PC-R vector and point of the gaze on the screen. However, as the head moves, the PC-R vector is not sufficient to provide enough information to estimate the point of the gaze using the PC-R equations 2.1 and 2.2. For this reason some of gaze tracking systems based on the PC-R model suffer from losing their accuracy with head movements and displacements. In order to over- come this issue, Hennessey et al.[33] proposed an improvement in the old PC-R vector technique. He proposed a system to track the corneal reflec- tion pattern that improves the reliability of POG estimation by detecting the loss and distortion of corneal reflections when both head movement and eyes movement cause the reflections to move off the surface of the cornea. He claims the system accuracy improved by a factor of 2.8 times over the traditional method at near distances of the user from the screen and by 1.8 times for far distances, with average error of 2.5cm to 1.3cm. Even though the system Hennessey [33] proposed offers a great improvement in accuracy of the PC-R method, still the main challenges in the PC-R method are its limitations in head movement and displacement by the user, and its higher error rate in POG estimation, compared to model-based systems.

Model Based Gaze Tracking Method The principle used in model based systems to estimate the point of gaze (POG) is based on 3D models of the camera and the eye. A pinhole camera is used and the eye is modeled using a Gullstand schematic model [2]. One of the basic terms used in eye gaze tracking systems often is the line of gaze or optical axis of the eye (OA) which is defined as a line connecting the

16 2.2. Review of Gaze Estimation Methods center of pupil and center of the corneal curvature. The line of sight (LOS) or visual axis (VA) of the eye is defined as a line that connects the point of regard to the fovea. The parameters used for geometric modeling of the eye can be divided into 3 groups, extrinsic, fixed eye intrinsic and variable[34]. The extrinsic parameters model the 3D eye position, the center of the eye- ball and the optical axis. The fixed eye intrinsic parameters include the radius of corneal curvature rc, the angle between the visual and optical axes (horizontal offset: α and vertical offset: β), refraction parameters (e.g, cornea refraction index nc), iris radius and the distance between the pupil center and corneal center of curvature of curvature, rp. The parameters such as the radius of corneal curvature, the corneal re- fraction index (nc), the distance between the corneal center of curvature of curvature and the pupil center, and the angle between the visual and optical axes are subject specific. The variable parameters change the shape of the eye model, such as the pupil radius. Ohno et al. [35], introduced a model based approach using a single cali- brated camera and a single light source (FreeGaze). The novelty of their system is based on using only two calibration points during the calibration phase. FreeGaze detects gaze position by two processes, first the pupil and Purk- inje images are detected from the captured image, then the gaze position is estimated by using an eyeball model. The pupil is detected by using bright and dark pupil images. After the pupil is detected the ellipse fitting is done to estimate the perimeter of the pupil. The Purkinje image is detected by searching for the nearest glint to the pupil. The eyeball model has two personalized parameters: rc (the radius of corneal curvature), and rp (the distance between the pupil and the center of corneal curvature). When the light falls on the curved surface of the cornea some of it is reflected back in a narrow ray. The first Purkinje image or corneal reflection is often referred to as the glint. The position vector c of the center of cornea curvature is calculated in camera coordinate system by equation 2.3. u c = u + r (2.3) c ||u||

Where u is Purkinje image, and ||u||is the norm of u, rc is the radius of corneal curvature. FreeGaze achieved an accuracy of approximately less than 1o, however the

17 2.2. Review of Gaze Estimation Methods system proposed by Ohno et al. [35], had a relatively small field of view (4x4 cm at 60 cm depth), and it used population averages for the radius of corneal curvature rc and the distance between the pupil center and the corneal center of curvature, rp, as well as a constant corneal refraction index nc were used for all users. Shih and Liu [36] proposed a method to estimate the 3D point of gaze using a stereo camera in the 3-D computer vision techniques. They introduced a system using the simplified eye model proposed by Le Grand [37] based on a spherical corneal model. Their system consisted of two cameras and two light sources located at known positions. The geometric relation and physical position between the camera and light sources were calibrated and the values known. They also proposed a method to estimate angular offset between the calculated optical axis (OA), and the user visual axis (VA) to estimate the point of gaze. The angular offset between OA and VA was assumed to be constant for each user and also the center of the eye ball, and the corneal center of curvature was assumed the same. The intersection of the VA with the screen provided the user point of the gaze. The accuracy of their system was claimed to be under 1o. Hennessey et al. [4] developed a system with a single high resolution cam- era, and multiple light sources. Multiple glints on the cornea were used to allow free head movements within the camera field of view. This system estimated a POG based on the 3D model of the camera and the eye. In this system, the parameters of interest were the radius of corneal curvature rc, the distance between the corneal center of curvature and pupil center rp, and the refractive index of the cornea nc are used as constants (by using population averages). The accuracy of his system was estimated at about 1o . The radius of corneal curvature is shortest in the central portion and pro- gressively increases as it moves toward the corners of the eye. This is called peripheral flattening. Since the cornea is an aspherical surface and it’s ra- dius is not constant, there is no unique center of curvature. There are several factors that contribute to generate errors and inaccuracies by current 3D modeling methods that some of them are as follows:

The fovea is modeled as a point in these systems, however the actual fovea is a small region on the retina ( The diameter of the fovea is about 0.4mm or 1o. [15] The angle between the line of gaze and the visual axis of the eye may vary between each fixation point to the next because of the size of fovea,

18 2.3. Basic Optics of the Eye

however it usually modeled as a constant value [15] A spherical model of the cornea could be an acceptable assumption only for small center part of the cornea, however, is not sufficient as we move to the periphery [15], since the actual cornea geometrical shape can be modeled as a conicoid such as an ellipse. The distance between the corneal center of curvature and the pupil center is assumed to be the same for all individuals, however it varies between users.

Nagamatsu et al. [7] proposed a novel gaze estimation model based on an aspherical model of the cornea. They modeled the cornea as a surface of revolution about the optical axis of the eye. In this system they estimated different personalized eye parameters such as: the radius of corneal curvature rc, the distance between center of corneal curvature and center of the pupil rp and the angular offset between visual and optical axes. The accuracy of their system compared to their old system (based on spherical corneal model), was improved especially on the corners of the screen. In the next chapter more details of remote, non-contact gaze estimation methods will be reviewed.

2.3 Basic Optics of the Eye

The eye is an organic structure that is susceptible to environmental influ- ences during its development and it is changed by age and disease. Yet it is still an optical instrument that can be understood in terms of the physical sciences that deal with optics and light. The human eye consists mainly of six regions: the cornea, aqueous humor, iris, lens, vitreous humor, and sclera. Figure 2.6 shows a cross sectional view of the human eye. The op- tical media of the eye are the cornea, the aqueous humor, the crystalline lens, and the vitreous humor. The aqueous humor is the fluid between the cornea and the crystalline lens. The iris of the eye is located anterior to the crystalline lens and is responsible for eye color also varying diameter of pupil that varies between 2 and 8 mm [2]. Light energy coming from an object into the eye, is transduced into neuronal energy by a layer of photoreceptor cells in the retina. The retina is the layer that is light sensitive and is the region where light energy is transformed into neural signals. The retina consists of a thin layer of neural cells that line the back of the eye and is attached to the optical nerve. The retina is placed at the rear interior surface of the eye. It contains recep-

19 2.3. Basic Optics of the Eye

Figure 2.6: Human eye cross sectional view tors sensitive to light (photoreceptor) that constitute the first step of . The retina is often described as consisting of two regions, pe- riphery and central. The periphery is designated for detecting gross forms of motion, whereas the central area is used for visual acuity. In terms of the area occupied by the periphery, it makes up most of the retina and mostly consists of rods. The central retina is rich in cones, and occupies a small portion of the entire retina [38]. Photoreceptors are thought of as transducers that convert light energy to electrical impulse (neural signals). The ability of the retina to detect light and detail in projected images varies across the surface. This is due to dif- ferences across the retina in the type of photoreceptors and their densities. The photosensitive cells of retina are rods and cones. Rods are more sensi- tive and are used most in low illumination condition. Rods are assosiated with the mechanism of the twilight vision.The fovea as a part of retina is the most capable of processing fine details. The diameter of fovea centralis is about 0.4 mm or about 1.3o. The middle part of the fovea centralis is called foveola (about 0.2mm in diameter). The foveola has the highest vi- sual acuity in the eye [2]. The fovea centralis is slightly displaced from the intersection of the optical axis of the eye and the retina. This displacement varies between individuals. The fovea holds 100, 000 cones and the macula or

20 2.3. Basic Optics of the Eye the central retina that is 5000µm in diameter contains 650, 000 cones. One degree of the visual angle corresponds to approximately 300µm distance on the human retina. The diameter of the eye from the anterior surface of the cornea to the retina varies from one individual to another, but it is usually around 24 mm. Op- tically, the eye consist of a series of refractive surfaces defined by transitions between the air, fluid and solid tissue. The cornea is the most important section in the optic of the eye. The ante- rior surface of cornea is convex and its posterior is concave. A large change of refraction between air and the anterior cornea creates a large positive power to the anterior cornea surface that gives the cornea a net converging power. This characteristic makes the cornea the most powerful optical ele- ment in the eye. The distance between the apex of the cornea and center of the rotation of the eye is approximately 14.5 mm. The radius of the cornea is approximated through population averages of about 7.98 mm. The center of rotation of the eye and center of curvature of the cornea do not coincide. The angle at which a stationary source of light is reflected in the cornea changes during a movement of the eye so that corneal reflex moves when the eye moves.

2.3.1 The Cornea The cornea is the anterior surface of the eye and is the major refractive ele- ment of the human eye. The cornea contributes about two-thirds of the total optical power. This is mainly due to the great difference between the refrac- tive index of air and an average refractive index of the cornea(nc=1.377). The central corneal curvature radius is normally measured by Keratometry. The range of corneal radii values in normal eyes is estimated about 6.7 to 9.4 mm with mean value of 7.8 mm. The cornea has two primary functions, refract and transmitting light. Factors that affect the amount of corneal refraction are as follows:

• The curvature of the anterior corneal surface

• The change in refractive index from air to cornea (actually the tear film)

• Corneal thickness

• The curvature of the posterior corneal surface

21 2.3. Basic Optics of the Eye

• The change in refractive index from the cornea to the aqueous humor

Due to the nature of the cornea and its roll in the vision system, the cornea must remain transparent to refract light properly. The presence of even the tiniest blood vessels can interfere with this process.

Asphericity of the Corneal Surface A sphere is a single surface that has a single and constant radius from the single center of curvature. Any section through the center of curvature of a spherical surface is always a circle. An aspherical surface is simply a surface that does not fit into above description and it is not spherical. The radius of an aspherical surface is constantly changing. Therefore, there is no single radius of curvature. The radius of curvature of a curve at a point is a measure of the radius of the circular arc at that point. In most the radius of curvature is shortest in the central portion and progressively increases as it moves toward the corners of the eye. This is called peripheral flattening. Corneas have very complex shapes and there is not a single mathematical shape that can be used to approximate their shape. Different studies about the shape and form of the cornea have been performed [39][40][41][42] [43]. Among those studies, corneal shape was fitted by the conicoid equation with different corneal meridians. The corneal meridian is a line that bisects the cornea through it’s apex. The most popular conicoid equation that best fitted the cornea is an ellipse with an asphericity of the cornea not varying between corneal meridians, but the radius of curvature of the cornea varies between meridians. The optical axis of the cornea is aligned with the major axis of an ellipse. An ellipse profile somehow follows the corneal shape closely where the central portion of an ellipse is more curved and it is flattened as moves toward the periphery. Szczesna et al. [44] describes the cornea as an elongated ellipsoid without rotational symmetry. Figure 2.7 shows the corneal profile and coordinate system used to define the corneal geometry as an ellipsoid.

s 2 2 2 Rx 1 − εx x y z(x, y) = 2 (1 − 1 − ( + )) (2.4) 1 − εx Rx Rx Ry The large axis of ellipsoid coincides with the axis of cornea and lies along the z-axis. Rx and Ry are the average values of the central radius of curvature in ( xz) and ( yz) planes and εx is the eccentricity in ( xz) plane.

22 2.3. Basic Optics of the Eye

Figure 2.7: The cornea is modeled as an ellipsoid whose outer limit cor- responds to the limbus. The approximate eccentricity, and the radius of curvature at the apex varies between individuals. Rc is the radius of corneal q 2 2 curvature, where Rc = Rx + Ry.

23 2.3. Basic Optics of the Eye

Figure 2.8: The sagittal depth is the distance between the flat (bottom) plane at given diameter to the apex of the cornea.

Eccentricity is a parameter used with every conic curve. It is a measure of how much a conic curve deviates from a circular shape. Studies showed the average radius of cornea curvature in the apex of the cornea is about 7.98 mm, the value of eccentricity on healthy eyes is about 0.51 In order to consider rotationally symmetric corneas the central radius of cornea curvature Rx and Ry are assumed equal, which will simplify the equation in ( xz) plane. In another paper [6] the cornea was modeled as a conic section that is defined by a shape factor P .

2r x − y2 P = c (2.5) x2 where rc is the radius of curvature at apex, y is the length of the semi- meridian from the x axis to the surface of the conic section and x is sagittal depth. The sagittal depth of the cornea is the measurement from the flat plane at a given diameter to the highest point of a concave surface (the cornea) as it is shown in Figure 2.8. Now for consistency with figure 2.7 and later equations, if we place the conic section on the z-axis so it is symmetric with respect to the z-axis (see Figure 2.9), and change the equation 2.5 by substituting z for x, equation 2.5 becomes equation 2.6.

2 2 y = 2rcz − P z (2.6)

In equation 2.6 y is the distance from any point on the surface to the z-axis, z is distance from the vertex plane to the position of y and rc is the central radius of the curvature and P is the shape factor. Shape factor is defined by

24 2.3. Basic Optics of the Eye

Figure 2.9: Shape factor for aspheric surface modified from [6].

P = 1 − e2 where e is the eccentricity of a conic shape. Figure 2.10 displays family of curves that are defined by shape factor P .

Since the cornea is an aspheric surface and it’s radius is not constant, there is no unique center of curvature. Most of normal corneas can be approximated as an ellipse that flattens from the vertex to the periphery. The radius of corneal curvature away from the corneal vertex do not remain on the axis of symmetry of the cornea. The center of the curvature lying on the axis of the cornea is the same for all points on the corneal surface that are equidistant from the axis of the symmetry [6].

2.3.2 Axes and Angle of the Eye The human eye is not a centered optical system. The optical axes of the cornea and lens are not coincident. However, we can define an approximation of the optical axis of the eye as the straight line joining the center of corneal curvature to the center of pupil A straight line best fit passes closest to the center of curvature. The optical axis of the eye is also considered a symmetrical axis of the individual’s eye. If the eye was a precision optical system it is useful to define axes in addition to the optical axis and also to

25 2.3. Basic Optics of the Eye

Figure 2.10: Family of conic sections with different shape factor (P), values modified from [6]

26 2.3. Basic Optics of the Eye define various angles formed at their intersection. The line of sight (LOS) or visual axis (VA) is defined as a line that connects the object of regard to the fovea. When we look at an object, the eye orients itself in a way that the observed object projects itself on the fovea. The basic imaging properties such as image size, location and orientation are determined by the location of the cardinal points. The cardinal points are the focal point and the nodal points. Nodal points are located on the optical axis of the lens. The front and rear nodal points have a property that if a ray is aimed at one of them, it will be refracted by the lens in such a way that it appears to have come from the other nodal point with the same angle with respect to the optical axis. The line that connects the fovea region to the object that we are looking at crossing the nodal points of the eye is approximated as the visual axis of the eye. This assumes the visual axis goes from the fovea through the corneal sphere center. The fovea is slightly displaced from the eye ball back pole, that is the reason the visual axis and the optical axis of the eye do not overlap each other. It is estimated the angular offset between the optical and visual axis of the eye is about 5o±1o horizontally in nasal direction as well as 2o-3o in vertical direction. It should be noted that there is a considerable personal varication between individuals. In eye gaze tracking systems these angular offsets between the axis plays an important role in system calibration.

Eye movements All natural main eye movements are used to repositioned the visual axis of the eye on the fovea. Even when a user look at an object, the eye has movements during the perception of stationary objects. In general the eye moves with six degree of the freedom. Three translations with in the eye’s socket and three rotations. Saccades are rapid eye movements that are used to reposition the visual axis of the eye on the fovea as the point of gaze moves. Saccadic movements are voluntary and reflexive. They range in duration from 10ms to 100ms, which is a short duration. Duchowski [8] refers to saccades movements, stereotyped and ballistic. The term stereotyped refers to the observation that some movement patterns can be executed repeatedly. The ballistic term refers to the fact that saccades destinations are programmed. This means the saccadic movement to the next fixation position has been already calculated and cannot be altered. One reason behind this is that saccade execution is quite fast and there is insufficient time for visual feedback to

27 2.3. Basic Optics of the Eye guide the eye to its final position. Microsaccades are small saccade movements during fixations. Microsaccades are made due to the motion sensitivity of the visual system’s single-cell physiology. Micorsaccades are eye movements signals that are random, and varied over 1 to 2 minutes of arc in amplitude. Fixations occur when the gaze rests for a minimum amount of the time on a small predefined area. Fixations are eye movements that stabilize the retina over a stationary object of interest. Fixations lasts between 200ms to 600ms and image formed on the retinas changes constantly due to the microsaccades of the eyes are performed involuntarily. The small eye movements during the fixation are required to continuously refresh the neuron sensors of the eye. Drift is an irregular and relatively slow movement of the axis of the eye, while the user gazes at an object and the fixation point for each eye remains on the fovea. Microsaccades increase as the duration of fixation on a particular point increases a certain length of time (0.3-0.5 sec) or because of the drift the image of the point of fixation becomes too far and removed from the center of fovea [38].

2.3.3 Light and the Eye When light strikes a refracting surface, a small portion of the incident light is reflected. The images of a light source reflected from the surfaces of the eye are known as Purkinje images. Purkinje images have a number of clinical and research uses, such as measurement of the curvature of the refracting surfaces of the eye (Keratomerty and Photokerometry), location of axis of the eye. Purkinje images are designed based on the order of surface from which external light source is reflected. In gaze tracking system based IR cornea reflection, we are only concerned with Purkinje image I that is formed by reflection from the anterior surface of the cornea. More details about the Purkinje images are covered in the appendix.

2.3.4 Ocular Biometry Unique characteristics of the eye’s anatomy such as the reflective surface of the cornea, transparent eye’s convex lens and, flexible pupil enable us to measure the optical components of the eye. The common methods used to measure cornea curvature as well as access corneal contour and profile, are Keratometry and Keratoscopy. Keratometry is a technique used to measure the radius of curvature of

28 2.3. Basic Optics of the Eye the anterior surface of the cornea. Keratometry has a variety of clinical uses such as fitting contact lenses, or it can be used to measure cornea . The basic components of keratometer are (a) an object to be reflected from the cornea, (b) a lens system to give the examiner a magnified view of reflected image, (c) system to keep the reflected image in focus, and (d) a system to measure image size. In Keratometry one common assumption is that the cornea is spherical and based on that assumption radius of cornea curvature is calculated. Keratoscopy is a method that is mostly used to access the curvature and topography of the anterior surface of the cornea. As mentioned, Keratom- etry only measures the radius of cornea curvature assuming the cornea is a sphere; however, Keratoscopy can evaluates almost the entire cornea as well as its asphericity. The clinical use of Keratoscopy include the fitting of contact lenses as well as monitoring anterior surface of cornea changes due to injury or anterior segment surgery.

29 Chapter 3

Hennessey’s POG Tracking Method Used Here

3.1 Background

Gaze trackers are devices that are used in many applications. One of the popular applications of gaze tracking systems is as a computer interface for individuals with severe motor disabilities. Adjouadi et al. [45] described a real time human-computer interface system that allows individuals with se- vere motor disabilities to interact with computers using only eye movements. The user can perform web browsing or execute simple PC commands. Most gaze tracking systems using the P-CR vector method require users to keep their head still or maintain limited motions. In the P-CR method the calibra- tion mapping deteriorates as the head moves away from its original position, and the error is not also uniform around the screen [1]. Jacob et al. [25] tried to solve this issue by giving the user an option to perform manual re-calibration by moving a cursor to the area that needs re-calibration, and clicking on it while looking at the cursor. However, the accuracy of the system still degrades as the user moves his/her head or if the system parameters change after the calibration phase. These limitations make the P-CR based system unsatisfactory for interactive applications. To overcome head movement limitations and complex calibration in a gaze tracking system, a 3D gaze tracking model was developed. The ability to estimate a point of gaze in 3D is becoming important as three dimensional displays are being introduced to the market in order to replace standard 2D displays. Kwon et al. [46] proposed a system to estimate the 3D POG in a virtual 3D environment. This method uses a P-CR vector method along with displacement of the binocular pupil center to estimate the depth of the gaze in a 3D environment. The limitation in this method is the requirement of a fixed head position. In 3D gaze tracking systems, two common features used to compute the point of gaze are pupil and corneal reflections or glints. The front surface of

30 3.1. Background the cornea is not a perfect optical surface; however, it can be approximated as a spherical convex mirror. The position of corneal reflections, commonly are seen as a highlight in the eye, is a function of eye position and orien- tation in 3D space. The brightest reflection of the cornea comes from the front surface of the cornea [27]. For this reason, using the first Purkinje im- age provides better and more robust glint position and subsequent tracking. The pupil can be distinguished from its surrounding iris by its reflection difference. The pupil can appear much darker than its surrounding if it is illuminated by light sources not parallel to the optical axis of the eye. Using light sources parallel to the axis of the camera,a bright pupil image is cre- ated. In this case the pupil looks brighter because most of the light enters the eye along the optical axis and therefore most of the light reflects back from retina. The pupil circumference is extracted using bright and dark pupil imaging. In this thesis, Hennessey’s gaze tracking algorithm [3] based on a single cam- era and multiple light sources was used to estimated the user’s point of the gaze (POG). The next section presents mathematical models for a 3D eye-gaze tracking system. In these models, the gaze estimation method is based on using the pupil and one or more corneal reflections extracted from images captured by a camera.

3.1.1 Eye Model One of the main components in gaze-tracking systems is eye detection. In eye detection it is important to identify a model of the eye in which for simplicity, the large variability in the appearance and dynamics of the eye is not taken to an account. A relatively small change in the angle of view could cause changes in eye appearance. Even though there is much research in eye detection and tracking, eye detection still is a very challenging task due to the many issues, such as occlusion of the eyes by eyelids, different shapes of eyes between different ethnic groups, degree of openness of the eye, or variety of the size of the eye [15]. The eye is a complex optical structure. When light hits the surface of the eye, it is refracted several times as it travels through the eye before it hits the retina. The refraction occurs at each optical interface, where there is a change in the index of refraction from one medium to another. For ex- ample, light travels through the air and hits the front surface of the cornea (air-cornea interface), the back surface of the cornea (cornea-aqueous humor interface), and the front and back surface of the lens (aqueous humor-lens

31 3.1. Background

Figure 3.1: Nodal points of the eye, N1 = first nodal point, N2= second nodal point. Nodal points are points on the optical axis of the eye. If a ray is directed toward one of them it gets refracted by the eye’s lens in a direction toward the other one as if the ray comes from the other nodal point. The nodal axis is a line from the point of the regard to the first nodal point interface and lens-vitreous humor interface). The cornea is one of the most important features of the eye. It is a powerful refractive surface that ac- counts for 60% − 75% of the refractive power of the eye [8]. The anterior surface of the cornea is convex and the posterior surface is concave. There- fore, the inner surface of the cornea has a shorter radius. This property of the cornea will redirect the light that hits the surface of the eye into a converging beam toward the crystalline lens. The eye lens has an ability to change its shape and create a focused image on the retina. Before the light reaches the crystalline lens, it has to pass through the pupil. The pupil acts as a diaphragm that controls the amount of the light that reaches the retina. The human eye is not a centered optical system and thus the optical axis of the cornea and lens are not coincident. This also means that the centers of their curvature do not fall into one straight line [2]. The eye has two nodal points. The nodal points are points on the optical axis of the eye that if a ray is redirected toward one of them the ray will emerge on the other side of the optical axis parallel to its entering direction, and it seems the ray comes from the other nodal point (Figure 3.1). In the eye optical system, if a ray directed toward the first nodal point (N1) at angle θ degrees with respect to the optical axis, it will hit the retina coming from the second nodal point (N2) at an angle equal to θ with the optical axis of the eye. The nodal points N1 and N2 are located near the posterior surface of the crystalline lens. The distance between N1 and N2

32 3.1. Background in Gullstrand model is about 0.25mm [2]. Since the two nodal points of the eye are fairly close, to simplify eye modeling, a single nodal point is shown. The angle α is an angle formed at the first nodal point by the intersection of the optical axis of the eye with a line from the point of the regard to the first nodal point (nodal axis).

3.1.2 Corneal Model The cornea has a very complex shape. It is important to fit the cornea with the nearest mathematical conicoid. Most corneas are better fitted by an ellipse than a circle where the optical axis of the cornea is aligned with the major axis of the ellipse. This orientation means that the ellipse is more curved in the central portion and less curved in the periphery. Figure 2.8 illustrates this. The simplest model of the cornea surface is the spherical model which has been used as a general model in gaze estimation techniques by many au- thors. Young et al.[27] described the cornea approximately as a spherical section, which can be treated as a convex mirror. An object reflected back from this surface forms a virtual image behind the surface of the cornea.

Spherical Corneal Model The simplest model of the cornea surface is the approximation of the cornea as a sphere with radius rc and a center located at c (center of curvature of the cornea). Different eye models have been developed to simplify the com- plex optics of the eye. One of the most important eye models was developed by Gullstrand [18]. Gullstrand used population averages for different eye parameters and estimated the eye as a sphere. The radius of corneal cur- vature was approximated by using the Gullstrand schematic to be 7.98mm and the radius of the eye was approximated to be 12mm[2]. The center of the corneal curvature is approximately displaced 3 − 5mm relative to the center of the rotation of the eye [38]. The center of rotation of the eye and the center of curvature of the cornea do not coincide; therefore, the angle at which a stationary light source is reflected back from the corneal surface varies as the eye moves or rotates. For each glint on the corneal surface, a coordinate system is defined to approximate the corneal center of curvature is defined at that point. The coordinate system is called the glint coordinate system. This coordinate system is explained in section 3.4.3. Using the law of reflection on a spherical surface, a light ray hits the corneal

33 3.1. Background surface at the same angle as it reflects back from that cornea with respect to the normal to the corneal surface. The normal to the spherical corneal surface passes through the center of the curvature of the cornea. For simplicity in calculations the center of the eye and the center of corneal curvature are assumed to be at a single point with this model.

Aspherical Corneal Model The cornea is the anterior optical media of the human eye. The anterior surface of the cornea is convex and the posterior surface is concave. The cornea plays an important rule in focusing images on the retina. Almost two-third of the refractive power of the eye is produced by the cornea [2]. Most of the current model-based approaches estimate gaze direction by as- suming a spherical eyeball and corneal surfaces. This method may simplify gaze estimation. However, the spherical model is not suitable for modeling the boundary area of the cornea. The inaccuracy in the gaze estimation increases as the user looks away from the optical axis of the camera into the corners of the screen or as the glints on the cornea surface move to the boundary or non-spherical surface of the cornea [15]. Different eye models have been developed to simplify the complex optics of the eye and variation of the characteristics between different individuals. Burek et al. [47] introduced different mathematical models of the general corneal surface. They incorporate the characteristics of the cornea into different mathematical corneal models. A reasonable assumption is that corneal surface conforms to a moderately simple geometrical surface such as the sphere. However this is an inadequate model since this model does not consider asphericity of the cornea. In the ellipsoidal model, the surface of the cornea is modeled as a section of an ellipsoid in which one of its axes is parallel with the optical axis of the eye. Baker’s equation generates a family of conic sections as described in equation 3.1. The curve is symmetric about the positive z-axis. If we assume the cornea as a conic curve, the positive z axis is coincident with optical axis of the eye.

2 2 y = 2rcz − pz (3.1) b2 r = c a b2 p = a2

34 3.1. Background

P value Conic Curve P > 1 ellipse with minor axis along z-axis P = 1 circle 0 < P < 1 ellipse with major axis along z-axis P = 0 parabola P < 0 hyperbola

Table 3.1: Different conic curves that are defined by Baker’s equation ac- cording to their p-values

In the above equation rc is the radius of the corneal curvature at the apex, a and b are semi-major and semi-minor axes of a conic curve respectively. The p-value determines the shape of the curve. All conics with the same p-value have the same shape irrespective of their size. Table 3.1 shows full range of conics as defined by Baker’s equations, according to their p-values. Nagamatsu et al. [7] proposed a novel method based on an aspherical model of the cornea that required two cameras. In this system two cameras (C1 and C2) and a light source attached to each camera (L1 and L2) are used. The corneal curvature rc and the distance between the center of the corneal curvature and pupil rp as well as the offset between the optical and visual axes are personalized through the calibration process. For the purpose of obtaining an estimate of rc during the calibration, a spherical model of the eye was assumed since the measurement were made in the vicinity of the optical axis of the eye. As shown in Figure 3.2, light sources L1 and L2 are coaxial with the cameras C1 and C2. The center of corneal curvature A is estimated through intersection of the two lines described in equations 3.2 and 3.3.

0 X1 = C1 + t(C1 + P1) (3.2) 0 X2 = C2 + t(C2 + P2) (3.3) Through the calibration process the distance between the pupil center (B) and corneal center of curvature (A), K= ||B − A||as well as the angular offset between the visual and optical axes are estimated. The novelty of their model is based on estimation of the visual axis of the user after calibration by searching for the center of the corneal curvature. The search ends, when K= ||B − A||. A ray from the light source L1 is reflected back from the corneal surface to 0 nodal point of the camera C1 and creates a glint P1 on the image plane of the camera. Similarly a ray from light source L2 creates a glint P2 on the

35 3.1. Background

Figure 3.2: Cross section of the eyeball showing the center of the corneal curvature along with camera’s nodal points (Ci,Cj) and position of light sources, modified from [7]

0 surface of the cornea and a glint P2 on the image plane of the camera C2. A ray from L1 is reflected at a point P12 on the corneal surface such that it 0 goes through the camera center C2 and creates a point P12 on the camera C2 image plane. Similarly a ray from L2 creates a point P21 on the corneal 0 surface, and P21 on the camera C1 image plane. In order to estimate the radius of corneal curvature, the reflection points P12 = P21 assumed equal.

The radius of the corneal curvature rc is determined as rc = ||Pi − A ||. In this system, using the aspherical model of the cornea as well as person- alizing the distance between the pupil center and the radius of the corneal curvature, the accuracy of Nagamatsu’s system is improved compared to their old model using spherical cornea model with using population aver- ages for radius of the cornea curvature and the distance between the pupil

36 3.2. Coordinate Systems

Figure 3.3: The camera and world coordinate systems. A ray traced from multiple light sources to the surface of the cornea reflects back to the camera sensor. center and cornea center.

3.2 Coordinate Systems

In order to develop an algorithm to estimate the angular offset between the visual and optical axes, different coordinates systems are defined. The coordinate systems used here are right handed Cartesian coordinate systems. The first coordinate system is the world coordinate system (WCS) with the origin at the lower left corner of the display. The Xw axis is to the right in the horizontal direction, the Yw axis is up in the vertical direction and the Zw axis is out perpendicular to the display. Figure 3.3 illustrates the system coordinates with respect to each other. Other coordinate systems used in this thesis are the camera coordinate system and the image coordinate

37 3.2. Coordinate Systems system, which are covered in 3.2.1 and 3.2.1.

3.2.1 Camera Coordinate System(CCS) The camera coordinate system(CCS) is a right handed 3D cartesian coordi- nate system. The camera is placed below and in the center of the display. The Xcam axis of the camera is parallel with the Xw axis of the world coordinate system (WCS). Ycam makes an angle θ with the Y axis of the world coordinate system Yw. The Zcam axis is perpendicular to the plane of the image sensor and is at an angle θ to the Zw of the world coordinate system. In order to convert between the camera and world coordinate sys- tems a transformation matrix is used. The transformation matrix consists of a translation and rotation matrix. The details of this transformation is covered in section 4.2.1, camera calibration.

Camera Lens Model The camera lens model used here is based on a pinhole camera model de- scribed by Forsyth [48]. A pinhole camera is a simplified version of modern cameras, where the imaging surface of the camera is a rectangle. The pin- hole camera can be described as a box with a small hole in one side and a sensor plane in the other side. The camera has two parameters: One is the horizontal and vertical pixel res- olution and the other one is the focal length. The pixel resolution is number of distinct pixels in each dimension divided by the sensor length. The focal length is defined as a distance between the pinhole and the image plane. The image acquisition and analysis is done through perspective projection of the relevant eye feature points onto the image plane.

Image Coordinate System(ICS) The image coordinate system (ICS) is a 2D coordinate system. ICS coor- dinates are defined in Ximg and Yimg. These coordinates are measured in pixels. The origin of the image coordinate system is placed at the upper-left corner of the image sensor. Ximg is parallel to the camera X coordinate system Xcam while the Yimg is in negative direction of the Ycam. Using the definition of the pinhole camera model, the image of the eye cap- tured by the camera is projected from the 3D coordinate system into the camera image plane (2D coordinates). Figure 3.4 illustrates this projection. In the camera, the nodal point is defined as a point on the optical axis of

38 3.2. Coordinate Systems

Figure 3.4: Perspective projection in pinhole camera the camera where all lines that join object points with their respective im- age points intersect. The nodal point of the camera is located at the center of the pinhole. The nodal point of the camera also is known as the center of projection or camera center. The principal point or the image center is referred to a point at the intersection of the optical axis (Zcam) and the camera image plane.

3.2.2 Light Source Model The light sources which illuminate the eye are modeled as point light sources that radiate in all directions. The light sources are infrared (IR) light emit- ting diode (LED) sources. Each light source consist of an array of LEDs that are approximated as a point light source located the center of the array. Since the light sources are modeled as a point light source, their orientation and position is defined with respect to the world coordinate system.

39 3.3. Eye, Camera, Light Sources, in One System

3.3 Eye, Camera, Light Sources, in One System

Having developed models of the eye, camera and light source coordinate systems, a single mathematical model is generated based on the world co- ordinate system. In this system, by defining the gaze direction vector in the world coordinate system and integrating it with information about the objects in the scene, the point of gaze (POG) is computed as an intersection of the gaze direction vector with the screen. A mathematical model was developed to estimate the center of corneal curvature by using the laws of reflection and refraction. These estimations are derived first for the simplest case in which the surface of the cornea is modeled as a spherical surface and later is generalized into a non-spherical cornea model with personalized eye parameters. With a known corneal curvature it is possible to estimate the corneal center of curvature using two light sources and a single camera[4]. However when eye specific parameters are not known, it has been argued that two light sources and two cameras are needed [36]. Shih et al. [49] proved that by using population averages for all eye pa- rameters, a single camera and a single light source, the point of the gaze can be estimated. However this method will limit the user head movement. They argue the need for additional constraints such as additional cameras to compensate for head pose changes in single glint systems. The following sections will describe these methods and formal relations of point features such as the center of the pupil and the center of the corneal curvature with varying geometry and cameras.

3.3.1 One Camera and One Light Source This system was quite common in early gaze tracking systems due to the simplicity. In this method the incident ray, the reflected ray, and the normal at the point of reflection are in the same plane. A regression based method is mostly employed for a single camera and a single light source. Ohno et al. [35] use a 3D model based approach. Their system used a single camera and single light source or single glint. The population average was used for the radius of corneal curvature, the corneal index of refraction, as well as the distance between the pupil center and the center of corneal curvature to estimate the optical axis. Ohno argued that in order to increase the robustness of his system personal calibration is needed. They achieved the system accuracy under 1o of visual angle. The limitation of their method is a relatively small field of view 4x4 cm at 60 cm depth. The common assumption in single glint systems is that the pupil-glint vector

40 3.3. Eye, Camera, Light Sources, in One System remains relatively constant as the head or the eye moves. The location of the glint will obviously change as the head moves. However the change in the glint location is less obvious when shifting the gaze direction. The eye rotates around the eyeball center not around the corneal center of curvature. This means changing gaze direction will move the cornea in space and as a result will change the glint location on the surface of the cornea [15].

3.3.2 One Camera and Two Light Sources The performance of the model-based POG estimation was improved by us- ing multiple corneal reflections. Using multiple corneal reflections improves reliability and robustness of the system. Shih et al. [49] proved that in order to have a head position invariant gaze estimation at least two light sources are needed. Their results explain that in order to have a head position in- variant gaze estimation an additional camera or an additional light source are needed to compensate for head pose changes. Shih also showed that the corneal center of curvature and gaze direction can be estimated in a fully calibrated system setup using two or more light sources. Their systems used population averages for eye parameters. In order to estimate the corneal center of curvature in EGTs using one camera and two light sources, a fully calibrated system is required (e.g known position of the light sources, and camera). Guestrin et al. [50] used a single camera and two light sources. They showed that by using spherical corneal model, the corneal center of curvature and the gaze direction can be estimated in a fully calibrated system with a known corneal curvature. Their system only allows for small head move- ments. However, they argued that their system can be used for a greater head movements using a higher resolution camera. Hennessey et al. [4] implemented a system based on a single camera and multiple light sources. By using two of the glints reflected off the cornea, the center of corneal curvature is estimated and, in turn, also gaze direction is estimated. Advantages in his method are free head movements as well as the larger field of view of the camera, while maintaining high accuracy of the system.

3.3.3 Two Cameras and One or More Light Sources One of the major issues with fixed single camera systems is the dilemma of trading off head movements against high-resolution eye images. In most of EGTs a large field of view is needed to allow for free head movement.

41 3.3. Eye, Camera, Light Sources, in One System

However, in order to capture a high resolution image of the eye, a limited field of view is needed to provide accurate and reliable gaze estimations. One of the solutions proposed for this problem is using multiple cameras to allow for 3D eye modeling. Yoo and Chung [51] described a method based on using multiple light sources and two cameras. Four infrared LEDs are each attached to each corner of the screen. A mapping function is created from the mapping of the generated glints on the surface of the cornea to the point of user gaze. The fifth LED is placed at the center of one of the camera lenses to monitor large head movements. In this method, a simplified eye model based on assuming a spherical cornea with a population average for the radius of corneal curvature is used. The advantage of this method is that it is based on the availability and calibration of the light sources (e.g no camera calibration). When looking at the screen the pupil center ideally falls between the four glint areas, and by utilizing the property of prospec- tive projection, and by exploiting the cross-ratio of four points, the center of corneal curvature is estimated. The reported average error for their system was 0.98o in x-coordinates and 0.82o in y-coordinates. The problem with this method is the use of multiple cameras, which requires multiple camera calibrations. The system complexity in calculation and gaze estimation is also another issue. Beymer and Flickner[34] proposed a system that models the 3D anatomy of the user’s eye and estimates the visual axis through four cameras. This model includes the corneal sphere, the pupil, and the angular offset of the visual and the optical axes of the eye. Two wide angle stereo cameras are used for face detection and two narrow field cameras are used for eye track- ing. The two narrow field of view cameras capture high resolution images of the eye for gaze tracking; however, due to the narrow field of view, quick head motions may not be captured by these cameras. Therefore pan and tilt for cameras are controlled by rotating mirrors and galvanometer motors to orient the narrow field of view cameras. They reported average accuracy for a user sitting about 60cm away from the screen is 0.6o in the gaze direction. Noureddin et at. [52] argues for a two-camera solution where the fixed wide- angle camera uses a rotating mirror to reorient the narrow-angle camera in head movement scenarios. It was shown the rotating mirror will speed up the acquisition process compared to a pan-tilt setup. Although a multiple camera solution seems to provide a more robust ap- proach in POG estimations with respect to the head movements and pro- vides higher resolution eye image. Using multiple cameras causes complexity in the system setup and calibration (stereo calibration), as well as complex geometrical modelings and computations. These systems are faced with the

42 3.4. The Method

Figure 3.5: The high level eye-gaze tracking system block-diagram. In the calibration phase, the angular offset between the optical axis and visual axis is estimated. usual problem of stereo cameras, such as: point matching, occlusion, and more data processing [15].

3.4 The Method

The system configuration used in the research developed by Hennessey [3], is based on a single camera and multiple light source configuration, which was explained in section 3.3.2. A high-level overview of the system is outlined in Figure 3.5. A single camera with an infrared low-pass filter is used to record images of the user’s face as well as right and left eyes, and glints on the surface of the cornea of each eye. The identified images are labeled as right eye or left eye images.

3.4.1 Feature Extraction In real time 3D gaze estimation algorithm, an accurate estimation of the center of the pupil and extraction of locations of the corneal reflections from images acquired by the camera are required to be executed rapidly and ac- curately. The IR light sources used in the system produces two type of images, bright pupil image and dark pupil image. As mentioned in the pre- vious section, the bright pupil image is created by using a ring of on-axis light sources. The center of the ring coincides with the camera optical axis. A set of off-axis light sources are used to generate a dark pupil image as well as multiple corneal reflections [53]. Feature extraction consists of pupil contour extraction.

43 3.4. The Method

The pupil contour extraction is obtained through a pupil detection process. The pupil is a hard feature in the eye to detect because of the low contrast between the pupil and iris boundary. To enhance the contrast between the pupil and iris, many gaze tracking systems use an IR light source. The pupil can be detected from a single thresholding of the difference between the dark and the bright pupil images. The pupil image can be tracked during small head movements by overlapping of the pupil between the images. Morimoto [53] argues that for accurate pupil tracking, the pupil must be tracked using the grey level images, either dark or bright pupil images rather than thresh- old difference images. The reason for using the grey level images instead of thresholded difference is that, the center of the blob detected from threshold difference image corresponds to the center of the overlapping region between to consecutive frames. The overlapping region might be significantly differ- ent from the true center of the pupil in either frames.

3.4.2 Pattern Matching Corneal reflections (glints) on the surface of the cornea are due to the off- axis light sources. The first step in corneal glint extraction from the eye image is pattern matching. As shown in Figure 3.6 the valid corneal re- flections are not distorted due to the boundary between the cornea and the sclera. In the 3D gaze estimation (model-based), two glints are needed for triangulation and estimation of the center of corneal curvature . When the user’s eyes move across the screen if distortion or loss of corneal reflections occur, the corneal reflection falls in the boundary region between the cornea and the sclera or falls off the corneal region. The sclera has a rougher sur- face compared to the cornea and also a different radius of curvature than the cornea. These factors contribute to loss and distortion of the glints. In order to separate non-distorted glints off the surface of the eye from dis- torted glints, based on the positional orientation of the light sources relative to each other a reference reflection is created [54]. This method is based on inter-point distances. The algorithm developed by Hennessey compensates for translation and distortion of the corneal reflections. A valid pattern of four corneal reflections is shown in 3.6. An image is tagged as a valid image if it correctly matches with the reference reflection and the corresponding IR light sources. The reference pattern is created by recording a valid pattern of glints on the surface of the cornea on each eye and the corresponding light source to each generated glints is selected manually. According to Hennessey [54] two glints

44 3.4. The Method

Figure 3.6: Valid corneal reflections consist of four corneal reflections (glints) undistorted on the boundary between the cornea and the sclera. that are located closet to the pupil are selected as valid glints, since there is a less likelihood of distortion due to the boundary between the cornea and the sclera.

3.4.3 Center of Corneal Curvature Estimation The model-based method for POG estimation is designed to compensate for the head motion with not very complex system configuration and algo- rithm. When the light enters the cornea surface of the eye, a part of the light refracted and travels through the eye and the other part of the light reflected by the surface of the cornea. The light which is reflected by the exterior surface of the cornea is called the Purkinje image. The method described here is based on Shih and Liu’s proposed method for estimation of a center of the corneal curvature using a single camera in 3D space [36]. This method later was extended by Hennessey et al. [4] for single camera and multiple light sources. Shih’s proposed method following assumptions were considered: • The corneal surface has a constant curvature 1/ρ (the cornea is as- sumed as a spherical surface with radius ρ)

• The cornea and the eye aqueous humor is modeled as a convex spher- ical lens with uniform refraction index Figure 3.7 displays the orientation of the glint on the surface of the camera with respect to the point light source and camera. A ray traced from the 0 light source (qi) hits the surface of the cornea (gi) and creates a glint image 0 on the camera image plane (gimage). Assume that o denotes the optical

45 3.4. The Method

Figure 3.7: Glint coordinate system for light source qi, the origin of the coordinate system is defined as the camera optical center.

center of the camera, the center of corneal curvature in the 3D plane ci can 0 be estimated by having a glint gi and a light source qi. Shih and Liu noted 0 0 0 that {qi, gimage, ci, o, gi} are in the same plane. Therefore, a glint coordinate system can be defined for each glint and corresponding light source as it is shown in Figure 3.7. The 2D glint coordinate system based on Shih’s model and Hennessey’s modifications is defined as follows:

• The origin of the auxiliary coordinate system is defined at the optical center of the camera o.

0 • The x-axis (Xi) is defined by ray oqi 0 0 0 • The y-axis (Yi ) is orthonormal to Xi and Zi 0 • The z-axis (Zi) is selected in such a way that z-x plane contains a vec- 0 tor (Gimage) that connects the camera optical center (o) to the glint 0 image on the camera image plane (gimage).

The geometry in Figure 3.7 shows a glint coordinate system defined for light 0 source qi and glint gi. In this model, the cornea is assumed to be a sphere 1 0 with constant curvature ρ . The center of the corneal curvature is ci. The

46 3.4. The Method camera parameters and the position of the point light sources need to be calibrated and measured. The glint (gi) on the surface of the cornea due to the first Purkinje image is observed on the camera CCD sensor, denoted as 0 gi. By the law of reflection in basic geometric optics, the incident angle equals 6 0 6 0 π−α−β the reflection angle. Referring to the figure 3.7 Mgiqi = Mgio = 2 . 0 The value of α can easily be determined by knowing the point ogimage through camera calibration and the vector oqi through the infrared light sources calibration. For each glint gi on the cornea surface, it is possible to 0 define the center of corneal curvature ci in the auxiliary coordinate system as follows:  0   0 αi−βi  cix gix − rc.sin 2 0 ciy =  0  (3.4) 0 0 αi−βi ciz gix.tanαi − rc.cos 2 0 In the above the equation rc is the radius of corneal curvature,gix is the glint 0 vector projection along the Xi axis and ci is the center of corneal curvature. Referring to the Figure 3.7, α and β for each glint and light source can be estimated by [4]:

0 0 −1 −Gi.Qi αi = cos [ 0 ] (3.5) || − Gi||.||Qi|| 0 0 0 −1 gix.tan(αi) βi = tan [ 0 0 ] (3.6) Li − gix

In the glint coordinate system, a rotation matrix Ri is defined. This matrix is used to transform points between the auxiliary coordinate system and the world coordinate system. For each glint the center of the corneal curvature is estimated in 3D space, 0 We use the rotation matrix Ri to transform back estimated Ci to the world coordinate system by using equation 3.7.

−1 0 ci = Ri Ci (3.7) The constraint that the center of corneal curvature calculated for each glint on the surface of the cornea must be coincident in the world coordinate system results in equation 3.8 for two light sources and two glints that were imaged on the surface of the cornea [4].

c1 = c2 (3.8)

47 3.4. The Method

3.4.4 Pupil Center Estimation The pupil of the human eye is located at the center of the iris which allows the light enter the retina. The pupil is observed as a black hole, because most of the light that enters the pupil is absorbed by the surrounding tissue inside the eye. The image of the pupil as is seen from outside of the eye, in reality does not correspond to the actual location and size of the pupil because of the magnification effect of the cornea on the pupil image. The image of the pupil captured by the camera is a virtual image of the pupil due to the refraction by the convex lens of the eye. The convex lens of the eye is composed of the cornea and the aqueous humor. In order to estimate the optical axis of the eye, the real pupil center pc is needed. The image of the pupil as seen on the camera image plane is not the real pupil image due to the refraction of light rays on the surface of the cornea and aqueous humor. The center of the pupil can be estimated by using an average of two opposing points on the perimeter of the real pupil. Referring to the Figure 3.8, it can be shown that a ray is defined by 3D parametric equations of a line, where ti is a variable (a scalar), ui is a point on the pupil perimeter, ki is a point on the surface of the camera sensor and Ki is a vector from point ki to the optical center of the lens O:

ui = ki + ti.Ki (3.9)

One of the constraints in modeling the pupil and cornea is that the pupil perimeter points should lie on the surface of the cornea with radius rc and center c, equation 3.10 [3].

2 2 2 2 (uix − cx) + (uiy − cy) + (uiz − cz) = rc (3.10)

Equations 3.9 and 3.10 provides a set of 3 equations with 3 unknowns given the fact that the center of corneal curvature c and radius of the corneal curvature rc are known, which leads to explicit solution for Ui. In order to compensate for the cornea and aqueous humor refraction ef- fects on the pupil image, Snell’s law of refraction is used. The vector Ki is refracted to the eye by using the indices of refraction of both air and the aqueous humor as well as an equivalent rotation angle. The refracted vector 0 Ki then is traced to the real pupil perimeter by using equation 3.11.

0 0 Ui = Ui + wi.Ki (3.11)

48 3.4. The Method

Figure 3.8: The pupil center is estimated through back projection and ray tracing. A point on the pupil perimeter image is denoted by ki and the corresponding vector from the image point to the center of the camera O is 0 Ki. The vector Ki is a refracted vector from point ui to the actual point 0 on the perimeter of the pupil ui. As shown, the distance between the center of pupil Pc and the center of corneal curvature C is rd. The pupil radius 0 (distance from Pc to ui) is noted by rp.

49 3.4. The Method

Adding a constraint that the distance between the pupil perimeter and the center of corneal curvature is constant and is defined by equation 3.12.

0 ||Ui − C|| = rps (3.12) where rps can be easily calculated through equation 3.13 q 2 2 rps = rd + rp (3.13)

In the above equation rd is the distance between the center of pupil and the center of corneal curvature, and rp is the distance between the pupil center 0 and a point on the pupil perimeter (ui).

3.4.5 Optical Axis Estimation Having discussed the determination of the coordinates of the center of corneal curvature and the center of the pupil, the next step is to reconstruct the op- tical axis of the eye in 3D space. The optical axis (OA) is defined as a vector that connects the center of curvature of the cornea(C) to the center of true pupil (Pc).

OA = Pc − C (3.14)

As shown in figure 3.9, the optical axis of the eye is a vector from the center of corneal curvature to the center of pupil. In this model the cornea is assumed to be spherical and eye parameters such as the radius of curvature of the cornea, the distance between the pupil center, the corneal center of curvature, and the index of refraction of the aqueous humor are taken from the population averages.

3.4.6 Calibration Phase There are number of simplifications that are applied in 3D modeling of the eye based on the spherical model of the cornea that contribute to POG inac- curacy and gaze estimation error. Such simplifications are, using a pin-hole camera model to approximate the real camera, or simplified eye model, use of spherical cornea with constant radius of curvature across the corneal sur- face, and the use of population averages for eye parameters. One of the limitations in current gaze trackers is the complex, and timely calibration phase to compensate for these approximations. Advanced gaze tracking systems are trying to eliminate time-consuming and complex cali- bration phase by either estimating gaze direction with no calibration phase

50 3.4. The Method

Figure 3.9: The eye model used to estimate the optical axis of the eye (OA). The center of corneal curvature is located at C and the center of the pupil is at Pc. The optical axis of the eye (OA) is a vector from C to Pc. or gaze estimation by a simple, one time calibration phase. Shih and Liu [49] introduced a novel calibration free model based on the 3D estimation of the optical axis of the eye by using multiple cameras and light sources. Their system is based on the spherical corneal model and using population averages for eye parameters. The accuracy of their system is claimed to be about 1.1o. The drawback of this approach is the complexity of the system with multiple cameras and light sources, and a higher error rate compared to the methods using single camera, multiple light sources and possessing a calibration phase. Model at al. [55] introduced a novel gaze tracking system that does not require a calibration phase based on an active participation of the user. The center of the corneal curvature as well as the optical axis of each eye is es- timated by using a pair of stereo cameras. The angular offsets (α and β) between the visual and optical axes of the eye are estimated using Auto- mated Angles Estimation (AAE) algorithm. The AEE algorithm relies on the assumption that the visual axes of the left and right eyes intersect at the point of the gaze on the screen. The unknown angles αr,αl and βr,βl can be estimated by minimizing the distance between the intersection of the left and right visual axes with the screen surface. In another word, the algo- rithm looks for minimum distance between left and right eye points of gaze (POGs). The developed algorithm is based on a right hand Cartesian coor- dinate system with its origin at the center of the display. In their proposed

51 3.4. The Method method, the corneal surface is modeled as a sphere and the light sources are modeled as point sources.

X r l F (αr, βr, αl, βl) = ||pogi (αr, βr) − pogi(αl, βl)|| (3.15) i The function in equation 3.15 is a non-linear function and numerical opti- mization is needed to solve for αr, αl and βr, βl. In Model’s method a number of approximations were used such as: using a simplified eye model with the population averages for the eye’s parameters, a linear approximation of the function in equation 3.15 by using the first three terms of the Taylor’s series. Even though their system benefits from estimating POGs without calibration, the benefit of not having the calibra- tion phase diminishes in higher POG estimation’s error and introduces more approximation and simplification to the actual modeling of the eye. One of the important aspects of personalizing eye gaze tracker parameters for each user is a proper calibration phase. In our system we use a one-time calibration phase that the user is asked to look at nine points that appear across the screen consecutively. The advan- tages of having a calibration phase is that personalized system parameters are introduced for each user to the gaze tracking system and as a result improve the system accuracy. According to Hansen [15] all gaze estimation methods require one to have a calibration process. Hansen divided the calibration process into following steps:

1. Camera Calibration to determine the intrinsic parameters of the cam- era.

2. Geometric Calibration to determine the relative locations and orienta- tion of parts in the system during the setup such as camera(s), light(s) and the monitor or screen.

3. Personal Calibration is done to estimate the user’s eyes’ parameters such as cornea curvature, angular offset between the visual and optical axes.

4. Gaze-Mapping calibration is to determine parameters of the eye-gaze mapping functions.

The details of the calibration process is described in the next chapter.

52 3.4. The Method

3.4.7 Summary of Point of Gaze (POG) Estimation In the 3D gaze estimation or model-based method, the point of gaze of the user (POG) is based on the 3D models of the eye and the system. One of the main reasons in developing this model is to allow the user free head motion while maintaining relatively high accuracy in estimating POGs. The following provides an overview of steps needed to calculate a user point of gaze:

1. The image location of the glints on the surface of the cornea and pupil perimeter points that are used in steps 2 and 3 are calculated by using image processing techniques.

2. By using two glints on the image of the surface of the cornea, rci is estimated and using triangulation method described by Hennessey [4] the 3D center of cornea (c) is calculated.

3. The center of pupil is estimated by using cornea center(c), rp is esti- mated and the perimeter points of the pupil image are located.

4. Personalized radii of the corneal curvature rci and rp and the offset between OA and VA vectors (or offset between the optical and visual axes) are estimated by one-time per-user calibration phase. 5. The POG is calculated by intersecting the visual axis, (VA vector) with the monitor plane.

As was mentioned previously, we use a single camera and multiple glints model to estimate the user’s point of gaze. The point of gaze is defined as a point of intersection of the visual axis (VA) with the screen that the user is looking at. The screen is modeled as a plane, which is constructed by measuring three corners of the screen and creating a plane equation passing through this points. A line that hits this plane at the point of gaze is defined by equation 3.16 P = c + s.V A (3.16)

In the above equation P is a 3D point (px, py, pz) and by having another unknown parameter s (a scalar) we are left with 4 unknowns. Adding a constraint that the point of gaze (POG) must lie on the plane defined by the monitor that a user is looking at, given the center of corneal curvature (c) is known, we are able to solve the equation 3.16 explicitly for vector P or the user’s point of gaze. As was explained in the previous sections, the center of corneal curvature (c) and the center of the pupil (Pc) are required to compute the optical axis of the eye OA.

53 Chapter 4

Proposed New Calibration Method

4.1 Introduction

The human eye is a complicated optical system with personalized eye pa- rameters such as the radius of corneal curvature, refractive index of aqueous humor, or distance between the pupil center and the corneal center of cur- vature. Different eye models describe optical characteristics of the human eye under various complexity levels [1]. A calibration procedure is required to compute a mapping between the eye orientation, different eye parameters between individuals, and the measure- ments taken by the system. There are two types of calibration : system calibration and user calibration. The system calibration consists of light source calibration, camera calibration, and screen calibration. User calibration requires active participation of the user. From the user’s point of view, calibration complexity of gaze-tracking systems has improved significantly. For example, users do not need to use chin rests during the calibration phase. In other words, the complexity of the calibration process has been shifted from the user to the application developers. In modern gaze-tracking systems, a typical calibration process presents a set of visual targets on the screen that the user must look at while the corresponding measurements are taken. From these measurements, a calibration function and parameters are computed. In some of the current gaze-tracking systems, to simplify the eye’s complex optical system, the simplified version of the Gullstrand eye model [56] is used. Population averages compiled by Gullstrand are used for the eye’s parameters of interest. An obvious advantage of using such a model is less complex eye gaze tracking systems with faster acquisition and process time. However, the drawback of these systems are lower accuracy for different eye models with different eye parameters as well as an increase in the gaze es- timation error as the user gaze moves around the screen especially when it

54 4.2. Calibration Method moves toward corners of the screen. One of the reasons for the decrease in accuracy of POG estimation as the user’s gaze moves to the corners of the screen is that light reflections or glints on the surface of the cornea move to the boundary between the cornea and sclera. As mentioned earlier, the cornea is a protective transparent membrane that covers the iris. On the other hand, the sclera is opaque and fibrous. The sclera is opaque due to the irregularity of the collagen fibers, whereas the cornea has almost uniform thickness and parallel arrangement of the corneal collagen [57]. Given the differences in the texture and optical properties of the cornea and the sclera, as well as different radii of curvature in the boundary of the cornea, as the cornea surface flattens out, the reflection near the boundary surface gets distorted and creates increased error. The purpose of the cali- bration method developed in this thesis is to calculate and estimate different individual eye parameters, such as the radius of the corneal curvature, the distance between the corneal center of curvature and the pupil center, as well as the angular offset between the visual axis and the optical axis of the user’s eye to improve the accuracy and reliability of the system.

4.2 Calibration Method

As discussed in previous sections, a number of simplifications that were used in the techniques applied to estimating the center of corneal curvature and center of the pupil, contribute to a relatively low system precision in gaze estimation. Such simplifications include the spherical cornea model, which means assuming constant corneal curvature across the corneal surface, and using population average for the radius of the corneal curvature and the distance between the corneal center of curvature and the pupil center. We propose a new one-time per-user calibration method to estimate a geo- metric model of the corneal surface, as well as to personalize eye parameters, which will later be used in the POG estimation. In this calibration process, a single camera is used to record images. The user is asked to look at nine different points appearing across the screen consecutively and then press the space bar to set a flag in the data stream. The purpose of pressing the space bar is to indicate to the system that the user is looking at that specific point. Later, that point is used as a reference point and the flagged data will be processed as acquired data. Figure 4.1 shows high-level overview of calibration steps. General calibration of the gaze-tracking system based on single camera and multiple light sources involves the following steps:

55 4.2. Calibration Method

Figure 4.1: The high-level overview of calibration steps. As shown, sys- tem calibration will be covered in section 4.2.1 and user calibration will be covered in section 4.2.2

System Calibration, (Common method used by all EGT sys- tems) a) Camera Calibration Calibration of the camera so it can provide accurate and focused image of the eyes. b) Light Sources Calibration Estimating the positions of the IR LEDs to esti- mate the corneal center of curvature. Each light source consists of a compact group of IR LEDs; however, for simplification it is treated as a point light source. c) Camera Position and Orientation Calibration Measurement of the cam- era position and orientation in the system with respect to the light sources and world coordinate system. d) Screen Calibration Estimation of the geometric properties of the screen.

User Calibration, (Uses a novel method proposed in this the- sis) a) User Calibration to estimate angular offset between OA and VA Determining the angular offset between the visual axis(VA) and optical axis (OA) of the user’s eye. As mentioned, the visual axis (VA) of the eye may lie up to 5o from the optical axis of the eye depending on the location of the fovea. b) User Calibration to estimate eye parameters Estimation of the eye pa- rameters, assuming an aspherical corneal surface. The last two steps require active participation of the user in one time cal- ibration process per user. The rest of the calibration steps needed to be performed only one time and they are user independent.

56 4.2. Calibration Method

4.2.1 System Calibration The system calibration does not involve active user participation and is re- quired only during the system setup. The initial calibration is done through camera and light sources as well as resolving the geometrical properties of the screen or monitor at which the user gazed.

Camera and Light Source Calibration Camera Calibration is done to estimate the camera’s intrinsic and extrinsic parameters. The camera’s intrinsic parameters are the focal point of the camera, the camera field of view, the radial lens distortion, the principal points (i.e. intersection of the camera optical axis and the camera image plane), and the camera CCD sensor effective size and resolution. Intrinsic parameters of the camera were found by using the Camera Calibration Tool- box in MATLAB. The camera’s extrinsic parameters are the translational and rotational pa- rameters for transformation between the world and the camera coordinates; the extrinsic parameters are estimated by measurements of the camera rel- ative position and orientation with respect to the world coordinate system. This step was done by using a ruler and measurement tape. The estimation of position of the IR light sources can be done in two ways. One method is to manually measure the position of an IR light source (Qi 0 =(xled, yled, zled) ) with respect to the world coordinate center (lower left cor- ner of the screen) and then translate it into the camera coordinate center. To determine the relationship between the camera and the screen, first the angular rotation of the optical axis of the camera with respect to the nor- mal to the screen must be determined. A rotation matrix to rotate points in the world coordinate system into the camera coordinate system is described in equation 4.1. The camera center position is also offset from the world coordinate center. The translation vector from the world coordinate center to the camera coordinate center is defined in equation 4.2. Translation vector is described in equation 4.3. Qcam is the 3D coordinates of the light source Qi in the camera coordinate system. camorg represent 0 the position of the camera (xc, yc, zc) in the world coordinate frame. Rcam rotates the world coordinates to the camera coordinates. and the θcam−angle in the angle between the camera optical axis and a normal to the screen. Tcam transforms the camera position in the world coordinates to the camera coordinate frame.

57 4.2. Calibration Method

1 0 0 0 0 cos(θcam−angle) sin(θcam−angle) 0 Rcam =   (4.1) 0 −sin(θcam−angle) cos(θcam−angle) 0 0 0 0 1 1 0 0 x 0 1 0 y camorg =   (4.2) 0 0 1 z 0 0 0 1

Tcam = Rcam ∗ camorg (4.3)

Qcam = Tcam ∗ Qi (4.4)

Although not used here, a second method uses the camera to determine the 3D coordinates of the IR LEDs. Since the light sources are not located in the field of view of the camera, the camera cannot observe them. A pla- nar mirror with reference markers on the surface can be placed in front of the camera. The camera can observe the reference markers as well as the IR LEDs located beside them. The 3D position of the markers can be used to reconstruct the reflection surface of the mirror; as a result of using the LED’s reflection, the actual 3D position of the IR LEDs can be determined as well[36].

Estimation of Geometric Properties of the Screen The geometric properties of the screen is comprised of two parameters. The first parameter is calibration of the position, and orientation of the screen with respect to the camera. In order to estimate the intersection of the user’s visual axis (VA) with the screen, the 3D position of the three corners of the monitor with respect to the world coordinate origin is measured. Using the 3D positions of the three corners of the screen a 3D plane representing the screen is constructed. The second parameter is the 2D window coordinates (in pixel) and transformation of a 2D point on the screen surface into the corresponding point in the world coordinate system in centimeters. A lin- ear transformation from 2D screen coordinates into 2D world coordinates is done by using the resolution of the monitor in pixel and geometrical dimen- sion of the monitor. Equations 4.5 and 4.6 display this relationship.

58 4.2. Calibration Method

xwidth(cm) Xcm = Xpixel ∗ (4.5) XW idth(pixel) xwidth(cm) Ycm = Ypixel ∗ (4.6) YW idth(pixel) The above equations are used to transform a point on the screen in pixel format (Xpixel,Ypixel) to a point in the world coordinate system (xcm, ycm). In these equations xwidth and ywidth are the screen’s physical dimensions, which are 38cm and 30cm and XW idth and YW idth are the screen resolutions in pixel format, which are 1280 and 1024.

4.2.2 User Calibration After the system calibration is done, the next step is user calibration. A set of visual targets that the user must look at while corresponding measure- ments are done are shown on the screen. In total, nine points are shown consecutively across the screen; the user is asked to gaze at each point and press the space bar on the keyboard to move to the next point. The calibra- tion process is automated, and the system detects when it switches to the next point on the screen. As mentioned before, a number of simplifications and approximations are employed in modeling the eyes and the camera. These factors contribute to the point of gaze inaccuracies. The user calibration is done once by the user; however, the calibration pro- cess follows two main steps. One step is the eye model parameter estimation based on an aspherical model of the cornea. The second step is the angular offset estimation between the visual and optical axes of the eye.

Eye Parameter Estimation Based on an Aspherical Corneal Model Eye parameter estimation is the first calibration process in user eye pa- rameters calibration. In this process, an aspherical model of the cornea is assumed. The cornea is normally approximated as a uniformly spheri- cal surface with two parameters selected from population averages (rc, rp), where rc is the radius of the cornea curvature and rp is distance between the corneal center of curvature and the pupil center. The use of population averages for the eye parameters is one of the ma- jor contributors to inaccuracies in point of gaze estimation. Bohme et al. [17] present a simulation framework based on the optical model of the eye.

59 4.2. Calibration Method

Bohme investigated how the inaccuracy of the eye tracker is influenced by the eye model parameters. It was shown how the deviation from the actual value of the corneal curvature radius rc by using the population average will affect gaze estimation accuracy. To show this effect, Bohme used different radii of corneal curvature for a specific user with a known radius of the corneal curvature. The corneal curvature radius distribution was assumed ±%25 of the assumed rc value that was used in the simulator. According to Bohme, the effect of the gaze estimation error by varying corneal curva- ture radius rc reaches almost two degrees of error for a ±%25 eye parameter change. Nagamatsu et al. [7] proposed a novel gaze estimation method based on an aspherical cornea model. In this model, the surface of revolution of the cornea is about the optical axis of the eye. In their proposed system, two cameras and a light source attached to each camera are used. Their main focus in the proposed system is to estimate user-dependent eye parameters during the calibration phase. In order to confirm their effectiveness of their method and improvement, they compared their new method results with the older method they had proposed based on the spherical corneal model [58].

Estimation of Angular Offset Between Visual and Optical Axes Estimation of the angle between the 3D visual axis and the optical axis is done based on a method proposed by Hennessey et al. [54]. The raw gaze estimates contain residual errors on the order of several degrees, due to the offset between the visual axis and optical axis of the eye, as well as using population averages for eye parameters. (e.g., radius of the corneal curva- ture, or the distance between the center of the corneal center of curvature and pupil center). After the gaze-tracking system has been calibrated, the optical axis of the eye is estimated. However, as noted earlier, the optical axis of a user’s eye is usually different from the visual axis of the eye. The user’s point of gaze cannot be determined unless this relationship is known. The average mag- nitude of the angle between the optical axis and the visual axis is 5o [59]. This angle has both horizontal (nasal) α and vertical components β, which exhibit interpersonal variation. The optical axis (OAi) and visual axis (VAi) of the eye at each calibration point are normalized and converted to spherical coordinates. The conversion between the Cartesian coordinate system to a spherical coordinate system

60 4.3. Calibration Method Based on Personalized Eye Parameters and Aspherical Corneal Surface

is shown in equation 4.7, where θi is the angle a vector makes with x-axis and φi is an angle a vector makes with z-axis.   sin(φi) cos(θi) OAi(φi, θi) = sin(φi) sin(θi) (4.7) ||OAi(φi, θi)|| cos(φi)

The angular offset correction βi and αi can be determined by using the parametric equation in 4.8. The parameters Pi, Ci, t, φi and θi are known which make this equation solvable for αi and βi. In this equation Pi is the calibration point, Ci is the estimated center of the corneal curvature.   sin(φi + βi) cos(θi + αi) Pi = Ci + t. sin(φi + βi) sin(θi + αi) (4.8) cos(φi + βi) It should be noted here that, Hennessey et al. [33] introduced a novel model to compensate for the pin-hole camera model used to approximate the real camera and lens and compensate for the angular offset between the visual and optical axes. A one-time calibration method used by Hennessey is based on the weighted offsets by applying a weight to the angular offset (α, β) between the visual axis and the optical axis of the eye, which depends on the position of the user’s fixation on the screen. The calibration targets, mi, are the points on which a user fixates his/her gaze during the calibration. The calibration consists of computing the error in the estimated point of gaze ni for that specific calibration target. Recognizing that there is only one angle between OA and VA that resolved into two components (α and β) and not one angle for each calibration point as in the Hennessey’s method, new approach for estimating that angle was developed in this thesis.

4.3 Calibration Method Based on Personalized Eye Parameters and Aspherical Corneal Surface

We propose a new one-time per-user calibration method to estimate a ge- ometric model of corneal surface, as well as personalized eye parameters, which will later be used in POG estimation. A high-level overview of the proposed system calibration is outlined in Fig- ure 4.2. The first step in processing raw data from the eye tracker is the calibration process. The eye tracker calibration procedure produces a uni- form mapping to be applied to the whole screen.

61 4.3. Calibration Method Based on Personalized Eye Parameters and Aspherical Corneal Surface

Figure 4.2: The high-level calibration process in eye-gaze tracking system block diagram. One-time calibration data acquisition is done by instructing the user to gaze at different target points across the screen. The α and β are angular offsets between the visual and the optical axes of the eye, Rc and Rp are the radius of the corneal curvature and distance between the pupil center and the center of corneal curvature, respectively.

The calibration procedure involves presenting the user with a set of visual targets consecutively on the screen; the user is asked to fixate his/her gaze on each point and move on to the next point by pressing the space-bar key on the keyboard. The number of calibration points on the screen are chosen in a way to cover the whole screen. Image acquisition is the first step in remote gaze estimation. In this step, the user’s eye orientations in the 3D space are tracked and recorded through a video processing technique. Recorded images are sent to the next stage, which is the image processing stage. In this stage, image features are ex- tracted from the recorded images. The eye feature extraction is the next step after image acquisition, in which the location of the corneal reflections or glints as well as pupil perimeter points are extracted. The pupil perimeter is segmented from the difference between the bright and the dark pupil images. In order to extract glint

62 4.3. Calibration Method Based on Personalized Eye Parameters and Aspherical Corneal Surface

Figure 4.3: A high-level over view of the calibration phase I. The main objective in this phase is initial estimation of angular offsets (α and β) between the visual and the optical axes of the eye.

positions on the eye images, the surface of the cornea is treated as a convex mirror where the light beams hit the exterior surface of the cornea are re- flected back. The corneal reflections are found by searching the dark pupil image for high-intensity pixels located in close proximity of the detected pupil. A novel algorithm proposed by Hennessey et al. [54] is based on corneal reflection matching. Point pattern matching is used to match a ref- erence pattern of known valid corneal reflections with the remaining visible corneal reflections. In this model, to minimize the lost or distortion of the glints on the surface of the cornea due to the eye rotation or falling glints on the boundary between the cornea and the sclera, two corneal reflection closest to the detected pupil perimeter are selected.

4.3.1 Calibration Phase I The first step in our proposed calibration process is the calibration phase I in which the main objective is an initial estimation of the angular offset between the visual and optical axes. A high-level overview of the process is shown in Figure 4.3. When a user looks at a target point, although the surrounding world seems fixed, the head and eyes are continuously in motion. A fixation occurs

63 4.3. Calibration Method Based on Personalized Eye Parameters and Aspherical Corneal Surface

typically when the gaze rests for some minimum amount of the time on a small area. Fixations typically remain within 1o of visual angle and last from 200ms to 600ms [60]. During a single fixation, when a user looks at a single object, the eyes are not exactly fixed; small jittery motions occurred. These involuntary small eye movements are called microsaccades. Microsaccades further limit the practical accuracy of the eye tracking. They are further developed during prolonged visual fixation. Microsaccades are miniature version of saccades. Saccade is the fast movements of the eye. Saccades serve as a mechanism for rapid eye movement and fixation. They most frequently shift from 1o to 40o the visual angle and last 30ms to 120ms. Between each saccades there is typically a 100ms to 200ms delay [25]. Because these small jittery motions are involuntary the observer is mostly unaware of them. To stabilize the user fixation on an object, we used a low- pass averaging filter based on a moving averaging window to filter the eye feature raw data and eliminate high-frequency jitter. The low-pass averaging filter uses a predefined number of raw data samples during the fixation period, calculates a mean value of the samples and estimate the new fixation position. The reason for choosing the low-pass filter was to compute the average of the points that are within the filter. Therefore, the filter smooths the data and reduces the noise that was generated due to the microsaccadic movements of the eye during the fixation. After the glints and pupil perimeter are extracted from the recorded image, a low-pass averaging window is applied to the raw data to obtain a better estimation of the location of the user fixation point. Corneal and pupil centers are estimated based on the methods described in sections 3.4.3 and 3.4.4. The pupil center is estimated by averaging at least two opposing points on the real pupil perimeter. As discussed previously, the pupil image seen by the camera is a refracted image of the pupil due to the corneal surface and aqueous humor inside the eye. By using Snell’s law of refraction and indices of refraction of both air and aqueous humor, the real pupil perimeter is calculated, equation 3.11. By using the corneal reflection pattern matching, two glints that are located closest to the center of the real cornea are selected. The corneal center of curvature estimation is discussed in section 3.4.3. In calibration phase I, in order to estimate the center of corneal curvature, the radius of the cornea is assumed based on the population average. The optical axis (OA) of the eye by definition is a vector joining the center of corneal curvature to the center of the pupil. The optical axis of the eye does not necessarily hit the fovea on the back of the eye (the retina). The fovea is a region that concentrates most of the color-sensitive cells and is

64 4.3. Calibration Method Based on Personalized Eye Parameters and Aspherical Corneal Surface

responsible for the perception of fine details. The visual axis of the eye is a line from the fovea to the object of regard. In the calibration phase the visual axis of the eye is assumed to be a line from the calibration target or point on the screen to the center of corneal curvature. VA = Tcalib−point − Cc (4.9)

In the above equation, VA is the visual axis of the eye, Tcalib−point is the calibration point on the screen, and Cc is the estimated corneal center of curvature, which is calculated by using equation 3.4. The angular offset between the visual and optical axes is defined in verti- cal and horizontal directions. For each of the N calibration points on the screen, the visual and optical axes of the eye are estimated. These axes are normalized and transformed into the spherical coordinate system. By using equation 4.8, the horizontal angular offset (αi) and vertical angular offset (βi) are calculated for each calibration point. As mentioned previously, the fovea is a region in the retina with the high- est sensitivity and capability to process fine details. The diameter of the fovea centralis is 0.4mm (i.e, about 1.3o). To estimate a more accurate an- gular offset, the mean value of the estimated α1..N and β1..N is calculated, equation 4.10.

PN α α = i=1 i (4.10) N PN β β = i=1 i (4.11) N The estimated angular offset in calibration phase I serves as an initial values for estimating the actual eye’s parameters.

4.3.2 Calibration Phase II Overview The main objective of our proposed new calibration method is to estimate the user’s eye parameters based on one-time per-user calibration. As dis- cussed previously, some of the current gaze-tracking systems ( e.g. [4],[53] [61]) use population averages for the eye parameters to facilitate the cal- ibration process. One approach is experimental analysis of the eye and measuring the eye’s exact parameters. For this purpose, a method was in- vestigated in the early stage of this research by using a stereo camera. The stereo camera was placed under the screen with a light source around the

65 4.3. Calibration Method Based on Personalized Eye Parameters and Aspherical Corneal Surface

screen. The camera is modeled as a pinhole camera. Through the calibra- tion process the position of the light sources and the intrinsic and extrinsic cameras parameters are measured. The main objective in using a stereo camera is to measure a 3D geometry of the cornea surface by projecting Placido disk on the surface of the cornea. Although using the stereo camera and Placido disk to reconstruct the cornea surface seems logical and practical, this method suffers from several limita- tions. These limitations are as follows: Eye feature detection Given the size and orientation of the eye compared to other facial features, we need a high-resolution stereo camera as well as a better filtering process to remove unwanted noise from captured images. This will result in a more complex imaging system.

Small angle of view vs larger effective focal length As mentioned ear- lier, in order to measure deflection of the concentric rings on the surface of the cornea for 3D modeling of the cornea, a high-resolution image of the eye as well as higher magnification of the eye’s image are needed. The camera’s angle of view is proportional to the reciprocal of the effective focal length of the camera, (equation 4.12; therefore, by in- creasing the accuracy of the image we lose the advantage of a wider field of view and larger movement of the user’s head in depth. d α = 2 arctan (4.12) 2f In the above equation, d represents the size of the camera CCD sensor in the direction measured, and f is the effective focal length of the camera.

The image of the user’s eye falls out of the binocular region of cameras The cornea is described as a prolate (flattening) ellipsoid. To measure the geometry of the cornea, the change in the reflected mires compared to the projected mires needed to be studied. This could be limited by partial closure of the eye due to the eyelids, or involuntary movements of the eye during the fixation. To measure more than the central sur- face of the cornea it is necessary to use a large enough Placido disk surface to cover the whole eye exterior surface and a shorter work- ing distance between the user and the cameras. The main issue with shorter distance to the cameras may result in loss of the eye’s images by falling out of the binocular field of view. More details are described in the appendix.

66 4.3. Calibration Method Based on Personalized Eye Parameters and Aspherical Corneal Surface

Even though measuring and reconstruction the cornea surface seems an ideal approach to obtain each individual’s eye parameters, this approach requires a substantial amount of work and equipment as well as a precise system setup and system calibration. The details of this method and experiment are explained in Appendix A. The alternative approach, which we propose as the phase II calibration al- gorithm below, is estimating the eye’s parameters by statistical modeling through least squares curve fitting. This approach allows an accurate, fast calibration and low-cost implementation compared to the other approach. Given the fact that actual geometry of the cornea is not spherical, the cali- bration method presented here is based on the estimation of different radius of corneal curvature as the user moves through the fixation points. The system also estimates a unique personalized rp (i.e. distance between the center of cornea and the center of pupil) for the subject. At the end,a sim- plified geometrical model of the corneal surface as well as eye parameters are calculated on the calibration phase. By applying these personalized eye parameters, enhancement on the system accuracy is evaluated experimen- tally.

Calibration Phase II Algorithm A high-level overview of the proposed calibration system is outlined in Fig- ure 4.4. The main objective in this system is to estimate the user’s eye parameters by a least squares error fitting model. The angular offset initialization is done through calibration phase I. Pre- liminary values for the angular offset between the visual and optical axes of the eyes are estimated to serve as an initial guess to generate successive

approximation of the eye’s parameters. The RppopAvg and RcpopAvg are popu- lation averages for the distance between the center of corneal curvature and center of pupil. The Termination point is set as an error value greater than , where  is set to get the lowest possible error and fastest acquisition time. Finding  was done through mathematical optimization. The average error is defined as a mean value of the error across N calibration points.

PN ||RealP OG − EstimatedP OG|| error = 1 (4.13) avg N The variable range of the radius of corneal curvature in normal eyes is about 6.7mm to 9.4mm with the mean value of 7.8mm [42]. Eysteinsson et al.

67 4.3. Calibration Method Based on Personalized Eye Parameters and Aspherical Corneal Surface

Figure 4.4: The high-level system overview for calibration phase II

[62] established a population profile of the radius of the corneal curvature. They found the standard deviation (σ) of rcornea to be about 0.6mm. This corresponds to 8% of the corneal curvature mean. If the corneal curvature variation is assumed to be a normal distribution, therefore, 99% of the adult population has a corneal curvature within 5.9 mm to 9.8 mm or 25% of the corneal curvature population average (3 standard deviations from the mean, 3σ). For this reason, the value of the corneal curvature during the iteration phase was varied in ±25% of the population average. The variable range of the distance between the corneal center of curvature

68 4.3. Calibration Method Based on Personalized Eye Parameters and Aspherical Corneal Surface

and the pupil center rp in normal eyes is between 3.0mm and 5.0mm with a mean value of 4.2mm [63]. The same rule that was described for the radius of the corneal curvature variation applies to the rp. The variation of rp is assumed to be a normal distribution and during the iteration phase was varied in ±25% of the population average. The user eye parameter estimation is the final step in the calibration phase II. The aspheric nature of the cornea requires one to estimate different radii of the cornea curvature across the cornea surface. Kiely et al. [42] discussed the variation in the radius of the cornea curvature and asphericity between different individuals. In that work, the cornea was divided into four merid- ians passing through the center of the pupil. It was shown that the flattest radius of the curvature was across the horizontal meridian (nasal direction) of the eye; the steepest radius of the curvature was found along the vertical meridian of the eye. It was also argued that the radii of curvature varied between the meridian whereas the asphericity of the cornea does not vary between meridians. Referring to Kiely’s finding regarding the radius of the corneal variation across the corneal surface, the calibration was designed to estimate different radii of the curvature across the corneal surface while the user is looking at different target points. A different number of target points and subsequently number of estimated radii of the corneal curvature were evaluated to optimize the system per- formance and accuracy. The calibration process was tested on 5, 9, and 25 points. By using 5 calibration points, it was observed that the accuracy of the system was worse compared to the 9 and 25 calibration points. Be- tween the 9 and 25 calibration points there was no statistically significant difference. Therefore, the 9-points calibration procedure was selected for the subsequent procedure to maximize the accuracy and minimize the time re- quired for the calibration process. The 9-points calibration procedure yields 9 different radii of corneal curvature, a single distance between the center of corneal curvature and the center of the pupil, as well as angular offsets α and β between the user’s optical and visual axes. The best-fitting eye parameters with the lowest generated error and greatest accuracy then are used as personalized eye parameters for the real-time gaze tracking. The stepwise algorithm for the calibration phase II is shown in Figure 4.5. As discussed earlier, the radius of the corneal curvature rc varies between 6.0mm and 1.0mm which is almost in the range of ±25% of the population average for that (7.98mm). Also, the distance between the pupil center and the corneal center of curvature varied between 3.2mm and 5.25mm, which

69 4.3. Calibration Method Based on Personalized Eye Parameters and Aspherical Corneal Surface

Calibration Phase II Input: Array of fixation points (fixpts[1..9]) Actual POG Glint positions at each target points Pupil Perimeter points at each target points Initial estimated values for α and β Procedure: while((average Error= abs(estimatedPOG- actualPOG)÷ size(fixation points))≥ 0.2)(equation 4.13) do for (rpj varies between 0.32 cm and 0.52 cm) //distance between cornea center and pupil center for (rci varies between 0.6 cm and 1.0 cm) //radius of cornea for (calibpoints varies between fixpts[1] to fixpts[9]) // estimate rc, rp Estimate center of corneal curvature (equation 3.4) Estimate Center of the pupil (equation 3.12) Estimate OA (the line of gaze or optical axis) equation 3.14 Correct estimated OA using average angular offset from phase I Find POG , p = c + t.OA //estimate a minimum error at each fixation point and corresponding cornea radii error(rci ) = ||estimatedPOG- actualPOG || Find the minimum of the least square root (error(rci )) end for //end of for-loop through calibration points end for Estimate a new center of cornea for the estimated cornea radius at each fixation point Estimate a new center of pupil Estimate OA Find POG, p = c + t.OA (equation 3.16) Estimate the angular offsets between VA and OA at each fixation point(αi, βi) equation 4.7 PN PN 1 αi 1 βi Calculate mean values for α1..N and β1..N , α = N and β = N equation 4.10 error (rpj ) = minimum least square root ||estimatedPOG- actualPOG || end for Estimate center of corneal curvature using the best cornea radii for least square root error Estimate Center of the pupil Estimate OA Correct OA using the angular offset for the calculated minimum error Find POG , p = c + t.OA end while

Figure 4.5: The Calibration Phase II

again is in range of ±25% of the population average for this distance (0.42).

70 Chapter 5

Experimental Method and Results

5.1 Introduction

In this chapter, experimental results obtained with a prototype system are presented. The purpose of this experiment was to evaluate the system accu- racy based on the new calibration method. To facilitate this comparison, the data collected from each user was processed by the old system and the new system. As discussed in previous chapters, the old system (i.e., Hennessey’s method [4][3]) was based on using a spherical model of the cornea with pop- ulation averages for the eye’s parameters. The new method discussed in this thesis is based on an aspherical cornea model with personalized eye param- eters. An obvious question is how deviation from the actual eye parameter values for a specific user by using population averages will affect the gaze estimation accuracy and how the new calibration method compensates for this. The key goal of 3D point of gaze estimation is to allow the user to have free head motions. In this experiment, the user’s head position was not fixed. The user was allowed to freely move his/her head with respect to the cam- era. In the first section of this experiment, the study was conducted through a set of data that was collected by Hennessey [3]. The 7 subjects for this study were selected with age ranging from 24 to 31 years old. In the experimental procedure, each test subject started with 9 calibration points; then a new set of 9-point data collection procedure was taken. The collected data was processed through the old system proposed by Hennessey [3] as well as the new system proposed in this thesis; the results were compared. Evaluation of the eye-gaze tracker performance based on the new calibra- tion techniques was performed to determine effectiveness of the proposed method. The system consisted of a single camera and four light sources on the screen and one light source on the camera axis.

71 5.2. Experimental Methods

Firstly, the new calibration system was applied to data collected by Hen- nessey [3] from 7 subjects to evaluate the potential of the new calibration method. Then a second set of 8 subjects was recruited in this work to confirm the previous experimental findings independently. Due to the fact that the ex- perimental equipment had been somewhat reconfigured between the two sets of experiments (light sources and camera position and orientation slightly perturbed and no further system calibration step was carried out) it was decided to treat the two sets of data as independent experiments. The performance of the system was evaluated with adults in the age range of 24 to 30 years old. Subjects with different ethnicity (Caucasian, Middle Eastern and Asian) were five males, and three females. The user’s point of gaze was evaluated by using 3D eye modeling. Two different calibration methods were applied to the collected data. The first method was based on the old calibration method by Hennessey et al. The second method was based on the new calibration process discussed in this thesis. In each method, the user’s POG was evaluated, an average error was computed, and the results were compared.

5.2 Experimental Methods

5.2.1 System Design The prototype system consists of a single monochrome DragonFly Express from Point Grey Research. The camera resolution is 640 x 480 pixels at a frame rate of 200 Hz. The on-axis light source consists of eight IR LEDs coaxial with the camera optical axis. The on-axis light source is used to generate a bright pupil image. The off-axis light sources are used to generate a dark pupil image. The off-axis light sources consists of a ring of seven closely spaced IR LEDs. The selection and placement of the light sources was chosen experimentally in such a way that at least two valid reflections form the surface of the cornea were available at all orientations of the eye, as the user’s gaze moves across the screen. A microcontroller was used to synchronize the camera shutter with alternating on-axis and off-axis light sources. The computer screen was a 17” LCD with a resolution of 1280x1024 with physical dimension of 38cm in width and 30cm in height. The data acquisition process was performed in C++ code, and the analysis of the recorded data was performed in the MATLAB environment. The camera lens was calibrated with the MATLAB Camera Calibration toolbox, while the physical locations of the camera and light sources with respect to

72 5.2. Experimental Methods the screen were measured manually. In order to reduce and smooth out noise from the system, ambient lighting and the inherently jittery eye motions ( as addressed in the section 4.3.1), a rectangular low-pass filter (moving averaging window) with a filter sample size of 10 was used. The eye gaze tracker system setup is shown in Figure 5.1.

5.2.2 Accuracy Metrics The accuracy of eye-gaze trackers is usually reported in degrees of visual angle. However, the actual calculated error is measured in terms of pixel error. The error is estimated from the target point or actual POG to the estimated POG. The accuracy of the system is preferred in terms of the visual angle because the accuracy of the gaze estimation is independent of the screen size and resolution. In this thesis, accuracy will be estimated in pixels and reported in centimeters (cm) and later will be converted to degrees of error for comparison. In order to convert from pixel error to centimeters (cm), the resolution and geometry of the screen is required. The geometry of the conversion from screen pixel view to the degrees of visual angle is shown in figure 5.2. To convert calculated error from screen pixels to centimeters, a conversion was used that is shown in equation 5.1 and 5.2. 38cm ∆X = ∆X ∗ (5.1) cm pixel 1280 33cm ∆Y = ∆Y ∗ (5.2) cm pixel 1024 Equation 5.3 is used to estimate the error in degrees. The angular error is calculated as an angle between two vectors. One vector from the es- timated center of corneal curvature to the target point POGreal and the second is from the center of corneal curvature to the estimated point of gaze POGestimated. 180 POG POG ∆θ = ∗ arccos (( real ) ∗ ( estimated )) (5.3) π kPOGrealk kPOGestimatedk

5.2.3 Methods In the first section of this experiment the study was conducted on the old data collected by Hennessey [3] from 7 subjects. In order to achieve more robust results, a set of new data was also collected. The experimental 8

73 5.2. Experimental Methods

Figure 5.1: In this figure, the gaze tracking system is shown. The camera is located under the screen. The off-axis light sources are placed at known distances with respect to the corner of the screen. The on-axis light source is placed around the camera lens.

74 5.2. Experimental Methods

Figure 5.2: Conversion of the pixel error to the angular error. Actual POG is the location of the target point on the screen, estimated POG is estimated point of the gaze, ∆X is error in x direction, ∆Y is error in y direction, and θ is the angular error. subjects were adults, 5 male and 3 female subjects with age range of 24 to 32 years old. The results of the new and old calibration methods then were compared for both sets of data. The proposed calibration algorithm was tested on the results and compared to the calibration method of Hennessey [4]. The experiment was designed to provide a comprehensive comparison of: System robustness to user eye parameters variation The old system was tested with different users in different ages, genders, and races. Then the accuracy of the old system was evaluated.

New calibration method effect on accuracy of the estimated POG The difference between the average accuracy of the evaluated POG in the new calibration system and the old calibration system of Hennessey [4] was investigated.

75 5.3. Results

Calibration Point X(cm) Y (cm) 1 1.98 0 2 20.87 1.96 3 37.7 1.96 4 1.98 12.8 5 16.86 12.8 6 35.7 16.8 7 0 27.6 8 16.86 27.6 9 35.7 29.9

Table 5.1: During the calibration phase, a grid of 3x3 points was shown to each subject. The above table shows these calibration points in the world coordinate system in centimeters.

The experiment for each user was done in two steps. The first step was the calibration phase. The calibration phase was divided into two steps. The first step was a familiarization step. The subject was asked to sit in front of the camera and look across the screen at different locations. The purpose of this step is to allow the user to feel comfortable with the system and tracking mechanism. The calibration step was the second step, in which the subject was asked to look at a set of 9 points that were displayed on the screen consecutively. The subject was asked to fixate his/her gaze on the point and the press space bar to move to the next point. The calibration target points on the screen are shown in table 5.1; these points were converted into centimeters. After completion of the calibration step, a trial step was set up. A new set of target points at different orientation and location compared to the calibration points was displayed. The user was asked to press the space bar as his/her gaze was fixed on the target point. A grid of 3x3 was displayed for the user. Table 5.2 shows these points in centimeters.

5.3 Results

During the calibration phase, the user’s eye parameters were estimated by an iteration process. The purpose of this process was to find the lowest possible error for the best fit eye parameters. It was observed that the error converges to a single value during the iteration process. Figure 5.3 shows the estimated point of gaze error converges to a single

76 5.3. Results

Test Point X(cm) Y(cm) 1 0 0 2 18.8 0 3 37.7 0 4 0 14.8 5 18.8 14.8 6 37.7 14.8 7 0 29.6 8 18.8 29.6 9 37.7 29.6

Table 5.2: During the trial phase, a different grid of 3x3 points was shown to each subject. The above table shows these points in centimeters.

Figure 5.3: Estimated average gaze error converges to a single value during the calibration phase, (using new data of 8 subjects). value, during the iteration process in calibration phase.

Figure 5.4 demonstrates that during the iteration process the angular offsets between the visual and optical axes of the user’s eye also converge to a single value.

Table 5.3 displays the estimated accuracy error in both methods; the error is reported in centimeters. The average calculated error in the new method and the old method for each test subject is summarized in table 5.4. The accuracy was measured as

77 5.3. Results

Figure 5.4: Estimated angular offsets between the visual and optical axes (α and β) converge during the iteration process, (using new data of 8 subjects). the Euclidean distance between the estimated point of gaze and the actual test point. The average error between 8 users for using a single eye was found to be 1.1cm for the new method and 1.5cm for the old method; this means the new system proposed in this thesis improves the accuracy of the gaze estimation by about 27% compared to the old system. As it may be noted, the estimated average error for the old data set col- lected by Hennessey [3] shows better results compared to the newly collected data set, the reason for this discrepancy is that the system calibration was modified from the original setting, due to the camera calibration problem. Light sources were also displaced slightly from the original setting. Since the main purpose of this experiment was to demonstrate that the proposed user calibration method in this thesis improves the system accuracy compare to the old method, the system was not calibrated again after the system pa- rameter change. Figure 5.5 shows estimated average error at each test point across the screen. As shown, in Figure 5.5 improvement on the system accuracy and estimated error provided by the new calibration method proposed in this thesis is more pronounced in the corners of the screen compared to the mid- dle of the screen.

78 5.3. Results

Estimated Average Gaze Error User New System(cm) Old System(cm) user 1 0.710 0.773 user 2 0.530 0.593 user 3 0.672 0.952 user 4 0.719 1.022 user 5 0.571 0.671 user 6 0.658 0.724 user 7 0.523 0.672 Average 0.626 0.765

Table 5.3: The above table summarizes an average error on POG estima- tions. These data were collected by Hennessey [3]. In total, 7 test subjects were tested on the system. An average error for each subject by using new proposed calibration method and the old calibration method (i.e. Hen- nessey’s method [4]) was estimated.

Estimated Average Gaze Error User New System(cm) Old System(cm) user 1 1.375 1.383 user 2 1.516 1.672 user 3 1.149 1.593 user 4 0.982 1.070 user 5 0.765 1.430 user 6 1.486 1.962 user 7 0.971 1.701 user 8 0.870 1.481 Average 1.130 1.536

Table 5.4: A new set of data was collected during this experiment. In total, 8 test subjects were tested on the system. An average gaze error for each subject by using the new proposed calibration method and old calibration method was estimated.

79 5.3. Results

Figure 5.5: Estimated average error across the screen at each test point.

In order to calculate the average error at each test point, for each user the square root of the error in x and y direction was estimated and the average error for 8 subjects was calculated (equation 5.4).

q PuserN 2 2 i=user ∆Xi + ∆Yi T otalError = 1 (5.4) ptj N

In the above equation T otalErrorptj is the estimated error at test point j, ∆X and ∆Y are estimated errors in x and y directions, and N is the

80 5.3. Results

Figure 5.6: Comparison of the gaze estimation error over all test points by applying the new system and the old system to the new data set. The new system is based on using an aspherical model of the cornea with personalized eye parameters. The old system was based on using a spherical model of the cornea with population averages for eye parameters. number of subjects used in the experiment. In order to illustrate the eye gaze tracker accuracy improvement through the new calibration method proposed in this thesis, a plot of the average error for each user is drawn. Figure 5.6 shows the average estimated error be- tween users 1-8 by using the old calibration method and the new calibration method.

5.3.1 Radius of the Corneal Curvature for Different Users Hennessey’s method uses population averages for two parameters of the eye: the radius of corneal curvature (rc) and the distance between the corneal center of curvature and pupil center (rp). In this thesis, variation in the radius of the corneal curvature between different individuals was investi- gated. Some of the current gaze-tracking systems use a population average of 7.98cm for the corneal curvature radius based on the assumption that the cornea is a sphere. Referring to the estimated results between two corneal

81 5.3. Results

Cornea Curvature Radii(cm) Test user 1 user 2 user 3 user 4 user 5 user 6 user 7 user 8 point 1 0.905 0.76 0.882 0.65 0.825 0.801 0.645 0.67 2 0.805 0.725 0.817 0.64 0.765 0.725 0.69 0.67 3 0.875 0.73 0.892 0.64 0.8 0.89 0.675 0.74 4 0.77 0.735 0.651 0.68 0.82 0.85 0.655 0.65 5 0.805 0.715 0.797 0.64 0.75 0.775 0.67 0.61 6 0.885 0.75 0.856 0.64 0.75 0.79 0.685 0.75 7 0.67 0.78 0.86 0.64 0.765 0.835 0.64 0.55 8 0.905 0.64 0.657 0.89 0.69 0.64 1.04 0.65 9 0.75 0.65 0.961 0.66 0.64 0.66 0.655 0.77

Table 5.5: Radius of the corneal variation between different test subjects. The radius of corneal curvature is estimated at each calibration point for each test subject, (the new data set). models (spherical and aspherical), by comparing average estimated error for new calibration method and Hennessey’s method, one can say that using an aspherical corneal model based on personalized eye parameters improves the accuracy in the user’s point of gaze by about 27% on the new data. The statistical analysis and comparison of these two methods in section 5.4 will validate this improvement. In table 5.5, the radius of corneal curvature variation between different individuals is shown (final numerical values re- ported by the search algorithm at each calibration point). One can conclude that using population averages for the radius of corneal curvature and eye parameters is not necessarily a valid assumption. Figure 5.7 shows calculated average, maximum, and minimum values of the radius of corneal curvature for each test subject.

Estimation of Distance Between the Pupil and Center of Corneal Curvature The second eye parameter estimated through calibration phase is a distance between the center of the pupil and the center of corneal curvature rp. Some of the current gaze-tracking systems use a population average of 4.2cm for rp. Table 5.6 shows estimated rp for each test subject. Figure 5.8 illustrates the estimated distance between the pupil center and the corneal center of curvature.

82 5.3. Results

Figure 5.7: The average, maximum, and minimum values of the radius of corneal curvature is estimated for each user as shown (the new data set).

83 5.4. Statistical Analysis

rp(cm) user 1 0.395 user 2 0.415 user 3 0.405 user 4 0.305 user 5 0.435 user 6 0.300 user 7 0.325 user 8 0.350

Table 5.6: Estimated values for rp (i.e. distance between the center of the corneal curvature and pupil center) for each test subject.

5.4 Statistical Analysis

One of the applied statistical analysis methods used in this thesis is the T- test. The T-test was used to investigate the significance of the improvement in the accuracy of the gaze tracker by using the new calibration method. The T-test is the most common statistical data analysis procedure for hy- pothesis testing. There are several types of T-tests; however, in this thesis, the two sample T-test or the independent T-test was used. The research hypothesis is that using personalized eye parameters based on an aspherical cornea model and using the new calibration method that was introduced in this thesis improves the accuracy ( reduces the error) of the eye-gaze track- ing system compare to the old calibration method used by Hennessey et al. [3]. The p-value reported with the T-test represents the probability of the error involved in accepting the null hypothesis that there is no significant dif- ference between the two estimated error results obtained by using the two different calibration methods. Here we report standard probability, which is the two-tailed test probability. By convention if there is less than a 5% chance (Pvalue < 0.05) of no significant difference between the new calibra- tion method and old calibration method, the null hypothesis is rejected; it can then be said that there is a significant difference between the two meth- ods of the calibration. The T-test analysis was initially performed on the data collected by Hen- nessey [3]. Table 5.7 illustrates the data distribution for each test group.

84 5.4. Statistical Analysis

Figure 5.8: The estimated distance between the pupil center and the corneal center of curvature (Rp) for each user.

The average error in POG estimation is calculated for the new method and Hennessey’s method [4] for each test subject. The T-test result on the data collected by Hennessey is presented in Table 5.8. The estimated average error for each test subject’s point of gaze is calculated for the new calibra- tion method discussed in this paper and the old calibration method used by Hennessey et al. [3]. Table 5.9 illustrates the data distribution for each test group used in T-test. The data is collected from 8 individuals. Each datum is the calculated average error for each user’s point of gaze estimation. One of the data sets is formed by using our proposed new calibration method and the other data set is obtained by using the old calibration method ([3]). Figure 5.9 graphically depicts the estimated average error, maximum and minimum values for the new calibration and the old calibration [3] methods in box-and-whisker plots. Figure 5.10 graphically display the estimated av- erage error in new calibration and old calibration method for data collected by Hennessey [3]. Table 5.10 illustrates the statistical analysis and results obtained using T-test on the data collected from 8 different users. The estimated average error for each user’s point of the gaze is calculated for the new calibration method discussed in this paper and the old calibration method used by

85 5.4. Statistical Analysis

Hennessey’s Experimental Data New Calibration Method Old Calibration Method Mean of Gaze Error 0.6261 0.7725 Standard Deviation of Gaze Error 0.085 0.158 Variance of Gaze Error 0.007 0.025 Number of Observations 7 7

Table 5.7: Calculated mean, standard deviation, and variance for two data groups collected by Hennessey [3]. The data was calibrated using the old calibration method (group 1) and the new calibration method (group 2). The two groups were compared using the T-test statistical analysis

Pvalue and Statistical Significance

The Two-Tailed Pvalue 0.0098

Confidence Interval

Difference between Mean Values 0.1464

95% Confidence Interval of Mean Difference −0.0502 to 0.243

Intermediate values used in calculations

Tvalue 3.72 Degrees of Freedom 6 Standard Error of Difference 0.039

Table 5.8: T-test statistical analysis for data collected by Hennessey [3].

86 5.4. Statistical Analysis

New Experimental Data New Calibration Method Old Calibration Method Mean of Gaze Error 1.139 1.537 Standard Deviation of Gaze Error 0.289 0.263 Variance of Gaze Error 0.0834 0.0694 Number of Observations 8 8

Table 5.9: Calculated mean, standard deviation, and variance for the newly recorded data groups used in the T-test statistical analysis

Figure 5.9: Comparison of the estimated error in point of the gaze estimation between the new method and old method (i.e. Hennessey’s method[3])for the new set of collected data.

87 5.4. Statistical Analysis

Figure 5.10: Comparison of the estimated error in point of the gaze esti- mation between the new method and old method (i.e. Hennessey’s method [3]) for the old set of collected data.

88 5.4. Statistical Analysis

Pvalue and Statistical Significance

The Two-Tailed Pvalue 0.0049

Confidence Interval

Difference between Mean Values 0.3975575

95% Confidence Interval of Mean Difference −0.6300395 to −0.1650755

Intermediate values used in calculations

Tvalue -4.0436 Degree of Freedom 7 Standard Error of Difference 0.098

Table 5.10: T-test statistical analysis for the new collected data set

Hennessey et al. [3]. As mentioned previously, in the T-test, the Pvalue is used to indicate if the null-hypothesis is valid or rejected. The null hypothesis is invalid if the Pvalue is less than 0.05. The results obtained from the statistical analysis is the calculated Pvalue = 0.0049. By conventional criteria, this difference is considered to be very statistically significant. As a result, the research hypothesis is accepted. We have proven that using the new calibration method in this thesis, which is based on the aspherical corneal model and personalized eye parameters, is statistically significant in improving the accuracy of the user’s point of the gaze estimation compared to the old calibration method that was based on the spherical cornea model and population averages for the eye’s parameters.

89 Chapter 6

Discussion and Conclusions

6.1 Summary of Contribution

Improvement in the accuracy of the estimated point of gaze using the new calibration methodology developed in this thesis was successfully achieved. The proposed new calibration method in this thesis is based on using an aspherical corneal model rather than a spherical cornea model and using personalized eye parameters rather than population averages for each user in POG estimation. In chapter 2 through chapter 3, the remote gaze-tracking system for 3D point of gaze estimation was described in detail. Villanueva et al. [16] pointed out that regardless of how many cameras or light sources are used in the gaze tracking systems, the calibration step is necessary for system accuracy. A detailed mathematical model for a video-based gaze estimation system that uses dark and bright pupil images and multiple cornea reflections ex- tracted from images captured by the video camera is discussed. The al- gorithm for the new calibration method is based on an aspherical corneal model and personalized eye parameters. In chapter 3, the system calibration is discussed in detail. The calibration phase is mainly used for estimating user’s eye parameters. The proposed calibration method estimates user eye parameters by minimizing the least square gaze error. The advantage of using this calibration method based on least-square error, is its simplicity, fast acquisition time, avoidance of complex calculations, and modeling. Chapter 4 describes how the system was tested on different users. The results were compared between the new calibration method and old calibration method [3] based on the spherical cornea shape.

6.2 Discussion

Model-based systems are designed to function remotely without having the subject physical contact the system; in this way, the user is able to move

90 6.2. Discussion his/her head freely. The main objectives in model-based remote gaze track- ing systems are free head movement and system accuracy. The analysis of the full range of system configurations shows that single camera and mul- tiple light sources is the simplest system configuration that can be used to estimate point of the gaze in 3D space. [61]. The system presented in this research project uses only the pupil, corneal reflections and one-time per- user calibration process to estimate the point of the gaze. Bohme et al. [17] established that system accuracy drops as the point of the gaze moves toward the corners of the screen. The reason for increase in system inaccuracy as the point of gaze shifts toward the corner of the screen is the glints on the cornea surface move toward the boundary between the sclera and the cornea; the cornea is flattened as one moves from the apex of the cornea to the corner. Therefore, the variation in cornea curvature is more pronounced. Using population averages for the curvature of the cornea amplifies the error in gaze estimation. During the calibration phase, the user is asked to fixate his/her gaze on known targets displayed consecutively on the screen. Using data acquired from the calibration, the user’s eye parameters (e.g., radii of the corneal cur- vature, distance between the pupil center, the center of corneal curvature and the angular offsets between the visual and optical axes) are estimated. The visual axis of the eye and the point of the gaze can be estimated after completing the one-time per-user calibration phase. The advantage of the proposed calibration method in this thesis is simplicity, fast acquisition, and accuracy. One other approach that was investigated in this thesis to estimate and calculate the user’s eye parameters was also discussed in chapter 3. Dif- ferent methods are used to study and estimate the corneal geometry and eye parameters. Corneal topography is the most common method used in measuring corneal geometry. In this system, a set of concentric illuminating rings is projected into the eyes in close distance; the distortion of the rings is used to analyze and estimate corneal surface and curvature [64]. The asphericity and curvature of the cornea is also measured using a videoker- atoscope [40]. The details of keratoscopy are discussed in the Appendix A. A Placido disk is another method used to study the geometry of the cornea. The deviation in the orientation of the reflected ring from the projected ring into the surface of the cornea is used to estimate and calculate the eye’s parameters. A stereo camera with a Placido ring was investigated to track and measure eye parameters. This method was not successful due to the limitations that were discussed in section 4.3.2. Chapter 4 compares the new calibration methodology to the old calibration

91 6.3. Conclusions and Future Work method. An experiment was performed on the model-base gaze tracking us- ing the new and old calibration methods for different subjects. The average of estimated error for both calibration systems are shown in table 5.3. Com- parison of our calibration system with the old calibration system developed by Hennessey et al. [3] reveals improved accuracy with our system. By us- ing only one eye to estimate the point of gaze, the proposed new calibration method improved the system accuracy by 35% compared to the old system. The estimated average error for old calibration method was reported about 1.4cm; for the new system, it was reported to be 0.87cm. We also compared our calibration system with the model-based method de- veloped by Shih et al. [36]. They reported the system accuracy using a stereo camera and spherical cornea model with personalized corneal curva- ture of 1o; the estimated error using our new calibration process was lower (0.8o − 0.9o). The other advantages of the proposed system in this the- sis compared to Shih’s system is our use of a single camera, simplicity in calculations, a more robust system to head movements, and recognition of variation between different user’s eye characteristics. The T-test was performed to investigate if there is a statistically signifi- cant difference between the new calibration method and the old calibra- tion method. For 8 different subjects, the calculated average errors in each method were compared in a T-test. The value obtained was P = 0.0049 for the new collected data and p = 0.0098 for the data collected by Hennessey [3]. It was shown that there is a statistically significant difference in the point of gaze error between the two calibration methods.

6.3 Conclusions and Future Work

An eye tracker is an input device that is used as a means of human-computer interaction. Remote eye gaze tracking requires the ability to handle vari- ation between different subjects’ eye characteristics while giving the users flexibility to move their heads. One of the main challenges that eye gaze trackers are still facing is system calibration and accuracy. A new calibration method was successfully introduced in this thesis that allowed a fit to an individual’s eye parameters within a reasonable range of anatomical variation. This method delivers significant improvements with an eye tracker that permits user head movements. The system is also able to accurately detect variations between different users’ eye characteristics. One of the advantages of the new calibration method is that it can be easily integrated into an existing eye-gaze tracking systems.

92 6.3. Conclusions and Future Work

Compared to previous calibration methods, the new calibration method has an improvement of 27%. Bohme et al. [17] showed in their simulator that the accuracy of the system decreases as the user’s gaze moves toward the corners of the screen. One reason for this effect is the cornea flattens as one moves from the apex of the cornea to the corner, therefore, the variation in corneal curvature is more pronounced at the corner of the screen. As shown, the new calibration method is more accurate in the point of gaze estimation at the corner of the screen compare to the Hennessey’s method [3]. This improvement in the point of gaze error was found to be statistically significant over the old method [3].

6.3.1 Future Work Using an aspherical corneal model with personalized eye parameters could further be developed by investing in more sophisticated personal calibration. Corneal geometry and topology can be estimated by using more calibration targets on the screen and fixating the user’s head during the calibration phase. In this way, the system would be able to better determine the actual position and orientation of the user’s eye during the calibration and track- ing phases. Furthermore, by using more calibration points, a geometrical modeling of the corneal surface could be derived. The overall system accuracy may be improved through developing a more advanced eye parameter estimation algorithm that more directly compen- sates for possible sources of error (e.g., partial eye-lid closure, blinks, or large head movements during the calibration phase). As discussed in previous chapters, Gullstrand’s eye model [18], uses pop- ulation averages for eye’s parameters. Three of the eye parameters in the simplified eye model for gaze-tracking systems are used to estimate user’s POG are, the radius of corneal curvature (rc), the distance between the pupil center and corneal center of curvature (rp), and the index of refraction of the aqueous humor (nc) of the eye. In this thesis the effects of variations of rc and rp between test subjects, in gaze tracking accuracy were investigated. Furthermore, the effect of variation of nc from the population average be- tween different individuals could be studied and a method of personalizing nc during the calibration phase could be investigated. In this thesis, in order to estimate the radius of corneal curvature, two clos- est glints to the pupil were selected based on the assumption that the center of corneal curvature for each glint are the same (equation 3.8). The validity of this assumption was not tested in this work. For the future work one way to solve this issue is by using two cameras and a single light source to

93 6.3. Conclusions and Future Work

estimate radius of the corneal curvature (rc) for that glint. Finally the proposed calibration method in this thesis uses only one eye at a time and applies personalized eye parameters to calculate the point of the gaze (POG). The system accuracy in POG estimation can be improved by using two eyes and averaging the estimated POG for each eye to calculate a user’s point of gaze.

94 References

[1] Carlos Hitoshi Morimoto and Marcio R. M. Mimica. Eye gaze tracking techniques for interactive applications. Computer Vision and Image Understanding, 98:4–24, 2005.

[2] Roger W. West David A. Goss. Introduction to the Optics of the Eye. Butterworth-Heinemann, 2001.

[3] C. Hennessey, B. Noureddin, and P. Lawrence. Fixation precision in high-speed noncontact eye-gaze tracking. Systems, Man, and Cybernet- ics, Part B: Cybernetics, IEEE Transactions on, 38(2):289 –298, april 2008.

[4] Craig Hennessey, Borna Noureddin, and Peter Lawrence. A single cam- era eye-gaze tracking system with free head motion. pages 87–94, 2006.

[5] Thomas Haslwanter(c) Michael Platz(a), James K. Y. Ong(b). Opti- mizing video-oculography systems by simulating the effect of slippage artifacts. Proceedings of the 21st European Modeling and Simulation Symposium EMSS 2009, 2009.

[6] Sami G El Hage and Norman E Leach. Tangential or sagittal dioptric plots: is there a difference? International Contact Lens Clinic, 26(2): 39 – 45, 1999.

[7] Takashi Nagamatsu, Yukina Iwamoto, Junzo Kamahara, Naoki Tanaka, and Michiya Yamamoto. Gaze estimation method based on an aspher- ical model of the cornea. 1(212):255, 2010.

[8] Andrew Duchowski. Eye tracking methodology: theory and practice. Springer London, 09/2007.

[9] Colin Ware and Harutune H. Mikaelian. An evaluation of an eye tracker as a device for computer input2. pages 183–188, 1987.

95 References

[10] M. Eizenman R. S. Allison and B. S. K. Cheung. Combined head and eye tracking system for dynamic testing of the vestibular system. IEEE Transactions on Biomedical Engineering, 443:1073–1082, nov 1996.

[11] Paola Campadelli Elena Casiraghi Raffaella Lanzarotti Giuseppe Lipori Gabriele Lombardi Stefano Arca, Giuseppe Boccignone and Alessandro Rozza. From patterns to behaviors: Research activities at laiv. LAIV - Dipartimento di Scienze dellInformazione.

[12] C. Poprik P. A. Wetzel, G. Krueger-Anderson and P. Bascom. An eye tracking system for analysis of pilots scan paths. United States Air Force Armstrong Laboratory, Tech. Rep., Apr. 1997.

[13] Narges Afsham, Mohammad Najafi, Purang Abolmaesumi, and Robert Rohling. 3d ocular ultrasound using gaze tracking on the contralateral eye: A feasibility study. In Gabor Fichtinger, Anne Martel, and Terry Peters, editors, Medical Image Computing and Computer-Assisted In- tervention MICCAI 2011, volume 6891 of Lecture Notes in Computer Science, pages 65–72. Springer Berlin / Heidelberg, 2011.

[14] G. Loshe. Consumer eye movement patterns of yellow pages advertising. Journal of Advertising, 26:61–73, 1997.

[15] D.W. Hansen and Qiang Ji. In the eye of the beholder: A survey of models for eyes and gaze. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 32(3):478 –500, march 2010.

[16] A. Villanueva and R. Cabeza. A novel gaze estimation system with one calibration point. Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on, 38(4):1123 –1138, aug. 2008.

[17] M.Graw M.Bohme, M.Dorr. A software for simulating eye trackers. 2008.

[18] Allfred Gullstrand. Procedure of the rays in the eye imaginary-laws of first order gullstrand. In Gelmholtz HV,Southall JPC. Helmholtz Trea- tise on Physiological Optics, volume 1. The optical Society of America, 1924. 301-58 pp.

[19] James P. Ivins and John Porrill. A deformable model of the human iris for measuring small three-dimensional eye movements. Machine Vision and Applications, 11:42–51, 1998. 10.1007/s001380050089.

96 References

[20] Dan Witzner Hansen and Arthur E. C. Pece. Eye tracking in the wild. Comput. Vis. Image Underst., 98:155–181, April 2005.

[21] M. Ohtani A.Sugioka, Y. Ebisawa. Noncontact video-based eye- gaze detection method allowing large head displacements. Engineer- ing in Medicine and Biology Society, 1996. Bridging Disciplines for Biomedicine. Proceedings of the 18th Annual International Conference of the IEEE, 2:26 – 528, 1996.

[22] W. N. Martin K. C. Reichert L. A. Frey T. E. Hutchinson, K. P. White. Human-computer interaction using eye-gaze input. IEEE Transactions on Systems, Man and Cybernetics, 19, no. 6:1527–1534, 1989.

[23] A. T. Duchowski. A breadth-first survey of eye-tracking applications. Behav Res Methods Instrum Comput, 34(4):455–470, November 2002.

[24] L.A. Frey, Jr. White, K.P., and T.E. Hutchison. Eye-gaze word pro- cessing. Systems, Man and Cybernetics, IEEE Transactions on, 20(4): 944 –950, jul/aug 1990.

[25] Robert J. K. Jacob. Eye movement-based human-computer interaction techniques: Toward non-command interfaces. pages 151–190, 1993.

[26] Muneo; Kobayashi Yukio Tomono, Akira; Iida. A tv camera system which extracts feature points for noncontact eye-movements detection. In proceedings of the SPIE Optics, Illumination, and Image Sensing for Machine Vision IV, 1194:2–12, 1989.

[27] Laurence Young and David Sheena. Survey of eye movement recording methods. Behavior Research Methods, 7:397–429, 1975. 10.3758/BF03201553.

[28] David A. Robinson. A method of measuring eye movemnent using a scieral search coil in a magnetic field. Bio-medical Electronics, IEEE Transactions on, 10:137 – 145, Oct. 1963.

[29] L.H. Yu and M. Eizenman. A new methodology for determining point- of-gaze in head-mounted eye tracking systems. Biomedical Engineering, IEEE Transactions on, 51(10):1765 –1773, oct. 2004.

[30] Dave M Stampe. Heuristic filtering and reliable calibration methods for video-based pupil-tracking systems. Time, 25(2):137–142, 1993.

97 References

[31] Z.R. Cherif, A. Nait-Ali, J.F. Motsch, and M.O. Krebs. An adaptive calibration of an infrared light device used for gaze tracking. 2:1029 – 1033 vol.2, 2002.

[32] Robert J. K. Jacob. The use of eye movements in human-computer interaction techniques: What you look at is what you get. ACM Trans- actions on Information Systems, 9:152–169, 1991.

[33] C. Hennessey and P. Lawrence. Noncontact binocular eye-gaze tracking for point-of-gaze estimation in three dimensions. Biomedical Engineer- ing, IEEE Transactions on, 56(3):790 –799, march 2009.

[34] David Beymer and Myron Flickner. Eye gaze tracking using an active stereo head. Computer Vision and Pattern Recognition, IEEE Com- puter Society Conference on, 2:451, 2003.

[35] Takehiko Ohno, Naoki Mukawa, and Atsushi Yoshikawa. Freegaze: a gaze tracking system for everyday gaze interaction. pages 125–132, 2002.

[36] Sheng wen Shih and Jin Liu. A novel approach to 3-d gaze tracking using stereo cameras. IEEE Transactions on Systems, Man, and Cy- bernetics, 34:234–245, 2004.

[37] YVES LE GRAND. Light, Colour and Vision. Chapman and Hall, London, second english edition edition, 1968.

[38] Alfred L.Yarbbus. Eye Movements and Vision. Plenum Press - NewYork, 1967.

[39] William J. Benjamin and William M. Rosenblum. Radii of curvature and sagittal depths of conic sections. International Contact Lens Clinic, 19(3-4):76 – 83, 1992.

[40] William A. Douthwaite. The asphericity, curvature and tilt of the hu- man cornea measured using a videokeratoscope. Ophthalmic Physiol Op, 23:141150, 2003.

[41] Y Yobani Meja-Barbosa and D Malacara-Hernndez. A review of meth- ods for measuring corneal topography. and vision science official publication of the American Academy of Optometry, 78(4):240– 253, 2001.

98 References

[42] Leo G. Carney Patricia M. Kiely, George Smith. Meridional variations of corneal shape. Am J Optom Physiol Opt., 61:619–626, 1984.

[43] Marco A. Rosales, Montserrat Ju´arez-Aubry, Estela L´opez-Olazagasti, Jorge Ibarra, and Eduardo Tepich´ın.Anterior corneal profile with vari- able asphericity. Appl. Opt., 48(35):6594–6599, Dec 2009.

[44] D.H. Szczesna and H.T. Kasprzak. The modelling of the influence of a corneal geometry on the pupil image of the human eye. Optik - International Journal for Light and Electron Optics, 117(7):341 – 347, 2006.

[45] Malek Adjouadi, Anaelis Sesin, Melvin Ayala, and Mercedes Cabrerizo. Remote eye gaze tracking system as a computer interface for persons with severe motor disability. 3118:628–628, 2004.

[46] Yong-Moo Kwon and Kyeong-Won Jeon. Gaze computer interaction on stereo display. 2006.

[47] Henry Burek and W. A. Douthwaite. Mathematical models of the gen- eral corneal surface. Ophthalmic and Physiological Optics, 13(1):68–72, 1993.

[48] Jean Ponce David A.Forsyth. Computer Vision, A Modern Approach. Prentice Hall, 2003.

[49] Jin Liu Sheng-Wen Shih, Yu-Te Wu. A calibration-free gaze tracking technique. IEEE , Pattern Recognition, 4:201–204, 2000.

[50] M.Eizenman E.Guestrin. General theory of remote gaze estimation using pupil center and corneal reflections. IEEE, 2006.

[51] Dong Hyun Yoo and Myung Jin Chung. A novel non-intrusive eye gaze estimation using cross-ratio under large head motion. Comput. Vis. Image Underst., 98:25–51, April 2005.

[52] B. Noureddin, P. D. Lawrence, and C. F. Man. A non-contact device for tracking gaze in a human computer interface. Comput. Vis. Image Underst., 98:52–82, April 2005.

[53] Carlos Hitoshi Morimoto, David Koons, Arnon Amir, and Myron Flick- ner. Pupil detection and tracking using multiple light sources. Image and Vision Computing, 18:331–335, 2000.

99 [54] C.A. Hennessey and P.D. Lawrence. Improving the accuracy and reli- ability of remote system-calibration-free eye-gaze tracking. Biomedical Engineering, IEEE Transactions on, 56(7):1891 –1900, july 2009.

[55] M.Eizenman D.Model. User-calibration-free remote gaze estimation system. 2010.

[56] R. S. Longhurst. Geometrical and physical optics. 1973.

[57] http://en.wikipedia.org/wiki/Sclera. The sclera.

[58] T. Nagamatsu, J. Kamahara, and N. Tanaka. 3d gaze tracking with easy calibration using stereo cameras for robot and human communication. pages 59 –64, aug. 2008.

[59] Roger H. S. Carpenter. Movements of the Eyes. Pion Ltd; 2 Sub edition, 1988.

[60] Robert J.K. Jacob. Virtual environments and advanced interface design ch. Eye tracking in advance interface design. Oxford University Press, NEWYORK , NY, USA, 1995. 258-304 pp.

[61] J.Kang E.Eizenman E.Guestrin, M.Eizenman. Analysis of subject- dependent point-of-gaze estimation bias in the cross-ratios method. ACM, pages 237–244, 2008.

[62] Thor Eysteinsson, Fridbert Jonasson, Hiroshi Sasaki, Arsaell Arnars- son, Thordur Sverrisson, Kazuyuki Sasaki, Einar Stefnsson, and Reyk- javik Eye Study Group. Central corneal thickness, radius of the corneal curvature and intraocular pressure in normal subjects using non-contact techniques: Reykjavik eye study. Acta Ophthalmologica Scandinavica, 80(1):11–15, 2002.

[63] Alastair G. and Gale. A note on the remote oculometer technique for recording eye movements. Vision Research, 22(1):201 – 202, 1982.

[64] Stephen D. Klyce. Computer assisted corneal topography high resolu- tion graphis presentation and analysis of keratoscopy. 1983.

[65] Mark A Halstead, Brain A Barsky, Stanley A Klein, and Robert B Man- dell. Reconstructing curved surfaces from specular reflection patterns using spline surface fitting of normals. Proceedings of the 23rd an- nual conference on Computer graphics and interactive techniques SIG- GRAPH 96, 30(Annual Conference Series):335–342, 1996.

100 Appendix A

Appendix

A.1 Purkinje Images

When a light hits a refracting surface, no matter how transparent or smooth the surface is a portion of the light reflects back from the surface. The im- age of the light source reflected from the refracting surfaces of the eye are known as Purkinje images [2]. Purkinje images are numbered I,II,III,IV , depends on the order of the surface from the front of the eye to the rear surface of the eye that reflection happens. Most of gaze trackers uses Purk- inje image I as a reference point. Purkinje image I is formed by reflection from the anterior surface of the cornea, while Purkinje image II is formed by reflection from the posterior surface of the cornea. Purkinje images III and IV is formed by anterior and posterior surface of the crystalline lens respectively as are shown in Figure A.1[2]. The location of the Purkinje images for Gullstrand schematic of the eye are given in table A.1.

A.2 Optics of a Keratometer

Keratometry is based on the principal that the cornea acts as a convex mir- ror when an object is shone into the surface of the cornea, and the reflected image is a function of the radius of curvature of the cornea (the surface which it is reflected)[2]. This relationship can be determined by finding the

Purkinje Image Gullstrand Location(mm) I 3.84 II 3.75 III 10.54 IV 4.16

Table A.1: Location of the Purkinje images for Gullstrand schematic of the eye

101 A.2. Optics of a Keratometer

Figure A.1: Light Rays striking the surface of the eye produce four reflec- tions, called Purkinje images. magnification m, which is define by ration of the image size to the object size h’/h). Also using Newton’s equation that magnification of a reflected image is equal to the focal length of the reflecting surface divided by distance form the object to the focal point.

h0 f m = = (A.1) h x Figure A.2 displays these distances. It should be noted in Keratometry the distance between the object and the anterior surface of the cornea is quit long relative to the focal length of the anterior corneal surface, therefore the virtual image formed by reflection from the anterior surface of the cornea is very close to the focal point F of the corneal surface. As a result d, the distance between the object and formed image by reflection from the cornea surface is very close to x. Substituting d for x we get approximately:

h0 f ≈ (A.2) h d Using the fact from the basic optics and mirrors that, the focal length of a mirror is equal to the radius of curvature of the mirror divided by two, the approximation equation is modified to:

102 A.2. Optics of a Keratometer

h h’ F C

f

r

d

x

Figure A.2: Schematic of image formation principal used in Keratometry

103 A.3. Keratoscopy

h0 r ≈ 2 (A.3) h d therefor, h0 r = (2d) = 2dm (A.4) h This equation is know as approximate Keratometer equation. If the operator keep the image reflected from patient’s cornea in focus the distance d is constant. In most Keratometers available today object size h is constant and radius of curvature r can be calculated by measuring image size h’.

A.3 Keratoscopy

Keratoscopy is a method that is mostly used to access the curvature and topography of the anterior surface of the cornea. As it was mentioned in previous section, Keratometry only measures the radius of cornea curva- ture assuming the cornea is a sphere; however, Keratoscopy can evaluates almost the entire cornea as well as its asphericity. The clinical use of Ker- atoscopy include the fitting of contact lenses as well as monitoring anterior surface of cornea changes due to injury or anterior segment surgery. In Ker- atoscopy a pattern of alternative black and white concentric circles, which is often referred to as Placido disks, named after nineteenth-centery Por- tuguese Oculist, is shone onto the anterior surface of the cornea. A viewing system observes and records the reflected image from cornea surface. The viewing system can be as simple as a hole in Placido disk through which the examiner views the reflected image. This system only provides qualitative assessment of corneal topography. Corneal astigmatism can be observed through distortion of the reflected image on the surface of the cornea. In this system the examiner can only observe severe corneal distortion or large amount of cornea astigmatism. A more sophisticated system that provides more quantitative assessment of corneal topography is Photo-Keratoscopes. Photo-Keratoscopes uses films or CCD chip to capture image for subsequent measurements. Keratoscopy is based on the principle that the size of the re- flected image depends on the radius of curvature of the reflecting surface. If the circles on Placido disk are used as objects, the farther that a given point on a given reflected image is from the center of the entire reflected image, the greater is the radius of curvature at the position that reflection has occurred.

104 Appendix B

Appendix

B.1 Stereo Camera

As was mentioned before, a Placido disk was used in order to measure corneal curvature by suing stereo camera. Placido disk is a circular or rounded flat plat as it is shown in figure B.1.

In corneal topography a three dimensional map is defined for examining the cornea diagnosis and treatments.Placido disk topography is based on the reflection principle. A Placido disk is projected onto the cornea and the images of the Placido disk reflected off the cornea surface are captured. The cornea surface can be studied by interpreting the reflected patten of the concentric rings from the cornea as a contour map of surface elevation. The reflected mires appear closer together on steeper parts of the cornea. The algorithm proposed by Halstead et al. [65] was used to reconstruct the cornea surface. It starts with an initial model of the cornea surface.The ray tracing is performed from the image to the model and from the model to the rings, following the law of reflection. Figure B.3 also shows the accuracy variation of the stereo camera Bumblebee2 from Point Grey Research that

Figure B.1: Placido disk is used to study the cornea surface. The reflection of the image from the cornea causes the concentric lines to deviate where there is irregularity of the corneal surface.

105 B.1. Stereo Camera

Figure B.2: A stereo camera, with camera field of view θ, The overlap between each camera’s field of views is binocular field of view. was used in this experiment. As it can see from this graph as the distance to the camera increase the accuracy of the system goes higher as well. This fact contributes to the limitation of using cameras to estimate the geometry of the cornea and the eye’s parameters. The stereo Camera used for measuring eye’s feature had the following specifications:

• Bumblebee II stereo camera

• Focal Length 4 mm

• Vertical Angle of View 51.9o

• Horizental Angle of View 68.8o

• HFOV 70o

Above specifications was applied to the binocular system that was used. We encountered issues such as low accuracy in depth detection for close objects, and inability to detect objects with no texture or low texture. The stereo camera does the stereo matching both vertically and horizontally. This means the system will be better able to measure the distance and depth. Since the Bumblebee has only horizontally offset cameras, it is sensitive mainly to vertical features in the scene. Since the baseline for bumblebee is 12cm, this will limit this camera to perform better for further distances

106 B.1. Stereo Camera

Figure B.3: Bumblebee 2 accuracy variation in short distance from the cam- era. As it is shown in the graph in closer range to the camera the accuracy of the system drops sharply.This graph was provided by PointGreyResearch.

107 B.1. Stereo Camera

Figure B.4: Overview of the stereo camera field of view. In order to get an accurate image of the cornea the user need to sit closer to the camera. This will force the user fall outside if the region where to camera’s field of views overlap. rather than closer. The distance between the centers of the pupils in each eye is called Pupillary Distance (PD). The typical pupillary distance for adults is about 54 − 68mm [2]. On the other hand, eyes are small features compared to the background of the image. In addition, considering the distance between the eyes, are close relative to the features located on the background, we should expect occlusion in the image. This means some part of the image may not be visible in both cameras. The stereo algorithm will call these points invalid and will reject them. This could be another reason for poor stereo matching and disparity computation on the eyes feature. The other issue is having very narrow field of view 70o for each camera. Due to the small size of the eyes compare to the other feature in the image, very small change in the cornea curvature in different individuals ( in millimeter or micrometer ranges), we need to have the user close to the stereo cameras. This will force the user to fall outside of the region that the two cameras field of views will overlap as was shown in Figure B.4. The stereo matching algorithm is based on taking a small fixed window in one image (P1) and shifts it across the second image (P2) by integer incre- ments along the epipolar line, and array of correlation score is generated. To validate this array similar a fixed window is taken in image P2 and will shift

108 B.1. Stereo Camera along epipolar line on the image P2. This way the correlation is performed twice by reversing the role of images and will be considered valid if and only if those matches for whom we measure the same depth at the corresponding points. The issue with this algorithm that was seen on the eye’s feature tracking and cornea depth measurement, was because of having eyes close to the camera, some portion of the eyes feature or face could be occluded or will have less pixel for similar features compare to the other camera for the same eye, this region will be considered invalid and will be rejected in the validation test.

109