Charles University in Prague

Faculty of Mathematics and Physics

BACHELOR THESIS

Jozef Jir´asek

Sledov´an´ıoˇcn´ıch pohyb˚u ve virtu´aln´ırealitˇe

Eye Tracking in

Department of Software and Computer Science Education

Supervisor of the bachelor thesis: Mgr. Cyril Brom Ph.D.

Study programme: Informatics

Specialization: Programming

Prague 2011 I would like to thank my advisor Mgr. Cyril Brom, Ph.D. for providing guidance in my work, and for allowing me to utilize an office at the Department of Software and Computer Science Education to develop and test the software. I would also like to thank Ing. Vratislav Fabi´an, Ing. Marcela Fejtov´a, and Ing. Jiˇr´ıMouˇcka from Medicton Group Ltd. for their assistance in solving technical problems with the I4Control device. Further thanks go to Mgr. Iveta Fajnerov´aand Mgr. Kamil Vlˇcek, Ph.D. from the Institute of Physiology of the Academy of Sciences of the Czech Republic for providing me with details about spatial navigation experiments and for help with testing my software during the experiments. Last, I would like to thank Mgr. Cyril Brom, Ph.D. and RNDr. Galina Jir´askov´a, CSc. for proofreading my work. I declare that I carried out this bachelor thesis independently, and only with the cited sources, literature, and other professional sources.

I understand that my work relates to the rights and obligations under the Act No. 121/2000 Coll., the Copyright Act, as amended, in particular the fact that the Charles University in Prague has the right to conclude a license agreement on the use of this work as a school work pursuant to Section 60 paragraph 1 of the Copyright Act.

In Prague date ...... N´azov pr´ace: Sledov´an´ıoˇcn´ıch pohyb˚uve virtu´aln´ı realitˇe

Autor: Jozef Jir´asek

Katedra: Kabinet software a v´yuky informatiky

Ved´uci bakal´arskej pr´ace: Mgr. Cyril Brom Ph.D.

Abstrakt:

V tejto pr´aci prezentujeme aplik´aciu pre sledovanie a ukladanie d´at o pohybe l’udsk´eho oka pri sledovan´ıobrazovky poˇc´ıtaˇca. Pouˇz´ıvame dve komerˇcne dos- tupn´ezariadenia: I4Control od firmy Medicton Group s.r.o. a TrackIR 4 od firmy NaturalPoint. N´aˇssoftware komunikuje so syst´emom SpaNav pouˇz´ıvan´ym na v´yskum priestorovej orient´acie. Tieˇzposkytujeme moˇznost’n´aˇssoftware jednodu- cho rozˇs´ırit’a pouˇzit’pri v´yvoji in´ych aplik´aci´ıktor´esleduj´uoˇcn´epohyby.

Kl’´uˇcov´eslov´a:

Sledovanie pohl’adu, priestorov´anavig´acia, interakcia ˇcloveka s poˇc´ıtaˇcom.

Title: in Virtual Reality

Author: Jozef Jir´asek

Department: Department of Software and Computer Science Education

Supervisor: Mgr. Cyril Brom Ph.D.

Abstract:

In this work we present an application for observing and recording data about movements of a human eye when looking at a computer screen. We use two com- mercially available devices: I4Control by Medicton Group Ltd. and TrackIR 4 by NaturalPoint. We build a software package which interfaces with the SpaNav sys- tem for cognitive research. We also provide an extensible framework for creating other eye tracking applications.

Keywords:

Eye tracking, head tracking, spatial navigation, human-computer interaction. Contents

Introduction 2

1 Eye Tracking Overview 3

2 Our Work 5

3 Technologies Used 7 3.1 SpaNav ...... 7 3.2 EyelinkII ...... 8 3.3 I4Control ...... 9 3.4 TrackIR4 ...... 10

4 Design Considerations 12 4.1 Designgoals...... 12 4.2 Softwarearchitectureoverview ...... 13

5 Implementation 15 5.1 Communication ...... 15 5.2 Tracking...... 15 5.2.1 Headtrackingandthecenterpoint ...... 16 5.2.2 Eyetrackingandthedeltafunction ...... 17 5.3 Calibration ...... 19 5.3.1 Calibratingtheheadtracker ...... 19 5.3.2 Calibratingtheeyetracker ...... 20 5.4 Datarecording ...... 20 5.5 EyeTrackingclients ...... 22 5.5.1 CalibrationWorker ...... 23 5.5.2 TestWorker ...... 24 5.5.3 UTWorker...... 24 5.6 UserInterface ...... 25

6 Testing 26

7 Conclusions 29

Bibliography 31

1 Introduction

Eye tracking is the technology of observing the movements of a human eye. This is usually achieved by recording the eye using video or infrared cameras and analyzing the recorded data. Eye tracking has applications in cognitive stud- ies, assistive technologies, design, advertisement, entertainment, and other areas. Basic information about eye tracking can be found in [Duchowski(2007)]. In this work, we build upon the thesis of Ivana Supalov´a[ˇ Supalov´a(2009)].ˇ In her work, she built a virtual environment SpaNav for running spatial navigation experiments. SpaNav presents a person with a simple virtual reality scenario and observes and records their eye movements as they try to solve this scenario. Virtual reality is often used in such experiments, as a virtual scenario it is usually less expensive to create, maintain, and modify than a real physical environment. Virtual reality has its limits, but using current technology it is usable for the purpose of such an experiment. The SpaNav software works with the device Eyelink II developed by SR Re- search Ltd. [SR Research Ltd.(2010)] This is a professional apparatus which comes with hardware and software performing all stages of the eye tracking pro- cess. Unfortunately, the high cost of this device is prohibitive to small research teams. Therefore, we have been tasked with developing a solution which would perform eye tracking measurements using less expensive commercially available hardware and software. We use an eye tracker I4Control by Medicton [Medicton group(2008)], which can gather eye movement data. Since the subject’s head can move, we also need to collect data about the head’s movement. We use a head tracker TrackIR 4 by NaturalPoint [NaturalPoint(2008)]. Our application combines data from these two devices and uses them to calculate a position on the computer screen where the observed subject is looking at. This information is then sent to the SpaNav system, which is used to perform the actual experiments. In addition to interfacing with SpaNav, we provide an extensible framework for applications which use eye tracking data. Programmers can easily extend our software to develop their own applications and use our eye tracking measurements.

2 1. Eye Tracking Overview

Eye tracking is the process of measuring movements of the human eye. There are two main subtasks involved. First, we need to determine the gaze direction of the subject. This is the vector in three-dimensional space in the direction the subject is looking, relative to the position and orientation of the subject’s head. The second step is computing the intersection of this vector with the observed scene, this is called the gaze point. Methods of determining the gaze direction can be divided into two groups: intrusive and non-intrusive. An example of an intrusive method is attaching special contact lenses to the subject’s eye and tracking movement of these lenses [Huey(1908)]. Another intrusive method called electrooculography uses electrodes placed around the eye to measure eye muscle movement [Bulling et al.(2011)]. Non-intrusive methods use a video camera (typically in the infrared spectrum) to capture movements of the eye. Images from this camera are then analyzed to determine the gaze direction. In general, non-intrusive methods are cheaper and easier to set up and use, but their results are less accurate. Both styles of tracking can be further subdivided into online and offline meth- ods. Offline methods only gather data during the experiment and analyze it later. Online methods analyze data and compute the gaze point while the measurement is being performed. Due to computational complexity, online methods are usually limited to lower frame rates and accuracy than offline methods. However, online methods give valuable feedback to both the experimenter and the subject in real time, such as whether the eye tracking device is working properly. Eye tracking has many applications in different areas of science and indus- try. Psychologists study the way we look at images or scenes when asked to perform various tasks [Yarbus(1967)]. Advertisers and web page designers are interested in knowing which parts of their product attract the most attention [Goldberg et al.(2002)]. Movements of the eye can also be used in assistive tech- nologies to allow people who can not use their hands to control a computer or another device [ICT Results(2009)]. Recently, as eye tracking systems are becom- ing more affordable, eye tracking has also found its way into the entertainment

3 and video gaming industry [Sundstedt(2010)]. In the remainder of this work, we will focus on eye tracking for physiological experiments, in particular experiments concerned with spatial navigation. This has several important implications. First, the measurement has to be sufficiently accurate. In an experiment with SpaNav, we want to know which object the subject is looking at during their decision process. The objects’ size usually varies between 100 and 300 pixels on the computer screen, which translates to about 4 to 10 degrees required accuracy when computing the gaze vector. Another important fact is that we do not receive any immediate feedback about accuracy of the measurement. For instance, in assistive technologies, when the subject controls movement of the mouse cursor using eye movement, we can immediately see when the measurements are inaccurate - the cursor does not point to the area the subject is looking at. In our applications, we have no such feedback during an experiment.

4 2. Our Work

In this work we present a software package EyeTracking which performs eye track- ing using commercially available hardware. This software was developed in col- laboration with Institute of Physiology of the Academy of Sciences of the Czech Republic. One of the research areas of the Institute is spatial navigation. This research can help for instance in prediction of Alzheimer’s disease in test subjects. The experiments are performed in a virtual environment created on a personal computer. These experiments also include measuring eye movement data. The software package used to create the environment and conduct experiments is called SpaNav and has been developed by Ivana Supalov´ainˇ her master’s thesis [Supalov´a(2009)].ˇ The SpaNav software works with the device Eyelink II developed by SR Re- search Ltd. [SR Research Ltd.(2010)] This is a professional apparatus which comes with hardware and software performing all stages of the eye tracking pro- cess. Unfortunately, the high cost of this device is prohibitive to small research teams. Therefore, we have been tasked with developing a solution which would perform eye tracking measurements using less expensive commercially available hardware and software.

Figure 2.1: The Eyelink II device. Image from [SR Research Ltd.(2010)].

5 We use the I4Control system by Medicton Group Ltd. [Medicton group(2008)] to measure eye movements. However, this is not enough to calculate the gaze point. I4Control only returns position of the center of the pupil relative to the eye tracker camera, which is mounted on the subject’s head. If the subject moves their head to face another point, we would not know the difference using only an eye tracker. This means that we need another device to measure head movements. We use TrackIR 4 by NaturalPoint [NaturalPoint(2008)] to measure head position and orientation. Our software collects data from both of these devices, analyses them, and computes the location on the computer screen the subject is looking at. This data is then sent to SpaNav for the purpose of the experiment.

Figure 2.2: A demonstration of the hardware used. Photo by Iveta Fajnerov´afor the purpose of this work.

However, the software is not specifically designed to be used in conjunction with SpaNav. It is intended to provide a general framework for any application which needs to track eye movements. Any programmer can take our eye tracking code and build their own application around it using these measurements. If one so desires, it should not even be difficult to adapt the software to work with a different eye tracker or head tracker, as long as they support similar functions as the devices used in this work. We implement the link to SpaNav and a user-friendly GUI to calibrate and control the eye tracking to showcase the capabilities of this framework.

6 3. Technologies Used

First we describe the hardware and software we use in this work. This allows us to define the problems our software will have to solve.

3.1 SpaNav

SpaNav is a tool for conducting spatial navigation experiments in virtual reali- ty. It has been developed and maintained by Ivana Supalov´a[ˇ Supalov´a(2009)].ˇ Structurally SpaNav consists of three parts. The first part is the virtual reality simulation itself which is built as a modifi- cation of the Unreal Tournament 2004 game engine. This simulation displays the testing environment to the subject and allows the subject to move and perform actions in this environment as the experiment requires. The second part is an administrative application which allows the experi- menter to set up and control the experiment. This application loads the de- scription of each experiment from a configuration file and creates the requested environment in the simulation. It also allows the experimenter to observe move- ment and actions of the test subject while performing the experiment. The third part is the Eyelink client which collects data from the Eyelink II eye tracking device [SR Research Ltd.(2010)]. Eyelink II provides 2D coordi- nates of the gaze point on the computer screen. The Eyelink client sends these coordinates to the simulation, which translates them into 3D coordinates in the virtual environment. These data are then stored by SpaNav in a text file for later analysis. Both the administrative application and the Eyelink client communicate with the Unreal Tournament engine via text messages sent over the TCP/IP protocol. This allows each of the three components to run on a separate computer and communicate over a local network. In fact, the set up described in Supalov´a’sˇ work uses this arrangement.

7 3.2 Eyelink II

Eyelink II is the eye tracker used by SpaNav. It has been developed by SR Research Ltd. [SR Research Ltd.(2010)] Although we are not using this device in our work, we briefly describe its function and its connection with SpaNav. The eye tracker uses two cameras mounted on the subject’s head to record movement of both eyes. Another camera on the device observes the scene the subject is looking at. Reflective markers are placed in the corners of a computer screen. Using measurements of eye rotation and position of the markers rela- tive to the subject’s head, the Eyelink II software can directly calculate the 2D coordinates of the gaze point on the computer screen. The Eyelink II software requires a dedicated machine running a specialized operating system. No other applications can run on this machine at the same time as Eyelink II. This is one of the major disadvantages of this setup and one we aim to overcome in our work. Eyelink II makes the 2D coordinates of the gaze point on the screen available to SpaNav via a library written in the C language.

Figure 3.1: The Eyelink II device. Image from [SR Research Ltd.(2010)].

8 3.3 I4Control

I4Control is an eye tracking device designed and developed by Medicton Group Ltd. [Medicton group(2008)] It is primarily intended to assist handicapped peo- ple in controlling a computer. Therefore it is not as easy to use in eye track- ing experiments. I4Control uses only one camera, which records images of the right eye. This means that it can not account for movement of the subject’s head. This is not a problem during its intended use, as the assistive applications (see [Fejtov´a– Fabi´an(2009)]) only use relative eye movements as input, that is, whether the user is looking up, down, or to the sides. For our purposes we re- quire absolute coordinates of the gaze point. Therefore we need another device to measure the position and rotation of the subject’s head. I4Control only produces images of one eye and calculates position of the cen- ter of the pupil in those images. The included software does not do any other calculations. Once again, this is enough for I4Control applications, but we need to compute the gaze vector in our software. Medicton Group Ltd. has provided us with software that can extract data from the I4Control device. This is the I4Control server. We start this server on the testing machine and connect to it using the TCP/IP protocol. The I4Control server provides data about the position of the pupil in the camera image every 50 ms. In addition to this online tracking, it also records raw camera images to the hard drive roughly every 12 ms. Medicton Group also provides software to perform offline analysis of these images and calculate the eye position.

Figure 3.2: The I4Control device. Image from http://www.i4control.eu/images/stories/frame.jpg.

9 Finally, we use a custom-built head mount for the eye tracker. The available method of attaching the camera to a pair of glasses (see Figure 3.2) proved to be too unreliable for accurate tracking, as the glasses frequently moved on the subject’s head. This leads to the eye tracker providing inaccurate results. The new mount eliminates this problem as it is fixed on the subject’s head. It also provides a convenient platform on which we mount the TrackIR 4 tracking clip (see Fig. 3.4).

3.4 TrackIR 4

TrackIR 4 is a commercial head tracking device manufactured by NaturalPoint [NaturalPoint(2008)]. It consists of a video camera mounted on the user’s monitor and a tracking clip attached to their head. The tracking clip carries three reflective markers. By observing these markers, the TrackIR software calculates position and orientation of the user’s head. TrackIR provides - three spatial axes of head movement and three angles of rotation (pitch, yaw, and roll). TrackIR is targeted at players, who can use this device to control their virtual character by their own head movements. NaturalPoint also provides a developer API which programmers can use to create their own applications. The API provides methods for activating cameras, recording images and calculating the user’s head position and orientation. This API is implemented as Microsoft COM components. Bindings for C and C++ are available.

Figure 3.3: The TrackIR 4 tracking clip and camera. Images from [NaturalPoint(2008)].

10 Figure 3.4: The head mount used in this work. 1 - TrackIR 4 tracking clip. 2 - I4Control camera. Photo by Iveta Fajnerov´afor the purpose of this work.

11 4. Design Considerations

In this chapter we outline the design goals of this work, as well as high-level decisions we undertake to accomplish these goals as closely as possible.

4.1 Design goals

We set several goals for this project. We aim to create a complete eye tracking package. This means we need to gather data from the eye tracker and from the head tracker, analyze this data, and use it to compute the point on the computer screen that the subject is looking at. This software should seamlessly integrate with SpaNav. Ideally, we should not need to make any changes in SpaNav itself. However we also want to make the software easily extensible and mod- ifiable. Users should be able to use our code to create their own eye tracking applications. We would like programmers writing these applications to not need to know about any implementation details or details of the used hardware. Sim- ilarly, it should be possible to replace the hardware with a different model and preserve functionality of existing applications. Unfortunately, as we explain fur- ther in this work, this is not possible to achieve to its full extent. Last, the software should be efficient. One of the issues of SpaNav is that it requires three dedicated machines to run. We want our software to be fast enough to analyze data from both devices at a decent frame rate, while still leaving enough computing resources to run the client application – SpaNav. The I4Control server is able to analyze data from the eye tracking camera at 25 frames per second. Our software is able to work at the same frame rate. On the other hand, the software should also be robust enough to be able to be distributed between several machines. In most eye tracking experiments, it is beneficial or even required for the experimenter to have direct control over the experiment. Therefore we would like to be able to run the tracking and show the controls on the experimenter’s computer, and allow the client application to run on a second machine.

12 4.2 Software architecture overview

First, examine the high-level architecture of SpaNav (see 3.1and[Supalov´a(2009)]).ˇ

Figure 4.1: High-level architecture of SpaNav.

The administrative application and the Eyelink client are two separate pro- grams. It is therefore easy for us to replace the latter with our own software which will interface with both our tracking devices and with the Unreal Tournament 2004 engine.

Figure 4.2: High-level architecture of our software.

In order to allow other users to build their own software based on our eye tracking methods, we further split our tracking module into two parts. The first is the EyeTracking server, which will communicate with our tracking devices and provide data about the user’s gaze point to the client. The second part is the

13 client, which consumes this data and uses it according to the application. In our case, it simply forwards the data to SpaNav.

Figure 4.3: High-level architecture of the EyeTracking server and client.

Anyone designing their own software using our eye tracking methods only needs to create their own client application. As long as they can communicate with our server, they will have full access to the eye tracking data. Ideally, the client should not need to know anything about the implementation of the server and vice versa. Unfortunately, we can not achieve this perfectly. In addition to the coordinates of the gaze point, our client also needs access to some of the raw data coming from the tracking devices. Therefore we rely on the devices to provide certain functions and measurements. However, a bigger problem is calibrating the devices and the tracking algo- rithm. The calibration is necessarily a device-specific function, and the tracking algorithm runs on the server. This would imply that calibration should run on the server end. However, a client needs to be able to start this calibration. Some clients would also like to have fine control over it – if only to make it visually uniform with the rest of their application. The calibration procedure itself needs access to eye tracking data, so it also technically acts as a client. We have therefore decided to make the calibration a client module. This unfortunately means having to expose some of the mechanics of the server to the client, in particular the tracking algorithm. We discuss this problem in more detail in section 5.5.1.

14 5. Implementation

5.1 Communication

Communication with both input devices is straightforward. The I4Control eye tracker communicates using binary messages over the TCP/IP protocol. This means that the I4Control device can be installed on a different machine than our software. Our program acts as the client in this communication. TrackIR 4 uses a set of Microsoft COM interfaces. The communication is easier to implement, as from the programmer’s side it only requires calling library methods. On the other hand, the TrackIR 4 device must run on the same machine as our code, unless a middleware layer is created which can relay its messages over a network. This was not a requirement, so we do not implement this layer. We poll both devices at regular intervals, gather the measurements, and store them into simple data structures. These are then passed to the tracking algorithm to compute the gaze point. Communication with both devices and work associated with setting up the connections is encapsulated in the Eyetracker and Headtracker classes.

5.2 Tracking

The Tracking class is the core of our system. It is responsible for aggregating data from the input devices and calculating the coordinates of the user’s gaze point on the screen. This problem is not very well covered in literature. Moreover, the devices we use present a unique challenge. Since the head tracker camera is mounted on the computer monitor and not on the user’s head, we do not have implicit data about the location of the screen’s edges from the point of view of the user. Other eye tracking systems, such as Eyelink II, use a reverse setup: the camera is attached to the eye tracking piece, and it observes positions of the screen edges. The gaze direction is then computed relative to the positions of the screen, and computing its intersection with the screen in this setup is relatively trivial.

15 Our system does not have this advantage. The TrackIR 4 head tracker only supplies the position and rotation of the user’s head relative to the camera. Most importantly, we do not have any implicit data about the size or even position of the computer screen itself. The eye tracker poses another difficulty: the I4Control software does not calculate the spatial angle of the rotation of the eye. It only provides coordinates of the center of the pupil as viewed from the eye tracker camera. The camera’s image plane is also not parallel to the computer screen and since the camera can be adjusted to fit each user, this plane is not in a fixed position relative to the user’s head either (see Figure 3.4). In the most general sense, this means we need to compute a mapping from an eight-dimensional space (six coordinates of head position and orientation, two coordinates of eye position) to a two-dimensional space of the computer screen. This would be extremely difficult to compute and calibrate accurately, so we need to simplify the problem.

5.2.1 Head tracking and the center point

Analyzing head tracking data from TrackIR 4 is simple, because they have a usable geometrical interpretation. The head tracker measures position of the head (or rather, of the tracking clip) in three-dimensional coordinates relative to the camera. The head orientation is given as three spatial angles, pitch, yaw, and roll, in degrees. Let us for a moment disregard eye movements and assume that the subject is always looking straight ahead. The subject can only use movements of their head to examine the scene. Now the gaze direction is equivalent to the direction the subject’s head is oriented towards. Computing this direction and its intersection with the computer screen from the data we are given is only a matter of simple trigonometry. We will call the intersection point the center point. Once again, this would be the gaze point if the subject was looking straight ahead in the direction their head is facing.

16 5.2.2 Eye tracking and the delta function

Now we need to account for eye movement. In theory, another geometrical solu- tion could be possible. However, the I4Control device only outputs coordinates of the center of the pupil on the camera image, in pixels. There are several problems here. First, the camera is placed at a slightly odd angle in relation to the subject’s eye, head, and the computer screen (see Figure 3.4). Moreover, this position can be adjusted so that the camera fits each individual subject. Therefore we have no control and no way to deduce the exact location of the camera plane. Also, the eyeball is spherical and the camera only produces a planar image. This coupled with the fact that the camera is observing the eye from a side means that the measurement will be highly non-linear and suspect to errors when the subject looks too far to a side. It would therefore be difficult, if not impossible, to derive an explicit formula to compute the gaze point under these conditions. We make a simplification which allows us to estimate it with a reasonable degree of accuracy. We assume the existence of a delta function. This function takes the measurement of the I4Control camera as input, and outputs the vector of difference between the center point and the actual gaze point, in screen coordinates. Given the position of the center point and the value of this delta function we can easily compute the gaze point. We need to mention here that even the existence of this function is an assump- tion we make. In reality, the difference of the center point and the gaze point is not only affected by eye movement, but also by the position of the camera on the subject’s head, movement of the head, and position of the center point on the screen. However, the dependence on these factors is small compared to the dependence on eye movement. Ignoring all the other factors makes the delta func- tion a function mapping points from two-dimensional space to two-dimensional space. This greatly reduces the complexity compared to the full eight-dimensional problem. Once again, we are not able to derive an explicit formula for the delta function. Our solution is to take several samples of this function during calibration and to

17 interpolate between these samples. An important decision we need to make is the number and positioning of these samples. This is a tradeoff between accuracy and duration of the calibration. The more samples we take, the more accurate the interpolation will be. However taking too many samples means that the calibration procedure takes too much time for the purpose of an experiment. We have settled on taking nine samples in a 3x3 grid. This is the same setup as used by Eyelink II. The last problem concerns the interpolation itself. For every position of the eye obtained from the camera we need to choose several of the sample points between which to interpolate. Simply selecting the three points closest to the measurement is a poor choice. Measured points outside the 3x3 grid of samples would lead to selecting three samples on the edge of this grid. These samples would more than likely have equal or very similar one of the coordinates, therefore it would be very difficult to interpolate this coordinate precisely. We choose a different approach: We divide the plane of possible input mea- surements into eight angles. Each angle is defined by the sample in the middle of the grid and two adjacent samples on the edges. For a given input measurement we then find the angle which it lies in, and use the two samples defining this an- gle and the central sample to interpolate (see Figure 5.1). This provides suitable precision for measurements both within and outside of the grid of samples.

Figure 5.1: An example of samples taken from the eye tracker camera during calibration. These samples correspond to values of the delta function between [-720,-400] and [720,400]. The samples divide the plane into eight angles. If a measurement is taken at the point denoted by an X, the three highlighted samples will be used for interpolation.

18 5.3 Calibration

In order to give the measurement units from both devices a real-world interpre- tation, we need to perform a calibration procedure. The calibration also collects samples for interpolating the delta function. Calibration needs to be done before every experiment, as each person’s head is different, and different people will po- sition the head mount slightly differently. The mount can also slip on the head due to involuntary muscle movements. The calibration needs to be precise enough to ensure accurate results during an experiment, but it can not take a long time as that would prolong the entire testing process. Also note that any error during calibration will invalidate the parameters given to the tracking procedure, and therefore make the entire exper- iment invalid. This means that we want to be able to warn about any possible errors as soon as possible, preferably before the experiment starts.

5.3.1 Calibrating the head tracker

The TrackIR 4 API returns the position of the head in device units and the orientation in degrees. In order to calculate the center point we need to convert the measurements from device units to pixels of the screen. TrackIR 4 uses the same scale in all three dimensions, and we assume that the screen pixels are square. This is a reasonable assumption as nearly all commonly used monitors have this property. Therefore the conversion from device coordinates to screen coordinates is a simple linear function. We ask the subject to move their head so that they are facing a target point on the screen and looking straight ahead. We take measurements from TrackIR 4 for two points: one in the left part of the screen and one in the right part. Knowing these measurements and the position of these points on the screen, we can easily find the linear conversion function. Since the orientation of the head is returned in degrees, this part of the mea- surement does not need to be calibrated.

19 5.3.2 Calibrating the eye tracker

Calibrating the eye tracker means taking samples of the delta function. To do this we display markings on the screen and ask the subject to look at these markings. We can place these markings at a known distance from the measured center point, so we know the expected results of the delta function. By taking measurements from the eye tracker when the subject is looking at the markings, we also know the coordinates of the eye on the camera image corresponding to these results. Note that since we want to sample the delta function, the markings move with the center point as the subject moves and turns their head. We take nine samples of the delta function, one exactly at the center point, and the rest in a 3x3 grid around it. The size of this grid is chosen as 75% of the size of the computer screen. This covers the coordinate space well, and it is still possible for the subject to see all of the markings while keeping the center point fixated in the center of the screen. Since the central sample is used for interpolation every time (see 5.2.2), it is important that it is measured as accurately as possible. Therefore we measure the position of the eye when the subject is looking directly at the center point twice, and average these measurements. We could do this for each of the samples to get a more accurate calibration, however it would also take a proportionally larger amount of time.

5.4 Data recording

Our software allows recording of two types of data. The first type is recorded by our software directly. In every frame the tracking is active and calibrated, we save the raw data from both the eye tracker and the head tracker, the computed center point, and the computed gaze point. This is virtually all the information the EyeTracking server provides. This recording is saved at the same frequency our software runs at: 25 frames per second. The second type of data is recorded by the I4Control camera. In addition to providing the coordinates of the center of the pupil, this camera is also able to save a stream of raw images of the eye into a file. This recording can run at up

20 to 80 frames per second. However, these images are not analyzed and therefore do not contain any information about the position of the pupil. To extract this information, an utility provided by Medicton Group Ltd. must be used to analyze the recorded data after an experiment. Such high-frequency data is useful for several physiological measurements, for example, the detection of saccades – sudden, quick movements of the eye. This method has two drawbacks. First, since the data analysis is run after the experiment, we have no way of pairing the eye tracking data with measurements from the head tracker, and computing the gaze point at this high frequency. Second, this method is storing raw images at full resolution. This means that the recorded data take up large amounts of space, up to a gigabyte for five minutes of recording. We offer the possibility to turn recording of either kind of data on or off via application settings, as the user desires. In addition to this, SpaNav also records data of its own. This recorded data contains not only the 2D coordinates of the gaze point, but also the 3D coordinates of the object or point in the virtual reality scenario the subject is looking at. Our application has no access to the virtual reality, so we do not work with this data.

21 5.5 EyeTracking clients

Section 4.2 describes the high-level architecture of our software. The system is split in two parts: the server, which communicates with tracking devices and calculates the gaze point, and the client, which then uses this data according to the application. We implement the communication between the server and the client using the Strategy design pattern. The server is implemented in the Tracking class. Clients are represented by the Worker abstract class. To create an EyeTracking client, one should derive from this class and implement the abstract methods start, tick, and stop. To activate the client, call the setActiveWorker method of the Tracking class instance and pass an instance of the subclass of Worker as a parameter. The Worker’s start method is called, allowing the client to perform startup tasks, such as initializing local variables. The instance of Tracking which is using the client is also passed to the Worker at startup, and the client is expected to keep a reference to it. After a client is started, the server will collect data from the eye tracker and head tracker at regular intervals, and call the Worker’s tick method. The client can access the measured data by calling one of the get* methods on the Tracking instance. The most important of these are getScreenPosition, which returns the calculated gaze point in screen coordinates, and getRawEyePosition and getRawHeadPosition, which return unmodified data measured from the eye tracker and head tracker respectively. A full listing of these methods is available in the programmer’s guide. When the client is finished, either the Worker itself or the surrounding code running it should call the setActiveWorker method on the Tracking instance again. The running client’s Worker’s stop method is called, allowing the client to perform cleanup. If another Worker is passed to setActiveWorker, the new worker starts acting as the client. If NULL is passed, the server stops and waits for a new client. A sequence diagram of this communication is shown in Figure 5.2.

22 Figure 5.2: Sequence diagram of the communication between EyeTracking client and server.

We implement three clients represented by three subclasses of Worker: the CalibrationWorker, the TestWorker, and the UTWorker.

5.5.1 CalibrationWorker

In order to provide feedback to the user, the calibration procedure needs access to the data from tracking devices while the calibration is being performed. There are several possible approaches to providing this data. The calibration could be implemented as a part of the EyeTracking server. While this solution would likely be easiest to implement, it would introduce very tight coupling between the calibration and the tracking algorithm. It would also lead to duplication of the server/client communication mechanisms. We could implement the calibration as a client module, as a subclass of the Worker base class. This would provide the programmer with much more flexi- bility, as changing the calibration procedure would not affect the server tracking

23 mechanism. However, the server would need to expose some of its internal da- ta to the clients so that the calibration can change parameters of the tracking algorithm. Another possible solution is to create an entirely new method of connecting the calibration and the server together. This method would provide the most flexibility and hide all the private data of the server from the clients. However, it would also be difficult to implement and it would require another specialized algorithm which would not be reused anywhere else. We choose to implement the second solution. The CalibrationWorker class is responsible for running the entire calibration procedure and setting the param- eters of the tracking algorithm if it completes successfully.

5.5.2 TestWorker

The TestWorker class provides a simple test of precision of the tracking algorithm. This test can be run after a calibration to verify that the data being computed are accurate. The test displays a stationary point on the screen. The subject is asked to look at this point and press a key on the keyboard. The gaze position measured by the server at the moment of the key press is recorded. This is then repeated multiple times for varying locations of the testing point. If the calibration was successful, the measured gaze positions should correspond to the positions of the target points. At the end of the test a graphic is displayed, quickly showing the error in measurement to the experimenter. For more details about the testing, see chapter 6.

5.5.3 UTWorker

This worker is responsible for communicating with the SpaNav environment. It is the most straightforward one: it simply retrieves data from the server and sends them to the Unreal Tournament engine using the TCP/IP protocol. In addition to that, it also optionally records the raw measurements from both devices as well as the measured gaze point to a text file for later analysis.

24 5.6

The last part of our work is a simple (GUI), which allows the experimenter to connect to both devices, run the calibration and the precision test, and run the experiment with SpaNav. The GUI also reports the status of both devices (disconnected/connected/not receiving data/OK) both in text and graphically. This alerts the experimenter immediately if one of the devices is not getting a clear reading, for example because of an object obstructing the field of view of the camera. The GUI also allows convenient control of the experiment setup. The three most important settings are:

• Whether the I4Control server should be started on the machine the program is running on, or whether it is already running on a remote machine.

• The IP address and port on which the Unreal Tournament 2004 environment is listening.

• The path where the program should store recorded data.

See the User Guide for a description of all the functions of the GUI.

Figure 5.3: The user interface of the EyeTracking application (left) and the GUI for specifying program settings (right).

25 6. Testing

We implement a simple testing procedure which we use to verify the accuracy of our tracking method. The only goal of this test is to verify the final measurement, that is, the gaze point. The test is straightforward: we show a number of target points on the screen to the subject, and we ask the subject to look at these points and fixate on them. We then record the data measured by the EyeTracking application at that time. The measured data is stored in a text file along with the positions of the target points. It is then easy to create a chart showing the targets and the measured gaze points (see below). Our aim is to minimize the difference between the measured data and the real position of the target point. For the purpose of experiments with SpaNav, this difference should not be greater than about 100 pixels. Our precision test uses a rectangular grid of target points uniformly spaced across the entire screen. The points are displayed one by one in a random order. The subject is asked to focus on the displayed point for several seconds. This is done to make sure that all the devices are receiving stable data. Especially the coordinates reported by the I4Control eye tracker sometimes “lag” behind reality by as much as one second. Then the subject presses a key on the keyboard, the current measurement is recorded, and a new target point is displayed. Each target point is displayed several times during the test. This allows us to get more data and also to eliminate incorrect measurements. For example, if the subject blinks at the same moment as the measurement is taken, the eye tracking camera will not be able to measure the position of the pupil correctly. The number of target points used as well as the number of times each of the points is displayed is configurable by editing the eyetracking.ini file. In our experiments we use either a 3x3 or a 4x4 uniform grid of points. We have performed multiple tests with two subjects at the Department of Mathematics and Physics of Charles University, and with five subjects at the Institute of Physiology of the Academy of Sciences of the Czech Republic. Below we show several interesting test results, first an example of a good result, then three tests demonstrating common problems.

26 Figure 6.1: A test after an accurate calibration. All the measured points are within 100 pixels of the targets, most are within 50 pixels or less.

Figure 6.2: During this test the subject was asked to look at the target points using only head movements, and to only look straight ahead. We can see that the accuracy drops off significantly at the edges of the screen. This is an evidence for the dependence of the delta function on head orientation.

27 Figure 6.3: Here we can see a similar problem to the previous test. The distortion is pronounced beyond usable limits towards the left edge of the screen. This is very likely also partially caused by an incorrectly measured calibration sample.

Figure 6.4: The I4Control camera had serious problems detecting this subject’s eye. This resulted in problems during calibration when the eye was not detected when the subject was looking at some of the sample points. The result is a very distorted measurement.

28 7. Conclusions

We have developed a system performing eye tracking measurements using the I4Control eye tracker and TrackIR 4 head tracker. Our EyeTracking application is a proof of concept showing that eye tracking is possible using commercially available hardware. While we can achieve the desired accuracy in good conditions, there are still open problems to solve. The I4Control camera is very sensitive to lighting conditions. Best results are achieved in an environment with uniform lighting. Windows, light fixtures, and other bright objects reflecting in the user’s eye have a strong negative impact on the measurements. A related problem occurs with subjects wearing glasses. Glasses reflect the light emitted by the TrackIR 4 camera, which confuses the eye tracker. Another problem appears when the subject moves and turns their head ex- cessively. Our assumptions about the delta function in 5.2.2 only hold when the head is facing a point close to the center of the screen. Even during normal use this sometimes results in barely acceptable accuracy for some people. We are currently working on improving the accuracy of our software in this case. Even despite these problems, we have been able to perform the measurements to a reasonable degree of accuracy. We use commercially available hardware and software, which makes our solution very well suited especially for small research teams. We also provide an extensible framework for other applications that use eye tracking data. Our software and our eye tracking measurements can be easily extended by programmers developing their own applications.

29 Bibliography

[Bulling et al.(2011)] BULLING, A. et al. Eye Movement Analysis for Activity Recognition Using Electrooculography. IEEE Transactions on Pattern Anal- ysis and Machine Intelligence. 2011, 33, s. 741–753. ISSN 0162-8828. doi: http://doi.ieeecomputersociety.org/10.1109/TPAMI.2010.86.

[Duchowski(2007)] DUCHOWSKI, A. T. Eye Tracking Methodology: Theory and Practice. Secaucus, NJ, USA : Springer-Verlag New York, Inc., 2007. ISBN 1-846-28608-5.

[Fejtov´a– Fabi´an(2009)] FEJTOVA,´ M. – FABIAN,´ V. Syst´em I4Control nov´e moˇznosti pro snadnˇejˇs´ıovl´ad´an´ıPC [New possibilities for easier PC control] [online]. 2009. Available in Czech only. Available from: http://www.i4control.eu/DokumentyKeStazeni/2009Inspo.pdf.

[Goldberg et al.(2002)] GOLDBERG, J. H. et al. Eye tracking in web search tasks: design implications. In Proceedings of the 2002 symposium on Eye tracking research & applications, ETRA ’02, s. 51–58, New York, NY, USA, 2002. ACM. doi: http://doi.acm.org/10.1145/507072.507082. Available from: http://doi.acm.org/10.1145/507072.507082. ISBN 1-58113-467-3.

[Huey(1908)] HUEY, E. The psychology and pedagogy of reading, with a review of the history of reading and writing and of methods, texts, and hygiene in reading. The Macmillan company, 1908. Available from: http://books.google.com/books?id=p2dEAAAAIAAJ. ISBN 9780872076969.

[ICT Results(2009)] ICT Results. Eye-tracking Software Opens Online Worlds To People With Disabilities [online]. ScienceDaily, 2009. [cit. May 5, 2011]. Available from: http://www.sciencedaily.com/releases/2009/06/090630075449.htm.

30 [Medicton group(2008)] Medicton group. I4Control product page [online]. 2008. [cit. May 5, 2011]. Available from: http://www.i4control.eu/index.php?lang=english.

[NaturalPoint(2008)] NaturalPoint. TrackIR 4 product page [online]. 2008. [cit. May 5, 2011]. Available from: http://www.naturalpoint.com/trackir/02-products/product-TrackIR-4-PRO.html

[SR Research Ltd.(2010)] SR Research Ltd. SR Research website [online]. 2010. [cit. May 5, 2011]. Available from: http://www.sr-research.com/EL_II.html.

[Sundstedt(2010)] SUNDSTEDT, V. Gazing at games: using eye tracking to control virtual characters. In ACM SIGGRAPH 2010 Courses, SIGGRAPH ’10, s. 5:1–5:160, New York, NY, USA, 2010. ACM. Available from: http://doi.acm.org/10.1145/1837101.1837106. ISBN 978-1-4503-0395- 8.

[Supalov´a(2009)]ˇ SUPALOVˇ A,´ I. Orientace v prostoru zkouman´ave virtu´aln´ı realitˇe[Adaptation of spatial navigation tests to virtual reality]. Master’s thesis, Charles University in Prague, 2009. Available in Czech only.

[Yarbus(1967)] YARBUS, A. L. Eye movements and vision. 1967.

31