Eye Movement Analysis for Activity Recognition in Everyday Situations

Faculty of Technology and Society Computer Science Bachelor's thesis 15 Credits, undergraduate level Eye Movement Analysis for Activity Recognition in Everyday Situations Analys av OgonrörelserförAktivitetsigenkänning¨ i Vardagliga Situationer Anton Gustafsson Degree: Bachelor, 180 Credits Supervisor: Shahram Jalaliniya Field of study: Computer Science Examiner: Fahed Alkhabbas Program: System Developer Date of final seminar: 2018-05-30 Abstract The increasing amount of smart devices in our everyday environment has created new problems within human-computer interaction such as how we humans are supposed to interact with these devices efficiently and with ease. So far, context-aware systems could be a possible candidate to solve this problem. If a system automatically could detect people's activities and intentions, it could act accordingly without any explicit input from the user. Eyes have previously shown to be a rich source of information about a person's cognitive state and current activity. Because of this, eyes could be a viable input modality for extracting information from. In this thesis, we examine the possibility of detecting human activity by using a low cost, home-built monocular eye tracker. An experiment was conducted were participants performed everyday activities in a kitchen to collect eye movement data. After conduct- ing the experiment, the data was annotated, preprocessed and classified using multilayer perceptron and random forest classifiers. Even though the data set collected was small, the results showed a recognition rate of between 30-40% depending on the classifier used. This confirms previous work that activity recognition using eye movement data is possible but that achieving high accuracy is challenging. Keywords: activity recognition, artificial intelligence, data annotation, eye tracking, eye movement analysis, gaze tracking, human activity recognition, intention recognition, machine learning Sammanfattning Den ständigtökande mängdenav smarta enheter i v˚arvardag har lett till nya problem inom HCI s˚asom hur vi människor ska interagera med dessa enheter p˚aett effektivt och enkelt sätt. An¨ s˚alängehar kontextuellt medvetna system visat sig kunna vara ett möjligt sättatt lösadetta problem. Om ett system hade kunnat automatiskt detektera personers aktiviteter och avsikter, kunde det agera utan n˚agonexplicit inmatning fr˚ananvändaren. Ogon¨ har tidigare visat sig avslöjamycket information om en persons kognitiva tillst˚and och skulle kunna vara en möjligmodalitet föratt extrahera aktivitesinformation ifr˚an. I denna avhandling har vi undersöktmöjlighetenatt detektera aktiviteter genom att användaen billig, hemmabyggd ögonsp˚arningsapparat. Ett experiment utfördesdärdelt- agarna genomfördeaktiviteter i ett kökföratt samla in data om deras ögonrörelser.Efter experimentet var färdigt,annoterades, förbehandlades och klassificerades datan med hjälp av en multilayer perceptron{och en random forest–klassificerare. Trots att mängden data var relativt liten, visade resultaten att igenkänningsgradenvar mellan 30-40% beroende p˚avilken klassificerare som användes.Detta bekräftartidigare forskning att aktivitetsigenkänninggenom att analysera ögonrörelserärmöjligt. Dock visar det även att det fortfarande ärsv˚artatt uppn˚aen högigenkänningsgrad. Glossary Gaze To look at something steadily and with intent HAR Human activity recognition, sometimes referred to activity recognition (AR) is the process of interpreting and detecting what a person is doing. HCI Human-computer interaction. The field of research about interfaces between humans and computers IoT Internet of Things, the concept of having many small connected computers inside common objects ML Machine learning, a collection of algorithms that learns in order to classify data. MLP Multilayer perceptron, a type of machine learning algorithm that is implemented with a neural net. Modality A single independent channel of input to a computer system. RF Random forest, a type of machine learning algorithm that is implemented with a decision tree. Smart space An environment that is interwoven with sensors, devices and computers that are connected. Contents 1 Introduction 1 1.1 Background . .1 1.2 Research Questions . .2 1.3 Hypotheses . .2 1.4 Structure . .3 2 Related work 4 2.1 Systematic Literature Review . .4 2.2 Search Strategy . .4 2.3 Summary of Related Work . .5 2.4 Common Uses for Activity Recognition . .6 2.5 The Traditional Methods of Activity Recognition . .6 2.6 Eyes for Activity Recognition . .7 3 Methodology 9 3.1 Experiment . .9 3.2 Method . 10 3.2.1 Experiment Setup . 10 3.2.2 Apparatus . 10 3.2.3 Software Stack . 11 3.2.4 Tasks . 12 3.2.5 Participants . 13 3.2.6 Procedure . 13 3.3 Data Processing . 14 3.3.1 Features and Attributes . 14 3.3.2 Tasks . 15 3.3.3 Time Window . 15 3.3.4 Sampling . 15 3.3.5 Classifiers . 15 3.3.6 Metrics . 16 3.3.7 Pilot Experiment . 16 4 Results 18 5 Discussion and Analysis 21 5.1 Results Compared to Related Work . 22 6 Conclusion 24 6.1 Future Work . 24 References 25 A Appendix 33 A.1 Systematic Literature Study Search Keywords . 33 A.2 Systematic Literature Study Inclusion and Exclusion Criteria . 33 A.3 Applied Inclusion and Exclusion Criteria to the Search Results . 34 1 Introduction This section gives an introduction for this thesis. It includes the background, hypotheses, research questions and the structure for the rest of the thesis. 1.1 Background The vision of smart spaces by Mark Weiser [1] is quickly becoming a reality with the emergence of IoT. It is estimated that we will have 50 billion connected devices by the year 2020 [2]. This future will bring us many benefits, some of which we can already see evidence of today. Systems that control the lights [3] in our home by automatically turning them on and off when needed, reduces the environmental impact and leads to a more comfortable living. Smart home alarms [4] can increase fire safety and reduce the risks of break-ins. Samsung, one of the largest brands of home electronics in the world [5], has stated that all their products will be smart by 2020 [6]. This increase in devices around us creates new problems and questions such as how are we going to interact with all these devices. Having a separate user interface for each system or device either physical or as part of some other device would quickly become overwhelming. It could possibly make these systems more cumbersome to use than if we were without them. One solution to this problem could be to create context-aware systems. These systems are able to recognize and infer different situations in their environment and thereby not require any explicit input from a user to react. Despite the massive amount of previous work in the area of context-aware systems, context-recognition is still very challenging. Many sensors must work in conjunction and collect great amounts of different data points and process it in order to be able to recognize the current situation. If it is possible to solve this problem, context-aware systems could play an essential part in realizing the vision of smart spaces and adoption of the internet of things [7]. We have already seen applications for activity recognition by using sensors in smart- phones and other wearable sensors. There are applications that recognize sport activities [8], keep track of your health [9] and your movements in everyday life [10, 11]. How- ever, inertial sensors such as accelerometers and gyroscopes are limited in the activities they are able to detect, mainly physical ones. Multiple sensors can be deployed to increase accuracy [12] and the amount of possible activities to detect but this is obtrusive and not particularly user friendly, especially if a person has to wear them on their body. Another common approach is borrowing techniques from the field of computer vision. Researchers have managed to achieve high accuracy when detecting common activities [13, 14] in videos. Aside from privacy issues associated with using videos for activity recognition, just as with inertial sensors, this approach can mostly detect physical activities. Eyes have a great potential to reveal a lot of information for non-physical states and activities such as high cognitive load [15, 16, 17] and activity recognition such as read- ing [18, 19, 20, 21, 22, 23] and even recognizing emotions [24, 25]. But eye trackers have their own limitations, e.g. head-mounted eye trackers are still obtrusive and sensitive to body motion. Remote eye trackers such as the Tobii eye tracker [26] is susceptible to the fact that it will not always have a clear line of sight to the eyes and are limited to relatively short detection distance. However, as eye trackers continue to evolve and become more accurate, smaller and cheaper it is likely that it will be possible to integrate eye trackers 1 in smart glasses to make them completely unobtrusive in the future. We have already seen evidence of this in technology such as the Google Glass [27, 28] and Tobii eye trackers. Today, eye trackers are a lot better than just ten years ago and eye tracking technology is used not only in research projects [29, 30] but also in commercial applications such as driver assisting technologies [31, 32] and smartphone applications [33]. It is also one of the primary input modalities for disabled people [34, 35]. Almost all of the existing research regarding activity recognition with eye trackers has been conducted in laboratory environments where the subjects are stationary in front of a screen. That is, the research is about tracking eyes and detecting activity when the sub- ject is looking at regular two-dimensional (2D) screens. Only a few number of researchers have examined activity recognition in three-dimensional (3D) environments situated in real life [30, 36, 37].

Eye Movement Analysis for Activity Recognition in Everyday Situations

Rapport AI Internationale Verkenning Van De Sociaaleconomische

Appearance-Based Gaze Estimation with Deep Learning: a Review and Benchmark

Through the Eyes of the Beholder: Simulated Eye-Movement Experience (“SEE”) Modulates Valence Bias in Response to Emotional Ambiguity

Modulation of Saccadic Curvature by Spatial Memory and Associative Learning

A Survey of Autonomous Driving: Common Practices and Emerging Technologies

Passenger Response to Driving Style in an Autonomous Vehicle

Gaze Shifts As Dynamical Random Sampling

New Methodology to Explore the Role of Visualisation in Aircraft Design Tasks

Human Visual Attention Prediction Boosts Learning & Performance of Autonomous Driving

INTELIGENCIA EN REDES DE COMUNICACIONES Trabajos De La Asignatura Curso 11-12

DOT's Efforts to Streamline Technology Transfer

THROUGH the HANDOFF LENS: COMPETING VISIONS of AUTONOMOUS FUTURES Jake Goldenfein†, Deirdre K