The Mental Image Revealed by Gaze Tracking

The Mental Image Revealed by Gaze Tracking Xi Wang Andreas Ley Sebastian Koch David Lindlbauer TU Berlin TU Berlin TU Berlin TU Berlin / ETH Zurich James Hays Kenneth Holmqvist Marc Alexa Georgia Institute of Universität Regensburg TU Berlin Technology correlate stimuli with gaze stimuli database ... user equipped with eye-tracker record fixation data Figure 1: We use eye movements of a person recalling an image while looking at nothing (a white wall in this case) to retrieve a set of potentially matching images from a database. ABSTRACT CCS CONCEPTS Humans involuntarily move their eyes when retrieving an • Human-centered computing → Interactive systems image from memory. This motion is often similar to actually and tools; Interaction techniques; Interaction paradigms; observing the image. We suggest to exploit this behavior as a new modality in human computer interaction, using the KEYWORDS motion of the eyes as a descriptor of the image. Interaction gaze pattern, mental imagery, eye tracking requires the user’s eyes to be tracked, but no voluntary physi- ACM Reference Format: cal activity. We perform a controlled experiment and develop Xi Wang, Andreas Ley, Sebastian Koch, David Lindlbauer, James matching techniques using machine learning to investigate Hays, Kenneth Holmqvist, and Marc Alexa. 2019. The Mental Image if images can be discriminated based on the gaze patterns Revealed by Gaze Tracking. In CHI Conference on Human Factors in recorded while users merely recall an image. Our results Computing Systems Proceedings (CHI 2019), May 4–9, 2019, Glasgow, indicate that image retrieval is possible with an accuracy Scotland UK. ACM, New York, NY, USA, 12 pages. https://doi.org/ significantly above chance. We also show that these results 10.1145/3290605.3300839 generalize to images not used during training of the classifier and extends to uncontrolled settings in a realistic scenario. 1 INTRODUCTION Imagine that you are thinking about your vacation photos from last summer. After 5 seconds a photo appears in front Permission to make digital or hard copies of all or part of this work for of you, from that vacation, very similar to the moment you personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear just recalled. this notice and the full citation on the first page. Copyrights for components We believe such a seemingly magical task is possible of this work owned by others than ACM must be honored. Abstracting with through the use of eye tracking. It has been observed for credit is permitted. To copy otherwise, or republish, to post on servers or to over a century that humans move their eyes while thinking redistribute to lists, requires prior specific permission and/or a fee. Request about images [21, 44, 49], more precisely, recalling images permissions from [email protected]. CHI 2019, May 4–9, 2019, Glasgow, Scotland UK from memory. While this perhaps surprising effect has long © 2019 Association for Computing Machinery. been thought of as an odd superfluous expenditure of energy, ACM ISBN 978-1-4503-5970-2/19/05...$15.00 it has been suggested in the 1960’s that it is connected to the https://doi.org/10.1145/3290605.3300839 active construction and organization of the image [18, 46]. This hypothesis has since been rigorously and repeatedly to recall some of the images while looking at a blank white- proven [3, 23–25, 36, 37, 51, 56, 60]. It is now clear that eye board. We find that the median rank in a classification ap- movements during the recall of an image serve as a spa- proach is small, showing that the idea is promising for prac- tial index. This means gaze during recall is connected to tical use. the location of features in the image. This connection, in Despite the encouraging results, our proposed new modal- turn, suggests that gaze might contain enough information ity still faces challenges that require further investigations. to retrieve a (known) image a user merely thinks about. We discuss them in detail in Section 8 together with possible On the other hand, gaze data is coarse. It has to be expected future applications. that gaze data for different images may be very similar. To make matters even more difficult, it is known that eye move- 2 BACKGROUND & RELATED WORK ments during imagery are not reinstatements of those made during perception [24, 56]. Moreover, there is a large variabil- Eye movements during imagery ity of eye movements during mental imagery across users. Early studies [44, 49] reported a large amount of eye move- A common effect is that eye movements during imagery ment activity during visual imagery, which suggested a tight are of smaller magnitude compared to those during percep- link between eye movements and mental images. Today, tion [3, 15, 22, 23]. We discuss more details in Section 2 and a large body of research has shown that spontaneous eye limitations imposed by these characteristics in Section 7. movements frequently occur when scenes are recalled from In order to gauge the feasibility of image retrieval based on memory and that such gaze patterns closely overlap with gaze data during mental imagery we set up a controlled lab the spatial layout of the recalled information [3, 23, 36]. experiment (see Section 3)1. Participants are viewing images It has been suggested that eye movements to blank spaces, while eye tracking is employed. Directly after an image is or the processes driving them, function as “spatial indices” [1, shown, the screen turns to gray and participants are asked to 51] that may assist in arranging the relevant parts of a visu- internally recall the image while keeping their eyes open. We alized scene [3, 18, 36, 46], and it has been argued that such record the eye motions during both image perception and looks at nothing can act as facilitatory cues during memory recall using a video-based eye tracker. Based on the collected retrieval [14, 50]. data we evaluate different retrieval scenarios. In all cases But characteristics of eye movements during recalling we essentially ask: how well can images be computationally pose four challenging issues on our idea of constructing an discriminated from other images based on only gaze data. eye-movement based mechanism for image retrieval. Firstly, The scenarios differ based on what gaze data is used asa although imagery eye movements are functional, the eye query and what gaze data is used for matching. We consider movements during imagery are not reinstatements of those different combinations which can be generally divided into made during perception [24, 56]. Secondly, it is very possi- two scenarios: in Scenario 1, we follow the idea of exploring ble that covert attentional movements occur during imagery available information contained in the data; in Scenario 2, that may account for the functional role of eye movements to we test a realistic setting in applications, which allows for nothing [55]. Thirdly, some people prefer to close their eyes the possibility of extension to new users. during recall, an effect so far not fully understood41 [ , 62], We develop two types of retrieval algorithms for these which makes eye-movement recordings with video-based scenarios. Restricting ourselves to using spatial histograms eye-trackers unpractical. Fourth, several studies have re- of the data, we consider an extended version of earth movers ported a large variability in the extent of the imagery eye distance (EMD) and, at least for the scenarios that provide movements of participants. Typically, some participants scale enough data, we also use deep neural nets. In general, we down the size of the area covered by imagery eye movements find that retrieval is feasible, albeit the data from lookingat compared to the area covered by the original image [15, 23]. nothing is challenging, and the resulting performance varies significantly across scenarios and observers. Gaze interaction Based on the promising results in a lab setting we make a first step towards a real application (see Section 6): we sent Despite the well-known ‘Midas Touch’ problem [20], gaze several participants with a mobile eye tracker to a staged still offers an attractive option as input, since it is compara- ‘museum’ exhibiting paintings. After their tour, we ask them tively fast and accurate. Especially in hand-free interaction systems, gaze data provides an additional input modality. As an input method, gaze is used to indicate users’ intention [19, 35], and to type words [40] and passwords [4]. Other types 1Gaze data and source code can be found on the project page http:// of applications are based on the characteristic of eye move- cybertron.cg.tu-berlin.de/xiwang/mental_imagery/retrieval.html ments captured by dwell time [45] or smooth pursuit [13]. For a more comprehensive review of eye-based interaction, t=11.5s we refer readers to work by Majaranta and Bulling [39]. Recall Most similar to our endeavor are attempts at learning t=6.5s something about the image or the observer by simply pro- t=5.5s cessing the gaze in a free viewing task. Fixations have been View used to select images from a collection to indicate the desired 70cm 40cm 0cm t=0.5s attributes of a search target [54]; or gaze data is exploited to t=0s (a) identify fine-grained differences between object classes [30]. (b) Comparing to previous work, where users are mostly re- Figure 2: (a) Experimental apparatus. (b) Trial sequence. quired to consciously move their eyes, the proposed interaction system relies more on unconscious eye movements. recall, gaze patterns are significantly distorted and incom- Brain-computer interfaces plete compared to the real visual appearance of an object.

Load more