Eye Tracking Data in Multimedia Containers for Instantaneous Visualizations

Eye Tracking Data in Multimedia Containers for Instantaneous Visualizations Julius Schoning¨ ,∗ Patrick Faion, Gunther Heidemann, and Ulf Krumnack Institute of Cognitive Science—Osnabruck¨ University, Germany (a) (b) (c) Figure 1: Instantaneous visualization of gaze data, show as yellow dots: (a) visualization of gaze data on a frame with VLC player — (b) like normal subtitles one can easily change between the gaze data of subjects — (c) exemplary gaze data on another video sequence. ABSTRACT Thus, why not encapsulate gaze data alongside the associated Nowadays, the amount of gaze data records of subjects associ- video material in a joint container? This has become common prac- pdf ated with video sequences increases daily. These eye tracking data tice for storing text plus metadata, e.g., in the container. Nowa- ogg are unfortunately stored in separate files in custom-made data for- days video containers like the open container formats [20]( ), MPEG-4 Matroskaˇ MKV mats, which reduces accessibility even for experts and makes the [7], or the container format [9]( ) encap- data effectively inaccessible for non-experts. Consequently, we still sulate video and metadata like subtitles, audio comments, and in- lack interfaces for many common use cases, such as visualization, teractive features. These formats can be distributed as a single file, streaming, data analysis, high level understanding, and semantic played by standard multimedia players, and streamed via the Inter- web integration of eye tracking data. To overcome these shortcom- net. In this manner, the accessibility of gaze data will be increased substantially if they are encapsulated in a common container for- ings, we want to promote the use of existing multimedia container 1 formats to establish a standardized method of incorporating con- mat, cf demonstration video . tent videos with eye tracking metadata. This will facilitate instan- Thus, we argue that video eye tracking data sets should be stored taneous visualization in standard multimedia players, streaming via in a multimedia container, carrying the corresponding video, the the Internet, and easy usage without conversion. Using our proto- gaze trajectories of multiple subjects and other video related data. type software, we embed gaze data from eye tracking studies and We present a software and a multimedia container format allowing the corresponding video into a single multimedia container, which to combine gaze data of several subjects with the video material can be visualized by any media player. Based on this prototype in a single multimedia container. Current multimedia containers implementation, we discuss the benefit of our approach as a possi- already support a variety of video data formats and with our ap- ble standard for storing eye tracking metadata including the corre- proach add instantaneous visualization of gaze points in standard sponding video. media players or in slightly modified versions of them. Our long- term aim is to establish a standard format that facilitates applica- Index Terms: H.2.4 [Information Systems]: Systems — tion in various fields, ranging from annotation for computer vision Multimedia databases; I.2.10 [Computing Methodologies]: Vision learning algorithms over highlighting objects in movies for visually and Scene Understanding —Representations, data structures, and impaired people to creating auditory displays for blind people or transforms video analysts. Such a standard format can also boost accessibility and shareability of eye tracking data as well as combination with 1 INTRODUCTION other metadata. Gaze data information belonging to video files is still commonly Our contribution focuses on the instantaneous visualization of stored separately next to the video file in custom file formats. The eye tracking data, but we also try not to neglect other kinds of data structures within these files are mostly customized, sometimes metadata. The paper starts with an extensive review of available unique, and they are stored in a diversity of formats, e.g. plain text, data formats for video annotation, metadata description, timed data XML, Matlab mat format, or even binary. As a consequence, one formats, and multimedia containers. Based on a discussion of suit- needs special tools for accessing and visualizing this data, which able data representations for scientific metadata, we convert exist- leads to the fact that the use of this data is almost impossible for a ing eye tracking data sets [13,8,4] with its metadata—gaze data general audience. 1Demonstration video, software including all tools, and converted eye tracking data sets can be downloaded at https://ikw.uos.de/ ∗e-mail:fjuschoening, pfaion, gheidema, [email protected] %7Ecv/publications/ETVIS16 c 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. of subjects, and annotations of objects—into multimedia contain- lows temporal interpolation using connected polynomials, of which ers with our software. These containers bundle all information and linear interpolation is a special case. On this basis, the SPATIO- provide an instantaneous visualization in standard video players. TEMPORAL LOCATOR describes spatio-temporal regions of an ob- We then compare the advantages and drawbacks of our approach ject of interest in a video sequence. and summarize the possible impact a standardized multimedia container will provide. 2.2 Timed text Timed text [16] refers to time-stamped text documents that allow 2 DATA FORMATS relating sections of the text to certain time intervals. Typical appli- Today’s video eye tracking data sets [17, cf. Chap. 2.2] are stored cations of timed text are subtitling of movies, captioning for hearing in a diversity of formats, e.g. plain text, XML, Matlab mat format, impaired or people lacking audio devices. The most simple form of or even binary. But hardly any these formats provides a method to timed text consists of units providing a time interval and a text to be support streaming of video data together with the metadata. This displayed during that interval. This is basically what is provided by is somewhat disappointing, considering the fact, that many stream- the popular SubRip format (SRT). able formats, in the domains of DVD, Blu-ray, or video compres- sion, exist. Some of these formats might be capable of carrying 2.3 Subtitle and Caption Formats these eye tracking metadata next to the video to fit the necessary The universal subtitle format (USF) was an open specification [11] requirements, give an easy to use visualization within a standard which aims at providing a modern format for encoding subtitles. multimedia player, and still provide the opportunity of expert visu- It tries to be comprehensive by providing features from different alization in specific tools. existing subtitle formats. An XML-based representation is chosen to gain flexibility, human readability, portability, Unicode support, 2.1 Metadata Formats hierarchical structure and an easier management of entries. The The content of video material is difficult to access by machines and subtitles are intended to be rendered by the multimedia player, al- e.g. eye tracking data may provide valuable hints for automatic pro- lowing to adapt the display style (color, size, position, etc.) to fit cessing. However, the lack of standardization makes it hard to com- the needs of the viewer. However, there are also tools to generate bine or exchange such eye tracking data, or to provide search across pre-rendered subtitles, e.g., in the VobSub format for players that do different repositories. not support USF. Standards for metadata abound and have become popular with Nowadays, the open source project USF [11] has become pri- the rise of the semantic web. The RDF standard [15] provides a vate, so the community does not have any influence on the develop- general format to make statements about resources, i.e., virtually ment. The latest version 1.1 has parts which are still under devel- everything that can be given a unique name. It comes with well- opment. Consequently, some parts—,e.g. the draw commands—are defined formal semantics and allows a distributed representation. incomplete. In addition to visualization of data on top of the video, However, it provides only a very limited predefined vocabulary, re- USF provides a comment tag which would allow storing additional quiring applications to extend the language for specific domains by information—not for display but for exchange. specifying schemes. By now, several of such schemes exist, but to Sub Station Alpha (SSA) is a file format for video subtitles, which our knowledge, no standard for the description of video material has been introduced with the subtitle editor Sub Station Alpha. It has been evolved. Videos feature a temporal and spatial structure, has been widely used by fansubbers and support has been imple- distinguishing them from most other types of data and requiring a mented for many multimedia players. The extended version V4.00+ special metadata framework. [14]—also known as Advanced SSA (ASS)—includes simple draw- rd The Continuous Media Markup Language (CMML)[12] is a for- ing commands, supporting straight lines, 3 degree bezier curves, rd mat for annotating time-continuous

Load more