SIGCHI Conference Paper Format

RichReview: Blending Ink, Speech, and Gesture to Support Collaborative Document Review Dongwook Yoon1, 2 Nicholas Chen1 François Guimbretière2 Abigail Sellen1 [email protected] [email protected] [email protected] [email protected] 1Microsoft Research 2Cornell University, Information Science 21 Station Road, Cambridge CB1 2FB, UK Ithaca, NY 14850 ABSTRACT This paper introduces a novel document annotation system that aims to enable the kinds of rich communication that usually only occur in face-to-face meetings. Our system, RichReview, lets users create annotations on top of digital documents using three main modalities: freeform inking, voice for narration, and deictic gestures in support of voice. RichReview uses novel visual representations and time- synchronization between modalities to simplify annotation access and navigation. Moreover, RichReview’s versatile support for multi-modal annotations enables users to mix and interweave different modalities in threaded conversations. A formative evaluation demonstrates early promise for the system finding support for voice, pointing, and the combination of both to be especially valuable. In addition, initial findings point to the ways in which both Figure 1. RichReview running on tablet. Hovering the content and social context affect modality choice. pen over screen leaves traces of gesture (blue blob on Author Keywords top). Inking can be done on expansion space (middle). Annotation; multi-modal input; voice; speech; pointing Voice recording is shown as waveform (bottom). gesture; pen interaction; collaborative authoring; asynchronous communication. are interleaved seamlessly and often support each other. It ACM Classification Keywords is therefore no surprise that many of the most important H.5.2. Information interfaces and presentation (e.g., HCI): discussions about documents occur in face-to-face meetings User Interfaces. [8,20]. INTRODUCTION One challenge of F2F meetings (and related techniques like The production of documents where collaborators video conferencing) is that they constrain collaborators to iteratively review and exchange feedback is fundamental to be co-present (temporally), which may not always be many workplace and academic practices. Effective possible or desirable. In response, people often collaborate communication of edits, questions and comments is central asynchronously over communication channels such as ink to the evolution of a document [27] as well as playing a role markup, textual annotations, and email. These techniques, in maintaining group dynamics [2]. compared to F2F interactions, undermine the production of Working face-to-face (F2F) has many advantages for these an implicit shared context and largely restrict kinds of collaborative processes. F2F collaborators enjoy a communications to a single modality. shared context in which they can verbally explain details In this paper, we describe RichReview, a document and gesture over documents with each other, often taking annotation system that brings some of the richness and notes as they do so. These different ways of communicating expressivity of F2F discussion to asynchronous Permission to make digital or hard copies of all or part of this work for collaboration scenarios. RichReview allows collaborators to personal or classroom use is granted without fee provided that copies are quickly produce and consume annotations consisting of not made or distributed for profit or commercial advantage and that copies voice, ink, and pointing gestures on top of ordinary PDF bear this notice and the full citation on the first page. Copyrights for documents. components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to On the creation side, RichReview introduces a unified, fast, post on servers or to redistribute to lists, requires prior specific permission and minimally intrusive set of interactions for creating and/or a fee. Request permissions from [email protected]. UIST '14, October 05 - 08 2014, Honolulu, HI, USA multi-modal annotations on text. The three different Copyright © 2014 ACM 978-1-4503-3069-5/14/10…$15.00. modalities that are supported can be freely interleaved and http://dx.doi.org/10.1145/2642918.2647390 combined. RichReview additionally leverages automatic Moreover, RichReview employs contemporary techniques speech recognition (ASR) to segment audio recordings at such as TextTearing [26] to alleviate issues with limited the word level to simplify the process of trimming or writing space on digital documents. cleaning up audio. Gestures On the consumption side, RichReview’s interface People often use gestures in a deictic role (i.e. pointing to streamlines navigation and access to the contents of these areas of interest) to streamline discussion around a rich annotations. Any visual representations of RichReview document [1]. Gestures help people establish a common annotations are recorded with time-stamps to support quick understanding and offer a shortcut to verbose verbal navigation. For instance, voice annotations are rendered as descriptions [4]. As such, they are an integral complement waveforms annotated with an ASR-generated transcript. to spoken language [13]. BoomChameleon [22] is one of This provides an interface to easily browse and access any the first systems to explore pointing gestures in the form of point in the audio stream. RichReview also leverages the the Flashlight tool which allowed users to refer to regions time-synchronization between voice, ink, and gesture of interest in 3D environments. RichReview employs a streams by providing users the ability to use one modality Spotlight tool to achieve similar functionality in textual to index into another. documents. A key difference between Spotlight and Flashlight is that Spotlight traces can be used as an index to We conducted a formative study investigating how people rapidly jump into the middle of an annotation. use (and do not use) RichReview features when discussing documents. RichReview’s support for ink, voice and Speech pointing gestures was widely used. Users also took Speech has been shown to be a uniquely strong medium for advantage of the freedom and flexibility that RichReview identifying high-level problems with a document. Because afforded by structuring and responding to annotations in a speaking is faster than writing or typing, it is an efficient variety of ways. Based on our results, we discuss design way to display complex concepts. Chalfonte and Kraut have implications for future implementations, including also shown that spoken annotation’s expressiveness and enhancing time indexing and making the system more richness are more suitable for describing structural or practical at scale. semantic issues in comparison with written annotation [3,8]. Furthermore, Neuwirth et al. found that speaking, RELATED WORK when compared with writing, generates more detailed RichReview has its roots in the Wang Freestyle system [10] explanations and nuance that can lead to better perceptions which pioneered the use of a combination of speech and ink of comments at the receiving end [16]. to annotate a document. It also builds on research that shows that combining modalities allows people to Despite these advantages, speech-based annotation is rarely communicate more efficiently and with more depth [19]. employed. Ethnographic studies of writing [18] make no However, RichReview’s support for annotation production mention of the use of audio commenting features available goes beyond existing work by capturing pointing gestures in word processing packages. One reasons may be that, as in addition to speech and ink. RichReview also provides Grudin has noted, speech is slower and more difficult to improved support for consuming annotations in the form of access than text, which can undermine its use in new visualizations and interactions. The contribution of collaborative applications [6]. this work, therefore, is to build on the work of others as we One way voice annotation systems have dealt with this shall outline below. At the same time we wish to broaden accessibility problem is by using ink strokes as navigational the flexibility and expressiveness with which annotations indices into an audio stream [21,24,25]. This approach is can be made without added complexity for the user. also used in commercial applications such as the LiveScribe Ink Pulse SmartPen and Microsoft OneNote. Another strategy Freeform ink annotations are pervasive and used is to use automatic speech recognition to produce a textual extensively for document work because they are fast to transcript to enable faster browsing and access to the create, can be interleaved with the reading process [20], and underlying audio [23]. Given the diversity of annotation are highly flexible in the information they represent [11]. strategies, RichReview includes both techniques to As a result, several annotation systems in the literature have maximize navigational flexibility: either ink or text can be employed ink as a primary modality. The collaborative used to browse and navigate through speech annotations. editor MATE [7] supported the use of ink both for low- This combined approach for accessing speech content is level editing commands as well as serving as a general similar to what is used in the NoteVideo [14] online lecture medium for communication. Similarly, the XLibris reading browser. It bears noting that all annotation elements, device, which supported

SIGCHI Conference Paper Format

Myth II 1.4.1 - Manual Install Guide

A Guide to the Josh Brandt Video Game Collection Worcester Polytechnic Institute

Halo Engine Ancestry Statistics

GAME DEVELOPERS a One-Of-A-Kind Game Concept, an Instantly Recognizable Character, a Clever Phrase— These Are All a Game Developer’S Most Valuable Assets

Indiana University Maurer School of Law-Bloomington Legal Studies Research Paper Series

Video Game Collection MS 17 00 Game This Collection Includes Early Game Systems and Games As Well As Computer Games

Destiny Grosses More Than $325 Million Worldwide in First Five Days

Exploring Skill Sets for Computer Game Design Programs Bungie, Inc

Myth II 1.7.2 Read Me

Norse Mythology in Video Games: Part of Immanent Nordic Regional Branding

An Analysis of the Hero Myth in Fromsoftware's Dark Souls Series

Reconstruction of Norse Myth in Videogames: the Case of God of War