Voicenotes: an Application for a Voice-Controlled Hand-Held Computer by Lisa Joy Stifelman
Total Page:16
File Type:pdf, Size:1020Kb
VoiceNotes: An Application for a Voice-Controlled Hand-Held Computer by Lisa Joy Stifelman B.S., Engineering Psychology Tufts University Medford, Massachusetts 1988 SUBMITTED TO THE MEDIA ARTS AND SCIENCES SECTION, SCHOOL OF ARCHITECTURE AND PLANNING, IN PARTIAL FULFILLMENT OF THE REQUIREMENTS OF THE DEGREE OF MASTER OF SCIENCE AT THE MASSACHUSETTS INSTITUTE OF TECHNOLOGY JUNE 1992 @Massachusetts Institute of Technology 1992 All Rights Reserved Signature of the Author 7/ / ' Media Arts and Sciences Section May 8, 1992 Certified by Christopher M. Schmandt Principal Research Scientist rl MIT Media Laboratory Accepted by Stephen A. Benton Chairperson Department Committee on Graduate Students jottifi MASSACHUSETTS INSTITUTE OF TECHNOLOGY AUG 0 6 1992 UBRARIES VoiceNotes: An Application for a Voice-Controlled Hand-Held Computer by Lisa Joy Stifelman SUBMITTED TO THE MEDIA ARTS AND SCIENCES SECTION, SCHOOL OF ARCHITECTURE AND PLANNING, ON MAY 8,1992 IN PARTIAL FULFILLMENT OF THE REQUIREMENTS OF THE DEGREE OF MASTER OF SCIENCE AT THE MASSACHUSETTS INSTITUTE OF TECHNOLOGY Abstract The ultimate computer has often been described as one you can always carry around with you, or wear like the "Dick Tracy watch," yet an appropriate interface to such a computer has not been designed. This thesis explores a user interface for a hand-held computer that employs speech input and output as the primary means of interaction instead of the traditional screen and keyboard. VoiceNotes is an application for creating and managing lists of recorded segments of the user's speech and was developed with the goal of exploring the design of voice interfaces to hand-held computers. Issues of input, navigation, management, and retrieval of structured audio data are addressed. Thesis Supervisor: Christopher M. Schmandt Title: Principal Research Scientist This work has been sponsored by Apple Computer, Inc. Thesis Committee Thesis Advisor Christopher M. Schmandt Principal Research Scientist MIT Media Laboratory Reader Eric A. Hulteen Human Interface Group/ATG Apple Computer, Inc. Reader Victor W. Zue Principal Research Scientist MIT Department of Electrical Engineering and Computer Science Table of Contents 1. Introduction .............................................................................................................................. 9 1.1 W hy a Voice Interface?................................................... ... ................................ 9 1.2 Research Challenges ............................................................................................ 10 1.2.1 Utility of Stored Voice as a Data Type for a Hand-held Computer......... 10 1.2.2 Role of Voice in a User Interface to a Hand-held Computer ................ 11 1.3 W hy Buttons?......................................................................................................... 11 1.4 Overview of the Docum ent .................................................................................. 12 1.5 D ocum ent Conventions.......................................................................................... 12 2. Background ............................................................................................................................. 13 2.1 Personal Recording Technology .......................................................................... 13 2.1.1 Apple's Augm ented Tape Recorder..................................................... 13 2.1.2 Voice M ail ........................................................................................... 14 2.2 V oice-only Interfaces ............................................................................................ 14 2.2.1 H yperphone .......................................................................................... 14 2.2.2 H yperspeech.......................................................................................... 15 2.3 V oice H and-held Com puters................................................................................ 16 2.3.1 Braille 'n Speak..................................................................................... 16 2.3.2 Pencorder............................................................................................... 18 2.4 N otetaking Program s............................................................................................ 19 2.4.1 NotePad................................................................................................. 19 2.4.2 ThoughtPattern..................................................................................... 19 2.4.3 D ayM aker ............................................................................................ 21 2.5 Sum m ary ............................................................................................................. 22 3. VoiceN otes Conceptual Design ......................................................................................... 23 3.1 V oiceNotes List Structure and Terminology ...................................................... 23 3.2 V oiceNotes: A Dem onstration............................................................................. 24 3.3 D escription of H and-held Prototype 1 .................................................................. 27 3.4 Taxonom y of V oice Notes .................................................................................... 30 3.4.1 Random ly Ordered Lists of V oice Notes ............................................. 30 3.4.2 Linear Lists of V oice Notes ................................................................... 31 3.4.3 Voice Notes Captured On-the-fly ......................................................... 32 4. VoiceN otes Interface Design .............................................................................................. 34 4.1 D esign Criteria...................................................................................................... 34 4.2 D esign of VoiceNotes Term inology ..................................................................... 35 4.2.1 N otes and N am es................................................................................... 35 4.2.2 The Notepad List.................................................................................. 36 4.3 Phase 1: M oded vs. Modeless ............................................................................... 36 4.3.1 VoiceNotes- Moded Design ................................................................ 37 4.3.2 VoiceNotes- Modeless D esign ........................................................... 41 4.3.3 Comparison of Moded and Modeless Designs ..................................... 42 4.4 Phase 2: M odeless List M anagement Interface.................................................... 44 4.4.1 Placem ent at Startup ............................................................................. 44 4.4.2 Recording Nam es.................................................................................. 45 4.4.3 Recording Notes.................................................................................. 46 4.4.4 N avigation............................................................................................ 46 4.4.5 Deleting/Undeleting ............................................................................. 48 4.4.6 M oving ................................................................................................. 49 4.5 Feedback ............................................................................................................... 51 4.5.1 D ialogue Design Criteria...................................................................... 51 4.5.2 Distinguishing the System's Voice from the User's Voice .................. 52 4.5.3 Recorded vs. Synthetic Speech ............................................................ 53 4.5.4 Feedback Levels.................................................................................. 53 4.5.5 Distinct Feedback.................................................................................. 55 5. Voice Input .................................................................................................. 57 5.1 Speech Recognition Technology........................................................................... 58 5.1.1 Isolated vs. Continuous Word Recognition .......................................... 58 5.1.2 Speaker Dependent vs. Speaker Independent Recognition.................. 58 5.1.3 Selecting a Recognizer for VoiceNotes .............................................. 58 5.2 VoiceNotes Vocabulary ...................................................................................... 59 5.2.1 Command W ords .................................................................................. 59 5.2.2 User-defined Name Words................................................................... 61 5.2.3 Targeting Style Commands................................................................... 61 5.3 Voice-only Training .............................................................................................. 63 5.4 Speech Recognition Issues .................................................................................... 64 5.4.1 Vocabulary Subsetting.......................................................................... 64 5.4.2 Recording vs. Recognizing .................................................................. 65 5.4.3 Output vs. Input .................................................................................... 66 5.4.4