User Concepts for In-Car Speech Dialogue Systems and Their Integration Into a Multimodal Human-Machine Interface
Total Page:16
File Type:pdf, Size:1020Kb
User Concepts for In-Car Speech Dialogue Systems and their Integration into a Multimodal Human-Machine Interface Von der Philosophisch-Historischen Fakultät der Universität Stuttgart zur Erlangung der Würde eines Doktors der Philosophie (Dr. phil.) genehmigte Abhandlung Vorgelegt von Sandra Mann aus Aalen Hauptberichter: Prof. Dr. Grzegorz Dogil Mitberichter: Apl. Prof. Dr. Bernd Möbius Tag der mündlichen Prüfung: 02.02.2010 Institut für Maschinelle Sprachverarbeitung der Universität Stuttgart 2010 3 Acknowledgements This dissertation developed during my work at Daimler AG Group Research and Advanced Engineering in Ulm, formerly DaimlerChrysler AG. Graduation was accomplished at the University of Stuttgart, Institute for Natural Language Processing (IMS) at the chair of Experimental Phonetics. I would like to thank Prof. Dr. Grzegorz Dogil from the IMS of the University of Stuttgart for supervising this thesis and supporting me in scientific matters. At this point I would also like to thank Apl. Prof. Dr. Bernd Möbius for being secondary supervisor. I also wish to particularly thank my mentors at Daimler AG, Dr. Ute Ehrlich and Dr. Susanne Kronenberg, for valuable advice on speech dialogue systems. They were always available to discuss matters concerning my research and gave constructive comments. Besides I would like to thank Paul Heisterkamp whose long-time experience in the field of human-machine interaction was very valuable to me. Special thanks go to all colleagues from the speech dialogue team, the recognition as well as the acoustics team. I very much acknowledge the good atmosphere as well as fruitful discussions, advice and criticism contributing to this thesis. I would especially like to mention Dr. André Berton, Dr. Fritz Class, Thomas Jersak, Dr. Dirk Olszewski, Marcel Dausend, Dr. Harald Hüning, Dr. Alfred Kaltenmeier and Alexandros Philopoulos. In this context I would like to add the Institute of Software Engineering and Compiler Construction from Ulm University, in particular Prof. Dr. Helmuth Partsch, Ulrike Seiter, Dr. Alexander Raschke and Carolin Hürster. Furthermore, I wish to thank the students having been involved into this thesis: Andreas Eberhardt, Tobias Staudenmaier and Steffen Rhinow. I also owe special appreciation to my parents, Gert and Elisabeth Mann, who enabled this work through their upbringing by constant encouragement and support. Above all, I want to thank my heavenly father who accompanied this work from start to finish. 4 5 Contents Acknowledgements........................................................................................................................ 3 Contents.......................................................................................................................................... 5 Abbreviations................................................................................................................................. 9 1 Introduction ........................................................................................................................... 13 1.1 Motivation ....................................................................................................................... 15 1.2 Goal ................................................................................................................................. 20 1.3 Outline ............................................................................................................................. 20 2 Multimodal dialogue systems ............................................................................................... 23 2.1 Speech in human-computer interaction ........................................................................... 24 2.2 Modality and multimodal interface ................................................................................. 25 2.3 Multimodal in-car dialogue systems................................................................................ 28 2.3.1 The control and display concept of the Mercedes-Benz S-Class.......................... 30 2.3.2 User study of the Linguatronic speech interface................................................... 33 2.4 The aspect of in-car human-computer interaction........................................................... 34 2.5 Investigations on driver distraction ................................................................................. 39 3 Communication...................................................................................................................... 47 3.1 Dialogue........................................................................................................................... 47 3.2 Discourse analysis........................................................................................................... 48 3.2.1 Conversational principles...................................................................................... 49 3.2.2 Speech acts............................................................................................................ 51 3.2.3 Presuppositions ..................................................................................................... 58 3.2.4 Deixis .................................................................................................................... 59 4 Human-computer communication with in-car speech dialogue systems ......................... 63 4.1 Architecture and functions of a speech dialogue system................................................. 63 4.2 Constraints of speech dialogue systems .......................................................................... 71 4.3 Collecting usability and speech data ............................................................................... 73 4.4 Designing the interface for usable machines................................................................... 76 4.4.1 Reusable dialogue components............................................................................. 76 6 4.4.2 Prompts ................................................................................................................. 81 4.4.3 Vocabulary ............................................................................................................ 83 4.4.4 Feedback from the system..................................................................................... 88 4.4.5 Help....................................................................................................................... 90 4.4.6 Spelling ................................................................................................................. 92 4.4.7 Voice enrolments .................................................................................................. 93 4.4.8 Barge-in................................................................................................................. 97 4.4.9 Error-recovery....................................................................................................... 98 4.4.10 Initiating speech dialogue ................................................................................... 101 4.4.11 Short cuts............................................................................................................. 103 4.4.12 Initiative .............................................................................................................. 104 4.4.13 Combining spoken and manual interaction......................................................... 105 5 Accessing large databases using in-car speech dialogue systems.................................... 109 5.1 State-of-the-art of in-car audio applications.................................................................. 111 5.1.1 Constraints........................................................................................................... 111 5.1.2 User needs........................................................................................................... 113 5.2 Interaction concepts for searching audio data ............................................................... 113 5.2.1 Category-based and category-free search ........................................................... 114 5.2.2 Fault-tolerant word-based search ........................................................................ 115 5.3 General requirements for the user interface .................................................................. 118 5.4 Prototype architecture.................................................................................................... 118 5.5 Verifying generating rules............................................................................................. 121 5.5.1 The data elicitation method................................................................................. 122 5.5.2 Results................................................................................................................. 126 5.6 Combined-category input.............................................................................................. 133 5.7 Application-independent approach................................................................................ 138 6 Conclusions........................................................................................................................... 147 6.1 Summary........................................................................................................................ 147 6.2 Future work...................................................................................................................