Windows Vista Media Center Controlled by Speech
Total Page:16
File Type:pdf, Size:1020Kb
UNIVERSIDADE DE LISBOA Faculdade de Ciências Departamento de Informática WINDOWS VISTA MEDIA CENTER CONTROLLED BY SPEECH Mário Miguel Lucas Pires Vaz Henriques Mestrado em Engenharia Informática 2007 UNIVERSIDADE DE LISBOA Faculdade de Ciências Departamento de Informática WINDOWS VISTA MEDIA CENTER CONTROLLED BY SPEECH Mário Miguel Lucas Pires Vaz Henriques Projecto orientado pelo Prof. Dr. Carlos Teixeira e co-orientado pelo Prof. Dr. Miguel Sales Dias Mestrado em Engenharia Informática 2007 Acknowledgments I would like to thank… To Prof. Doutor Miguel Sales Dias for his orientation and supervision and for allowing my presence in the MLDC team; To Prof. Doutor Carlos Teixeira for his orientation and supervision during my project; To Eng. Pedro Silva for his amazing friendship, constant support and help; To Eng. António Calado by his constant help and support; To Sandra da Costa Neto for all the support and help executing the usability tests; To all other colleagues from MLDC; To all Microsoft colleagues that participated in the usability tests. i Contents Index of Figures ..................................................................... iv Acronyms ................................................................................ v 1. Introduction ....................................................................... 1 1.1 Introduction .............................................................................................................................. 1 1.2 Automatic Speech Recognition ................................................................................................ 2 1.3 Microsoft Windows Vista Media Center ................................................................................. 3 1.4 Host institution ......................................................................................................................... 4 1.5 Thesis problems identification ................................................................................................. 5 1.6 Solution approaches ................................................................................................................. 5 1.6.1 Automatic language detection ........................................................................................... 5 1.6.2 Acoustic Echo Cancelation ............................................................................................... 5 1.7 Main thesis goals ...................................................................................................................... 6 1.8 Chapter Organization ............................................................................................................... 7 1.9 Conclusion ............................................................................................................................... 8 2. User and System Requirements .......................................... 9 2.1 Introduction .............................................................................................................................. 9 2.2 Overall user requirements ........................................................................................................ 9 2.3 The Usage Scenario ............................................................................................................... 10 2.4 Detailed system requirements ................................................................................................ 10 2.4.1 Hardware Requirements .................................................................................................. 10 2.4.2 Application requirements ................................................................................................ 12 2.5 Conclusion ............................................................................................................................. 14 3. System Architecture and development ............................. 15 3.1 Technologies used .................................................................................................................. 15 3.1.1 Vista Media Center SDK .................................................................................................... 15 3.1.2 Vista Media Center Add-In ................................................................................................. 16 3.1.3 Media Player SDK and Media Database ............................................................................ 17 3.2 The Hardware Description ..................................................................................................... 17 3.3 The Media Center Add-in architecture .................................................................................. 18 3.3.1 Speech Recognition and Speech Synthesis ..................................................................... 19 3.3.2 Microphone Array ........................................................................................................... 20 3.3.3 Dynamic grammar .......................................................................................................... 21 3.4 Developed work ..................................................................................................................... 23 3.4.1 Multiple recognition engines problem ............................................................................ 23 3.4.2 EVA – Easy Voice Automation ...................................................................................... 23 3.5 Conclusion ............................................................................................................................. 24 4. Usability Evaluation .......................................................... 25 4.1 Introduction ............................................................................................................................ 25 4.2 The interfaces: Remote Control vs Speech ............................................................................ 25 4.2.1 Usability evaluation methodology .................................................................................. 25 4.2.2 Usability experiment ....................................................................................................... 27 4.2.3 Evaluation results and analysis ....................................................................................... 30 4.2.4 Problems in the system ................................................................................................... 31 4.2.5 Subject comments ........................................................................................................... 31 4.2.6 Time analysis .................................................................................................................. 33 4.2.7 Questionnaire and observed results ................................................................................ 33 4.3 Hoolie VS Speech Macros (Media Center Add-In) ............................................................... 36 4.3.1 Other usability evaluation ............................................................................................... 36 ii 4.3.2 Usability evaluation ........................................................................................................ 36 4.3.3 Users Comments ............................................................................................................. 37 4.3.4 Other issues found ........................................................................................................... 37 4.4 Conclusion ............................................................................................................................. 40 5. Conclusion & Future Work ............................................... 41 6. References ........................................................................ 42 7. Annexes ............................................................................ 43 iii Index of Figures Figure 1 – Media Center logo Figure 2 – MLDC logo Figure 3 – Media Center music menu Figure 4 – Media Center simple User Interface Figure 5 – MC remote control Figure 6 – The usage scenario Figure 7 - Media Center box Figure 8 – High definition TV Figure 9 – Linear four Microphone Array Figure 10 – Wireless headset microphone Figure 11 – Standard Desktop Microphone Figure 12 – The Media Center Add-In architecture Figure 13 – Dynamic Grammar Process Diagram Figure 14 - Speech interface Figure 15 – System help Figure 16 – Subjects performing their usability tests Figure 17 – Media Center Main Menu Figure 18 – Command cannot be performed Figure 19 - Options available in the Graphical User Interface Figure 20 - Options unavailable in the Graphical User Interface iv Acronyms AEC – Acoustic Echo Cancellation API – Application Programming Interface ASR – Automatic Speech Recognition EMEA – Europe, Middle-East and Africa ENG - English EVA – Easy Voice Automation HCI - Human-Computer Interface HMM – Hidden Markov Model LAN – Local Area Network MC – Microsoft Media Center MLDC – Microsoft Language Development Center PTG – European Portuguese SAPI – Microsoft Speech API SDK – Software Development Kit SR – Speech Recognition TTS – Text-to-Speech WMP – Windows Media Player v 1. Introduction 1.1 Introduction Automatic Speech recognition (ASR) and synthesis technologies are already present in the daily life of an increased number of people. Due to the potential of speech, as a natural Human- Computer Interface (HCI) modality, a large number of applications are adopting