Using X+V to Construct a Non-Proprietary Speech Browser for a Public-Domain Speechweb

University of Windsor Scholarship at UWindsor Electronic Theses and Dissertations Theses, Dissertations, and Major Papers 2006 Using X+V to construct a non-proprietary speech browser for a public-domain SpeechWeb Xiaoli Ma University of Windsor Follow this and additional works at: https://scholar.uwindsor.ca/etd Recommended Citation Ma, Xiaoli, "Using X+V to construct a non-proprietary speech browser for a public-domain SpeechWeb" (2006). Electronic Theses and Dissertations. 4488. https://scholar.uwindsor.ca/etd/4488 This online database contains the full-text of PhD dissertations and Masters’ theses of University of Windsor students from 1954 forward. These documents are made available for personal study and research purposes only, in accordance with the Canadian Copyright Act and the Creative Commons license—CC BY-NC-ND (Attribution, Non-Commercial, No Derivative Works). Under this license, works must always be attributed to the copyright holder (original author), cannot be used for any commercial purposes, and may not be altered. Any other use would require the permission of the copyright holder. Students may inquire about withdrawing their dissertation and/or thesis from this database. For additional inquiries, please contact the repository administrator via email ([email protected]) or by telephone at 519-253-3000ext. 3208. Using X+V to Construct a Non-Proprietary Speech Browser for a Public-Domain SpeechWeb By Xiaoli Ma A Thesis Submitted to the Faculty of Graduate Studies and Research through the School of Computer Science in Partial Fulfillment of the Requirements for the Degree of Master of Science at the University of Windsor Windsor, Ontario, Canada 2006 © 2006 Xiaoli Ma Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Library and Bibliotheque et Archives Canada Archives Canada Published Heritage Direction du Branch Patrimoine de I'edition 395 Wellington Street 395, rue Wellington Ottawa ON K1A 0N4 Ottawa ON K1A 0N4 C anada C a na da Your file Votre reference ISBN: 978-0-494-17031-1 Our file Notre reference ISBN: 978-0-494-17031-1 NOTICE: AVIS: The author has granted a non L'auteur a accorde une licence non exclusive exclusive license allowing Library permettant a la Bibliotheque et Archives and Archives Canada to reproduce, Canada de reproduire, publier, archiver, publish, archive, preserve, conserve, sauvegarder, conserver, transmettre au public communicate to the public by par telecommunication ou par I'lnternet, preter, telecommunication or on the Internet, distribuer et vendre des theses partout dans loan, distribute and sell theses le monde, a des fins commerciales ou autres, worldwide, for commercial or non sur support microforme, papier, electronique commercial purposes, in microform, et/ou autres formats. paper, electronic and/or any other formats. The author retains copyright L'auteur conserve la propriete du droit d'auteur ownership and moral rights in et des droits moraux qui protege cette these. this thesis. Neither the thesis Ni la these ni des extraits substantiels de nor substantial extracts from it celle-ci ne doivent etre imprimes ou autrement may be printed or otherwise reproduits sans son autorisation. reproduced without the author's permission. In compliance with the Canadian Conformement a la loi canadienne Privacy Act some supporting sur la protection de la vie privee, forms may have been removed quelques formulaires secondaires from this thesis. ont ete enleves de cette these. While these forms may be included Bien que ces formulaires in the document page count, aient inclus dans la pagination, their removal does not represent il n'y aura aucun contenu manquant. any loss of content from the thesis. i * i Canada Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ABSTRACT A SpeechWeb is a collection of hyperlinked speech applications that are distributed over the Internet. Users access the speech applications through remote browsers, which accept human-voice-input and return synthesized-voice-output. In previous research, a new architecture (LRRP) has been proposed, which is ideally suited for building a Public-Domain SpeechWeb. However, a non-proprietary speech browser is needed for this architecture. In this thesis, we have solved several limitations of X+V, a programming language for developing Multimodal applications, and we have used X+V to build a viable Public-Domain SpeechWeb browser. Our browser has the following properties: real-time human-machine speech interaction; ease of installation and use; acceptable speech-recognition accuracy in a suitable environment; no cost, non-proprietary, ease of distribution; use of common communication protocol - CGI; ease of creation of speech applications; possibility to deploy on mobile devices. [Keywords: SpeechWeb, Public-Domain, LRRP architecture, speech recognition, SpeechWeb browser, X+V, XHTML+Voice, Opera, Multimodal, distributed system, mobile speech application.] Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Acknowledgements I would like to thank my advisor Dr. Richard A. Frost for helping me to find this thesis topic, giving me invaluable guidance, encouragement, and generous help. I would also like to thank Dr. Kai Hildebrandt, Dr. Dan Wu and Dr. Jianguo Lu for reading my thesis report and giving me their valuable comments and suggestions. And a special thanks to my parents for their love and support. iv Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table of Contents Abstract iii Acknowledgements iv List of Tables ix List of Figures x Chapter 1: Introduction 1 1.1 A Public-Domain SpeechWeb 1 1.2 The LRRP Architecture 1 1.3 Shortcomings of VXML for a SpeechWebBrowser 2 1.4 X+V 3 1.5 Thesis Statement 3 1.6 Why proof of the thesis is important? 3 1.7 Why the thesis is not obvious? 4 1.8 How the thesis is proven by demonstration 4 Chapter 2: Requirements For a Public-Domain SpeechWeb Browser 6 2.1 Overview 6 2.2 Requirements for a Public-Domain SpeechWeb Browser 6 Chapter 3: The LRRP Architecture 9 3.1 Overview 9 3.2 Existing architectures 9 V Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.2.1 Speech Interfaces to Conventional HTML Web Pages 9 3.2.2 Networks of Hyperlinked VXML Pages 10 3.2.3 Telephone Access to Remote Speech Applications 11 3.3 The LRRP architecture 12 3.4 Advantages of LRRP architecture 13 Chapter 4: Problems in the Previous VXML SpeechWeb Browser 15 4.1 Overview 15 4.2 The VXML-Java SpeechWeb Browser structure. 16 4.3 Shortcomings of using VXML to build a SpeechWeb Browser 17 Chapter 5: A Novel Use of X+V to Create a Browser 19 5.1 Introduction to X+V 19 5.2 Limitations of X+V 20 5.3 New solution to some of the X+V limitations 21 5.3.1 Limitation 1: Some important VXML elements are missing, such as the ‘goto’ element 21 5.3.2 Limitation 2: No iterated dialog-control, or recursive methods 30 5.3.3 Limitation 3: Difficulties in using JavaScript, XHTML, and VXML objects together 32 5.4 Conclusion of X+V usability 40 Chapter 6: Investigation of X+V for a Browser 41 6.1 Overview 41 vi Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6.2 Single-Page Design 41 6.2.1 Components and workflow of Single-Page Design SpeechWeb Browser 42 6.2.2 Problems in Single-Page Design 44 6.2.3 Single-Page Design conclusion 44 6.3 Multiple-Page Design 44 6.3.1 Components of Multiple-Page Design SpeechWeb Browser 45 6.3.2 Multiple-Page Design structure 47 6.3.3 Comparison of a SpeechWeb Browser building as a single and multiple X+V pages 48 6.3.4 Multiple-Page Design conclusion 50 Chapter 7: The New SpeechWeb Browser 51 7.1 Overview 51 7.2 End user interface 51 7.3 A sample session user input/computer response 54 Chapter 8: Analysis of the New SpeechWeb Browser 57 8.1 Overview 57 8.2 Clarity in design 57 8.3 Ease of installation and use for end users 58 8.4 Ease of distribution 58 8.5 Use of common communication protocol 59 8.6 Ease of creation and deployment for SpeechWeb applications 59 v ii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 8.7 Capabilities 60 8.8 Efficiency and speed 60 8.8.1 Network communication cost 61 8.8.2 Speech-recognition accuracy 61 8.8.3 Time complexity 62 8.8.4 Speed 62 Chapter 9: Use, Implementation and Documentation 65 9.1 New SpeechWeb Browser user manual 65 9.2 Manual for creating SpeechWeb Applications 65 9.3 New SpeechWeb Browser Website 65 9.4 Program code 65 Chapter 10: Conclusions and Future Work 66 10.1 What has been achieved? 66 10.2 Suggestion for future work 67 Bibliography 68 Appendix I, SpeechWeb Browser User Manual 70 Appendix II, Developer Manual for Creating SpeechWeb Applications 79 Appendix III, New SpeechWeb Browser Website 90 Appendix IV, Program Code 91 VitaAuctoris 164 viii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. List of Tables Table 1: Comparison of Single-Page design with Multiple-Page Design 49 Table 2: SpeechWeb Browser Speed Experiment Result 64 ix Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. List of Figures Figure 1: The LRRP SpeechWeb Architecture 13 Figure 2: The structure of the Multiple-page Design 47 Figure 3: GUI of the SpeechWeb Browser Menu Page. 52 Figure 4: “Judy” is loaded 53 Figure 5: A Sample Conversation with Judy. 54 Figure 6: A Sample Conversation Session with SpeechWeb Applications 55 X Reproduced with permission of the copyright owner.

Using X+V to Construct a Non-Proprietary Speech Browser for a Public-Domain Speechweb

Genesys Voice Platform 7.6 Voicexml 2.1 Reference Manual

Bibliography of Erik Wilde

Voice and Multimodal Technology for the Mobile Worker Kondratova, Irina

An Architecture of Mobile Web 2.0 Context-Aware Applications in Ubiquitous Web

Dialogic Powermedia XMS Voicexml Reference Guide

Interactive Topographic Web Mapping Using Scalable Vector Graphics

Data Submission

Session 2: Markup Language Technologies

Voice and Multimodal Technology for the Mobile Worker

W3C TAG Perspectives

Xml Based Framework for Contact Center Applications

Voicexml (VXML) 2.0