Voice Extensible Markup Language (Voicexml) Version 2.0

Voice Extensible Markup Language (VoiceXML) Version 2.0 Voice Extensible Markup Language (VoiceXML) Version 2.0 W3C Working Draft 24 April 2002 This Version: http://www.w3.org/TR/2002/WD-voicexml20-20020424/ Latest Version: http://www.w3.org/TR/voicexml20 Previous Version: http://www.w3.org/TR/2001/WD-voicexml20-20011023/ Editors: Scott McGlashan, PipeBeach <[email protected]> (Editor-in-Chief) Dan Burnett, Nuance Communications, <[email protected]> Peter Danielsen, Lucent <[email protected]> Jim Ferrans, Motorola <[email protected]> Andrew Hunt, SpeechWorks International <[email protected]> Gerald Karam, AT&T <[email protected]> Dave Ladd, Dynamicsoft <[email protected]> Bruce Lucas, IBM <[email protected]> Brad Porter, Tellme Networks <[email protected]> Ken Rehor, Nuance Communications <[email protected]> Steph Tryphonas, Tellme Networks <[email protected]> Copyright ©2002 W3C® (MIT, INRIA, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply. Abstract This document specifies VoiceXML, the Voice Extensible Markup Language. VoiceXML is designed for creating audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key input, recording of spoken input, telephony, and mixed-initiative conversations. Its major goal is to bring the advantages of web-based development and content delivery to interactive voice response applications. Status of this Document This is a W3C Last Call Working Draft for review by W3C Members and other interested parties. Last call means that the working group believes that this specification is ready and therefore wishes this to be the last call for comments. If the feedback is positive, the working group plans to submit it for consideration as a W3C Candidate Recommendation. Comments can be sent until the 24th of May, 2002. To find the latest version of this working draft, please follow the "Latest version" link above, or visit the list of W3C Technical http://www.w3.org/TR/voicexml20/ (1 of 171) [03/12/2002 9:13:12] Voice Extensible Markup Language (VoiceXML) Version 2.0 Reports. It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress". This specification describes markup for representing audio dialogs, and forms part of the proposals for the W3C Speech Interface Framework. This document has been produced as part of the W3C Voice Browser Activity, following the procedures set out for the W3C Process. The authors of this document are members of the Voice Browser Working Group (W3C Members only). This document is for public review, and comments and discussion are welcomed on the public mailing list <[email protected]>. To subscribe, send an email to <www-voice-request@w3. org> with the word subscribe in the subject line (include the word unsubscribe if you want to unsubscribe). The archive for the list is accessible online. The proposed XML-based media types used in this specification have been submitted to the IETF for registration. Please note that during the registration process, the proposed media types may be modified or removed. The Memorandum of Understanding between the W3C and the Voice XML Forum has paved the way for the publication of this working draft, with the VoiceXML Forum committing to abandoning trademark applications involving the name "VoiceXML". This document seeks Member and public comment on both the technical design and the patent licensing issues arising out of the disclosure and licensing statements that have been made. Our decision to publish this working draft does not imply that all questions of patent licensing have been resolved or clarified. They must be resolved or work on this document in W3C will stop. As things stand at the time of publication of this specification, implementations conforming to this specification may require royalty bearing licenses for essential IPR. Further information can be found in the patent disclosures page. The patent policy for W3C as a whole is under wide discussion. A set of commitments by all participants in the Voice Browser Activity to royalty free is a possibility for the future but has NOT been made at time of publication. Conventions of this Document In this document, the key words "must", "must not", "required", "shall", "shall not", "should", "should not", "recommended", "may", and "optional" are to be interpreted as described in [RFC2119] and indicate requirement levels for compliant VoiceXML implementations. Table of Contents Abbreviated Contents ● 1. Overview ● 2. Dialog Constructs ● 3. User Input ● 4. System Output ● 5. Control flow and scripting ● 6. Environment and Resources ● 7. Appendices Full Contents ● 1. Overview ❍ 1.1 Introduction ❍ 1.2 Background ■ 1.2.1 Architectural Model ■ 1.2.2 Goals of VoiceXML ■ 1.2.3 Scope of VoiceXML http://www.w3.org/TR/voicexml20/ (2 of 171) [03/12/2002 9:13:12] Voice Extensible Markup Language (VoiceXML) Version 2.0 ■ 1.2.4 Principles of Design ■ 1.2.5 Implementation Platform Requirements ❍ 1.3 Concepts ■ 1.3.1 Dialogs and Subdialogs ■ 1.3.2 Sessions ■ 1.3.3 Applications ■ 1.3.4 Grammars ■ 1.3.5 Events ■ 1.3.6 Links ❍ 1.4 VoiceXML Elements ❍ 1.5 Document Structure and Execution ■ 1.5.1 Execution within one Document ■ 1.5.2 Executing a Multi-Document Application ■ 1.5.3 Subdialogs ■ 1.5.4 Final Processing ● 2. Dialog Constructs ❍ 2.1 Forms ■ 2.1.1 Form Interpretation ■ 2.1.2 Form Items ■ 2.1.3 Form Item Variables and Conditions ■ 2.1.4 Directed Forms ■ 2.1.5 Mixed Initiative Forms ■ 2.1.6 Form Interpretation Algorithm ❍ 2.2 Menus ❍ 2.3 Form Items ■ 2.3.1 FIELD ■ 2.3.2 BLOCK ■ 2.3.3 INITIAL ■ 2.3.4 SUBDIALOG ■ 2.3.5 OBJECT ■ 2.3.6 RECORD ■ 2.3.7 TRANSFER ❍ 2.4 Filled ❍ 2.5 Links ● 3. User Input ❍ 3.1 Grammars ■ 3.1.1 Speech Grammars ■ 3.1.2 DTMF Grammars ■ 3.1.3 Scope of Grammars ■ 3.1.4 Activation of Grammars http://www.w3.org/TR/voicexml20/ (3 of 171) [03/12/2002 9:13:12] Voice Extensible Markup Language (VoiceXML) Version 2.0 ■ 3.1.5 Semantic Interpretation of Input ■ 3.1.6 Mapping Semantic Interpretation Results to VoiceXML forms ● 4. System Output ❍ 4.1 Prompt ■ 4.1.1 Speech Markup ■ 4.1.2 Basic Prompts ■ 4.1.3 Audio Prompting ■ 4.1.4 <value> Element ■ 4.1.5 Barge-in ■ 4.1.6 Prompt Selection ■ 4.1.7 Timeout ■ 4.1.8 Prompt Queueing and Input Collection ● 5. Control flow and scripting ❍ 5.1 Variables and Expressions ■ 5.1.1 Declaring Variables ■ 5.1.2 Variable Scopes ■ 5.1.3 Referencing Variables ■ 5.1.4 Standard Session Variables ■ 5.1.5 Standard Application Variables ❍ 5.2 Event Handling ■ 5.2.1 Throw ■ 5.2.2 Catch ■ 5.2.3 Shorthand Notation ■ 5.2.4 Catch Element Selection ■ 5.2.5 Default Catch Elements ■ 5.2.6 Event Types ❍ 5.3 Executable Content ■ 5.3.1 VAR ■ 5.3.2 ASSIGN ■ 5.3.3 CLEAR ■ 5.3.4 IF, ELSEIF, ELSE ■ 5.3.5 PROMPT ■ 5.3.6 REPROMPT ■ 5.3.7 GOTO ■ 5.3.8 SUBMIT ■ 5.3.9 EXIT ■ 5.3.10 RETURN ■ 5.3.11 DISCONNECT ■ 5.3.12 SCRIPT ■ 5.3.13 LOG http://www.w3.org/TR/voicexml20/ (4 of 171) [03/12/2002 9:13:12] Voice Extensible Markup Language (VoiceXML) Version 2.0 ● 6. Environment and Resources ❍ 6.1 Resource Fetching ■ 6.1.1 Fetching ■ 6.1.2 Caching ■ 6.1.3 Prefetching ■ 6.1.4 Protocols ❍ 6.2 Metadata Information ■ 6.2.1 META ■ 6.2.2 METADATA ❍ 6.3 Property ■ 6.3.1 Platform-Specific Properties ■ 6.3.2 Generic Speech Recognizer Properties ■ 6.3.3 Generic DTMF Recognizer Properties ■ 6.3.4 Prompt and Collect Properties ■ 6.3.5 Fetching Properties ■ 6.3.6 Miscellaneous Properties ❍ 6.4 Param ❍ 6.5 Time Designations ● 7. Appendices ❍ Appendix A. Glossary of Terms ❍ Appendix B. VoiceXML Document Type Definition ❍ Appendix C. Form interpretation Algorithm ❍ Appendix D. Timing Properties ❍ Appendix E. Audio File Formats ❍ Appendix F. Conformance ❍ Appendix G. Internationalization ❍ Appendix H. Accessibility ❍ Appendix I. Privacy ❍ Appendix J. Changes from VoiceXML 1.0 ❍ Appendix K. Reusability ❍ Appendix L. Acknowledgements ❍ Appendix M. References ❍ Appendix N. Media Type and File Suffix ❍ Appendix 0. Schema ❍ Appendix P. Builtin Grammar Types http://www.w3.org/TR/voicexml20/ (5 of 171) [03/12/2002 9:13:12] Voice Extensible Markup Language (VoiceXML) Version 2.0 1. Overview This document defines VoiceXML, the Voice Extensible Markup Language. Its background, basic concepts and use are presented in Section 1. The dialog constructs of form, menu and link, and the mechanism (Form Interpretation Algorithm) by which they are interpreted are then introduced in Section 2. User input using DTMF and speech grammars is covered in Section 3, while Section 4 covers system output using speech synthesis and recorded audio. Mechanisms for manipulating dialog control flow, including variables, events, and executable elements, are explained in Section 5. Environment features such as parameters and properties as well as resource handling are specified in Section 6. The appendices provide additional information including the VoiceXML Schema, a detailed specification of the Form Interpretation Algorithm and timing, audio file formats, and statements relating to conformance, internationalization, accessibility and privacy. Developers familar with VoiceXML 1.0 are particularly directed to Changes from Previous Public Version which summarizes how VoiceXML 2.0 differs from VoiceXML 1.0. 1.1 Introduction VoiceXML is designed for creating audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key input, recording of spoken input, telephony, and mixed-initiative conversations. Its major goal is to bring the advantages of web-based development and content delivery to interactive voice response applications.

Load more