Vocalocity

VoiceXML Implementation Reference Guide

Version 2.4.1 Vocalocity VoiceXML Implementation Reference Guide, Version 2.4.1

Copyright © 2003–2005. Vocalocity, Inc. All rights reserved. An unpublished work under US Copyright Laws.

Published June 2005

This document is protected by copyright. No part of this document may be used or reproduced in any form by any means without prior written authorization of Vocalocity, Inc. (“Vocalocity”) and its licensors, if any. This document contains information that may be protected by one or more US patents, foreign patents, or pending applications. This document is subject to the terms of the Vocalocity Evaluation Agreement and/or the Vocalocity Master Software License Agreement.

THIS DOCUMENT IS PROVIDED “AS IS” AND WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.

THIS DOCUMENT MAY CONTAIN TECHNICAL INACCURACIES OR TYPOGRAPHICAL ERRORS, AND VOCALOCITY MAKES NO REPRESENTATION OR WARRANTIES WITH RESPECT TO THE ACCURACY OR COMPLETENESS OF THE INFORMATION CONTAINED IN THIS DOCUMENT. CHANGES MAY BE ADDED PERIODICALLY TO THE INFORMATION CONTAINED HEREIN; THESE CHANGES WILL BE INCORPORATED IN NEW EDITIONS, VERSIONS OR RELEASES OF THIS DOCUMENT. VOCALOCITY MAY MAKE IMPROVEMENTS AND/OR CHANGES IN THE PRODUCT(S) AND/OR PROGRAM(S) DESCRIBED IN THIS DOCUMENT AT ANY TIME.

Vocalocity, the Vocalocity logo, and combinations thereof are trademarks of Vocalocity, Inc. in the United States and other countries. Other product names and brands used in this document are for identification purposes only, and are the trademarks and/or property of their respective owners. This notice does not evidence any actual or intended publication of this document.

For more information, contact us at [email protected].

Vocalocity, Inc. 730 Peachtree Street Suite 1100 Atlanta, GA 30308 USA +1.404.487.1200 http://www.vocalocity.com Contents

Preface: About This Guide

Introduction ...... vi Intended Audience ...... vi Version Information ...... vi

Using Documentation ...... vii Contents of This Guide...... vii Related Documentation ...... viii

Conventions ...... ix

Contacting Vocalocity Technical Support ...... x

Chapter 1: Introduction About Voice Browsers and VoiceXML ...... 1-2 Voice Browsers ...... 1-2 Supported Specifications ...... 1-3 Implementation of the VoiceXML Specifications...... 1-4

ASR Vendor Support of SRGS ...... 1-5 ASR Vendor Support of SISR ...... 1-5

Chapter 2: VoiceXML Element Summary VoiceXML Summary...... 2-2

SSML Summary ...... 2-7

SRGS Summary ...... 2-9

Detailed Implementation Notes...... 2-10

Chapter 3: Standard Types and Defaults Introduction ...... 3-2 Setting Vocalocity Browser Properties ...... 3-2 Setting Properties ...... 3-2 Setting Java System Properties ...... 3-2 Vocalocity Session Variables ...... 3-3

Vocalocity Property Defaults...... 3-4 Generic Speech Recognition Properties ...... 3-4 Generic DTMF Recognition Properties ...... 3-4 Prompt and Collection Properties ...... 3-5 Fetching Properties ...... 3-5 Miscellaneous Properties ...... 3-6 Custom Browser Properties ...... 3-7 Specifying the ASR or TTS Engine to Use...... 3-7 ASR Engines ...... 3-8 TTS Engines ...... 3-8 Defaults Used when No Engine Is Specified ...... 3-8 Audio and Initial Page Fetching ...... 3-9

MIME Type Mapping...... 3-10 Overriding a MIME Type ...... 3-11 SAX Parsers ...... 3-12

ECMAScript ...... 3-13 Accessing the log4j Logger ...... 3-13 Accessing Web Services ...... 3-13 Implementation Notes ...... 3-14 Bargein...... 3-14 Default Encoding ...... 3-14 DTMF-Only Applications ...... 3-14 Infinite Loop Detection ...... 3-15 Strict Content Type Processing ...... 3-15 Time Unit Designations ...... 3-15 Using a File-Based URL in Applications ...... 3-15

Chapter 4: SpeechWorks OSR Notes Introduction ...... 4-2

Application Name Used for OSR Logging ...... 4-3

SpeechWorks Recognizer Properties ...... 4-4

Endpointer Tuning ...... 4-5

Licensing Modes ...... 4-6

iv Vocalocity Voice Browser VoiceXML Implementation Reference Preface

About This Guide

The Vocalocity Voice Browser 2.4.1 fully conforms to the VoiceXML 2.0 and 2.1 specifications, and supports the Speech Recognition Grammar Specification (SRGS) and other related open standards. This reference guide provides additional detail for how the Vocalocity Voice Browser implements the standards.

The topics discussed in this guide include:

X Voice browsers and VoiceXML

X Supported specifications

X Implementation of VoiceXML elements

X Vocalocity defaults About This Guide

Introduction

The Vocalocity VoiceXML Implementation Reference Guide describes how the Vocalocity Voice Browser implements VoiceXML as described in:

X The W3C Recommendation 16 March 2004, Voice Extensible Markup Language (VoiceXML) 2.0

X The W3C Working Draft 28 July 2004, Voice Extensible Markup Language (VoiceXML) 2.1

The guide is not a programming guide; it: clarifies how Vocalocity has implemented the standards, where the requirements were ambiguous or where we have chosen to implement in a slightly different manner.

This guide is intended to provide an explanation of VoiceXML support in the Vocalocity Voice Browser. Use this guide along with the W3C VoiceXML specifications when developing applications.

Intended Audience

This guide should be used by:

X Application developers who are creating VoiceXML applications for the Vocalocity Voice Browser

X Technical personnel who are responsible for troubleshooting deployed applications

Version Information

The information in this guide is accurate for Version 2.4.1 of the Vocalocity Voice Browser.

It discusses Vocalocity Voice Browser’s implementation of VoiceXML 2.0 and VoiceXML 2.1.

vi Vocalocity Voice Browser VoiceXML Implementation Reference Guide Using Documentation

Using Documentation

This section outlines the structure of the VoiceXML Implementation Reference Guide and explains other guides in the documentation set and their intended audiences.

Contents of This Guide

This guide consists of four chapters. The following table describes each chapter.

Chapter or Appendix Description

Preface Introduces the structure of this guide, and explains how information is presented

Chapter 1, Introduction Provides some background on voice browsers, voice standards, and the Vocalocity VoiceXML Interpreter. Lists the specifications to which the VoiceXML Interpreter conforms.

Chapter 2, VoiceXML Element For each VoiceXML element, provides Summary additional detail for how the Vocalocity Voice Browser implements the VoiceXML standard.

Chapter 3, Standard Types and Lists the standard event types, session Defaults variables, and application variables in the Vocalocity Voice Browser VoiceXML implementation, and how to configure them.

Chapter 4, SpeechWorks OSR Contains implementation suggestions or Notes usage notes for using SpeechWorks OSR with the Vocalocity Voice Browser.

Vocalocity Voice Browser VoiceXML Implementation Reference Guide vii About This Guide

Related Documentation

There are several different guides to help you understand, implement and run the Vocalocity Voice Browser. The documentation set consists of the following guides.

Guide Description Intended Audiences

Vocalocity Voice Browser Contains hardware and software Anyone planning an implementation Installation Guide requirements for the Vocalocity Voice or installing Vocalocity Voice Browser, describes deployment Browser, Voice Browser options, and contains procedures for components, and Vocalocity tools installing and configuring Vocalocity software, third-party software, and hardware. Note: Operations information is included in the Control Center User’s Guide.

Vocalocity App Center User’s Describes how to build and deploy Voice application developers who Guide Vocalocity Voice Browser solutions are creating and publishing using Vocalocity App Center and the VoiceXML applications for their own Vocalocity Voice Browser use or for their customers

Vocalocity Control Center User’s Describes how to monitor Vocalocity Operations personnel performing Guide Voice Browser solutions ongoing maintenance of Vocalocity Voice Browser solutions

Vocalocity Info Center User’s Guide Describes how to gather call Support personnel responsible for information and use that information troubleshooting and supporting to support Vocalocity Voice Browser voice applications solutions

VoiceXML Implementation Describes how the Vocalocity Voice Application developers who are Reference Guide Browser implements VoiceXML 2.0 creating VoiceXML applications for and 2.1. the Vocalocity Voice Browser This guide should be used along with Technical personnel who are the W3C VoiceXML specifications responsible for troubleshooting when developing applications. deployed applications

viii Vocalocity Voice Browser VoiceXML Implementation Reference Guide Conventions

Conventions

The following table describes the typographical conventions used in this guide.

Convention Meaning

Monospace Indicates text that should be entered exactly as shown (including punctuation) or examples of code. Here is an example of a command line: # mkdir /somedir

Bold Type Indicates a path or the name of a program, process, procedure, routine, script, or table, such as ASSIGN

Italic Type Indicates a variable entry, such as , or a term being defined for the first time

Vocalocity Voice Browser VoiceXML Implementation Reference Guide ix About This Guide

Contacting Vocalocity Technical Support

There are many ways to contact Vocalocity Customer Support.

Contact us... At...

On the Web http://support.vocalocity.com The Vocalocity support website is available 24 hours. The website has integrated issue tracking functionality that allows customers to enter and track defects and enhancements to our software.

Via email [email protected] Your email goes directly to our technical support staff.

By telephone +1 404.487.1200

By mail Our corporate offices are located at: Vocalocity, Inc. 730 Peachtree Street Suite 1100 Atlanta, GA 30308 USA

x Vocalocity Voice Browser VoiceXML Implementation Reference Guide 1 Introduction

The Vocalocity Voice Browser is a voice browser, a packaged solution that integrates all the components necessary for a voice application system. The Vocalocity Voice Browser includes a VoiceXML Interpreter that reads and plays VoiceXML applications. This chapter provides some background on voice browsers, voice standards, and the Vocalocity VoiceXML Interpreter.

This chapter contains the following topics:

X About Voice Browsers and VoiceXML

X Implementation of the VoiceXML Specifications

X ASR Vendor Support of SRGS Introduction

About Voice Browsers and VoiceXML

Vocalocity is an active member of the W3C Voice Browser Working Group. The Working Group has defined a suite of markup languages covering dialog, , speech recognition, call control and other aspects of interactive voice response applications.

Vocalocity is one of the W3C Editors of VoiceXML 2.0, VoiceXML 2.1, CCXML 1.0 and SSML. Additionally, Vocalocity is a Board Member of the VoiceXML Forum and our Chief Architect, Ken Rehor, serves as the organization's Vice Chair.

For more information about the:

X W3C Voice Browser Working Group, go to www.w3c.org/Voice

X VoiceXML Forum, go to www.voicexml.org

Specifications such as the Speech Synthesis Markup Language (SSML), Speech Recognition Grammar Specification (SRGS), and Call Control XML (CCXML) are core technologies for describing speech synthesis (text-to- speech), recognition grammars (automatic speech recognition), and call control constructs respectively.

VoiceXML, or Voice eXtensible Markup Language, is a dialog markup language that leverages the other specifications for creating dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key (touch tone) input, recording of spoken input, telephony, and mixed initiative conversations.

VoiceXML is the HTML of the voice web, the open standard markup language for voice applications. Where HTML assumes a graphical with display, keyboard, and mouse, VoiceXML assumes a voice browser with audio output (recorded messages and TTS synthesis), audio input (ASR), and keypad input (DTMF).

Voice Browsers

A voice browser is a collection of software that works together to integrate and manage telephony, automatic speech recognition (ASR), text-to-speech (TTS), DTMF (touchtone), third-party or custom services, media, and other resources required to run VoiceXML applications.

The Vocalocity Voice Browser is a packaged solution that integrates all the components necessary for a voice application system. The Vocalocity Voice Browser includes a VoiceXML Interpreter that enables it to execute voice applications written in VoiceXML.

1-2 Vocalocity Voice Browser VoiceXML Implementation Reference Guide About Voice Browsers and VoiceXML

The Vocalocity VoiceXML Interpreter conforms to the VoiceXML 2.0 and VoiceXML 2.1 specifications and related specifications.

Supported Specifications

The Vocalocity Voice Browser supports the following W3C specifications.

Standard Description Specification

VoiceXML 2.0 Voice eXtended Markup Language W3C Recommendation 16 March 2004 Markup language used to create dialogs – www.w3.org/TR/voicexml20/ voice applications. The Vocalocity Voice Browser includes a VoiceXML interpreter that can render VoiceXML applications.

VoiceXML 2.1 Voice eXtended Markup Language W3C Working Draft 28 July 2004 www.w3.org/TR/voicexml21/

SSML 1.0 Speech Synthesis Markup Language W3C Recommendation 7 September 2004 SSML tags are used for TTS capabilities. www.w3.org/TR/speech-synthesis/ They are noted in the Implementation Notes in the following table.

SRGS 1.0 Speech Recognition Grammar W3C Recommendation 16 March 2004 Specification www.w3.org/TR/speech-grammar/ SRGS tags are used for ASR capabilities, for example, to specify a grammar.

SISR 1.0 Semantic Interpretation for Speech W3C Working Draft 8 November 2004 Recognition www.w3.org/TR/semantic-interpretation/ The SRGS element provides a placeholder for instructions to a semantic processor.

Vocalocity Voice Browser VoiceXML Implementation Reference Guide 1-3 Introduction

Implementation of the VoiceXML Specifications

The Vocalocity VoiceXML Interpreter conforms to all required elements in the VoiceXML 2.0 and 2.1 specifications. However, there are elements where the implementation of attributes has been left to the interpreter. This guide describes how Vocalocity has implemented the standard. It should be used along with the W3C VoiceXML specifications when developing applications.

X Support of built-in VoiceXML grammars is dependent on the ASR vendor implementation. For a list of vendors and supported SRGS versions, see ASR Vendor Support of SRGS on page 1-5.

X Support of semantic interpretation is dependent on the ASR vendor. For a list of vendors and supported SRGS versions, see ASR Vendor Support of SISR on page 1-5.

X Support for SSML is dependent on the TTS vendor.

1-4 Vocalocity Voice Browser VoiceXML Implementation Reference Guide ASR Vendor Support of SRGS

ASR Vendor Support of SRGS

The version of the SRGS supported also depends on the speech recognition vendor your implementation uses.

ASR Vendor Supported Specification

ScanSoft SpeechWorks SRGS 1.0 OSR 3.0 W3C Proposed Recommendation 18 December 2003 http://www.w3.org/TR/2003/PR-speech-grammar- 20031218/

ScanSoft SpeechWorks SRGS 1.0 OSR 2.0 W3C Candidate Recommendation 26 June 2002 http://www.w3.org/TR/2002/CR-speech-grammar- 20020626/

LumenVox SRE 5.5 SRGS 1.0 W3C Recommendation 16 March 2004 http://www.w3.org/TR/speech-grammar/

Nuance Speech SRGS 1.0 Recognition System 8.0.0 W3C Working Draft 20 August 2001 www.w3.org/TR/2001/WD-speech-grammar- 20010820/

ASR Vendor Support of SISR

The version of the SISR specification supported also depends on the speech recognition vendor your implementation uses.

ASR Vendor Supported Specification

ScanSoft SpeechWorks SISR 1.0 OSR 3.0 W3C Working Draft 8 November 2004 http://www.w3.org/TR/semantic-interpretation/

ScanSoft SpeechWorks SISR 1.0 OSR 2.0

LumenVox SRE 5.5 SISR 1.0

Vocalocity Voice Browser VoiceXML Implementation Reference Guide 1-5 Introduction

ASR Vendor Supported Specification

Nuance Speech SISR 1.0 Recognition System 8.0.0

1-6 Vocalocity Voice Browser VoiceXML Implementation Reference Guide 2 VoiceXML Element Summary

This chapter explains how the Vocalocity Voice Browser implements the VoiceXML 2.0 and 2.1 standards. It identifies areas of clarification in cases where the specifications had ambiguous requirements or where the implementation has been left to the vendor.

This chapter contains the following topics:

X VoiceXML Summary

X SSML Summary

X SRGS Summary

X Detailed Implementation Notes VoiceXML Element Summary

VoiceXML Summary

The following table is a summary of the current VoiceXML elements supported in this release of the Vocalocity Voice Browser.

Element Purpose Implementation Notes

Assign a variable a value Implemented as defined in VoiceXML 2.0.

A container of (non-interactive) executable Implemented as defined in VoiceXML 2.0. code

Control the pausing or other prosodic Implemented as defined in VoiceXML 2.0. boundaries between words

Catch an event Implemented as defined in VoiceXML 2.0.

Define a menu item or specify a speech or Up to 100 tags are supported for DTMF grammar, each menu. Exactly one of “next”, “expr”, “event” or “eventexpr” must be specified; otherwise, an error.badfetch event is thrown. Exactly one of “message” or “messageexpr” may be specified; otherwise, an error.badfetch event is thrown.

Clear one or more form item variables Implemented as defined in VoiceXML 2.0

Allows a VoiceXML application to fetch XML New in VoiceXML 2.1 data from a document server without A Java system property – transitioning to a new VoiceXML document vocalos.vxml.data.access_control.allow – can be set to configure the default behavior if the returned XML content does not contain the access-control XML processing instruction. See Element on page 2- 10.

Disconnect a session The namelist values are passed to the TEP implementation for further processing. Upon processing of the prompt queue will be flushed before sending the hangup command to the TEP.

Used in elements Implemented as defined in VoiceXML 2.0.

Used in elements Implemented as defined in VoiceXML 2.0.

2-2 Vocalocity Voice Browser VoiceXML Implementation Reference Guide VoiceXML Summary

Element Purpose Implementation Notes

Shorthand for enumerating the choices in a Implemented as defined in VoiceXML 2.0. menu. An automatically generated description specifies a template that is of the choices available to the user. applied to each choice in the order they appear in the menu. If it is used with no content, the VoiceXML interpreter uses the following template:

Enter “dtmf-digit” for “text”

Catch an error event Implemented as defined in VoiceXML 2.0. An abbreviation for element

Exit a session and returns control to the The namelist values are passed to the TEP interpreter context which determines what to implementation for further processing. Upon do next processing the the prompt queue will be flushed before sending the hangup command to the TEP.

Declares an input field in a form – an input to Implemented as defined in VoiceXML 2.0 be gathered from the user

An action executed when fields are filled Implemented as defined in VoiceXML 2.0

Allows a VoiceXML application to iterate New in VoiceXML 2.1 through an ECMAScript array and to execute Implemented as defined in VoiceXML 2.1 the content contained within the element for each item in the array

A dialog for presenting information and Implemented as defined in VoiceXML 2.0 collecting data

Go to another dialog in the same or different Implemented as defined in VoiceXML 2.0 document

Specify a speech recognition or DTMF Implemented as defined in VoiceXML 2.0 grammar

Catch a help event Implemented as defined in VoiceXML 2.0. An abbreviation for element Grammars generated for all system level prompts such as “help” and “exit” are generated in US English only.

Simple conditional logic Implemented as defined in VoiceXML 2.0

Declares initial logic upon entry into a (mixed Implemented as defined in VoiceXML 2.0 initiative) form

Vocalocity Voice Browser VoiceXML Implementation Reference Guide 2-3 VoiceXML Element Summary

Element Purpose Implementation Notes

Specify a transition common to all dialogs in Implemented as defined in VoiceXML 2.0 the link's scope

Generate a debug message Implemented as defined in VoiceXML 2.0 An application can log the content of the log tag to a Log4J category. For more information, see Element on page 2-11. For SpeechWorks OSR, any log tag starting with SWI will be logged to the SPWK log.

A dialog for choosing amongst alternative Implemented as defined in VoiceXML 2.0 destinations

Define a metadata item as a name/value pair The following meta properties are supported: „ Expires (http-equiv) „ Pragma (http-equiv) „ Cache-Control (http-equiv) For more information, see Element on page 2-11.

Define metadata information using a This element is supported, but not used by metadata schema the Vocalocity Voice Browser.

Catch a noinput event Implemented as defined in VoiceXML 2.0 An abbreviation for element

Catch a nomatch event Implemented as defined in VoiceXML 2.0 An abbreviation for element

Interact with a custom extension Use the Vocalocity Object API to register a custom object implementation. If you use this tag without registering the object, an error.object.notsupported event will be thrown.

Parameter in or Implemented as defined in VoiceXML 2.0

Queue speech synthesis and audio output to The attribute bargeintype is not supported. the user Recognition-based bargein is not currently supported. Regardless of the value of the bargeintype attribute, the browser will always use energy. For information about bargein support, see Bargein on page 3-14.

2-4 Vocalocity Voice Browser VoiceXML Implementation Reference Guide VoiceXML Summary

Element Purpose Implementation Notes

Control implementation platform settings The Vocalocity Voice Browser sets default (such as recognition process, timeouts, properties, if not specified by . For caching policy, etc.) more information, see Vocalocity Property Defaults on page 3-4.

Record an audio sample The timeout attribute is ignored (the VoiceXML interpreter will not throw a event if timeout is exceeded before recording begins). Use finalsilence attribute to set interval that indicates end of speech to record.

Play a field prompt when a field is re-visited Implemented as defined in VoiceXML 2.0 after an event

End execution of a subdialog and return Implemented as defined in VoiceXML 2.0 control and data to the calling dialog