Beyond the Form Interpretation Algorithm

Beyond the Form Interpretation Algorithm

Beyond the Form Interpretation Algorithm Towards flexible dialog management for conversational voice and multi-modal applications Rahul P. Akolkar IBM Watson Research Center P.O. Box 704 Yorktown Heights, N.Y. 10598 Tel.: 1 914 784 6174 [email protected] ABSTRACT 1. INTRODUCTION The dialog or voice user interface choices available to ap- The creation of voice user interfaces for conversational appli- plication developers in the present version of VoiceXML are cations provides a number of challenges, such as the devel- largely limited to the capabilities of the Form Interpretation opment of sophisticated grammars and semantics, language Algorithm (FIA) combined with dynamic server-side genera- models for natural language processing and dynamic gener- tion of VoiceXML. This position paper discusses several im- ation of the voice user interface. Beyond server-side process- provements aimed at providing flexible dialog management ing for generating dynamic VoiceXML, the present version of in VoiceXML. The notion of recursive transition controller- VoiceXML provides limited capabilities to describe dialogs s at varying dialog granularity enables the presentation of for conversational applications. The Form Interpretaion Al- flexible interfaces for conversational applications. As an ex- gorithm (FIA), specified as Appendix C of the VoiceXML ample, the use of State Chart XML (SCXML) to declarative- 2.0 specification [5] provides the default sequence of process- ly describe such transition controllers is illustrated. These ing fields within a form, with some level of control via guard improvements are demonstrated to be backwards compati- conditions and flow control statements. ble with the latest version of VoiceXML (v2.1). Finally, the paper introduces the use of DOM events to manipulate the In order to provide dialog flexibility beyond what the FIA of- voice user interface, enabling compound namespace usecases fers, a recursive Model-View-Controller (MVC) pattern may where voice may be used with other modalities in the same be employed. The basis of the resulting dialog managemen- document. Many of these improvements are under consid- t pattern is the existence of transition controllers at vary- eration for the next version of VoiceXML (v3.0). ing levels of granularity. In particular, such transition con- trollers may exist at all the levels of granularity or scope, viz. Categories and Subject Descriptors application, session, document and form. These transition I.7.2 [Document and Text Processing]: Document Prepa- controllers may be authored using a number of languages ration—Markup languages, Standards; H.5.2 [Information or notations. The subsequent section illustrates this using Interfaces and Presentation (e.g., HCI)]: User Inter- State Chart XML (abbreviated as SCXML) [3], though tran- faces—Interaction styles (e.g., commands, menus, forms, di- sition controllers may use other notations as well. rect manipulation); H.5.3 [Information Interfaces and Presentation (e.g., HCI)]: Group and Organization In- 2. FLEXIBLE DIALOG MANAGEMENT terfaces—Web-based Interaction Being a general purpose controller notation, SCXML is suit- able for describing the controllers for managing the dialog in General Terms VoiceXML 3.0 applications. Such SCXML controllers may Languages, Standardization be placed at application, session, document and form levels. This is illustrated in this section using examples of resulting compound documents containing VoiceXML 3.0 (abbreviat- Keywords ed as V3) and SCXML namespaced elements. VoiceXML, SCXML, web application, speech application, voice user interfaces For the examples in this section, we use a travel itinerary modification scenario. The first order of business is to re- trieve an itinerary, which may be identified using either a record locator or traveler’s lastname and other information. The use of an SCXML-based form level transition controller World Wide Web Consortium - First Workshop on Con- is illustrated using two flavors, viz. a system-driven and a versational Applications user-driven approach. Both use similar set of fields in the Use Cases and Requirements for New Models of Human form but different dialog management styles. While in this Language to Support Mobile Conversational Systems 18-19 June 2010 simple example, the voice user interface (VUI) may appear Somerset, NJ, US similar, but has distinct system vs. user driven flavors. The communication between the controller and the V3 fields 2.2 User-driven Dialog is discussed in detail in Section 5. <v3:form> <scxml:scxml initial="choose"> 2.1 System-driven Dialog <v3:form> <scxml:state id="choose"> <scxml:scxml initial="choose"> <scxml:onentry> <scxml:send target="#choice" <scxml:state id="choose"> type="DOM" event="DOMActivate"/> <scxml:onentry> </scxml:onentry> <scxml:send target="#choice" <scxml:transition event="filled.choice" type="DOM" event="DOMActivate"/> cond="choice == ’locator’" </scxml:onentry> target="record"/> <scxml:transition event="filled.choice" <scxml:transition event="filled.choice" cond="choice" target="record"/> cond="choice == ’lastname’" <scxml:transition event="filled.choice" target="name"/> cond="!choice" target="name"/> </scxml:state> </scxml:state> <scxml:state id="record"> <scxml:state id="record"> <!-- Same contents as system-driven dialog --> <scxml:onentry> </scxml:state> <scxml:send target="#locator" type="DOM" event="DOMActivate"/> <scxml:state id="name"> </scxml:onentry> <!-- Same contents as system-driven dialog --> <!-- Retrieve record, transition to app menu --> </scxml:state> </scxml:state> <!-- Remaining dialog control logic omitted --> <scxml:state id="name"> </scxml:scxml> <scxml:onentry> <scxml:send target="#lastname" <v3:field id="choice"> type="DOM" event="DOMActivate"/> <v3:grammar src="choice.grxml" </scxml:onentry> type="application/srgs+xml"/> <!-- Collect other information needed to get <v3:prompt> record, then retrieve record, go to menu --> Welcome. How would you like to </scxml:state> look up your itinerary? <v3:prompt> <!-- Remaining dialog control logic omitted --> </v3:field> </scxml:scxml> <v3:field id="locator"> <v3:field id="choice"> <!-- Same contents as system-driven dialog --> <v3:grammar src="boolean.grxml" </v3:field> type="application/srgs+xml"/> <v3:prompt> <v3:field id="lastname"> Welcome. Do you have the record locator <!-- Same contents as system-driven dialog --> for your itinerary? </v3:field> <v3:prompt> <!-- Implicit filled.choice event on filled --> <!-- Other form items, such as the subsequent </v3:field> application menu omitted --> </v3:form> <v3:field id="locator"> <v3:grammar src="locator.grxml" The transition controller captures the complete model of the type="application/srgs+xml"/> correspondng dialog rather than a more distributed defini- <v3:prompt>What is the record locator?<v3:prompt> tion using guards and flow control statements on presenta- </v3:field> tion elements, and thereby, as illustrated in the examples above, allows for making localized changes to the controller <v3:field id="lastname"> logic to manipulate the voice user interface. <v3:grammar src="lastname.grxml" type="application/srgs+xml"/> In the user driven dialog above, a more sophisticated gram- <v3:prompt>Please say your last name.<v3:prompt> mar (choice.grxml for the choice field rather than the sim- </v3:field> pler boolean.grxml used in the system driven dialog) is used along with the corresponding new semantic interpretation <!-- Other form items, such as the subsequent in the guard conditions of the transitions which decide the application menu omitted --> next fields in the form, to illustrate a different dialog using </v3:form> essentially the same set of fields within the form. To ensure backwards compatibility with VoiceXML 2.0 and Further, similar transition controllers may be authored at 2.1 applications, a graceful degradation approach is suggest- the application and session levels to provide flexible dialog ed that causes behavior to fallback to the FIA in the absence management across many levels of user interface granularity. of an explicit transition controller. 5. DOM EVENTS AS THE GLUE 3. GRACEFUL DEGRADATION The glue that enables the interaction between the transition controller namespace (such as SCXML) and the VoiceXML The absence of an explicit transition controller (such as the 3.0 presentation elements is DOM events [6]. The function- <scxml:scxml> child element) a <v3:form> should revert be- ality provided by the voice user interface presentation ele- havior to be identical to a <v2:form> element where the V2 ments such as <v3:field> is exposed via DOM interfaces, Form Interpretation Algorithm would be in charge. In the and the elemental function may be invoked by dispatching presence of a <scxml:scxml> child, the FIA should be sup- a DOMActivate event with the element as the target. pressed and the more expressive SCXML controller used, which then allows application developers to design the for- As seen in the examples in Section 2, this results in the m VUI in a flexible manner. In other words, the following following pattern to invoke the functionality of a VoiceXML <v3:form> below: field when a particular state in the transition controller state machine is entered. <v3:form> <scxml:state id="state-id"> <!-- No transition controller child --> <scxml:onentry> <!-- Various form items etc. --> <scxml:send target="#field-id" type="DOM" event="DOMActivate"/> </v3:form> </scxml:onentry> should behave as would: <!-- Rest of state content, such as transitions --> <v2:form> </scxml:state>

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    4 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us