W3C Multimodal Interaction Working Group. Multimodal Interaction.

Ingmar Kliche, January 30, 2012 Life is for sharing. W3C Multimodal Interaction Working Group. Multimodal interaction – scope definiton.

 Alternate or combined input and output modalities.  Allow users to dynamically select the most appropriate mode of interaction.  Context dependency (e.g. battery status, location, calendar, ...)  Adaptive to user preferences.  Adaptive to devices properties.

 Example used throughout this talk: composite input

 „I want to fly from to “ while tapping on a map  „Show me restaurants in area“ while drawing a circle on a map

Deutsche Telekom Laboratories 2 W3C Multimodal Interaction Working Group. W3C Multimodal Interaction Working Group. Overview.

 Launched 2002  24 member organizations (Deutsche Telekom, France Telecom, Microsoft, Loquendo, Genesys, Voxeo, Openstream, ...), 42 members  Mission: „... to extend the Web to allow users to dynamically select the most appropriate mode of interaction for their current needs ...“  Deliverables:  Multimodal Architecture and Interfaces Specification  EMMA  InkML  EmotionML  http://www.w3.org/2002/mmi

Deutsche Telekom Laboratories 4 W3C Multimodal Interaction Working Group. Multimodal Architecture and Interfaces.

 Inititial proposal X-HMTL+Voice (X+V)  http://www.w3.org/TR/xhtml+voice/  Problem: thight coupeling of HTML and VoiceXML (using XML-Events), no support for other modalities

 MMI working group developed a “Multimodal Interaction Framework” (2003)  “Multimodal Architecture and Interfaces specification” (currently in CR)  http://www.w3.org/TR/mmi-arch  Loosely coupled architecture  Allows for co-resident or distributed implementations  Leverages existing W3C standards (HTML, SVG, …)

Deutsche Telekom Laboratories 5 W3C Multimodal Architecture. Multimodal Interaction Framework.

 W3C Multimodal Interaction Framework (published 2003)

Deutsche Telekom Laboratories 6 W3C Multimodal Interaction Working Group. „Multimodal Architecture and Interfaces“ specification.

 Runtime Framework provides the basic infrastructure and controls communication among the constituents.  Interaction Manager (IM) coordinates modality Delivery Context Interaction Data components (MCs) by life-cycle events and Component Manager Component contains the shared data (context). Runtime Framework  Event-based communication between IM and

MCs. Modality Component API  http://www.w3.org/TR/mmi-arch Modality Modality Component 1 Component N

Deutsche Telekom Laboratories 7 W3C Multimodal Interaction Working Group. „Multimodal Architecture and Interfaces“.

 MMI lifecycle events  newContextRequest/Response  startRequest/Response Delivery Interaction Data  cancelRequest/Response Context Component Manager Component  prepareRequest/Response Runtime Framework  statusRequest/Response

 pauseRequest/Response Modality Component API  resumeRequest/Response Modality Modality  doneNotification Component 1 Component N  extensionNotification

 XML schmema for MMI lifecycle events

Deutsche Telekom Laboratories 8 Representing user input - EMMA. W3C Multimodal Interaction Working Group. EMMA 1.0.

 Goal: Annotation/representation of user input  Example: user utterance “flights from Boston to Denver“

Boston Denver

Austin Denver

Deutsche Telekom Laboratories 10 Fusion (integration) of combined input. Integrating multimodal input. „Show me

Recognition Interpretation Integration

grammar semmantic interpretation

EMMA speech interpretation

EMMA integration EMMA interaction ink interpretation manager manager

xmlns:emma="http://www.w3.org/2003/04/emma"> show point

Deutsche Telekom Laboratories 12 Integrating multimodal input. „Show me

Recognition Interpretation Integration

grammar semmantic interpretation

EMMA speech interpretation

EMMA integration EMMA interaction ink interpretation manager manager

xmlns:emma="http://www.w3.org/2003/04/emma"> point> 17 54

Deutsche Telekom Laboratories 13 Integrating multimodal input. „Show me

Speech Input Stylus Input

> show point point 17 54

Deutsche Telekom Laboratories 14 Integrating multimodal input. „Show me

Speech Input Stylus Input

> show point point 17 54

Integration

show point 17 54

Deutsche Telekom Laboratories 15 Integrating multimodal input. „Show me

 JavaScript library to demonstrate fusion of multimodal input (aka „integration“)

Deutsche Telekom Laboratories 16 W3C Multimodal Interaction Working Group. InkML1.0.

 Representation of Ink-Traces  InkML might be carried within EMMA

10 0, 9 14, 8 28, 7 42, 6 56, 6 70, 8 84, 8 98, 8 112, 9 126, 10 140, 13 154, 14 168, 17 182, 18 188, 23 174, 30 160, 38 147, 49 135, 58 124, 72 121, 77 135, 80 149, 82 163, 84 177, 87 191, 93 205 130 155, 144 159, 158 160, 170 154, 179 143, 179 129, 166 125, 152 128, 140 136, 131 149, 126 163, 124 177, 128 190, 137 200, 150 208, 163 210, 178 208, 192 201, 205 192, 214 180 227 50, 226 64, 225 78, 227 92, 228 106, 228 120, 229 134, 230 148, 234 162, 235 176, 238 190, 241 204 282 45, 281 59, 284 73, 285 87, 287 101, 288 115, 290 129, 291 143, 294 157, 294 171, 294 185, 296 199, 300 213 366 130, 359 143, 354 157, 349 171, 352 185, 359 197, 371 204, 385 205, 398 202, 408 191, 413 177, 413 163, 405 150, 392 143, 378 141, 365 150

Deutsche Telekom Laboratories 17 W3C Multimodal Interaction Working Group. EmotionML 1.0.

 Annotation of material involving emotionality, such as annotation of videos, of speech recordings, of faces, of texts, etc.  Automatic recognition of emotions from sensors, including physiological sensors, speech recordings.  Generation of emotion-related system responses.  http://www.w3.org/TR/emotionml/

 Example: Annotation of emotion, enclosed in an EMMA document

Deutsche Telekom Laboratories 18 Related work within W3C. W3C Multimodal Working Group. Voice Modality Component (VoiceXML).

W3C VoiceBrowser Working Group  Existing standards:  VoiceXML 2.1  CCXML 1.0  SRGS 1.0 and SISR 1.0  PLS 1.0 and SSML 1.0

 Work in progress:  SCXML 1.0  VoiceXML 3.0  External eventing  New features (e.g. speaker verification)  Fine-grained control of FIA by embedding SCXML in VoiceXML or VoiceXML in SCXML

* Note: VoiceXML 3 syntax not yet finalized, example shows principle.

Deutsche Telekom Laboratories 20 W3C Multimodal Interaction Working Group. VoiceXML 2.0/2.1 - example.

Say one of: sports; weather;

Stargazer astrophysics news. Welcome home. Say one of: H(uman): Astrology. C: I did not understand what Sports you said. (a platform- specific default message.) Weather C: Welcome home. Say one of: sports; weather; Stargazer Stargazer astrophysics news astrophysics news. H: sports. Please say one of C: (proceeds to start.vxml)

Deutsche Telekom Laboratories 21 W3C Multimodal Architecture. State Chart XML (SCXML).

“State Chart XML (SCXML): State Machine Notation for Control Abstraction ” (working draft status)

...

Deutsche Telekom Laboratories 22 W3C Multimodal Working Group. Voice Modality Component (VoiceXML 3.0).

Welcome. How would you like to look up your itinerary?

* Note: VoiceXML 3 syntax not yet finalized, example shows principle.

Deutsche Telekom Laboratories 23 W3C Multimodal Architecture. SCXML Editor (Eclipse Plugin).

 Eclipse plugin, developed as Diploma thesis by Dennis Biber.

Deutsche Telekom Laboratories 24 W3C Multimodal Architecture. SCXML implementations

 Apache Commons SCXML  Java implementation  http://commons.apache.org/scxml/

 Various voice platform providers, such as Voxeo (Prophecy voice platform) already have early SCMXL support

 JavaScript interpreter  SCXML-js: http://blog.echo-flow.com/tag/scxml-js/  Synergy SCXML lab: http://www.ling.gu.se/~lager/Labs/SCXML-Lab/

 C# library  http://www.nordover.de

Deutsche Telekom Laboratories 25 W3C Multimodal Architecture. SCXML used for Gesture control system.

 Gesture control system for “Entertain” based on Microsoft Kinect, developed as Diploma thesis by Alexander Lübeck.

Gesture SCXML HTTP recognition interpreter (((using (using interface library (NITE) Nordover librarylibrary) ))) library

Deutsche Telekom Laboratories 26 W3C HTML Speech API (Google proposal)

Deutsche Telekom Laboratories 27 DCCI and Delivery Context Ontology

 Delivery Context Client Interface (DCCI)  Provide web applications access to a hierarchy of dynamic properties representing device capabilities, configurations, user preferences and environmental conditions.  http://www.w3.org/TR/DPF/

 Delivery Context Ontology  provides a formal model of the characteristics of the environment (using OWL)

Deutsche Telekom Laboratories 28 W3C Device API Working Group. Useful APIs for multimodal applications.

 Sensor API  Generic API to access device sensors (from HTML)  http://dev.w3.org/2009/dap/system-info/Sensors.html

 Battery status API  Retrieve information about the battery status of a (mobile) device (from HTML)  http://www.w3.org/TR/battery-status/

 Vibration API  Provide access to vibration mechanism of the hosting device (from HTML)  http://www.w3.org/TR/vibration/

Deutsche Telekom Laboratories 29 W3C WebApps Working Group. Useful APIs for multimodal applications.

 Geolocation API  Get current geo location (longitude, latitude, altitude) from HTML  Independent from location provider (GPS, WiFi, Cell-Id, ...)  http://www.w3.org/TR/geolocation-API/

 Orientation API  Get current device orientation (e.g. tilt) from HTML  http://www.w3.org/TR/orientation-event/  iPhone example: „Move the ball“ http://ad.ag/wjmtgt

Deutsche Telekom Laboratories 30