W3C Multimodal Interaction Working Group. Multimodal Interaction.
Ingmar Kliche, January 30, 2012 Life is for sharing. W3C Multimodal Interaction Working Group. Multimodal interaction – scope definiton.
Alternate or combined input and output modalities. Allow users to dynamically select the most appropriate mode of interaction. Context dependency (e.g. battery status, location, calendar, ...) Adaptive to user preferences. Adaptive to devices properties.
Example used throughout this talk: composite input
„I want to fly from
Deutsche Telekom Laboratories 2 W3C Multimodal Interaction Working Group. W3C Multimodal Interaction Working Group. Overview.
Launched 2002 24 member organizations (Deutsche Telekom, France Telecom, Microsoft, Loquendo, Genesys, Voxeo, Openstream, ...), 42 members Mission: „... to extend the Web to allow users to dynamically select the most appropriate mode of interaction for their current needs ...“ Deliverables: Multimodal Architecture and Interfaces Specification EMMA InkML EmotionML http://www.w3.org/2002/mmi
Deutsche Telekom Laboratories 4 W3C Multimodal Interaction Working Group. Multimodal Architecture and Interfaces.
Inititial proposal X-HMTL+Voice (X+V) http://www.w3.org/TR/xhtml+voice/ Problem: thight coupeling of HTML and VoiceXML (using XML-Events), no support for other modalities
MMI working group developed a “Multimodal Interaction Framework” (2003) “Multimodal Architecture and Interfaces specification” (currently in CR) http://www.w3.org/TR/mmi-arch Loosely coupled architecture Allows for co-resident or distributed implementations Leverages existing W3C standards (HTML, SVG, …)
Deutsche Telekom Laboratories 5 W3C Multimodal Architecture. Multimodal Interaction Framework.
W3C Multimodal Interaction Framework (published 2003)
Deutsche Telekom Laboratories 6 W3C Multimodal Interaction Working Group. „Multimodal Architecture and Interfaces“ specification.
Runtime Framework provides the basic infrastructure and controls communication among the constituents. Interaction Manager (IM) coordinates modality Delivery Context Interaction Data components (MCs) by life-cycle events and Component Manager Component contains the shared data (context). Runtime Framework Event-based communication between IM and
MCs. Modality Component API http://www.w3.org/TR/mmi-arch Modality Modality Component 1 Component N
Deutsche Telekom Laboratories 7 W3C Multimodal Interaction Working Group. „Multimodal Architecture and Interfaces“.
MMI lifecycle events newContextRequest/Response startRequest/Response Delivery Interaction Data cancelRequest/Response Context Component Manager Component prepareRequest/Response Runtime Framework statusRequest/Response
pauseRequest/Response Modality Component API resumeRequest/Response Modality Modality doneNotification Component 1 Component N extensionNotification
XML schmema for MMI lifecycle events
Deutsche Telekom Laboratories 8 Representing user input - EMMA. W3C Multimodal Interaction Working Group. EMMA 1.0.
Goal: Annotation/representation of user input Example: user utterance “flights from Boston to Denver“
Deutsche Telekom Laboratories 10 Fusion (integration) of combined input. Integrating multimodal input. „Show me
Recognition Interpretation Integration
grammar semmantic interpretation
EMMA speech interpretation
EMMA integration EMMA interaction ink interpretation manager manager
Deutsche Telekom Laboratories 12 Integrating multimodal input. „Show me
Recognition Interpretation Integration
grammar semmantic interpretation
EMMA speech interpretation
EMMA integration EMMA interaction ink interpretation manager manager
Deutsche Telekom Laboratories 13 Integrating multimodal input. „Show me
Speech Input Stylus Input
Deutsche Telekom Laboratories 14 Integrating multimodal input. „Show me
Speech Input Stylus Input
Integration
Deutsche Telekom Laboratories 15 Integrating multimodal input. „Show me
JavaScript library to demonstrate fusion of multimodal input (aka „integration“)
Deutsche Telekom Laboratories 16 W3C Multimodal Interaction Working Group. InkML1.0.
Representation of Ink-Traces InkML might be carried within EMMA
Deutsche Telekom Laboratories 17 W3C Multimodal Interaction Working Group. EmotionML 1.0.
Annotation of material involving emotionality, such as annotation of videos, of speech recordings, of faces, of texts, etc. Automatic recognition of emotions from sensors, including physiological sensors, speech recordings. Generation of emotion-related system responses. http://www.w3.org/TR/emotionml/
Example: Annotation of emotion, enclosed in an EMMA document
Deutsche Telekom Laboratories 18 Related work within W3C. W3C Multimodal Working Group. Voice Modality Component (VoiceXML).
W3C VoiceBrowser Working Group Existing standards: VoiceXML 2.1 CCXML 1.0 SRGS 1.0 and SISR 1.0 PLS 1.0 and SSML 1.0
Work in progress: SCXML 1.0 VoiceXML 3.0 External eventing New features (e.g. speaker verification) Fine-grained control of FIA by embedding SCXML in VoiceXML or VoiceXML in SCXML
* Note: VoiceXML 3 syntax not yet finalized, example shows principle.
Deutsche Telekom Laboratories 20 W3C Multimodal Interaction Working Group. VoiceXML 2.0/2.1 - example.
Deutsche Telekom Laboratories 21 W3C Multimodal Architecture. State Chart XML (SCXML).
“State Chart XML (SCXML): State Machine Notation for Control Abstraction ” (working draft status)
Deutsche Telekom Laboratories 22 W3C Multimodal Working Group. Voice Modality Component (VoiceXML 3.0).
* Note: VoiceXML 3 syntax not yet finalized, example shows principle.
Deutsche Telekom Laboratories 23 W3C Multimodal Architecture. SCXML Editor (Eclipse Plugin).
Eclipse plugin, developed as Diploma thesis by Dennis Biber.
Deutsche Telekom Laboratories 24 W3C Multimodal Architecture. SCXML implementations
Apache Commons SCXML Java implementation http://commons.apache.org/scxml/
Various voice platform providers, such as Voxeo (Prophecy voice platform) already have early SCMXL support
JavaScript interpreter SCXML-js: http://blog.echo-flow.com/tag/scxml-js/ Synergy SCXML lab: http://www.ling.gu.se/~lager/Labs/SCXML-Lab/
C# library http://www.nordover.de
Deutsche Telekom Laboratories 25 W3C Multimodal Architecture. SCXML used for Gesture control system.
Gesture control system for “Entertain” based on Microsoft Kinect, developed as Diploma thesis by Alexander Lübeck.
Gesture SCXML HTTP recognition interpreter (((using (using interface library (NITE) Nordover librarylibrary) ))) library
Deutsche Telekom Laboratories 26 W3C HTML Speech API (Google proposal)
Deutsche Telekom Laboratories 27 DCCI and Delivery Context Ontology
Delivery Context Client Interface (DCCI) Provide web applications access to a hierarchy of dynamic properties representing device capabilities, configurations, user preferences and environmental conditions. http://www.w3.org/TR/DPF/
Delivery Context Ontology provides a formal model of the characteristics of the environment (using OWL)
Deutsche Telekom Laboratories 28 W3C Device API Working Group. Useful APIs for multimodal applications.
Sensor API Generic API to access device sensors (from HTML) http://dev.w3.org/2009/dap/system-info/Sensors.html
Battery status API Retrieve information about the battery status of a (mobile) device (from HTML) http://www.w3.org/TR/battery-status/
Vibration API Provide access to vibration mechanism of the hosting device (from HTML) http://www.w3.org/TR/vibration/
Deutsche Telekom Laboratories 29 W3C WebApps Working Group. Useful APIs for multimodal applications.
Geolocation API Get current geo location (longitude, latitude, altitude) from HTML Independent from location provider (GPS, WiFi, Cell-Id, ...) http://www.w3.org/TR/geolocation-API/
Orientation API Get current device orientation (e.g. tilt) from HTML http://www.w3.org/TR/orientation-event/ iPhone example: „Move the ball“ http://ad.ag/wjmtgt
Deutsche Telekom Laboratories 30