(Microsoft Powerpoint
Total Page:16
File Type:pdf, Size:1020Kb
W3C Multimodal Interaction Working Group. Multimodal Interaction. Ingmar Kliche, January 30, 2012 Life is for sharing. W3C Multimodal Interaction Working Group. Multimodal interaction – scope definiton. Alternate or combined input and output modalities. Allow users to dynamically select the most appropriate mode of interaction. Context dependency (e.g. battery status, location, calendar, ...) Adaptive to user preferences. Adaptive to devices properties. Example used throughout this talk: composite input „I want to fly from <here> to <there>“ while tapping on a map „Show me restaurants in <this> area“ while drawing a circle on a map Deutsche Telekom Laboratories 2 W3C Multimodal Interaction Working Group. W3C Multimodal Interaction Working Group. Overview. Launched 2002 24 member organizations (Deutsche Telekom, France Telecom, Microsoft, Loquendo, Genesys, Voxeo, Openstream, ...), 42 members Mission: „... to extend the Web to allow users to dynamically select the most appropriate mode of interaction for their current needs ...“ Deliverables: Multimodal Architecture and Interfaces Specification EMMA InkML EmotionML http://www.w3.org/2002/mmi Deutsche Telekom Laboratories 4 W3C Multimodal Interaction Working Group. Multimodal Architecture and Interfaces. Inititial proposal X-HMTL+Voice (X+V) http://www.w3.org/TR/xhtml+voice/ Problem: thight coupeling of HTML and VoiceXML (using XML-Events), no support for other modalities MMI working group developed a “Multimodal Interaction Framework” (2003) “Multimodal Architecture and Interfaces specification” (currently in CR) http://www.w3.org/TR/mmi-arch Loosely coupled architecture Allows for co-resident or distributed implementations Leverages existing W3C standards (HTML, SVG, …) Deutsche Telekom Laboratories 5 W3C Multimodal Architecture. Multimodal Interaction Framework. W3C Multimodal Interaction Framework (published 2003) Deutsche Telekom Laboratories 6 W3C Multimodal Interaction Working Group. „Multimodal Architecture and Interfaces“ specification. Runtime Framework provides the basic infrastructure and controls communication among the constituents. Interaction Manager (IM) coordinates modality Delivery Context Interaction Data components (MCs) by life-cycle events and Component Manager Component contains the shared data (context). Runtime Framework Event-based communication between IM and MCs. Modality Component API http://www.w3.org/TR/mmi-arch Modality Modality Component 1 Component N Deutsche Telekom Laboratories 7 W3C Multimodal Interaction Working Group. „Multimodal Architecture and Interfaces“. MMI lifecycle events newContextRequest/Response startRequest/Response Delivery Interaction Data cancelRequest/Response Context Component Manager Component prepareRequest/Response Runtime Framework statusRequest/Response pauseRequest/Response Modality Component API resumeRequest/Response Modality Modality doneNotification Component 1 Component N extensionNotification XML schmema for MMI lifecycle events Deutsche Telekom Laboratories 8 Representing user input - EMMA. W3C Multimodal Interaction Working Group. EMMA 1.0. Goal: Annotation/representation of user input Example: user utterance “flights from Boston to Denver“ <emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2003/04/emma http://www.w3.org/TR/2009/REC-emma-20090210/emma.xsd" xmlns="http://www.example.com/example"> <emma:one-of id="r1" emma:start="1087995961542" emma:end="1087995963542" emma:medium="acoustic" emma:mode="voice"> <emma:interpretation id="int1" emma:confidence="0.75" emma:tokens="flights from boston to denver"> <origin>Boston</origin> <destination>Denver</destination> </emma:interpretation> <emma:interpretation id="int2" emma:confidence="0.68" emma:tokens="flights from austin to denver"> <origin>Austin</origin> <destination>Denver</destination> </emma:interpretation> </emma:one-of> </emma:emma> Deutsche Telekom Laboratories 10 Fusion (integration) of combined input. Integrating multimodal input. „Show me <this>“ Recognition Interpretation Integration grammar semmantic interpretation EMMA speech interpretation EMMA integration EMMA interaction ink interpretation manager manager <emma:emma version="1.0"> xmlns:emma="http://www.w3.org/2003/04/emma"> <emma:interpretation id="int1" emma:mode="speech"> <command>show</command> <location emma:hook="ink"> <type>point</type> </location> </emma:interpretation> </emma:emma> Deutsche Telekom Laboratories 12 Integrating multimodal input. „Show me <this>“ Recognition Interpretation Integration grammar semmantic interpretation EMMA speech interpretation EMMA integration EMMA interaction ink interpretation manager manager <emma:emma version="1.0"> xmlns:emma="http://www.w3.org/2003/04/emma"> <emma:interpretation id="int1" emma:mode="ink"> <location> <type>point></type> <x>17</x> <y>54</y> </location> </emma:interpretation> </emma:emma> Deutsche Telekom Laboratories 13 Integrating multimodal input. „Show me <this>“ Speech Input Stylus Input <emma:emma version="1.0" <emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:emma="http://www.w3.org/2003/04/emma" > > <emma:interpretation id="int1" emma:mode="speech"> <emma:interpretation id="int1 emma:mode="ink" > <command>show</command> <location > <location emma:hook="ink" > <type>point</type> <type>point</type> <x>17</x> </location> <y>54</y> </emma:interpretation> </location> </emma:emma> </emma:interpretation> </emma:emma> Deutsche Telekom Laboratories 14 Integrating multimodal input. „Show me <this>“ Speech Input Stylus Input <emma:emma version="1.0" <emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns:emma="http://www.w3.org/2003/04/emma" > > <emma:interpretation id="int1" emma:mode="speech"> <emma:interpretation id="int1" emma:mode="ink"> <command>show</command> <location> <location emma:hook="ink"> <type>point</type> <type>point</type> <x>17</x> </location> <y>54</y> </emma:interpretation> </location> </emma:emma> </emma:interpretation> </emma:emma> Integration <emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma"> <emma:interpretation id="int1" emma:mode="multimodal"> <command>show</command> <location> <type>point</type> <x>17</x> <y>54</y> </location> </emma:interpretation> </emma:emma> Deutsche Telekom Laboratories 15 Integrating multimodal input. „Show me <this>“ JavaScript library to demonstrate fusion of multimodal input (aka „integration“) Deutsche Telekom Laboratories 16 W3C Multimodal Interaction Working Group. InkML1.0. Representation of Ink-Traces InkML might be carried within EMMA <ink xmlns="http://www.w3.org/2003/InkML"> <trace> 10 0, 9 14, 8 28, 7 42, 6 56, 6 70, 8 84, 8 98, 8 112, 9 126, 10 140, 13 154, 14 168, 17 182, 18 188, 23 174, 30 160, 38 147, 49 135, 58 124, 72 121, 77 135, 80 149, 82 163, 84 177, 87 191, 93 205 </trace> <trace> 130 155, 144 159, 158 160, 170 154, 179 143, 179 129, 166 125, 152 128, 140 136, 131 149, 126 163, 124 177, 128 190, 137 200, 150 208, 163 210, 178 208, 192 201, 205 192, 214 180 </trace> <trace> 227 50, 226 64, 225 78, 227 92, 228 106, 228 120, 229 134, 230 148, 234 162, 235 176, 238 190, 241 204 </trace> <trace> 282 45, 281 59, 284 73, 285 87, 287 101, 288 115, 290 129, 291 143, 294 157, 294 171, 294 185, 296 199, 300 213 </trace> <trace> 366 130, 359 143, 354 157, 349 171, 352 185, 359 197, 371 204, 385 205, 398 202, 408 191, 413 177, 413 163, 405 150, 392 143, 378 141, 365 150 </trace> </ink> Deutsche Telekom Laboratories 17 W3C Multimodal Interaction Working Group. EmotionML 1.0. Annotation of material involving emotionality, such as annotation of videos, of speech recordings, of faces, of texts, etc. Automatic recognition of emotions from sensors, including physiological sensors, speech recordings. Generation of emotion-related system responses. http://www.w3.org/TR/emotionml/ Example: Annotation of emotion, enclosed in an EMMA document <emma:emma version="1.0" xmlns:emma="http://www.w3.org/2003/04/emma" xmlns="http://www.w3.org/2009/10/emotionml"> <emma:interpretation emma:start="12457990" emma:end="12457995" emma:mode="voice" emma:verbal="false"> <emotion category-set="http://www.w3.org/TR/emotion-voc/xml#everyday-categories"> <category name="bored" value="0.1" confidence="0.1"/> </emotion> </emma:interpretation> </emma:emma> Deutsche Telekom Laboratories 18 Related work within W3C. W3C Multimodal Working Group. Voice Modality Component (VoiceXML). W3C VoiceBrowser Working Group Existing standards: VoiceXML 2.1 CCXML 1.0 SRGS 1.0 and SISR 1.0 PLS 1.0 and SSML 1.0 Work in progress: SCXML 1.0 VoiceXML 3.0 External eventing New features (e.g. speaker verification) Fine-grained control of FIA by embedding SCXML in VoiceXML or VoiceXML in SCXML * Note: VoiceXML 3 syntax not yet finalized, example shows principle. Deutsche Telekom Laboratories 20 W3C Multimodal Interaction Working Group. VoiceXML 2.0/2.1 - example. <?xml version="1.0" encoding="UTF-8"?> <vxml version="2.0" C(omputer): Welcome home. xmlns=http://www.w3.org/2001/vxml > Say one of: sports; weather; <menu> Stargazer astrophysics news. <prompt> Welcome home. Say one of: <enumerate/> H(uman): Astrology. </prompt> <choice next="start.vxml"> C: I did not understand what Sports </choice> you said. (a platform- <choice next="intro.vxml"> specific default message.) Weather </choice> C: Welcome home. Say one of: <choice next="astronews.vxml"> sports; weather; Stargazer Stargazer astrophysics