D5.1 ANALYSIS: Multi-User, Multimodal & Context Aware Value Added Services

Deliverable 5.1 Project Title Next-Generation Hybrid Broadcast Broadband Project Acronym HBB-NEXT Call Identifier FP7-ICT-2011-7 Starting Date 01.10.2011 End Date 31.03.2014 Contract no. 287848 Deliverable no. 5.1 Deliverable Name ANALYSIS: Multi-User, Multimodal & Context Aware Value Added Services Work package 5 Nature Report Dissemination Public Authors Oskar van Deventer (TNO), Mark Gülbahar (IRT), Sebastian Schumann, Radovan Kadlic (ST), Gregor Rozinaj, Ivan Minarik (STUBA), Joost de Wit (TNO), Christian Überall, Christian Köbel (THM), Contributors Jennifer Müller (RBB), Jozef Bán, Marián Beniak, Matej Féder, Juraj Kačur, Anna Kondelová, Luboš Omelina, Miloš Oravec, Jarmila Pavlovičová, Ján Tóth, Martin Turi Nagy, Miloslav Valčo, Mário Varga, Matúš Vasek (STUBA) Due Date 30.03.2012 Actual Delivery Date 12.04.2012 HBB-NEXT I D5.1 ANALYSIS: Multi-User, Multimodal & Context Aware Value Added Services Table of Contents 1. General introduction ....................................................................................................... 3 2. Multimodal interface for user/group-aware personalisation in a multi-user environment . 6 2.1. Outline ........................................................................................................................... 6 2.2. Problem statement ......................................................................................................... 6 2.3. Gesture recognition ........................................................................................................ 7 2.3.1. Gesture taxonomies .................................................................................................................. 8 2.3.2. Feature extraction methods ..................................................................................................... 9 2.3.2.1. Methods employing data gloves ............................................................................................................. 9 2.3.2.2. Vision based methods ............................................................................................................................. 9 2.3.2.3. Hidden Markov Models ........................................................................................................................... 9 2.3.2.4. Particle Filtering .................................................................................................................................... 10 2.3.2.5. Condensation algorithm ........................................................................................................................ 11 2.3.2.6. Finite State Machine Approach ............................................................................................................. 11 2.3.2.7. Soft Computing and Connectionist Approach ....................................................................................... 12 2.4. Face recognition ............................................................................................................ 12 2.4.1. Theoretical background .......................................................................................................... 13 2.4.2. Team Expertise ........................................................................................................................ 18 2.4.2.1. Skin Colour Analysis............................................................................................................................... 18 2.4.2.2. Gabor Wavelet approach ...................................................................................................................... 18 2.4.2.3. ASM and SVM utilization ....................................................................................................................... 19 2.5. Multi-speaker identification .......................................................................................... 19 2.5.1. Theoretical background .......................................................................................................... 20 2.5.2. Team Expertise background.................................................................................................... 29 2.6. Speech recognition ........................................................................................................ 29 2.6.1. Theoretical background .......................................................................................................... 30 2.6.2. Speech feature extraction methods for speech recognition .................................................. 31 2.6.2.1. HMM overview ...................................................................................................................................... 33 2.6.2.2. HMM variations and modifications ....................................................................................................... 35 2.6.3. Team Expertise background.................................................................................................... 35 2.7. Speech synthesis ........................................................................................................... 36 2.7.1. Available solutions .................................................................................................................. 37 2.7.1.1. Formant Speech Synthesis .................................................................................................................... 38 2.7.1.2. Articulatory Speech Synthesis ............................................................................................................... 38 2.7.1.3. HMM-based Speech Synthesis .............................................................................................................. 39 2.7.1.4. Concatenative Speech Synthesis ........................................................................................................... 39 2.7.1.5. MBROLA ................................................................................................................................................ 40 2.7.2. Speech synthesis consortium background .............................................................................. 40 2.8. APIs for multimodal interfaces ...................................................................................... 44 2.8.1. Gesture recognition projects, overview ................................................................................. 44 2.8.2. Gesture Recognition Projects, Microsoft Kinect in-depth description ................................... 46 2.8.2.1. Projects .................................................................................................................................................. 49 2.8.2.2. Standards .............................................................................................................................................. 50 2.9. Gaps analysis ................................................................................................................ 57 2.9.1. Gesture recognition ................................................................................................................ 57 2.9.2. Face detection ......................................................................................................................... 58 2.9.2.1. ASM, AAM: Gaps Analysis ..................................................................................................................... 59 2.9.3. Multi speaker recognition ....................................................................................................... 60 HBB-NEXT Consortium 2012 Page 1 HBB-NEXT I D5.1 ANALYSIS: Multi-User, Multimodal & Context Aware Value Added Services 2.9.4. Speech recognition ................................................................................................................. 61 2.9.5. Speech synthesis ..................................................................................................................... 63 3. Context-aware and multi-user content recommendation ............................................... 66 3.1. Outline ......................................................................................................................... 66 3.2. Problem statement ....................................................................................................... 66 3.3. Filtering types ............................................................................................................... 70 3.3.1. Content-based filtering ........................................................................................................... 70 3.3.2. Collaborative filtering ............................................................................................................. 72 3.4. User profiles ................................................................................................................. 75 3.4.1. Implicit and explicit feedback ................................................................................................. 76 3.4.2. User profiles in TNO’s Personal Recommendation Engine Framework .................................. 76 3.5. Presentation of the recommendations ........................................................................... 78 3.6. Evaluation/validation of the recommendation ............................................................... 79 3.6.1. Accuracy .................................................................................................................................

D5.1 ANALYSIS: Multi-User, Multimodal & Context Aware Value Added Services

Rečové Interaktívne Komunikačné Systémy

Commercial Tools in Speech Synthesis Technology

A Framework for Intelligent Voice-Enabled E-Education Systems

Speechtek 2008 Final Program

A Tooi to Support Speech and Non-Speech Audio Feedback Generation in Audio Interfaces

Speech Recognition and Synthesis

Voicexml (VXML) 2.0

IBM Websphere Voice Systems Solutions

Really Useful

Design, Implementation and Evaluation of a Voice Controlled Information Platform Applied in Ships Inspection

W3C Looks to Improve Speech Recognition Technology for Web Transactions 10 December 2005

AN EXTENSIBLE TRANSCODER for HTML to VOICEXML CONVERSION APPROVED by SUPERVISORY COMMITTEE: Supervisor