
Automatic Speech Recognition for the Hearing Impaired in an Augmented Reality Application Anja Virkkunen School of Science Thesis submitted for examination for the degree of Master of Science in Technology. Espoo 19.11.2018 Supervisor Prof. Mikko Kurimo Advisors D.Sc. Kalle Palom¨aki M.Sc. Juri Lukkarila Copyright © 2018 Anja Virkkunen Aalto University, P.O. BOX 11000, 00076 AALTO www.aalto.fi Abstract of the master’s thesis Author Anja Virkkunen Title Automatic Speech Recognition for the Hearing Impaired in an Augmented Reality Application Degree programme Computer, Communication and Information Sciences Major Machine Learning, Data Science and Artificial Code of major SCI3044 Intelligence Supervisor Prof. Mikko Kurimo Advisors D.Sc. Kalle Palom¨aki,M.Sc. Juri Lukkarila Date 19.11.2018 Number of pages 90+11 Language English Abstract People with hearing loss experience considerable difficulties in participating and understanding spoken communication, which has negative effects on many aspects of their life. In many proposed solutions to the problem, the deaf or hard of hearing person has to take their attention away from the speaker. As a consequence, the hearing impaired miss for instance gestures and expressions of the speaker. This thesis studied the use of augmented reality and automatic speech recognition technologies in an assistive mobile application for the hearing impaired. The appli- cation uses mobile augmented reality with video-based augmentations. Automatic speech recognition is done using modern neural network models. In the implementa- tion, automatic speech recogniser transcriptions were placed in speech bubbles on top of an augmented reality view of the conversation partner. This minimised the distance between the speaker and the transcriptions which help the hearing impaired follow the conversation. To validate the usefulness of the approach, user tests were organised with hearing impaired participants. The results show that the deaf and hard of hearing found the augmented reality view and the application helpful for following conversations. The most requested improvements by the user testers were support for visual separation and identification of speakers in group conversations and higher speech recognition accuracy. Keywords augmented reality, speech recognition, mobile development, user testing, hearing impairment , assistive technology Aalto-yliopisto, PL 11000, 00076 AALTO www.aalto.fi Diplomityön tiivistelmä Tekijä Anja Virkkunen Työn nimi Automaattinen puheentunnistus kuuroille ja huonokuuloisille lis¨atyn todellisuuden sovelluksessa Koulutusohjelma Computer, Communication and Information Sciences Pääaine Machine Learning and Data Mining Pääaineen koodi SCI3044 Työn valvoja Prof. Mikko Kurimo Työn ohjaajat TkT Kalle Palom¨aki,DI Juri Lukkarila Päivämäärä 19.11.2018 Sivumäärä 90+11 Kieli Englanti Tiivistelmä Huonokuuloisilla ja kuuroilla ihmisill¨a on huomattavia vaikeuksia keskusteluihin osal- listumisessa ja niiden ymm¨art¨amisess¨a, joka laskee heid¨an el¨am¨anlaatuaan monella tavalla. Suuressa osassa ongelmaan tarjotuista ratkaisuista kuurot ja huonokuuloi- set joutuvat siirt¨am¨a¨an huomionsa pois puhujasta. T¨all¨oin kuulovammainen ei n¨ae esimerkiksi puhujan eleit¨a ja ilmeit¨a. T¨ass¨a ty¨oss¨a tutkittiin lis¨atyn todellisuuden ja automaattisen puheentunnistuksen hy¨odynt¨amist¨a huonokuuloisille ja kuuroille tarkoitetussa avustavassa sovelluksessa. Sovellus k¨aytt¨a¨a video- ja mobiilipohjaista lis¨atty¨a todellisuutta. Puheentunnistuk- sessa hy¨odynnet¨a¨an moderneja neuroverkkomalleja. Toteutuksessa automaattisen puheentunnistuksen tulokset sijoitettiin puhekupliin videokuvassa n¨akyv¨an puhujan kasvojen l¨ahelle. N¨ain kuuro tai huonokuuloinen k¨aytt¨aj¨a pystyi helposti seuramaan sek¨a puhujaa ett¨a puheentunnistustuloksia. Sovelluksen hy¨odyllisyytt¨a arvioitiin j¨arjest¨am¨all¨a k¨aytt¨aj¨atestej¨a kuuroille ja huonokuuloisille. Tulosten perusteella huonokuuloiset ja kuurot kokivat lis¨atyn todellisuuden ja sovelluksen auttavan keskustelujen seuraamisessa. Testik¨aytt¨ajien eniten toivomia pa- rannuksia olivat eri puhujien puheentunnistustulosten visuaalinen erottelu toisistaan ja parempi puheentunnistustarkkuus. Avainsanat lis¨atty todellisuus, puheentunnistus, mobiilikehitys, k¨aytt¨aj¨atestaus, kuulovauriot, apuv¨alineteknologia 5 Preface First, I would like to thank supervisor Prof. Mikko Kurimo for his feedback, patience and understanding during the whole process. I would like to thank advisors D.Sc. Kalle Palom¨akiand M.Sc. Juri Lukkarila for letting me work on this project and for all they have done to help me with this work. I am extremely lucky to have had two people to guide me and give me new perspectives. I am grateful to all the deaf and hard of hearing who participated and gave valuable feedback in the user tests and to people from Kuuloliitto ry for their help in recruiting people to the user tests. I would like to thank Katri Leino, Juho Leinonen, Peter Smit, Tuomas Kaseva, Zhicun Xu, Mittul Singh, Aku Rouhe and Reima Karhila from the Speech Recognition group for general help, ideas, conversations and good company. Additionally, I would like to thank Tarmo Simonen and Aleksi Oyry¨ from the Aalto University, for providing me all the necessary equipment. I am thankful to the Speech Recognition research group of the Department of Signal Processing and Acoustics at the Aalto University School of Electrical Engi- neering and the Academy of Finland for supporting and enabling this thesis as part of the project Conversation Assistant for the Hearing Impaired. Last but not least, I would like to express my gratitude to Jarkko and my family for believing in me, cheering me forward and supporting me in the moments of despair until the very end. Otaniemi, 19.11.2018 Anja Virkkunen 6 Contents Abstract3 Abstract (in Finnish)4 Preface5 Contents6 Symbols and abbreviations8 1 Introduction9 1.1 Augmented reality............................9 1.2 Conversation assistant.......................... 10 1.3 Research goals............................... 11 1.4 Thesis structure.............................. 11 2 Background 12 2.1 Augmented reality............................ 12 2.1.1 Definition............................. 13 2.1.2 Human factors.......................... 15 2.1.3 Technology............................ 20 2.1.4 Past and present AR systems.................. 24 2.1.5 Challenges for AR usage and adoption............. 25 2.2 Automatic speech recognition...................... 26 2.2.1 The structure of an ASR system................. 27 2.2.2 Conversational speech...................... 31 2.3 Hearing impairment............................ 32 2.3.1 Hearing loss types......................... 33 2.3.2 Diagnosis and treatment..................... 35 2.3.3 Societal effects.......................... 35 2.4 Visual dispersion............................. 36 2.5 Previous work............................... 38 3 Augmented Reality Conversation Assistant 41 3.1 Description................................ 41 3.2 System structure............................. 42 3.3 Software.................................. 45 3.3.1 iOS application.......................... 46 3.3.2 ASR model and server...................... 49 7 4 User testing 51 4.1 Objectives................................. 51 4.2 Test plan................................. 52 4.2.1 Introduction............................ 53 4.2.2 Getting to know the application................. 53 4.2.3 Section 1: Word explanation................... 53 4.2.4 Section 2: Conversation..................... 54 4.2.5 Conclusion............................. 54 4.3 Questionnaire............................... 54 4.4 Arrangements............................... 55 4.4.1 Test environment setup...................... 58 4.4.2 Participants............................ 60 5 Results 62 5.1 Word explanation and conversation tasks................ 63 5.2 Final ratings................................ 67 5.3 Written feedback............................. 71 5.4 Discussion on results........................... 73 6 Conclusions 75 References 77 A Questionnaire form 91 B Questionnaire answers 101 8 Symbols and abbreviations Abbreviations AR Augmented reality ASR Automatic speech recognition AV Augmented virtuality AVSR Audiovisual speech recognition CE Central executive FOV Field-of-view GMM Gaussian Mixture Model GPS Global positioning system HMD Head-mounted display HMM Hidden Markov Model IT Information technology LSTM Long short-term memory MFCC Mel frequency cepstral coefficient MR Mixed reality MVC Model-view-controller OST Optical see-through RSD Retinal scanning display TDNN Time-delay neural network TLS Transport Layer Security TTS Text-to-Speech UI User interface VR Virtual reality VST Video see-through WM Working memory 9 1 Introduction Hearing loss in all of its forms can significantly limit the access to auditory information and the ability to communicate. Participating in conversations and social life becomes a struggle, especially in noisy environments, causing exhaustion, withdrawal and a poorer quality of life [1,2,3]. Working life and education are affected as well, with hearing impaired often quitting schools and working life earlier than their hearing peers [4, 5, 6, 7]. Different studies estimate 10 to 20 percent of the population to have hearing loss [8, 9], which means it is a considerable societal issue in addition to complicating individual lives. Moreover, hearing loss has the highest prevalence in elderly population, so the number of hearing impaired is
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages102 Page
-
File Size-