Automatic Speech Recognition for the Hearing Impaired in an Augmented Reality Application

Automatic Speech Recognition for the Hearing Impaired in an Augmented Reality Application Anja Virkkunen School of Science Thesis submitted for examination for the degree of Master of Science in Technology. Espoo 19.11.2018 Supervisor Prof. Mikko Kurimo Advisors D.Sc. Kalle Palomäki M.Sc. Juri Lukkarila Copyright © 2018 Anja Virkkunen Aalto University, P.O. BOX 11000, 00076 AALTO www.aalto.fi Abstract of the master’s thesis Author Anja Virkkunen Title Automatic Speech Recognition for the Hearing Impaired in an Augmented Reality Application Degree programme Computer, Communication and Information Sciences Major Machine Learning, Data Science and Artificial Code of major SCI3044 Intelligence Supervisor Prof. Mikko Kurimo Advisors D.Sc. Kalle Palomäki,M.Sc. Juri Lukkarila Date 19.11.2018 Number of pages 90+11 Language English Abstract People with hearing loss experience considerable difficulties in participating and understanding spoken communication, which has negative effects on many aspects of their life. In many proposed solutions to the problem, the deaf or hard of hearing person has to take their attention away from the speaker. As a consequence, the hearing impaired miss for instance gestures and expressions of the speaker. This thesis studied the use of augmented reality and automatic speech recognition technologies in an assistive mobile application for the hearing impaired. The application uses mobile augmented reality with video-based augmentations. Automatic speech recognition is done using modern neural network models. In the implementa- tion, automatic speech recogniser transcriptions were placed in speech bubbles on top of an augmented reality view of the conversation partner. This minimised the distance between the speaker and the transcriptions which help the hearing impaired follow the conversation. To validate the usefulness of the approach, user tests were organised with hearing impaired participants. The results show that the deaf and hard of hearing found the augmented reality view and the application helpful for following conversations. The most requested improvements by the user testers were support for visual separation and identification of speakers in group conversations and higher speech recognition accuracy. Keywords augmented reality, speech recognition, mobile development, user testing, hearing impairment , assistive technology Aalto-yliopisto, PL 11000, 00076 AALTO www.aalto.fi Diplomityön tiivistelmä Tekijä Anja Virkkunen Työn nimi Automaattinen puheentunnistus kuuroille ja huonokuuloisille lisätyn todellisuuden sovelluksessa Koulutusohjelma Computer, Communication and Information Sciences Pääaine Machine Learning and Data Mining Pääaineen koodi SCI3044 Työn valvoja Prof. Mikko Kurimo Työn ohjaajat TkT Kalle Palomäki,DI Juri Lukkarila Päivämäärä 19.11.2018 Sivumäärä 90+11 Kieli Englanti Tiivistelmä Huonokuuloisilla ja kuuroilla ihmisillä on huomattavia vaikeuksia keskusteluihin osal- listumisessa ja niiden ymmärtämisessä, joka laskee heidän elämänlaatuaan monella tavalla. Suuressa osassa ongelmaan tarjotuista ratkaisuista kuurot ja huonokuuloiset joutuvat siirtämään huomionsa pois puhujasta. Tällöin kuulovammainen ei näe esimerkiksi puhujan eleitä ja ilmeitä. Tässä työssä tutkittiin lisätyn todellisuuden ja automaattisen puheentunnistuksen hyödyntämistä huonokuuloisille ja kuuroille tarkoitetussa avustavassa sovelluksessa. Sovellus käyttää video- ja mobiilipohjaista lisättyä todellisuutta. Puheentunnistuk- sessa hyödynnetään moderneja neuroverkkomalleja. Toteutuksessa automaattisen puheentunnistuksen tulokset sijoitettiin puhekupliin videokuvassa näkyvän puhujan kasvojen lähelle. Näin kuuro tai huonokuuloinen käyttäjä pystyi helposti seuramaan sekä puhujaa että puheentunnistustuloksia. Sovelluksen hyödyllisyyttä arvioitiin järjestämällä käyttäjätestejä kuuroille ja huonokuuloisille. Tulosten perusteella huonokuuloiset ja kuurot kokivat lisätyn todellisuuden ja sovelluksen auttavan keskustelujen seuraamisessa. Testikäyttäjien eniten toivomia pa- rannuksia olivat eri puhujien puheentunnistustulosten visuaalinen erottelu toisistaan ja parempi puheentunnistustarkkuus. Avainsanat lisätty todellisuus, puheentunnistus, mobiilikehitys, käyttäjätestaus, kuulovauriot, apuvälineteknologia 5 Preface First, I would like to thank supervisor Prof. Mikko Kurimo for his feedback, patience and understanding during the whole process. I would like to thank advisors D.Sc. Kalle Palomäkiand M.Sc. Juri Lukkarila for letting me work on this project and for all they have done to help me with this work. I am extremely lucky to have had two people to guide me and give me new perspectives. I am grateful to all the deaf and hard of hearing who participated and gave valuable feedback in the user tests and to people from Kuuloliitto ry for their help in recruiting people to the user tests. I would like to thank Katri Leino, Juho Leinonen, Peter Smit, Tuomas Kaseva, Zhicun Xu, Mittul Singh, Aku Rouhe and Reima Karhila from the Speech Recognition group for general help, ideas, conversations and good company. Additionally, I would like to thank Tarmo Simonen and Aleksi Oyry¨ from the Aalto University, for providing me all the necessary equipment. I am thankful to the Speech Recognition research group of the Department of Signal Processing and Acoustics at the Aalto University School of Electrical Engi- neering and the Academy of Finland for supporting and enabling this thesis as part of the project Conversation Assistant for the Hearing Impaired. Last but not least, I would like to express my gratitude to Jarkko and my family for believing in me, cheering me forward and supporting me in the moments of despair until the very end. Otaniemi, 19.11.2018 Anja Virkkunen 6 Contents Abstract3 Abstract (in Finnish)4 Preface5 Contents6 Symbols and abbreviations8 1 Introduction9 1.1 Augmented reality............................9 1.2 Conversation assistant.......................... 10 1.3 Research goals............................... 11 1.4 Thesis structure.............................. 11 2 Background 12 2.1 Augmented reality............................ 12 2.1.1 Definition............................. 13 2.1.2 Human factors.......................... 15 2.1.3 Technology............................ 20 2.1.4 Past and present AR systems.................. 24 2.1.5 Challenges for AR usage and adoption............. 25 2.2 Automatic speech recognition...................... 26 2.2.1 The structure of an ASR system................. 27 2.2.2 Conversational speech...................... 31 2.3 Hearing impairment............................ 32 2.3.1 Hearing loss types......................... 33 2.3.2 Diagnosis and treatment..................... 35 2.3.3 Societal effects.......................... 35 2.4 Visual dispersion............................. 36 2.5 Previous work............................... 38 3 Augmented Reality Conversation Assistant 41 3.1 Description................................ 41 3.2 System structure............................. 42 3.3 Software.................................. 45 3.3.1 iOS application.......................... 46 3.3.2 ASR model and server...................... 49 7 4 User testing 51 4.1 Objectives................................. 51 4.2 Test plan................................. 52 4.2.1 Introduction............................ 53 4.2.2 Getting to know the application................. 53 4.2.3 Section 1: Word explanation................... 53 4.2.4 Section 2: Conversation..................... 54 4.2.5 Conclusion............................. 54 4.3 Questionnaire............................... 54 4.4 Arrangements............................... 55 4.4.1 Test environment setup...................... 58 4.4.2 Participants............................ 60 5 Results 62 5.1 Word explanation and conversation tasks................ 63 5.2 Final ratings................................ 67 5.3 Written feedback............................. 71 5.4 Discussion on results........................... 73 6 Conclusions 75 References 77 A Questionnaire form 91 B Questionnaire answers 101 8 Symbols and abbreviations Abbreviations AR Augmented reality ASR Automatic speech recognition AV Augmented virtuality AVSR Audiovisual speech recognition CE Central executive FOV Field-of-view GMM Gaussian Mixture Model GPS Global positioning system HMD Head-mounted display HMM Hidden Markov Model IT Information technology LSTM Long short-term memory MFCC Mel frequency cepstral coefficient MR Mixed reality MVC Model-view-controller OST Optical see-through RSD Retinal scanning display TDNN Time-delay neural network TLS Transport Layer Security TTS Text-to-Speech UI User interface VR Virtual reality VST Video see-through WM Working memory 9 1 Introduction Hearing loss in all of its forms can significantly limit the access to auditory information and the ability to communicate. Participating in conversations and social life becomes a struggle, especially in noisy environments, causing exhaustion, withdrawal and a poorer quality of life [1,2,3]. Working life and education are affected as well, with hearing impaired often quitting schools and working life earlier than their hearing peers [4, 5, 6, 7]. Different studies estimate 10 to 20 percent of the population to have hearing loss [8, 9], which means it is a considerable societal issue in addition to complicating individual lives. Moreover, hearing loss has the highest prevalence in elderly population, so the number of hearing impaired is

Automatic Speech Recognition for the Hearing Impaired in an Augmented Reality Application

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support