Serious Mobile Game with Sibilant Consonant Exercises for Speech Therapy

Ivo Filipe Pinho dos Anjos Master of Science Serious mobile game with sibilant consonant exercises for speech therapy Dissertation submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science and Engineering Adviser: Prof. Dr. Sofia Cavaco, Assistant Professor, Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa Co-adviser: Prof. Dr. João Magalhães, Assistant Professor, Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa Examination Committee Chairperson: Prof. Dr. Fernando Birra Raporteur: Prof. Dr. Sérgio Paulo Member: Prof. Dr. Sofia Cavaco November, 2017 Serious mobile game with sibilant consonant exercises for speech therapy Copyright © Ivo Filipe Pinho dos Anjos, Faculty of Sciences and Technology, NOVA Uni- versity of Lisbon. The Faculty of Sciences and Technology and the NOVA University of Lisbon have the right, perpetual and without geographical boundaries, to file and publish this dissertation through printed copies reproduced on paper or on digital form, or by any other means known or that may be invented, and to disseminate through scientific reposito- ries and admit its copying and distribution for non-commercial, educational or research purposes, as long as credit is given to the author and editor. This document was created using the (pdf)LATEX processor, based in the “unlthesis” template[1], developed at the Dep. Informática of FCT-NOVA [2]. [1] https://github.com/joaomlourenco/unlthesis [2] http://www.di.fct.unl.pt Acknowledgements This work was supported by the Portuguese Foundation for Science and Technology under projects BioVisualSpeech (CMUP-ERI/TIC/0033/2014) and also NOVA-LINCS (PEest/UID/CEC/04516/2013). I would like to thank Prof. Dr. Sofia Cavaco for all the counselling and guidance during this last year, which allowed me to produce the work presented in this dissertation, and also in the form of an accepted paper, which I am very grateful for. I would also like to thank the SLPs Diana Lança and Catarina Duarte for their availabil- ity and feedback, which helped me find the main problem to adress in this dissertation. Would also like to thank all the 3rd and 4th year SLP students from Escola Superior de Saúde do Alcoitão who collaborated in the data collection task. Many thanks also to Inês Jorge for the graphic design of the game scenarios. Finally, would like to thank the schools from Agrupamento de Escolas de Almeida Garrett, and all the children who participated in the recordings. Lastly, I would like to thank my family and friends who have helped me a lot during my academic years. v Abstract The distortion of sibilant sounds is a common type of speech sound disorder (SSD) in European Portuguese (EP) speaking children. Speech and language pathologists (SLP) frequently use the isolated sibilants exercise to assess and treat this type of speech errors. While technological solutions like serious games can help SLPs to motivate the children on doing the exercises repeatedly, there is a lack of such games for this specific exercise. Another important aspect is that given the usual small number of therapy ses- sions per week, children are not improving at their maximum rate, which is only achieved by more intensive therapy. We propose a serious game for mobile platforms that allows children to practice their isolated sibilants exercises at home to correct sibilant distortions. This will allow children to practice their exercises more frequently, which can lead to faster improvements. We have designed four different scenarios, one for each EP sibilant consonant. The game, which uses an automatic speech recognition (ASR) system to classify the child sibilant productions, is controlled by the child’s voice in real time and gives immediate visual feedback to the child about her sibilant productions. We also used some relevant cues, in order to help the child remember which sound to produce. In order to keep the computation on the mobile platform as simple as possible, the game has a client-server architecture, in which the external server runs the ASR system. We used the raw Mel frequency cepstral coefficients as features, and tested different classifiers, like linear and quadratic discriminant analysis and support vector machines, and compared multiple options for selecting the training and test sets. We were able to achieved very good results with accuracy test scores of above 91% using support vector machines. Keywords: Sound Analysis, Machine Learning, Supervised Learning, Interactive Environment, Parameterization, Speech Therapy, Articulation Disorders vii Resumo A distorção das sibilantes é um tipo de distúrbio de fala comum em crianças cuja lín- gua materna é o Português Europeu. Os terapeutas da fala e da linguagem frequentemente utilizam o exercício das sibilantes isoladas para avaliar e tratar este tipo de problemas. Apesar de existirem soluções tecnológicas como jogos sérios para ajudar os terapeutas a motivar as crianças para praticarem os exercícios repetidamente, não existem jogos para este exercício específico. Outro aspeto importante é que dado o reduzido número de sessões de terapia por semana, as crianças não estão a melhorar ao seu ritmo máximo, pois isso só é alcançável com recurso a uma terapia mais intensiva. Propomos um jogo sério para plataformas móveis que permite às crianças praticar os seus exercícios das sibilantes isoladas em casa para corrigir as distorções. Isto vai per- mitir que as crianças pratiquem os exercícios mais frequentemente, o que pode levar a melhorias mais rápidas. Desenhámos quatro cenários, um para cada sibilante do Portu- guês Europeu. O jogo, que utiliza um sistema automático de reconhecimento de fala para classificar as produções da criança, é controlado pela voz da criança em tempo real e dá feedback visual imediato à criança sobre a sua produção de som. Também utilizámos algumas pistas, de modo a ajudar a criança a lembrar-se de qual o som a produzir. Para manter a complexidade na plataforma móvel o mais simples possível, o jogo utiliza uma arquitetura de cliente-servidor, sendo que o servidor corre o sistema automático de reconhecimento de fala. Utilizámos Mel frequency cepstral coefficients como features, e testámos vários classificadores, como linear e quadratic discriminant analysis e support vector machines, e comparámos várias opções para selecionar os conjuntos de treino e de teste. Conseguimos atingir resultados muito bons com resultados de precisão em teste acima de 91% utilizando support vector machines. Palavras-chave: Análise de Som, Aprendizagem Automática, Aprendizagem Supervisionada, Ambi- ente Interativo, Parametrização, Terapia da Fala, Distúrbios de Articulação ix Contents List of Figures xiii List of Tables xv Listings xvii 1 Introduction1 1.1 Introduction................................... 1 1.2 Objectives .................................... 2 1.3 Proposed Solution................................ 4 2 Background and Related Work7 2.1 Speech Therapy ................................. 7 2.1.1 The Process of voice........................... 8 2.1.2 Language and Speech.......................... 8 2.1.3 Speech Disorders............................ 9 2.1.4 Speech Therapy Areas - Sibilant Sounds and Minimal Pairs.... 9 2.1.5 Interviews................................ 12 2.2 Sound Features Extraction & Machine Learning............... 13 2.2.1 Mel-Frequency Cepstral Coefficients (MFCC)............ 14 2.2.2 Linear Discriminant Analysis..................... 14 2.2.3 Linear and Quadratic Discriminant Classifier............ 14 2.2.4 Cross Validation............................. 15 2.2.5 Support Vector Machines (SVM).................... 15 2.3 State-of-the-art Tools .............................. 16 2.3.1 Isolated Games ............................. 16 2.3.2 Complex Systems............................ 20 2.3.3 Tools Comparison............................ 21 3 Game and Architecture 23 3.1 System platform and game engine....................... 23 3.2 Mobile game................................... 24 3.2.1 Game goal and scenarios........................ 25 xi CONTENTS 3.2.2 Visual cues................................ 26 3.2.3 Visual Feedback............................. 27 3.2.4 Parameterization of the game ..................... 28 3.3 System architecture............................... 29 3.4 Implementation details - modularity and extensibility........... 30 3.4.1 Mobile Game .............................. 31 3.4.2 Server................................... 33 3.4.3 The need for an ASR system...................... 34 4 Automatic Speech Recognition System 35 4.1 Sound data.................................... 35 4.2 Automatic recognition of isolated sibilant consonants ........... 37 4.2.1 The classification algorithm...................... 37 4.2.2 Feature vectors ............................. 38 4.2.3 Different options for the training and test sets............ 39 4.2.4 Model training methodology...................... 41 4.3 Classification Results.............................. 41 4.3.1 Comparing the classifiers using the Naive split........... 42 4.3.2 Comparing the different training and test sets............ 43 4.3.3 Naive split results............................ 44 4.3.4 K% children test set results ...................... 44 4.3.5 One Child Out experiment results .................. 47 4.4 Discussion .................................... 50 5 Feedback from the SLPs and children 53 5.1 Feedback from children............................. 53 5.2 Feedback from SLPs..............................

Load more