Exploring Phone Recognition in Pre-Verbal and Dysarthric Speech

Exploring Phone Recognition in Pre-verbal and Dysarthric Speech Syed Sameer Arshad A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science University of Washington 2019 Committee: Gina-Anne Levow Gašper Beguš Program Authorized to Offer Degree: Department of Linguistics 1 ©Copyright 2019 Syed Sameer Arshad 2 University of Washington Abstract Exploring Phone Recognition in Pre-verbal and Dysarthric Speech Chair of the Supervisory Committee: Dr. Gina-Anne Levow Department of Linguistics In this study, we perform phone recognition on speech utterances made by two groups of people: adults who have speech articulation disorders and young children learning to speak language. We explore how these utterances compare against those of adult English-speakers who don’t have speech disorders, training and testing several HMM-based phone-recognizers across various datasets. Experiments were carried out via the HTK Toolkit with the use of data from three publicly available datasets: the TIMIT corpus, the TalkBank CHILDES database and the Torgo corpus. Several discoveries were made towards identifying best-practices for phone recognition on the two subject groups, involving the use of optimized Vocal Tract Length Normalization (VTLN) configurations, phone-set reconfiguration criteria, specific configurations of extracted MFCC speech data and specific arrangements of HMM states and Gaussian mixture models. 3 Preface The work in this thesis is inspired by my life experiences in raising my nephew, Syed Taabish Ahmad. He was born in May 2000 and was diagnosed with non-verbal autism as well as apraxia-of-speech. His speech articulation has been severely impacted as a result, leading to his speech production to be sequences of babbles. He is receptively bilingual in English and Urdu, and communicates via Assistive and Augmentative Communication (AAC) apps on his iPad. He undergoes speech therapy, behavior therapy and special-needs education throughout the week. He has come a long way in life and he gets better at communicating every day. As a linguist and an engineer, I have observed his babbles and read up a lot on speech disorders and the way they are treated by speech-language pathologists. Through discussions with my nephew’s speech-language pathologist and reading texts on the topic, I identified an under-served aspect of the speech-recognition and speech-therapy research literature that I feel I can contribute to, using the wisdom from my lived experiences as a caregiver for my nephew and as a volunteer worker for all the social organizations that he participates in. My hope is that the work in this thesis inspires other researchers to take this kind of research further, in order to create new methods of speech-therapy that can help people like my nephew, who is transitioning from childhood to adulthood with a severe speech-articulation disorder as well as a neuro-cognitive disorder that impacts his social communication. I would like to thank Professors Gina Levow, Emily Bender, Fei Xia, Richard Wright, Ellen Kaisse, Toshiyuki Ogihara and Gašper Berguš for all their help. 4 Table of Contents Preface......................................................................................................................................................... 4 List of Tables................................................................................................................................................ 8 List of Figures............................................................................................................................................ 11 1. Introduction........................................................................................................................................... 12 1.1 Motivation for this study................................................................................................................13 1.2 The work of speech-language pathologists.....................................................................................13 1.3 Phonetic Inventories: Easier said than done...................................................................................16 1.4 Using speech-recognition technology in speech-language pathology.............................................18 2. Related Work......................................................................................................................................... 21 2.1 Related studies on child speech production....................................................................................21 2.1.1 Related studies on babbling....................................................................................................22 2.1.2 Speech-recognition for children..............................................................................................24 2.2 Related studies on dysarthria.........................................................................................................25 2.2.1 Speech-recognition for dysarthric speech...............................................................................26 2.3 Studies on phone recognition.........................................................................................................27 2.3.1 Phone recognition with the TIMIT corpus..............................................................................28 2.3.2 Other phone recognition approaches......................................................................................35 2.4 How our study is different..............................................................................................................36 3. Research Questions................................................................................................................................ 37 4. Choice of Tools...................................................................................................................................... 40 4.1 HTK................................................................................................................................................ 40 4.2 Python and Bash............................................................................................................................. 40 4.3 SoX................................................................................................................................................. 41 5. Methodology and Approach...................................................................................................................42 6. Gathering and Preparing Data...............................................................................................................46 6.1 The TIMIT Dataset.......................................................................................................................... 46 6.2 The Torgo Dataset.......................................................................................................................... 48 6.3 The Talkbank Dataset.....................................................................................................................49 6.3.1 Source Corpora for our TalkBank dataset...............................................................................49 6.3.2 Analysis of our Talkbank Dataset...........................................................................................51 6.3.3 Phone-level statistics on the Talkbank dataset.......................................................................57 6.3.4 Processing transcriptions into HTK Label files........................................................................62 6.3.5 Managing the Talkbank dataset phone count.........................................................................62 6.4 Converting from WAV to MFCC format..........................................................................................66 6.5 Splitting: Training sets vs. Testing sets...........................................................................................67 5 6.6 Preparing the TalkBank data to be recognized by a TIMIT-trained model.....................................68 6.6.1 Handling TIMIT diphthongs...................................................................................................69 6.7 Vocal Tract Length Normalization..................................................................................................70 6.8 Phone-recognition grammar and pronunciation dictionary............................................................71 7. Experimental variables.......................................................................................................................... 73 7.1 Choice of training data...................................................................................................................73 7.2 Choice of testing data.....................................................................................................................73 7.3 Age At Recording............................................................................................................................ 74 7.4 Language Environment...................................................................................................................75 7.5 Cepstrum........................................................................................................................................ 76 7.6 HMM States...................................................................................................................................

Exploring Phone Recognition in Pre-Verbal and Dysarthric Speech

Talk Bank: a Multimodal Database of Communicative Interaction

Multilingvální Systémy Rozpoznávání Řeči a Jejich Efektivní Učení

Segmentability Differences Between Child-Directed and Adult-Directed Speech: a Systematic Test with an Ecologically Valid Corpus

Here ACL Makes a Return to Asia!

Multimedia Corpora (Media Encoding and Annotation) (Thomas Schmidt, Kjell Elenius, Paul Trilsbeek)

Gold Standard Annotations for Preposition and Verb Sense With

A Massively Parallel Corpus: the Bible in 100 Languages

The Relationship Between Transitivity and Caused Events in the Acquisition of Emotion Verbs

The Workshop Programme

Distributional Properties of Verbs in Syntactic Patterns Liam Considine

The Field of Phonetics Has Experienced Two

The 2Nd Workshop on Arabic Corpora and Processing Tools 2016 Theme: Social Media Workshop Programme