Lithuanian University of Health Sciences Medical Academy
Total Page:16
File Type:pdf, Size:1020Kb
LITHUANIAN UNIVERSITY OF HEALTH SCIENCES MEDICAL ACADEMY Evaldas Padervinskis THE VALUE OF AUTOMATIC VOICE CATEGORIZATION SYSTEMS BASED ON ACOUSTIC VOICE PARAMETERS AND QUESTIONNAIRE DATA IN THE SCREENING OF VOICE DISORDERS Doctoral Dissertation Biomedical Sciences, Medicine (06B) Kaunas, 2016 Dissertation has been prepared at the Lithuanian University of Health Sciences, Medical Academy, Department of Otorinolaryngology during the period of 2011–2015. Scientific Supervisor Prof. Dr. Habil. Virgilijus Ulozas (Lithuanian University of Health Sciences, Medical Academy, Biomedical Sciences, Medicine – 06B). Dissertation is defended at the Medical Research Council of the Lithuanian University of Health Sciences, Medical Academy: Chairman Prof. Dr. Habil. Limas Kupcinskas (Lithuanian University of Health Sciences, Medical Academy, Biomedical Sciences, Medicine – 06B). Members: Prof. Dr. Habil. Daiva Rastenyte (Lithuanian University of Health Sciences, Medical Academy, Biomedical Sciences, Medicine – 06B); Prof. Dr. Dalia Zaliuniene (Lithuanian University of Health Sciences, Medical Academy, Biomedical Sciences, Medicine – 06B); Prof. Dr. Vaidotas Marozas (Kaunas University of Technology, Technological Sciences, Electrical and Electronics Engineering – 01T); Prof. Dr. Habil. Kazimierz Niemczyk (Medical University of Warsaw, Biomedical Sciences, Medicine – 06B). Dissertation will be defended at the open session of the Medical Research Council of Lithuanian University of Health Sciences on June 16th, 2016 at 2 p.m. in 204 auditorium at Faculty of Pharmacy of Lithuanian University of Health Sciences. Address: Sukileliu 13, LT-50009 Kaunas, Lithuania. 2 LIETUVOS SVEIKATOS MOKSLŲ UNIVERSITETAS MEDICINOS AKADEMIJA Evaldas Padervinskis AUTOMATINĖS BALSO KATEGORIZAVIMO SISTEMOS, PAREMTOS AKUSTINIŲ BALSO PARAMETRŲ BEI PACIENTŲ KLAUSIMYNŲ DUOMENŲ ANALIZE, VERTĖ PIRMINEI BALSO SUTRIKIMŲ ATRANKAI Daktaro disertacija Biomedicinos mokslai, medicina (06B) Kaunas, 2016 3 Disertacija rengta 2011–2015 metais Lietuvos sveikatos mokslų universiteto Medicinos akademijos Ausų, nosies ir gerklės ligų klinikoje. Mokslinis vadovas prof. habil. dr Virgilijus Ulozas (Lietuvos sveikatos mokslų universitetas, Medicinos akademija, biomedicinos mokslai, medicina – 06B). Disertacija ginama Lietuvos sveikatos mokslų universiteto Medicinos akademijos medicinos mokslo krypties taryboje: Pirmininkas prof. habil. dr. Limas Kupčinskas (Lietuvos sveikatos mokslų uni- versitetas, biomedicinos mokslai, medicina – 06B). Nariai: prof. habil. dr. Daiva Rastenytė (Lietuvos sveikatos mokslų universitetas, biomedicinos mokslai, medicina – 06B); prof. dr. Dalia Žaliūnienė (Lietuvos sveikatos mokslų universitetas, biomedicinos mokslai, medicina – 06B); prof. dr. Vaidotas Marozas (Kauno technologijos universitetas, tech- nologijos mokslai, elektros ir elektronikos inžinerija – 01T); prof. habil. dr. Kazimierz Niemczyk (Varšuvos medicinos universitetas, biomedicinos mokslai, medicina – 06B). Disertacija ginama viešame Medicinos mokslo krypties tarybos posėdyje 2016 m. birželio 16 d. 14 val. Lietuvos sveikatos mokslų universiteto Farmacijos fakulteto 204 auditorijoje. Adresas : Sukilėlių pr. 13, LT-50009 Kaunas. 4 Šią knygą skiriu savo Tėvams Ilonai ir Edmundui Padervinskiams Ačiū „Gyvenimas skirstomas į tris laiko vienetus: kas buvo, kas yra ir kas bus. Tai, ką dabar veikiame, yra trumpa, ką veiksime – netikra, ką nuveikėme – užtikrinta.“ Seneka 5 CONTENTS ABREVIATIONS .......................................................................................... 7 INTRODUCTION .......................................................................................... 8 1. THE AIM AND OBJECTIVES OF THE STUDY .................................. 10 2. ORGINALITY OF THE STUDY ............................................................ 11 3. SCIENTIFIC LITERATURE REVIEW .................................................. 13 3.1. Voice and computer .......................................................................... 14 3.2. Acoustic voice analysis ................................................................... 18 3.3. Microphone and acoustic voice analysis ......................................... 21 3.4. Questionnaires and voice analysis ................................................... 22 3.5. Classifiers and voice analysis .......................................................... 25 3.6. Acoustic voice analysis and smartphones ....................................... 27 4. METHODS .............................................................................................. 27 4.1. Ethics ................................................................................................ 27 4.2. Study design .................................................................................... 27 4.3. Voice recordings .............................................................................. 30 4.4. Acoustical analysis ........................................................................... 32 4.5. Questionnaire data ........................................................................... 33 4.6. Statistical evaluation, classifiers ...................................................... 34 5. RESULTS ................................................................................................ 37 5.1. Study No I (Analysis of oral and throat microphones using discriminant analysis) data ..................................................................... 37 5.2. Study No II (Analysis of oral and throat microphones using Random Forest classifier) data ............................................................... 40 5.3. Study No III (Analysis of oral and smart phone microphones data using Random Forest classifier) data .............................................. 42 5.4 Study No IV (Testing of VoiceTest software) data .......................... 45 6. DISCUSSION ......................................................................................... 47 7. CONCLUSION ....................................................................................... 58 REFERENCES ............................................................................................. 59 LIST OF PUBLICATIONS ........................................................................ 74 PUBLICATIONS ........................................................................................ 76 SANTRAUKA .......................................................................................... 102 CURRICULUM VITAE ........................................................................... 109 PADĖKA ................................................................................................... 110 6 ABREVIATIONS RF – Random forest SVM – Support vector machine GERD – Gastro oesophageal disease F0 – Fundamental frequency SNR – Signal to noise ratio HNR – Harmonic to noise ratio NNE – Normalized noise energy CCR – Correct classification rate EER – Equal error rate GFI – Glottal functional index VQOL – Voice-disordered quality of life VHI – Voice handicap index GFI-LT – Glottal functional index Lithuanian version LVQ – Learning vector quantization GMM – Gaussian mixture model HMM – Hidden Markov model LDA – Linear discriminant analysis k-NN – k-nearest neighbours MLP – Multi-layer perceptron SVM – Support vector machine CC – Cepstral coefficient BEC – Band energy coefficient RASTA-PLP – Relative spectral transform perceptual linear prediction LPC – Linear predictive coding GNE – Glottal-to-noise excitation VLS – Video laryngostroboscopy SD – Standard deviation CART – Classification and regression tree t-SNE – t-distributed stochastic neighbour embedding algorithm DET – Detection error trade-off curve ROC – Receiver operating characteristic curve AUC – Area under the curve OOB – Out-of-bag data classification accuracy (%) VHI – Voice handicap index 7 INTRODUCTION Over the past 200 000 years humans have used lungs, larynx, tongue, and lips in order to produce and modify the highly intricate arrays of voice for realizing verbal communication and emotional expression [1]. Vocal folds have evolved to be a key organ in the creation of human voice. The vibrations of the vocal folds serve as an origin of the primary voice signal. The process of voice production is called phonation, and it is the preli- minary stage for speech production [2]. So what is a normal healthy voice? In 1956, Jahnson et al. suggested a description that a healthy voice is a voice of nice quality and colour, and that it shows the speakers age and sex; as well, it is a voice that has normal loudness and adequate possibilities to change voice loudness and tone [3]. Voice is the main and the easiest instrument for communication between people, and it is one of the most challenging means of information transmission from a person to a computer. When we speak, we give some certain straight and verbal information to other people; however, we also supply some certain indirect and bodily information about ourselves. Such information as the psychological and emotional status, personality, identity and aesthetic orientation is also conveyed [4]. Every day the voice is influenced by internal and external factors. External factors, such as dust, dry or very humid air, temperature, noisy background, incorrect bodily position when speaking, etc. affect our voice [5]. There are internal factors, as well, such as viral infections, GERD, larynx pathologies, neurological diseases, and hormone fluctuations [6–8]. If we made a search throughout the scientific