Artificial Speech for Intensive Care Unit (ICU) Patients and Laryngectomees
Total Page:16
File Type:pdf, Size:1020Kb
Artificial Speech for Intensive Care Unit (ICU) Patients and Laryngecton1ees by Marilyn Adeline Mei l-im B .E. (Hons 1) A thesis presented for the degree of Doctor of Philosophy in the Department of Electrical and Computer Engineering University of Canterbury Private Bag 4800, Christchurch, New Zealand 31 July 2005 Abstract A method and prototype device to provide artificial speech for intensive care unit (leU) patients and laryngectomees is presented. The method assists these patients to produce natural sounding speech by "mouthing the words". A review of the current communication techniques for these patients is presented. The limitations of these techniques suggests that there is a need for a device that produces natural sounding speech (pitch variation and glottal sound source that resembles the actual glottal pulse generated by the vibrating vocal folds) and a device that is user friendly. As vocal folds only vibrate during vowel production, only vowel sounds are considered. Since pitch variation plays a major role in the naturalness of a person's voice, a number of alternative (automatic) pitch control tecbniques were explored. A unique pitch control technique utilising the changes in jaw height when a person "mouth the words" is presented. The electroglottographic (EGG) signal is used as the glottal sound source signal for this research as the properties of the EGG signal offers a number of advantages compared with other glottal sound source measurement techniques. A new glottal source model known as the twin-bar model, based on EGG measurements from normal volunteers, is also introduced. This model changes the shape of the glottal pulse based on a single parameter: pitch. Perceptual testing of the simulated voice using the twin-bar glottal model and two other well-known models on volunteers showed that the twin-bar model produces more natural sounding voice than the other two models. A new artificial speech system combining the automatic pitch control technique Gaw height) and the glottal sound source (twin-bar model) was constructed. It also includes a number of extra functions that would further improves the speech produced with this system. Existing technology on a laptop (e.g. serial port communication, bluetooth transceivers and USB port) is utilised for the construction of the prototype, with the laptop as the signal processing unit. The prototype was tested on a normal subject. 17 JAN 2U06 Acknowledgements r am very grateful to my supervisor, Associate Professor Philip Bones and my associate supervi sor, Dr. Emily Lin (Department of Communications Disorders, University of Canterbury), for their guidance, encouragement, useful and critical comments, and the constant proof-reading through out the course of my studies. r would also like to thank my two other associate supervisors from Christchurch Hospital, Dr. Geoffrey Shaw (rCU specialist) and Mr. Richard Dove (biomedical en gineer from the Medical Physics and Bioengineering Department) for their support, enthusiasm and great ideas. A special thanks to Dr. Margaret Maclagan (Communications disorders department, University of Canterbury) and Mr. Darren Ayling (speech and language therapist, Christchurch Hospital) for their help and support with the basics of speech and language, at the beginning of my research. r am indebted to the technical staff, especially Mr. Philipp Hof, Mr. Dudley Berry and Mr. Dermot Sallis at Electrical and Computer Engineering Department who are always ready to lend a hand and for their practical suggestions. r acknowledge the financial support from the University of Canterbury, the Royal Society of New Zealand, the Royal Society of New Zealand (Canterbury Branch) and the Canterbury Medical Re search Foundation. r am especially grateful to my friends and colleagues for their encouragement and for just being there to listen, at times when my thesis seems to have stalled. r thank God for guiding me through the ups and downs of this project and for giving me the passion to do a project that r hope will help the less fortunate individuals who cannot speak. Last, but not least, r would like to thank my wonderful family, especially Christopher, for helping me make this project a reality. r thank them also for their unconditional love and support, for their suggestions and even proof-reading. I could not have done it without them. Contents 1 Introduction 1 1.1 Motivation. • • ~ • 5 • 1 1.2 Current practice in ICU 2 1.3 Project objectives 2 1.4 Thesis overview . 3 1.5 Journal articles and conference presentations. 3 2 Speech sounds and characteristics 5 2.1 Mechanism of human speech production . 5 2.2 Speech sounds and articulations 6 2.3 Source-filter model 9 2.4 Speech analysis . 11 2.5 Visualising Speech and Spectral Characteristics of Speech 12 2.6 Sound sources and characteristics 13 2.6.1 Voiced sound source .. 13 2.6.2 Unvoiced sound source. 15 2.6.3 Voice types . 16 2.6.4 Fundamental frequency range and pitch 16 2.6.5 Prosodic features, jitter and shimmer. l7 2.7 Vocal folds movement measurement techniques 18 3 Communication techniques for speech impaired individuals 21 I II Contents 3.1 Speech impainnent in tracheostomised and ventilator dependent patients. 22 3.2 Speech impainnent in laryngectomees ............... 23 3.3 Standard communication techniques for speech impaired individuals 23 3.3.1 Non-vocal treatment: unaided communication techniques. 23 3.3.2 Non-vocal treatment: aided communication techniques 24 3.3.3 Vocal treatment .................... 26 3.3.4 Alternative augmented c0111111unication (AAC) devices 32 3.4 Proposal for improved speech production for tracheostomised individuals and laryngectomees . 33 4 EGG analysis techniqnes 35 4.1 1breshold method . 36 4.2 LF-Model method. 37 4.3 Computerised Speech Lab (CSL) and Speech Station2 38 4.4 Modified EGG analysis technique 38 4.4.1 Signal segmentation 41 4.4.2 Drift removal . 41 4.4.3 Signal differentiation and average frequency calculation 42 4.4.4 Average wavefonn calculation 46 4.4.5 Parameters extraction ..... 48 5 Glottal Pulse Model and Vocal Tract Model 51 5.1 The analogy between natural and artificial speech 51 5.2 Existing glottal pulse models 51 5.2.1 Two-pole model 52 5.2.2 Rosenberg model 53 5.2.3 LF-model . 54 5.2.4 Physical models 57 5.3 The new glottal pulse model: the twin-bar model 60 Contents III 5.3.1 Description of the model 61 5.3.2 Analysis ...... 62 5.3.3 Choice of parameters . 64 5.3.4 Comparison to earlier models 66 5.4 Vocal tract modelling . 66 5.4.1 Acoustic tube model 66 5.4.2 The Kelly-Lochbaum structure. 68 5.4.3 Half-sample delay Kelly-Lochbaum structure 70 5.4.4 Z-transform of the lossless tube model . 72 5.4.5 Lip radiation . 74 5.4.6 Choice of vocal tract model for voice synthesis 74 5.5 Summary ....................... 75 6 Alternative pitch control methods for the voiceless 77 6.1 Pitch control methods considered. 78 6.1.1 Thumb pressure sensor . 78 6.1.2 Eyebrow movement .. 78 6.1.3 Laryngeal muscle movement . 79 6.1.4 Laryngeal height . 79 6.1.5 Random pitch variation . 79 6.1.6 Jaw movement . 80 6.2 Preliminary study on vocal folds and jaw movement in voiced sounds 80 6.2.1 Method..... 81 6.2.2 Instrumentation. 82 6.2.3 Procedure... 84 6.2.4 Data Analysis . 84 6.2.5 Statistical Analysis 88 6.2.6 Results . 89 IV Contents 6.2.7 Discussion . 93 6.2.8 Conclusions. 94 6.3 Jaw movement detector for prototype development 95 7 Contribntion of glottal wave shape to natural sounding voiced speech 97 7.1 Waveform shape experiment 98 7.1.1 Setup ... 98 7.1.2 Procedure. 100 7.1.3 Waveform analysis 101 7.1.4 Results and discussion 102 7.1.5 Conclusions . 111 7.2 Voice synthesis 112 7.3 Perceptual tests 114 7.3.1 Setup 114 7.3.2 Procedure. 117 7.3.3 Statistical analysis 118 7.3.4 Results and discussions. 119 7.3.5 Conclusion . ~ . 122 8 Hardware development for MyVoice, the artificial voice device 123 8.1 MyVoicel: the first prototype 123 8.1.1 Design layout . 123 8.1.2 Prototype testing 126 8.2 MyVoice2: the second prototype 127 8.2.1 The design layout. 128 8.2.2 Speech enhancement 132 8.3 Use of MyVoice2 .... 132 8.3.1 Prototype testing 139 Contents v 9 Conclusion and suggestions for future research 141 9.1 Conclusion ......... 141 9.2 Suggestions for future research 142 9.2.1 Experiments and glottal model. 142 9.2.2 Hardware development . 144 References 157 VI Chapter 1 Introduction Speech is an aspect of humanity that sets us apart from all other creatures of the earth. Our ability to communicate allows us to share our ideas, emotions and needs with one another. In fact, speech comes so naturally to us that it is often taken for granted until something goes wrong with it. The speech production mechanism involves the interlinking of a variety of complex systems, includ ing the neurological, phonatory, respiratory, and articulatory systems. When one of these complex physiologies is defective, the ability to control speech production becomes impaired. The level of speech impairment may range from something minor like stuttering and slurred speech to the severe case where an individual becomes incapable o~ producing any speech or even voice. This thesis looks at the effect of airway obstruction in the intensive care unit (ICU) and laryngectomised patients on speech production and a new method of restoring speech for these patients. 1.1 Motivation A number of patients who are being treated in ICU due to serious medical illnesses or injury have tracheostomy tubes introduced into them (the insertion of a breathing tube into the trachea) and have to rely on a mechanical ventilator (a machine that pumps air into the lungs) to a'lsist them with their breathing.