Silent Speech Interfaces for Speech Restoration: a Review Jose A
Total Page:16
File Type:pdf, Size:1020Kb
1 Silent Speech Interfaces for Speech Restoration: A Review Jose A. Gonzalez-Lopez, Alejandro Gomez-Alanis, Juan M. Mart´ın-Donas,˜ Jose´ L. Perez-C´ ordoba,´ and Angel M. Gomez Abstract—This review summarises the status of silent speech survey conducted in 70 countries, concluded that 3.6% of the interface (SSI) research. SSIs rely on non-acoustic biosignals population had severe to extreme difficulty with participation generated by the human body during speech production to in the community, a condition which includes communication enable communication whenever normal verbal communication is not possible or not desirable. In this review, we focus on the impairment as a specific case. first case and present latest SSI research aimed at providing Speech and language impairments have a profound impact new alternative and augmentative communication methods for on the lives of people who suffer them, leading them to persons with severe speech disorders. SSIs can employ a variety struggle with daily communication routines. Besides, many of biosignals to enable silent communication, such as electro- service and health-care providers are not trained to inter- physiological recordings of neural activity, electromyographic (EMG) recordings of vocal tract movements or the direct tracking act with speech-disabled persons, and feel uncomfortable or of articulator movements using imaging techniques. Depending ineffective in communicating with them, which aggravates on the disorder, some sensing techniques may be better suited the stigmatisation of this population [4]. As a result, people than others to capture speech-related information. For instance, with speech impairments often develop feelings of personal EMG and imaging techniques are well suited for laryngectomised isolation and social withdrawal, which can lead to clinical patients, whose vocal tract remains almost intact but are unable to speak after the removal of the vocal folds, but fail for depression [5]–[11]. Furthermore, some of these persons also severely paralysed individuals. From the biosignals, SSIs decode develop feelings of loss of identity after losing their voice [12]. the intended message, using automatic speech recognition or Communication impairment can also have important economic speech synthesis algorithms. Despite considerable advances in consequences if they lead to occupational disability. recent years, most present-day SSIs have only been validated in In the absence of clinical procedures for repairing the laboratory settings for healthy users. Thus, as discussed in this paper, a number of challenges remain to be addressed in future damage originating speech impediments, various methods can research before SSIs can be promoted to real-world applications. be used to restore communication. One such is assistive If these issues can be addressed successfully, future SSIs will technology. The U.S. National Institute on Deafness and Other improve the lives of persons with severe speech impairments by Communication Disorders (NIDCD) defines this as any device restoring their communication capabilities. that helps a person with hearing loss or a voice, speech or Index Terms—Silent speech interface, speech restoration, au- language disorder to communicate [13]. For the specific case tomatic speech recognition, speech synthesis, deep neural net- of communication disorders, devices used to supplement or works, brain computer interfaces, speech and language disor- replace speech are known as augmentative and alternative ders, voice disorders, electroencephalography, electromyography, electromagnetic articulography. communication (AAC) devices. AAC devices are diverse and can range from simple paper and pencil resources to picture boards or text-to-speech (TTS) software. From an economic I. INTRODUCTION standpoint, the worldwide market for AAC devices is expected PEECH is the most convenient and natural form of human to grow at an annual rate of 8.0% during the next five years, S communication. Unfortunately, normal speech communi- from $225.8 million in 2019 to $307.7 million in 2025 [14]. cation is not always possible. For example, persons who suf- AAC users include individuals with a variety of conditions, arXiv:2009.02110v3 [eess.AS] 27 Sep 2020 fer traumatic injuries, laryngeal cancer or neurodegenerative whether congenital (e.g., cerebral palsy, intellectual disability) disorders may lose the ability to speak. The prevalence of or acquired (e.g., laryngectomy, neurodegenerative disease or this type of disability is significant, as evidenced by several traumatic brain injury) [15], [16]. studies. For instance, in [1], the authors conclude that ap- In recent years, a promising new AAC approach has proximately 0.4% of the European population have a speech emerged: silent speech interfaces (SSIs) [17]–[19]. SSIs are impediment, while a survey conducted in 2011 [2] concluded assistive devices to restore oral communication by decoding that 0.5% of persons in Europe presented ‘difficulties’ with speech from non-acoustic biosignals generated during speech communication. The American Speech-Language-Hearing As- production. A well-known form of silent speech communica- sociation (ASHA) reports that nearly 40 million U.S. citizens tion is lip reading. A variety of sensing modalities have been have communication disorders, costing the U.S. approximately investigated to capture speech-related biosignals, such as vocal $154-186 billion annually [3]. The World Health Organization tract imaging [20]–[22], electromagnetic articulography (mag- (WHO), in its World Report on Disability [4] derived from a netic tracing of the speech articulator movements) [23]–[27], surface electromyography (sEMG) [28]–[31], which captures These authors are with the Department of Signal Processing, Telematics and Communications, University of Granada, 18071 Granada, Spain (email: electrical activity driving the facial muscles using surface elec- fjoseangl,agomezalanis,mdjuamart,jlpc,[email protected]). trodes, and electroencephalography (EEG) [32]–[34], which 2 captures neural activity in anatomical regions of the brain or the electrical activity driving the facial muscles will be involved in speech production. The latter approach, involving better suited for people with dysarthria, who have difficulties the use of brain activity recordings, is also known as a brain moving and coordinating the lips and tongue, than using computer interface (BCI) [35]–[37]. Since SSIs enable speech sensors for articulator motion capture (e.g., video cameras). communication without relying on the acoustic signal, they The information about the applicable sensor technologies for offer a fundamentally new means of restoring communication each disorder is also shown in Table I. capabilities to persons with speech impairments. Apart from In the rest of this section, we provide an overview of the clinical uses, other potential applications of this technology different types of speech, language and voice disorders, discuss include providing privacy, enabling telephone conversations to their causes and describe methods and devices currently avail- be held without being overheard by bystanders and enhancing able to help speech-impaired people communicate, including normal spoken communication in noisy environments [17], previous investigations in which SSIs have been used to restore [38]. These applications are possible because biosignals are communication. largely insensitive to environmental noise and are independent of the acoustic speech signal (i.e., these biosignals can be A. Aphasia captured even when no vocalisation is performed). SSIs have attracted increasing attention in recent years, as Aphasia is a disorder that affects the comprehension and evidenced by the special sessions organised on this topic at formulation of language and is caused by damage to the related conferences [39]–[41] and by special issues of journals areas of the brain involved in language [49]. People with [17], [42]. These events and publications supplement the exist- aphasia have difficulties with understanding, speaking, reading ing literature in the related research field of BCIs [35], [36], or writing, but their intelligence is normally unaffected. For [43]–[48]. In this review, we present an overview of recent instance, aphasic patients struggle in retrieving the words they advances in the rapidly evolving field of SSIs with special want to say, a condition known as anomia. The opposite mental emphasis on a particular clinical application: communication process, i.e., the transformation of messages heard or read restoration for speech-disabled individuals. The remainder of into an internal message, is also affected in aphasia. Aphasia this paper is structured as follows. Section II first summarises affects not only spoken but also written communication (read- the speech and voice disorders that may affect spoken human ing/writing) and visual language (e.g., sign languages) [49]. communication, describing their causes and effects, and ex- The major causes of aphasia are stroke, head injury, cerebral amines methods currently used to supplement and/or restore tumours or neurodegenerative disorders such as Alzheimer’s communication. Section III then formally introduces SSIs disease (AD) [50]. Among these causes, strokes alone account and details the two main approaches employed in decoding for most new cases of aphasia [51]. Elderly people are speech from biosignals. The sensing modalities that have been especially liable to develop aphasia because the