Direct Generation of Speech from Electromyographic Signals: EMG-To

Total Page:16

File Type:pdf, Size:1020Kb

Direct Generation of Speech from Electromyographic Signals: EMG-To EMG-to-Speech: Direct Generation of Speech from Facial Electromyographic Signals Zur Erlangung des akademischen Grades eines Doktors der Ingenieurwissenschaften der KIT-Fakult¨atf¨urInformatik des Karlsruher Instituts f¨urTechnologie (KIT) genehmigte Dissertation von Matthias Janke aus Offenburg Tag der m¨undlichen Pr¨ufung: 22. 04. 2016 Erste Gutachterin: Prof. Dr.-Ing. Tanja Schultz Zweiter Gutachter: Prof. Alan W Black, PhD Summary For several years, alternative speech communication techniques have been examined, which are based solely on the articulatory muscle signals instead of the acoustic speech signal. Since these approaches also work with completely silent articulated speech, several advantages arise: the signal is not corrupted by background noise, bystanders are not disturbed, as well as assistance to people who have lost their voice, e.g. due to accident or due to disease of the larynx. The general objective of this work is the design, implementation, improve- ment and evaluation of a system that uses surface electromyographic (EMG) signals and directly synthesizes an audible speech output: EMG-to-speech. The electrical potentials of the articulatory muscles are recorded by small electrodes on the surface of the face and neck. An analysis of these signals allows interpretations on the movements of the articulatory apparatus and in turn on the spoken speech itself. An approach for creating an acoustic signal from the EMG-signal is the usage of techniques from automatic speech recognition. Here, a textual output is produced, which in turn is further processed by a text-to-speech synthesis component. However, this approach is difficult resulting from challenges in the speech recognition part, such as the restriction to a given vocabulary or recognition errors of the system. This thesis investigates the possibility to convert the recorded EMG signal directly into a speech signal, without being bound to a limited vocabulary or other limitations from an speech recognition component. Different ap- proaches for the conversion are being pursued, real-time capable systems are implemented, evaluated and compared. For training a statistical transformation model, the EMG signals and the acoustic speech are captured simultaneously and relevant characteristics in terms of features are extracted. The acoustic speech data is only required as a reference for the training, thus the actual application of the transformation can take place using solely the EMG data. A feature mapping is accomplished ii Summary by a model that estimates the relationship between muscle activity patterns and speech sound components. From the speech components the final audible voice signal is synthesized. This approach is based on a source-filter model of speech: The fundamental frequency is overlaid with the spectral information (Mel Cepstrum), which reflects the vocal tract, to generate the final speech signal. To ensure a natural voice output, the usage of the fundamental frequency for prosody generation is of great importance. To bridge the gap between normal speech (with fundamental frequency) and silent speech (no speech signal at all), whispered speech recordings are investigated as an intermediate step. In whispered speech no fundamental frequency exists and accordingly the generation of prosody is possible, but difficult. This thesis examines and evaluates the following three mapping methods for feature conversion: 1. Gaussian Mapping: A statistical method that trains a Gaussian mixture model based on the EMG and speech characteristics in order to estimate a mapping from EMG to speech. Originally, this technology stems from the Voice Conversion domain: The voice of a source speaker is transformed so that it sounds like the voice of a target speaker. 2. Unit Selection: The recorded EMG and speech data are segmented in paired units and stored in a database. For the conversion of the EMG signal, the database is searched for the best matching units and the resulting speech segments are concatenated to build the final acoustic output. 3. Deep Neural Networks: Similar to the Gaussian mapping, a statisti- cal relation of the feature vectors is trained, using deep neural networks. These can also contain recurrences to reproduce additional temporal contextual sequences. This thesis examines the proposed approaches for a suitable direct EMG- to-speech transformation, followed by an implementation and optimization of each approach. To analyze the efficiency of these methods, a data cor- pus was recorded, containing normally spoken, as well as silently articulated speech and EMG recordings. While previous EMG and speech recordings were mostly limited to 50 utterances, the recordings performed in the scope of this thesis include sessions with up to 2 hours of parallel EMG and speech data. Based on this data, two of the approaches are applied for the first time to EMG-to-speech transformation. Comparing the obtained results to Summary iii related EMG-to-speech work, shows a relative improvement of 29 % with our best performing mapping approach. Zusammenfassung Seit einigen Jahren werden alternative Sprachkommunikationsformen unter- sucht, welche anstatt des akustischen Sprachsignals allein auf den, beim Spre- chen aufgezeichneten, Muskelsignalen beruhen. Da dieses Verfahren auch funktioniert, wenn komplett lautlos artikuliert wird, ergeben sich mehre- re Vorteile: Keine Nutzsignalbeeintr¨achtigung durch Hintergrundger¨ausche, keine L¨armbel¨astigung Dritter, sowie die Unterstutzung¨ von Menschen, die beispielsweise durch Unfall oder Erkrankung ihre Stimme verloren haben. Ziel dieser Arbeit ist die Entwicklung, Evaluation und Verbesserung eines Systems, welches elektromyographische (EMG) Signale der artikulatorischen Muskelaktivit¨at direkt in ein h¨orbares Sprachsignal synthestisiert: EMG-to- speech. Dazu werden elektrische Potentiale der Artikulationsmuskeln, durch am Ge- sicht angebrachte, Oberfl¨achenelektroden aufgezeichnet. Eine Analyse dieser Signale erm¨oglicht Ruckschl¨ usse¨ auf die Bewegungen des Artikulationsappa- rates und damit wiederum auf die gesprochene Sprache selbst. Ein m¨oglicher Ansatz zur Erzeugung der akustischen Sprache aus dem EMG- Signal ist die Verwendung von Techniken aus der automatischen Spracher- kennung. Dabei wird eine textuelle Ausgabe erzeugt, welche wiederum mit einer textbasierten Synthese verknupft¨ wird, um das Gesprochene h¨orbar zu machen. Dieses Verfahren ist jedoch verbunden mit Einschr¨ankungen das Spracherkennungssystems, wie z.B. das limitierte Vokabular sowie Erken- nungsfehler des Systems. Diese Arbeit befasst sich mit der M¨oglichkeit das aufgezeichnete EMG-Signal direkt in ein Sprachsignal umzuwandeln, ohne dabei an ein beschr¨anktes Vo- kabular oder weitere Vorgaben gebunden zu sein. Dazu werden unterschiedli- che Abbildungsans¨atze verfolgt, m¨oglichst echtzeitf¨ahige Systeme implemen- tiert, sowie anschließend evaluiert und miteinander verglichen. Zum Training eines statistischen Transformationsmodells werden die EMG- Signale und die simultan erzeugte akustische Sprache zun¨achst erfasst und relevante Merkmale extrahiert. Die akustischen Sprachdaten werden dabei vi Zusammenfassung nur als Referenz fur¨ das Training ben¨otigt, so dass die eigentliche Anwendung der Transformation ausschließlich auf den EMG-Daten stattfinden kann. Die Transformation der Merkmale wird erreicht, indem ein Modell erzeugt wird, das den Zusammenhang von Muskelaktivit¨atsmustern und Lautbe- standteilen der Sprache abbildet. Dazu werden drei alternative Abbildungs- verfahren untersucht. Aus den Sprachbestandteilen kann wiederum das h¨orbare Sprachsignal synthetisiert werden. Grundlage dafur¨ ist ein Quelle-Filter-Modell: Die Grundfrequenz wird mit den spektralen Informationen (Mel Cepstrum) uberlagert,¨ die den Vokaltrakt wiederspiegeln, um daraus das finale Sprach- signal zu generieren. Um eine naturliche¨ Sprachausgabe zu gew¨ahrleisten, ist die Verwendung der Grundfrequenz zur Generierung der Prosodie von großer Bedeutung. Da sich die sprachlichen Charakteristika von normaler Sprache (enth¨alt Grundfre- quenz) und lautloser Sprache (kein Sprachsignal) deutlich unterscheiden, werden als Zwischenschritt geflusterte¨ Sprachaufnahmen betrachtet. Bei ge- flusterter¨ Sprache ist keine Grundfrequenz vorhanden und dementsprechend die Prosodiegenerierung notwendig, wobei im Gegensatz zur lautlosen Spra- che auf ein akustisches Sprachsignal zuruckgegriffen¨ werden kann. In dieser Dissertation wurden die folgenden drei Abbildungsverfahren zur Merkmalstransformation entwickelt, implementiert, im Detail untersucht und ausgewertet: 1. Gaussian Mapping: Ein statistisches Verfahren, das eine Gaußsche Mischverteilung auf Basis der EMG- und Sprach-Merkmale trainiert, um damit eine Abbildung von EMG auf Sprache zu sch¨atzen. Ur- sprunglich¨ findet diese Technik im Bereich der Sprechertransformation Verwendung: Die Stimme eines Quellsprechers wird so transformiert, dass sie wie die Stimme eines Zielsprechers klingt. 2. Unit Selection: Die aufgenommenen EMG- und Sprach-Daten wer- den in paarweise Einheiten segmentiert und in einer Datenbank ab- gelegt. Fur¨ die Konvertierung des EMG-Signals wird die Datenbank nach m¨oglichst passenden Segmenten durchsucht und die resultieren- den Sprachabschnitte konkateniert. 3. Deep Neural Networks: Ahnlich¨ dem Gaussian Mapping wird hier eine Abbildung der Merkmalsvektoren trainiert. Als Abbildungsverfah- ren kommt ein vielschichtiges neuronales Netz zum Einsatz, welches zus¨atzlich auch Rekurrenzen enthalten kann, um zus¨atzlich
Recommended publications
  • Talking Without Talking
    Honey Brijwani et al Int. Journal of Engineering Research and Applications www.ijera.com ISSN : 2248-9622, Vol. 4, Issue 4( Version 9), April 2014, pp.51-56 RESEARCH ARTICLE OPEN ACCESS Talking Without Talking Deepak Balwani*, Honey Brijwani*, Karishma Daswani*, Somyata Rastogi* *(Student, Department of Electronics and Telecommunication, V. E. S. Institute of Technology, Mumbai ABTRACT The Silent sound technology is a completely new technology which can prove to be a solution for those who have lost their voice but wish to speak over phone. It can be used as part of a communications system operating in silence-required or high-background- noise environments. This article outlines the history associated with the technology followed by presenting two most preferred techniques viz. Electromyography and Ultrasound SSI. The concluding Section compares and contrasts these two techniques and put forth the future prospects of this technology. Keywords- Articulators , Electromyography, Ultrasound SSI, Vocoder, Linear Predictive Coding I. INTRODUCTION Electromyography involves monitoring tiny muscular Each one of us at some point or the other in movements that occur when we speak and converting our lives must have faced a situation of talking aloud them into electrical pulses that can then be turned on the cell phone in the midst of the disturbance into speech, without a sound being uttered. while travelling in trains or buses or in a movie Ultrasound imagery is a non-invasive and clinically theatre. One of the technologies that can eliminate safe procedure which makes possible the real-time this problem is the ‘Silent Sound’ technology. visualization of the tongue by using an ultrasound ‗Silent sound‘ technology is a technique that transducer.
    [Show full text]
  • Preliminary Test of a Real-Time, Interactive Silent Speech Interface Based on Electromagnetic Articulograph
    Preliminary Test of a Real-Time, Interactive Silent Speech Interface Based on Electromagnetic Articulograph Jun Wang Ashok Samal Jordan R. Green Dept. of Bioengineering Dept. of Computer Science & Dept. of Communication Sci- Callier Center for Communi- Engineering ences & Disorders cation Disorders University of Nebraska- MGH Institute of Health Pro- University of Texas at Dallas Lincoln fessions [email protected] [email protected] [email protected] treatment options for these individuals including Abstract (1) esophageal speech, which involves oscillation of the esophagus and is difficult to learn; (2) tra- A silent speech interface (SSI) maps articula- cheo-esophageal speech, in which a voice pros- tory movement data to speech output. Alt- thesis is placed in a tracheo-esophageal puncture; hough still in experimental stages, silent and (3) electrolarynx, an external device held on speech interfaces hold significant potential the neck during articulation, which produces a for facilitating oral communication in persons robotic voice quality (Liu and Ng, 2007). Per- after laryngectomy or with other severe voice impairments. Despite the recent efforts on si- haps the greatest disadvantage of these ap- lent speech recognition algorithm develop- proaches is that they produce abnormal sounding ment using offline data analysis, online test speech with a fundamental frequency that is low of SSIs have rarely been conducted. In this and limited in range. The abnormal voice quality paper, we present a preliminary, online test of output severely affects the social life of people a real-time, interactive SSI based on electro- after laryngectomy (Liu and Ng, 2007). In addi- magnetic motion tracking. The SSI played tion, the tracheo-esophageal option requires an back synthesized speech sounds in response additional surgery, which is not suitable for eve- to the user’s tongue and lip movements.
    [Show full text]
  • Silent Speech Interfaces B
    Silent Speech Interfaces B. Denby, T. Schultz, K. Honda, Thomas Hueber, J.M. Gilbert, J.S. Brumberg To cite this version: B. Denby, T. Schultz, K. Honda, Thomas Hueber, J.M. Gilbert, et al.. Silent Speech Interfaces. Speech Communication, Elsevier : North-Holland, 2010, 52 (4), pp.270. 10.1016/j.specom.2009.08.002. hal- 00616227 HAL Id: hal-00616227 https://hal.archives-ouvertes.fr/hal-00616227 Submitted on 20 Aug 2011 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Accepted Manuscript Silent Speech Interfaces B. Denby, T. Schultz, K. Honda, T. Hueber, J.M. Gilbert, J.S. Brumberg PII: S0167-6393(09)00130-7 DOI: 10.1016/j.specom.2009.08.002 Reference: SPECOM 1827 To appear in: Speech Communication Received Date: 12 April 2009 Accepted Date: 20 August 2009 Please cite this article as: Denby, B., Schultz, T., Honda, K., Hueber, T., Gilbert, J.M., Brumberg, J.S., Silent Speech Interfaces, Speech Communication (2009), doi: 10.1016/j.specom.2009.08.002 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript.
    [Show full text]
  • Silent Sound Technology: a Solution to Noisy Communication
    International Journal of Engineering Trends and Technology (IJETT) – Volume 9 Number 14 – Mar 2014 Silent Sound Technology: A Solution to Noisy Communication Priya Jethani#1, Bharat Choudhari*2 #M.E Student , Padmashree Dr. V.B.Kolte College of Engineering, Malkapur, Maharashtra ,India *Assistant Professor, Padmashree Dr V.B.Kolte College of Engineering, Malkapur, Maharashtra, India Abstract— when we are in movie, theatre, bus, train there is lot levels. Most people can also understand a few of noise around us we can’t speak properly on a mobile phone. In words which are unspoken, by lip-reading The idea future this problem is eliminated with the help of Silent sound technology. It is a technology that helps you to transmit of interpreting silent speech electronically or with a information without using your vocal cords. This technology computer has been around for a long time, and was notices every lip movements & transforms them into a computer popularized in the 1968 Stanley Kubrick science- generated sound that can be transmitted over a phone. Hence person on another end of phone receives the information in audio fiction film ‘‘2001 – A Space Odyssey ” [7].A This device is developed by the Karlsruhe Institute of major focal point was the DARPA Advanced Technology (KIT).It uses electromyography, monitoring tiny Speech Encoding Program (ASE ) of the early muscular movements that occur when we speak and converting them into electrical pulses that can then be turned into speech, 2000’s, which funded research on low bit rate without a sound uttered .When demonstrated, it seems to detect speech synthesis ‘‘with acceptable intelligibility, every lip movement and internally converts the electrical pulses quality , and aural speaker recognizability in into sounds signals and sends them neglecting all other surrounding noise.
    [Show full text]
  • Pedro Miguel André Coelho Generic Modalities for Silent Interaction
    Departamento de Universidade de Aveiro Electrónica, Telecomunicações e Informática, 2019 Pedro Miguel Generic Modalities for Silent Interaction André Coelho Modalidades Genéricas para Interação Silenciosa Departamento de Universidade de Aveiro Electrónica, Telecomunicações e Informática, 2019 Pedro Miguel Generic Modalities for Silent Interaction André Coelho Modalidades Genéricas para Interação Silenciosa Dissertação apresentada à Universidade de Aveiro para cumprimento dos requisitos necessários à obtenção do grau de Mestre em Engenharia de Computadores e Telemática, realizada sob a orientação científica do Doutor Samuel de Sousa Silva, Investigador do Instituto de Engenharia Electrónica e Informática de Aveiro e do Doutor António Joaquim da Silva Teixeira, Professor Associado com Agregação do Departamento de Eletrónica, Tele- comunicações e Informática da Universidade de Aveiro. o júri / the jury presidente / president Doutora Beatriz Sousa Santos Professora Associada com Agregação da Universidade de Aveiro vogais / examiners committee Doutor José Casimiro Pereira Professor Adjunto do Instituto Politécnico de Tomar Doutor Samuel de Sousa Silva (Orientador) Investigador da Universidade de Aveiro agradecimentos Em primeiro lugar, quero agradecer aos meus orientadores, Doutor Samuel Silva e Professor Doutor António Teixeira, pela confiança que em mim depositaram, por toda a dedicação e pela motivação incondicional. Agradeço, de igual forma, ao Doutor Nuno Almeida, pelo incentivo e apoio na elaboração da presente dissertação. Aos meus amigos e colegas, por todo o encorajamento e amizade. À Mariana, pela sua paciência, compreensão e ajuda prestada, con- tribuindo para chegar ao fim deste percurso. Por último, o meu profundo e sentido agradecimento à minha família, por toda a força e por estarem sempre presentes ao longo da minha vida académica.
    [Show full text]
  • Articulatory-Based English Consonant Synthesis in 2-D Digital Waveguide Mesh
    Articulatory-Based English Consonant Synthesis in 2-D Digital Waveguide Mesh Anocha Rugchatjaroen PhD Department of Electronics University of York June, 2014 0 1 Abstract In articulatory speech synthesis, the 3-D shape of a vocal tract for a particular speech sound has typically been established, for example, by magnetic resonance imaging (MRI), and this is used to model the acoustic output from the tract using numerical methods that operate in either one, two or three dimensions. The dimensionality strongly affects the overall computation complexity, which has a direct bearing on the quality of the synthesized speech output. The digital waveguide mesh (DWM) is a numerical method commonly used in room acoustic modelling. A smaller space such as a vocal tract, which is about 5 cm wide and 16.5-18 cm long in adults, can also be modelled using DWM in one, two and three dimensions. The latter requires a very dense mesh requiring massive computational resources; these requirements are lessened by using a lower dimensionality (two rather than three) and/or a less dense mesh. The computational cost of 2-D digital waveguide modelling makes it a practical technique for real-time synthesis in an average PC at full (20 kHz) audio bandwidth. This research makes use of a 2-D mesh with the advantage of the availability and flexibility of existing boundary modelling and the raised-cosine impedance control to study the possibilities of using it for English consonant synthesis. The research was organized under the phonetic ‘manner’ classification of English consonants as: semi-vowel, nasal, fricative, plosive and affricate.
    [Show full text]
  • Text-To-Speech Synthesis As an Alternative Communication Means
    Biomed Pap Med Fac Univ Palacky Olomouc Czech Repub. 2021 Jun; 165(2):192-197. Text-to-speech synthesis as an alternative communication means after total laryngectomy Barbora Repovaa#, Michal Zabrodskya#, Jan Plzaka, David Kalferta, Jindrich Matousekb, Jan Betkaa Aims. Total laryngectomy still plays an essential part in the treatment of laryngeal cancer and loss of voice is the most feared consequence of the surgery. Commonly used rehabilitation methods include esophageal voice, electrolarynx, and implantation of voice prosthesis. In this paper we focus on a new perspective of vocal rehabilitation utilizing al- ternative and augmentative communication (AAC) methods. Methods and Patients. 61 consecutive patients treated by means of total laryngectomy with or w/o voice prosthesis implantation were included in the study. All were offered voice banking and personalized speech synthesis (PSS). They had to voluntarily express their willingness to participate and to prove the ability to use modern electronic com- munication devices. Results. Of 30 patients fulfilling the study criteria, only 18 completed voice recording sufficient for voice reconstruction and synthesis. Eventually, only 7 patients started to use this AAC technology during the early postoperative period. The frequency and total usage time of the device gradually decreased. Currently, only 6 patients are active users of the technology. Conclusion. The influence of communication with the surrounding world on the quality of life of patients after total laryngectomy is unquestionable. The possibility of using the spoken word with the patient's personalized voice is an indisputable advantage. Such a form of voice rehabilitation should be offered to all patients who are deemed eligible.
    [Show full text]
  • The IAL News
    The IAL News The International Association of Laryngectomees Vol. 60 No. 4 November 2015 Viet Nam Veteran Assists Vietnamese Laryngectomees By Larry Hammer, M.C.D Left to right- Larry Hammer, Speech Language Pathologist; Dr. Huynh An, Chief ENT; Laryngectomee patient and wife; Ms. Dang Thi Thu Tien, therapist. After having made a presentation on the communication of Laryngectomees following surgery and demonstrating the functional use of the electrolarynx, I turn the evaluation and training over to the physician and therapist to evaluate and train the Laryngectomee. I then serve as a “coach”. Viet Nam Laryngectomee Humanitarian Project In 1975 I completed my undergraduate studies at (VNLHP) made its most recent trip to Viet Nam in the University of Nebraska – Lincoln. It was the April 2015. Forty-three (43) electrolarynx devices year of the end of the Viet Nam War. I had served (ELD) were provided to indigent Laryngectomees in in the U.S. Marine Corps during that war during Ha Noi, Da Nang City and Hue City. Actively prac- 1968 and 1969. That was when I committed to ticing as a Speech Language Pathologist, I felt most someday return to Viet Nam to use my skills as a personally rewarded by being involved with the Speech Language Pathologist to serve the Vietnam- communication rehabilitation of Laryngectomees. ese people. I saw a significant number of people They are a unique and special group of people who who had suffered injuries that left them with a vari- have professionally motivated and inspired me for ety of disorders related to Continued on Page 6 36 years.
    [Show full text]
  • Ultrasound-Based Silent Speech Interface Using Deep Learning
    Budapest University of Technology and Economics Faculty of Electrical Engineering and Informatics Dept. of Telecommunications and Media Informatics Ultrasound-based Silent Speech Interface using Deep Learning BSc Thesis Author Supervisor Eloi Moliner Juanpere dr. Tamás Gábor Csapó May 18, 2018 Contents Abstract 4 Resum 5 Introduction6 1 Input data8 1.1 Ultrasound imaging................................8 1.2 Tongue images..................................9 2 Target data 11 2.1 Mel-Generalized Cepstral coefficients...................... 11 2.2 Line Spectral Pair................................ 12 2.3 Vocoder...................................... 13 3 Deep Learning Architectures 14 3.1 Deep Feed-forward Neural Network....................... 14 3.2 Convolutional Neural Network.......................... 16 3.3 CNN-LSTM Recurrent Neural Network..................... 18 3.3.1 Bidirectional CNN-LSTM Recurrent Neural Network......... 19 3.4 Denoising Convolutional Autoencoder..................... 20 4 Hyperparameter optimization 22 4.1 Deep Feed-forward Neural Network....................... 22 4.2 Convolutional Neural Network.......................... 24 4.3 CNN-LSTM Recurrent Neural Network..................... 25 4.3.1 Bidirectional CNN-LSTM Recurrent Neural Network......... 25 4.4 Denoising Convolutional Autoencoder..................... 26 4.5 Optimization using the denoised images.................... 27 5 Results 31 5.1 Objective measurements............................. 31 5.2 Listening subjective tests............................ 34 6 Conclusions 36 1 Acknowledgements 37 List of Figures 38 List of Tables 39 Acknowledgements 43 2 STUDENT DECLARATION I, Eloi Moliner Juanpere, the undersigned, hereby declare that the present BSc thesis work has been prepared by myself and without any unauthorized help or assistance. Only the specified sources (references, tools, etc.) were used. All parts taken from other sources word by word, or after rephrasing but with identical meaning, were unambiguously iden- tified with explicit reference to the sources utilized.
    [Show full text]
  • Across-Speaker Articulatory Normalization for Speaker-Independent Silent Speech Recognition Jun Wang University of Texas at Dallas, [email protected]
    University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln CSE Conference and Workshop Papers Computer Science and Engineering, Department of Fall 9-2014 Across-speaker Articulatory Normalization for Speaker-independent Silent Speech Recognition Jun Wang University of Texas at Dallas, [email protected] Ashok Samal University of Nebraska-Lincoln, [email protected] Jordan Green MGH Institute of Health Professions, [email protected] Follow this and additional works at: http://digitalcommons.unl.edu/cseconfwork Part of the Computer Sciences Commons, and the Speech and Hearing Science Commons Wang, Jun; Samal, Ashok; and Green, Jordan, "Across-speaker Articulatory Normalization for Speaker-independent Silent Speech Recognition" (2014). CSE Conference and Workshop Papers. Paper 234. http://digitalcommons.unl.edu/cseconfwork/234 This Article is brought to you for free and open access by the Computer Science and Engineering, Department of at DigitalCommons@University of Nebraska - Lincoln. It has been accepted for inclusion in CSE Conference and Workshop Papers by an authorized administrator of DigitalCommons@University of Nebraska - Lincoln. INTERSPEECH 2014 Across-speaker Articulatory Normalization for Speaker-independent Silent Speech Recognition Jun Wang 1, 2, Ashok Samal 3, Jordan R. Green 4 1 Department of Bioengineering 2 Callier Center for Communication Disorders University of Texas at Dallas, Dallas, Texas, USA 3 Department of Computer Science & Engineering University of Nebraska-Lincoln, Lincoln, Nebraska, USA
    [Show full text]
  • A History of Vocoder Research at Lincoln Laboratory
    B.Gold A History of Vocoder Research at Lincoln Laboratory During the 1950s, Lincoln Laboratory conducted a study of the detection of pitch in speech. This work led to the design ofvoice coders (vocoders)-devices that reduce the bandwidth needed to convey speech. The bandwidth reduction has two benefits: it lowers the cost oftransmission and reception ofspeech, and it increases the potential privacy. This article reviews the past 30 years ofvocoder research with an emphasis on the work done at Lincoln Laboratory. ifI could detennine what there is in the very rapidly basically a transmitter of waveforms. In fact, changingcomplexspeechwavethatcorrespondstothe telephone technology to date has been almost simple motions ojthe Zips and tongue, I could then totally oriented toward transmission while the analyze speechjorthesequantities, I wouldhaveaset ojspeech defining signals that could be handled as subject of speech modeling has remained of low-jrequency telegraph currents with resulting ad­ peripheral practical interest, in spite ofyears of vantages ojsecrecy, and more telephone channels in research by engineers and scientists from Lin­ the sameJrequency space as well as a basic under­ coln Laboratory, Bell Laboratories, and other standingoJthecarriernatureojspeechbywhichthe lip research centers. reader interprets speechjrom simple motions. -Homer Dudley, 1935 Modern methods ofspeech processing really beganinthe UnitedStateswiththe development Historical Background oftwo devices: the voice operated demonstrator (voder) [4] and channel voice coder (vocoder) [51. More than 200 years ago, the Hungartan Both devices were pioneered by Homer Dudley, inventorW. von Kempelen was the first to build whose philosophy ofspeech processing is quot­ a mechanical talking contrivance that could ed at the beginning of this article.
    [Show full text]
  • Sessions, Workshops, and Technical Initiatives
    TUESDAY MORNING, 19 MAY 2015 BALLROOM 3, 8:00 A.M. TO 12:00 NOON Session 2aAAa Architectural Acoustics, Noise, Speech Communication, Psychological Physiological Acoustics, and Signal Processing in Acoustics: Sixty-Fifth Anniversary of Noise and Health: Session in Honor of Karl Kryter I David M. Sykes, Cochair Architectural Acoustics, Rensselaer Polytechnic Institute, 31 Baker Farm Rd., Lincoln, MA 01773 Michael J. Epstein, Cochair Northeastern University, 360 Huntington Ave., 226FR, Boston, MA 02115 William J. Cavanaugh, Cochair Cavanaugh Tocci Assoc. Inc., 3 Merifield Ln., Natick, MA 01760-5520 Chair’s Introduction—8:00 Invited Papers 8:05 2aAAa1. Auditory and non-auditory effects of noise on health: An ICBEN perspective. Mathias Basner (Psychiatry, Univ. of Penn- sylvania Perelman School of Medicine, 1019 Blockley Hall, 423 Guardian Dr., Philadelphia, PA 19104-6021, [email protected]) Noise is pervasive in everyday life and induces both auditory and non-auditory health effects. Noise-induced hearing loss remains the most common occupational disease in the United States, but is also increasingly caused by social noise exposure (e.g., through music players). Simultaneously, evidence on the public health impact of non-auditory effects of environmental noise exposure is growing. According to the World Health Organization (WHO), ca. 1.6 Million healthy life years (DALYs) are lost annually in the western mem- ber states of the European Union due to exposure to environmental noise. The majority (>90%) of these effects can be attributed to noise-induced sleep disturbance and community annoyance, but noise may also interfere with communication and lead to cognitive impairment of children. Epidemiological studies increasingly support an association of long-term noise exposure with the incidence of hypertension and cardiovascular disease.
    [Show full text]