An Investigation Into Improving Speech Intelligibility Using Binaural Signal Processing

Total Page:16

File Type:pdf, Size:1020Kb

An Investigation Into Improving Speech Intelligibility Using Binaural Signal Processing An investigation into improving speech intelligibility using binaural signal processing Gary Anthony Spittle PhD Electronics July 2009 1 Abstract This thesis sets out to improve the intelligibility of a target speech sound source when presented with simultaneous masking sounds. A summary of the human hearing system and methods for spatialising sounds is provided as background to the problem. A detailed review of relevant research in auditory masking, auditory continuity and speech intelligibility is discussed. Angular separation and sound object differentiation through amplitude modification are used to enhance a target speech sound. A novel method is developed for achieving this using only the binaural signals received at the ears of a listener. This new approach is termed an auditory lens. Psychoacoustic evaluation of the auditory lens processing has shown comparable intelligibility scores to direct spatialisation techniques which require prior knowledge of sound source spectral content and direction. The success of the auditory lens has led to a number of potential further research projects that will take the processing system closer to a real wearable product. 2 Table of contents Abstract ..................................................................................................................2 Table of contents ....................................................................................................3 List of Figures ........................................................................................................6 List of Tables ........................................................................................................16 Acknowledgements ..............................................................................................18 Declaration ...........................................................................................................19 Chapter 1 Introduction ......................................................................................20 1.1 Background of the research ..................................................................20 1.2 Objective of the research ......................................................................22 1.3 Statement of hypothesis ........................................................................23 1.4 Thesis structure .....................................................................................24 Chapter 2 Human hearing .................................................................................26 2.1 Directions of sound ..............................................................................26 2.2 Ear Physiology .....................................................................................27 2.2.1 The peripheral auditory system ........................................................27 2.2.2 The internal auditory system ............................................................28 2.3 Masking ................................................................................................29 2.4 Binaural hearing ...................................................................................32 2.4.1 Sound localisation ............................................................................32 2.4.2 Directional cues ................................................................................32 2.4.2.1 Interaural time difference .........................................................33 2.4.2.2 Interaural envelope difference ..................................................38 2.4.2.3 Interaural intensity difference...................................................39 2.4.2.4 Spectral cues .............................................................................42 2.4.3 Distance cues ....................................................................................45 2.4.3.1 Overall signal level ...................................................................45 2.4.3.2 Changes in interaural intensity differences ..............................46 2.4.3.3 Spectral attenuation ..................................................................46 2.4.3.4 Direct-to-reverberant ratio ........................................................47 2.4.4 Binaural advantage ...........................................................................47 2.5 Summary ..............................................................................................48 Chapter 3 Sound spatialisation ..........................................................................50 3.1 Sound field reconstruction methods .....................................................50 3.1.1 Stereo ................................................................................................50 3.1.2 Surround sound systems ...................................................................52 3.1.3 Ambisonics .......................................................................................53 3.1.4 Wave-field synthesis ........................................................................53 3.2 Binaural synthesis .................................................................................54 3.2.1 Head-related transfer functions ........................................................55 3.2.2 HRTF measurement .........................................................................56 3.2.3 Personalised and generic HRTFs .....................................................62 3.2.4 Sound spatialisation using HRTFS ...................................................64 3.2.4.1 Convolution ..............................................................................64 3.2.4.2 Binaural synthesis of a single sound source .............................66 3.2.4.3 Binaural synthesis of multiple sound sources ..........................68 3 3.3 Summary ..............................................................................................69 Chapter 4 Auditory masking and the factors that influence it ..........................71 4.1 Auditory grouping ................................................................................71 4.1.1 Auditory Scene Analysis ..................................................................72 4.2 Masking and critical bands ...................................................................78 4.3 The influence of spatial separation between sound sources .................79 4.4 The influence of sound duration ...........................................................84 4.5 The influence of spectral content of sounds .........................................86 4.6 Sound source signal content .................................................................87 4.6.1 Phoneme recognition with a noise masker .......................................87 4.6.2 Speech recognition with a noise masker ..........................................87 4.6.3 The articulation rate of speech .........................................................88 4.6.4 Similarity in target and masker speech content ................................90 4.7 Summary ..............................................................................................94 Chapter 5 The cause and effects of auditory continuity ....................................97 5.1 Homophonic auditory continuity ..........................................................97 5.1.1 The influence of signal level differences..........................................98 5.1.2 The influence of signal duration .....................................................100 5.2 Heterophonic auditory continuity .......................................................102 5.2.1 The influence of spectral content and frequency separation ..........104 5.2.2 The influence of spatial separation .................................................109 5.2.3 Contralateral induction ...................................................................114 5.3 Phonemic restoration ..........................................................................115 5.3.1 The spectral content of inducer signals ..........................................116 5.3.2 The relevance of target speech content ..........................................120 5.3.3 The influence of spatial separation of auditory objects ..................121 5.3.4 Methods for improving phonemic restoration ................................123 5.4 Summary ............................................................................................125 Chapter 6 Speech intelligibility with multiple sound sources .........................128 6.1 Blind Source Separation .....................................................................129 6.1.1 Independent Component Analysis ..................................................131 6.1.2 Beamforming ..................................................................................132 6.1.3 Spectral processing .........................................................................134 6.1.4 Summary ........................................................................................134 6.2 The influence of spectral content .......................................................135 6.2.1 Auditory glimpsing – the peek theory in practice ..........................143 6.3 The influence of spatial separation .....................................................146 6.3.1 The number, type and location of interferers .................................156 6.4 Summary ............................................................................................161 Chapter 7 Binaural
Recommended publications
  • Afternoon Sessions
    THURSDAY AFTERNOON, 6 JUNE 2013 513ABC, 1:00 P.M. TO 5:20 P.M. Session 4pAAa Architectural Acoustics: Room Acoustics Computer Simulation II Diemer de Vries, Cochair RWTH Aachen Univ., Inst. fuer Technische Akustik, Aachen D-52056, Germany Lauri Savioja, Cochair Dept. of Media Technol., Aalto Univ., P.O. Box 15500, Aalto FI-00076, Finland Invited Papers 1:00 4pAAa1. The differences and though the equivalence in the detection methods of particle, ray, and beam tracing. Uwe M. Ste- phenson (HafenCity Univ. Hamburg, Hebebrandstr. 1, Hamburg 22297, Germany, [email protected]) Within the numerical methods used in room acoustics, the geometrical and energetic methods of sound particle, ray and beam tracing are often confused. This rather tutorial paper does not treat the tracing algorithms but rather aims to explain the differences in the physi- cal models and the corresponding detection and evaluation methods. While ray tracing needs spherical detectors as receivers to count rays, the particle model is based on a weighting of the energies of the particles with their inner crossing distances to compute the local sound energy densities. For beam tracing, receiver points are sufficient. In its core, this paper shows the convergence of the evaluated intensities computed from immitted sound particle energies to those predicted by the well-known 1/R<+>2<+>-distance law for the free field—as applied with the mirror image source method and beam tracing as its efficient implementation. Finally, the geometrical methods are classified depending on their efficiency with higher orders of reflection and their extensibility by scattering and diffraction. 1:20 4pAAa2.
    [Show full text]
  • UCSF Audiology Amplification Update XI
    Division of Audiology, Department of Otolaryngology University of California, San Francisco School of Medicine presents UCSF Audiology Amplification Update XI November 1– 2, 2013 Holiday Inn, Fisherman’s Wharf San Francisco, California Course Chair Robert W. Sweetow, PhD University of California, San Francisco University of California, San Francisco School of Medicine Acknowledgement of Commercial Support This CME activity was supported in part by educational grants from the following: Oticon, Inc Starkey Hearing Technologies Exhibitors Audiology Systems Inc. – an Otometrics partner CaptionCall Cochlear Americas Elite Hearing Network Health Care Instruments (HCI)/Audiometrics Fuel Medical Lyric by Phonak MED EL Neurotone Oticon Medical Oticon USA Phonak ReSound Siemens Hearing Instruments Starkey Hearing Technologies Unitron Widex Table of Contents Educational Objectives .......................................................................................................5 Accreditation .......................................................................................................................5 General Information ............................................................................................................6 Linguistic Competency Information .................................................................................. 7 Course Faculty ....................................................................................................................9 Disclosures .......................................................................................................................
    [Show full text]
  • The Effect of Workplace Noise Exposure on Reaction Time Hollis T
    James Madison University JMU Scholarly Commons Dissertations The Graduate School Spring 2017 The effect of workplace noise exposure on reaction time Hollis T. Leidy James Madison University Follow this and additional works at: https://commons.lib.jmu.edu/diss201019 Part of the Speech Pathology and Audiology Commons Recommended Citation Leidy, Hollis T., "The effect of workplace noise exposure on reaction time" (2017). Dissertations. 152. https://commons.lib.jmu.edu/diss201019/152 This Dissertation is brought to you for free and open access by the The Graduate School at JMU Scholarly Commons. It has been accepted for inclusion in Dissertations by an authorized administrator of JMU Scholarly Commons. For more information, please contact [email protected]. The Effect of Workplace Noise Exposure on Reaction Time Hollis Taylor Leidy A dissertation submitted to the Graduate Faculty of JAMES MADISON UNIVERSITY In Partial Fulfillment of the Requirements for the degree of Doctor of Audiology Communication Sciences and Disorders May 2017 FACULTY COMMITTEE: Committee Chair: Ayasakanta Rout, Ph.D. Committee Members/ Readers: Brenda Ryals, Ph.D. Christopher Clinard, Ph.D. ACKNOWLEDGEMENTS I would like to thank my advisor and mentor, Dr. Ayasakanta Rout, for his support and guidance throughout my dissertation project and through my graduate school experience. His friendship has taken me down the path to my dream job in a profession I will always love. I would also like to thank my committee members, Drs. Christopher Clinard and Brenda Ryals, for their support throughout this process and my entire graduate school career. I could not imagine a different graduate school experience; I will carry this experience with me for the rest of my life.
    [Show full text]
  • Sound Synthesis with Auditory Distortion Products.” in Amacher, M
    Gary S. Kendall,∗ Christopher Haworth,† Sound Synthesis with and Rodrigo F. Cadiz´ ∗∗ ∗Artillerigatan 40 Auditory Distortion 114 45 Stockholm, Sweden [email protected] Products †Faculty of Music Oxford University St. Aldate’s, Oxford, OX1 1 DB, United Kingdom [email protected] ∗∗Center for Research in Audio Technologies, Music Institute Pontificia Universidad Catolica´ de Chile Av. Jaime Guzman´ E. 3300 Providencia, Santiago, Chile 7511261 [email protected] Abstract: This article describes methods of sound synthesis based on auditory distortion products, often called combination tones. In 1856, Helmholtz was the first to identify sum and difference tones as products of auditory distortion. Today this phenomenon is well studied in the context of otoacoustic emissions, and the “distortion” is understood as a product of what is termed the cochlear amplifier. These tones have had a rich history in the music of improvisers and drone artists. Until now, the use of distortion tones in technological music has largely been rudimentary and dependent on very high amplitudes in order for the distortion products to be heard by audiences. Discussed here are synthesis methods to render these tones more easily audible and lend them the dynamic properties of traditional acoustic sound, thus making auditory distortion a practical domain for sound synthesis. An adaptation of single-sideband synthesis is particularly effective for capturing the dynamic properties of audio inputs in real time. Also presented is an analytic solution for matching up to four harmonics of a target spectrum. Most interestingly, the spatial imagery produced by these techniques is very distinctive, and over loudspeakers the normal assumptions of spatial hearing do not apply.
    [Show full text]
  • August 15 – 19, 2018
    IHCON 2018 International Hearing Aid Research Conference August 15 – 19, 2018 Granlibakken Conference Center Tahoe City, California IHCON 2018 2 August 15-19, 2018 IHCON 2018 Sponsors ● -- ● -- ● National Institute on Deafness and Other Communication Disorders The Hearing Industry Research Consortium Massachusetts Eye and Ear Eaton-Peabody Laboratories IHCON 2018 3 August 15-19, 2018 IHCON 2018 ● -- ● -- ● Technical Chair Birger Kollmeier Technical Co-Chairs Virginia Best ● Tom Francart Organizational Co-Chairs Sunil Puria ● Sigfrid Soli Steering Committee Sunil Puria, Organizational Co-Chair Sigfrid D. Soli, Organizational Co-Chair Harvard Medical School and Massachusetts Eye and Ear The University of British Columbia, Canada ● -- ● -- ● ● -- ● -- ● Birger Kollmeier, Technical Chair (2018) Virginia Best, Technical Co-Chair (2018) University of Oldenburg, Germany Boston University, USA ● -- ● -- ● ● -- ● -- ● Tom Francart, Technical Co-Chair (2018) Peggy Nelson, Technical Chair (2016) KU Leuven, Belgium University of Minnesota, USA ● -- ● -- ● ● -- ● -- ● Torsten Dau, Technical Co-Chair (2016) Kathryn Arehart, Technical Co-Chair (2014) Technical University of Denmark, Denmark University of Colorado, USA ● -- ● -- ● ● -- ● -- ● Susan Scollie, Technical Co-Chair (2014) Brian Moore, Technical Chair (2008) University of Western Ontario, Canada University of Cambridge, United Kingdom ● -- ● -- ● ● -- ● -- ● Judy Dubno, Technical Co-Chair (2008) Brent Edwards, Technical Co-Chair (2006) Medical University of South Carolina, USA National Acoustic
    [Show full text]
  • The Effect of Auditory Fatigue on Reaction Time in Normal Hearing Listeners Beth I
    James Madison University JMU Scholarly Commons Dissertations The Graduate School Spring 2015 The effect of auditory fatigue on reaction time in normal hearing listeners Beth I. Hulvey James Madison University Follow this and additional works at: https://commons.lib.jmu.edu/diss201019 Part of the Speech Pathology and Audiology Commons Recommended Citation Hulvey, Beth I., "The effect of auditory fatigue on reaction time in normal hearing listeners" (2015). Dissertations. 26. https://commons.lib.jmu.edu/diss201019/26 This Dissertation is brought to you for free and open access by the The Graduate School at JMU Scholarly Commons. It has been accepted for inclusion in Dissertations by an authorized administrator of JMU Scholarly Commons. For more information, please contact [email protected]. The Effect of Auditory Fatigue on Reaction Time in Normal Hearing Listeners Beth I. Hulvey A dissertation submitted to the Graduate Faculty of JAMES MADISON UNIVERSITY In Partial Fulfillment of the Requirements for the degree of Doctor of Audiology Communication Sciences and Disorders May 2015 ACKNOWLEDGMENTS First and foremost, I would like to thank my advisor, Dr. Ayasakanta Rout, for his constant collaboration and patience throughout data collection and the writing process. I would also like to acknowledge my committee members, Dr. Brenda Ryals and Dr. Christopher Clinard, for all their wisdom and input into this project. In addition, I would like to thank Nick Arra and Ali Bove for their help in recruiting subjects for this study. Without the help of the aforementioned individuals, this project would not have been possible. ii TABLE OF CONTENTS Acknowledgments .............................................................................................................. ii Table of Contents ..............................................................................................................
    [Show full text]
  • (12) Patent Application Publication (10) Pub. No.: US 2009/0028356 A1 Ambrose Et Al
    US 20090028356A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2009/0028356 A1 Ambrose et al. (43) Pub. Date: Jan. 29, 2009 (54) DIAPHONIC ACOUSTIC TRANSDUCTION Publication Classification COUPLER AND EAR BUD (51) Int. Cl. (75) Inventors: Stephen D. Ambrose, Longmont, GIOK II/6 (2006.01) CO (US); Samuel P. Gido, Hadley, H04R L/10 (2006.01) MA (US); Jimmy W. Mays, Knoxville, TN (US); Roland (52) U.S. Cl. ........................................ 381/71.6; 381/380 Weidisch, Schonebeck (DE); Robert B. Schulein, Schaumburg, IL (US) (57) ABSTRACT Correspondence Address: The disclosed methods and devices incorporate a novel THE LAW OFFICE OF EDWARD LBSHOP expandable bubble portion which provides superior fidelity to 1240 S. OLD FORGE CT. a listener while minimizing listener fatigue. The expandable PALATINE, IL 60067 (US) bubble portion may be expanded through the transmission of low frequency audio signals or the pumping of a gas to the (73) Assignee: ASIUSTECHNOLOGIES, LLC, expandable bubble portion. In addition, embodiments of the Beaverton, OR (US) acoustic device may be adapted to consistently and comfort (21) Appl. No.: 12/178,236 ably fit to any ear, providing for a variable, impedance match ing acoustic Seal to both the tympanic membrane and the (22) Filed: Jul. 23, 2008 audio transducer, respectively, while isolating the Sound-vi bration chamber within the driven bubble. This reduces the Related U.S. Application Data effect of gross audio transducer vibration excursions on the (60) Provisional application No. 60/951,420, filed on Jul. tympanic membrane and transmits the audio content in a 23, 2007, provisional application No.
    [Show full text]
  • Minimal and Mild Hearing Loss in School-Aged Children
    THE SIGNIFICANT IMPACTS OF A “TINY” HEARING LOSS Minimal and Mild Hearing Loss in School-Aged Children Maine Educational Center for the Deaf and Hard of Hearing © MECDHH 2017 THE SIGNIFICANT IMPACTS OF A “TINY” HEARING LOSS Presenters: Katherine Duncan, AuD Donna Casavant, MED, CAS [email protected] [email protected] © MECDHH 2017 OBJECTIVES • Define a minimal hearing loss. • Identify potential causes of minimal hearing loss. • Describe some educational, communication, and social impacts of minimal hearing loss. © MECDHH 2017 WHAT’S IN A WORD? © MECDHH 2017 WHAT IS MINIMAL HEARING LOSS? © MECDHH 2017 UNDERSTANDING AN AUDIOGRAM Increasing Pitch Low High Increasing Loudness Soft Loud Right Ear Inaccessible Left Ear Accessible Speech Sounds © MECDHH 2017 MINIMAL = SLIGHT + MILD • 16 - 40 dB HL range • At least one ear © MECDHH 2017 SLIGHT HEARING LOSS • 16-25 dB HL range • At least one ear • Individual thresholds or pure tone averages © MECDHH 2017 MILD HEARING LOSS • 26 - 40 dB HL range • At least one ear • Individual thresholds or pure tone averages © MECDHH 2017 IDENTIFICATION • Universal Newborn Hearing Screening • School-based screenings • Audiological follow-up © MECDHH 2017 WHY ARE MINIMAL HEARING LEVELS MISSED IN UNIVERSAL NEWBORN SCREENING PROCESS? • Limitations of audiometric equipment • No screening / lost to follow up © MECDHH 2017 LATE IDENTIFICATION • Progressive © MECDHH 2017 LATE IDENTIFICATION • Late onset © MECDHH 2017 TYPES OF MINIMAL HEARING LOSS Conductive Sensorineural • Outer ear • Inner ear * Canal
    [Show full text]