Bio-Inspired Broadband Sonar: Methods for Acoustical Analysis of Bat Echolocation and Computational Modeling of Biosonar Signal Processing
By Jason E. Gaudette M.S., University of Rhode Island, May 2005 B.S., Worcester Polytechnic Institute, May 2003
Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Center for Biomedical Engineering at Brown University
Providence, Rhode Island May 2014 © Copyright 2014 by Jason E. Gaudette This dissertation by Jason E. Gaudette is accepted in its present form by the Center for Biomedical Engineering as satisfying the dissertation requirement for the degree of Doctor of Philosophy.
Date James A. Simmons, Advisor
Recommended to the Graduate Council
Date Elie L. Bienenstock, Reader
Date Rodney J. Clifton, Reader
Date Diane Hoffman-Kim, Reader
Date Sherief Reda, Reader
Date John R. Buck, External Reader
Approved by the Graduate Council
Date Peter M. Weber, Dean of the Graduate School
iii Curriculum Vitae
Jason E. Gaudette was born on October 9th, 1980 and raised with his younger sister Renee in Raynham, Massachussets to Edward and Mary Gaudette. Graduating from Bridgewater-Raynham High School in 1999, he continued on to Worcester Polytech- nic Institute to pursue a degree in Electrical Engineering. While an undergraduate Jason studied abroad on three occasions in Madrid, Spain; San Juan, Puerto Rico; and Limerick, Ireland. He received his Bachelor of Science in 2003 with distinction, a concentration in Computer Engineering, and a minor in International Studies. Imme- diately following graduation, Jason began his career at the Naval Undersea Warfare Center in Newport, RI as an Electrical Engineer. He enrolled in the graduate program at the University of Rhode Island in the Fall of 2003 and graduated in 2005 to obtain the Master of Science in Electrical Engineering. Soon thereafter, Jason married his wife Elena and had two children, Lucas and Alexander, born in 2006 and 2008. In the Fall of 2008 Jason enrolled in the Biomedical Engineering program at Brown Uni- versity. Working with his advisor, Prof. James A. Simmons, Jason has been part of a highly interdisciplinary team of researchers studying bat echolocation. As an active member of this laboratory, Jason has co-authored several peer-reviewed journal arti- cles, conference proceedings and abstracts, invited presentations, numerous research proposals, and a technical patent.
iv Jason E. Gaudette [email protected] Naval Undersea Warfare Center 1176 Howell Street Newport, RI 02841
Professional Experience
Naval Undersea Warfare Center, Newport, RI 2003 – present Electrical Engineer and Research Scientist • Lead engineer for electronics design and acoustic signal processing on various sonar programs, including acoustic countermeasure devices and forward-looking active sonar systems • Principal investigator for bio-inspired broadband sonar research • Experienced with design of low-noise acoustic transducer interface electronics, acoustic signal processing and analysis, and embedded systems development Analog Devices, Inc., Limerick, Ireland Fall 2002 Precision Digital to Analog Converters • Developed electronics and software for two customer evaluation board designs • Completed WPI Senior design team project (MQP) in 10 weeks abroad Analog Devices, Inc., Wilmington, MA Summer 2002 High-Speed Networking (HSN) Engineering Intern • Developed and tested an integrated circuit communication interface using Agi- lent VEE and the I2C protocol • Characterized high-speed transceiver electronics for laser diode driver IC
Education
Brown University, Providence, RI May 2014 (exp.) Ph.D. Biomedical Engineering Advised by Dr. James A. Simmons
University of Rhode Island, Kingston, RI May 2005 M.S. Electrical Engineering
Worcester Polytechnic Institute, Worcester, MA May 2003 B.S. Electrical Engineering with Distinction Concentration in Computer Engineering Minor in International Studies
v Awards and Honors
1. Full Member, Sigma Xi, Scientific Research Society, Brown University Chapter, (2014). 2. J. E. Gaudette, L. N. Kloepper, M. Warnecke and J. A. Simmons, “Arrayzilla Lives! Visualizing the dynamic beam pattern of an echolocating bat,” 1st place video entry in the Gallery of Acoustics displayed at the 164th Meeting of the Acoustical Society of America, Kansas City, MO, (October 2012). 3. “Special Achievement Award for Excellence in the Area of Basic and Applied Research,” Swampworks Lightweight Torpedo Project Team, Naval Undersea Warfare Center, Newport, RI, (2007). 4. “Special Achievement Award for Excellence in the Area of Basic and Applied Research,” Biorobotic Research Team, Naval Undersea Warfare Center, New- port, RI, (2006). 5. Member, Eta Kappa Nu, Electrical Engineering Honor Society, Gamma Delta Chapter at Worcester Polytechnic Institute, Worcester, MA, (2003).
Grants and Fellowships
1. 2014–2016, ONR Research Grant, Code 341 Bio-Inspired Autonomous Systems Program, (J. E. Gaudette, Principle Investigator), $275K, “Computational modeling and experimental evaluation of a bio-inspired broadband sonar sys- tem.” 2. 2014–2016, NUWC Division Newport FY14 Independent Applied Research (IAR) Award, (J. E. Gaudette, Principal Investigator), $300K, “Bio-inspired broadband sonar system for high-resolution acoustic imaging applications.” 3. 2014–2016, NUWC Division Newport FY14 In-House Laboratory Independent Research (ILIR) Award, (J. DiCecco, P. I.; J. E. Gaudette, Associate Investi- gator), $300K, “Novel reconfigurable neuromorphic computing architectures for neural information processing.” 4. 2011–2013, NUWC Division Newport FY11-FY13 In-House Laboratory Inde- pendent Research (ILIR) Award, (J. E. Gaudette, Principal Investigator), $300K, “Bio-inspired broadband sonar receiver for clutter reduction: Computa- tional modeling and system evaluation.” 5. 2010, NUWC Division Newport Academic Fellowship Award, (J. E. Gaudette, Principal Investigator), one-year sabbatical leave to Simmons’ Laboratory, Brown University, Providence, RI.
vi 6. 2009, NUWC Division Newport FY09 Virtual In-House Laboratory Independent Research (V-ILIR) Award, (J. E. Gaudette, Principal Investigator), $85K. “Bio-inspired broadband sonar receiver for clutter reduction.”
Peer-Reviewed Journal Articles
1. J. E. Gaudette, L. N. Kloepper and J. A. Simmons, “Modeling of bio-inspired broadband sonar for high-resolution angular imaging,” J. Acoust. Soc. Am.,(in prep.). 2. L. N. Kloepper, J. E. Gaudette, J. R. Buck, and J. A. Simmons, “Influence of mouth opening and gape angle on the transmitted signals of big brown bats (Eptesicus fuscus),” J. Acoust. Soc. Am.,(in prep.). 3. L. N. Kloepper and J. E. Gaudette, “Exploring the dynamics of mammalian vocal-motor processes with emerging advanced technologies,” J. PostDoc. Res., (in review.). 4. J. E. Gaudette, L. N. Kloepper, M. Warnecke and J. A. Simmons, “High res- olution acoustic measurement system and beam pattern reconstruction method for bat echolocation emissions,” J. Acoust. Soc. Am., 135 (1), 513–520 (2014). doi: [10.121/1.4829661] 5. J. DiCecco, J. E. Gaudette and J. A. Simmons, “Multi-component separation and analysis of bat echolocation calls,” J. Acoust. Soc. Am., 133 (1), 538–546 (2013). doi: [10.121/1.4768877] 6. J. A. Simmons and J. E. Gaudette, “Biosonar echo processing by frequency- modulated bats,” Radar Sonar Navig. IET, 6 (6), 556–565 (2012). doi: [10.1049/iet-rsn.2012.0009]
Conference Papers and Abstracts Presented
1. J. E. Gaudette†and J. A. Simmons, “Encoding phase information is critical for high resolution spatial imaging in biosonar,” in J. Acoust. Soc. Am., Providence, RI, May 2014 2. J. E. Gaudette†and J. A. Simmons, “Modeling of bio-inspired broadband sonar for high-resolution angular imaging,” in J. Acoust. Soc. Am., San Francisco, CA, December 2013, p. 4052. doi: [10.1121/1.4830787] 3. L. N. Kloepper†, J. A. Simmons, J. E. Gaudette, R. Himmelwright and D. Robitzski, “Timing patterns of strobe groups for echolocating big brown bats
†presented
vii performing a target detection task,” in J. Acoust. Soc. Am., San Francisco, CA, December 2013, p. 4119. doi: [10.1121/1.4831129] 4. J. E. Gaudette, L. N. Kloepper†and J. A. Simmons, “Object selection by head aim and acoustic gaze in the big brown bat,” in J. Acoust. Soc. Am., 133 (5), Montreal, Quebec, June 2013, p. 3406. doi: [10.1121/1.4805938] 5. J. A. Simmons, J. E. Gaudette and L. N. Kloepper†, “Object selection by head aim and acoustic gaze in the big brown bat,” in Proc. Meetings on Acoustics, Vol. 19, (010036), June 2013. doi: [10.1121/1.4800651] 6. J. E. Gaudette, L. N. Kloepper†and J. A. Simmons, “Large reconfigurable microphone array for transmit beam measurements of echolocating bats,” in J. Acoust. Soc. Am., 131 (4), Hong Kong, China, May 2012, p. 3361. doi: [10.1121/1.4708666] 7. J. E. Gaudette†and J. DiCecco, “Bio-inspired broadband sonar and multi- component time-frequency analysis,” presented at the Maritime Systems and Technology (MAST) Americas Conference, Washington, DC, 14 November 2011. 8. J. E. Gaudette†J. M. Knowles, J. R. Barchi, and J. A. Simmons, “Computa- tional model of a bio-inspired broadband receiver for sonar clutter reduction,” in J. Acoust. Soc. Am., 129 (4), Seattle, WA, 25 May 2011, p. 2507. doi: [10.1121/1.3588282] 9. J. M. Knowles†, J. E. Gaudette, J. R. Barchi and J. A. Simmons, “Recon- structing echolocation behavior using time difference of arrival localization and a distributed microphone array as a virtual Telemike,” in J. Acoust. Soc. Am., 129 (4), Seattle, WA, 23-27 May 2011, p. 2574. doi: [10.1121/1.3588496] 10. J. DiCecco†and J. E. Gaudette†, “Analysis of Active Sonar Waveform Design by Echolocating Mammals,” presented at the Nato Undersea Research Center (NURC) Maritime Rapid Environmental Assessment (MREA10) Conference, Lerichi, Italy, 13 October 2010. 11. J. E. Gaudette†and J. A. Simmons, “Modeling of precise onset spike timing for echolocation in the big brown bat, Eptesicus fuscus,” in J. Acoust. Soc. Am., 127 (3), Baltimore, MD, April 2010, p. 1861. doi: [10.1121/1.3384433]. 12. J. R. Barchi†, J. E. Gaudette, J. M. Knowles and J. A. Simmons, “Bioa- coustic and behavioral correlates of spatial memory in echolocating bats,” in J. Acoust. Soc. Am., 127 (3), Baltimore, MD, April 2010, p. 2030. doi: [10.1121/1.3385329].
Invited Lectures
1. J. E. Gaudette†and J. A. Simmons, “Bio-inspired broadband sonar: Exploiting biological solutions to simplify acoustic imaging,” Keynote Speaker for Winter
viii Meeting of the Acoustical Society of America, Narragansett Chapter, 24 Febru- ary 2014. Middletown, RI. 2. J. E. Gaudette†and J. A. Simmons, “Bio-inspired broadband sonar: Compu- tational modeling and system evaluation,” NUWC Newport – Naval Research Laboratory (NRL) Joint Lecture Series, 18 June 2013. Washington, DC. 3. J. E. Gaudette†and J. A. Simmons, “Bio-inspired broadband sonar: Compu- tational modeling and system evaluation,” NUWC ILIR Science and Technology ILIR Seminar Series, 2013. Newport, RI. 4. J. E. Gaudette†and J. A. Simmons, “Bio-inspired broadband sonar: Exploiting biological solutions to simplify acoustic imaging.” Virtual teleconference presen- tation - ONR N-STAR lecture series, 3 April 2013. NUWC Division Newport, RI; Office of Naval Research, Arlington, VA; NSWC Panama City, FL. 5. J. E. Gaudette†and J. A. Simmons, “Bio-inspired broadband sonar for clutter reduction.” Presentation at the UMASS Dartmouth – NUWC Newport Joint Technical Seminar Series, Dartmouth, MA, 2 November 2012 6. J. E. Gaudette†and J. A. Simmons, “Bio-inspired broadband sonar: Compu- tational modeling and system evaluation,” NUWC ILIR Science and Technology ILIR Seminar Series, 10 February 2012. Newport, RI. 7. J. E. Gaudette†and J. A. Simmons, “Bio-inspired broadband sonar: Compu- tational modeling and system evaluation,” Brown University Biomedical Engi- neering Graduate Seminar Lecture, 7 February 2012. Providence, RI. 8. J. E. Gaudette†and J. A. Simmons, “Bio-inspired broadband sonar receiver for clutter reduction: Computational modeling and system evaluation,” Brown University Biomedical Engineering Graduate Seminar Lecture, 18 April 2011. Providence, RI. 9. J. E. Gaudette†and J. A. Simmons, “Bio-inspired broadband sonar receiver for clutter reduction: Computational modeling and system evaluation,” NUWC ILIR Science and Technology ILIR Seminar Series, 30 March 2011. Newport, RI.
Poster Sessions
1. J. E. Gaudette†and J. A. Simmons, “Bio-inspired broadband sonar for micro- aperture imaging,” poster presented at the FY2013 In-House Laboratory Inde- pendent Research (ILIR) Annual Program Review, 29 October 2013, Newport, RI. 2. J. E. Gaudette†and J. A. Simmons, “Bio-inspired broadband sonar,” poster presented at the N-STAR symposium, June 2012, Arlington, VA.
ix 3. J. E. Gaudette†and J. A. Simmons, “Bio-inspired broadband sonar: Com- putational modeling and system evaluation,” poster presented at the FY2012 In-House Laboratory Independent Research (ILIR) Annual Program Review, October 2012, Newport, RI. 4. J. M. Knowles†, J. A. Simmons, J. M. Barchi, J. E. Gaudette, S. S. Horowitz and A. M. Simmons, “Cochlear processing in biosonar: Modeling sound trans- duction and the cochlear microphonic in echolocating bats,” poster presented at the Society for Neuroscience, 477.D.02, November 2011, Washington, DC. 5. J. E. Gaudette†and J. A. Simmons, “Bio-inspired broadband sonar: Com- putational modeling and system evaluation,” poster presented at the FY2011 In-House Laboratory Independent Research (ILIR) Annual Program Review, October 2011, Newport, RI. 6. J. E. Gaudette†and J. A. Simmons, “Sonar clutter reduction using bio-inspired broadband template matching,” poster presented at the FY2009 In-House Lab- oratory Independent Research (ILIR) Annual Program Review, October 2009, Newport, RI.
Teaching Experience
1. BN065, Biology of Hearing, guest lecturer, designed and delivered lecture notes with computer examples to approx. 80-100 students, “Fourier transform and spectral analysis related to acoustics and the auditory system,” 1 February 2012. Brown University, Providence, RI. 2. Sheridan Teaching Certificate: Level I Seminar Program, May 2010, Sheridan Center for Teaching and Learning, Brown University, Providence, RI. 3. BN065, Biology of Hearing, guest lecturer, presented two consecutive seminars of approx. 100-120 students, “Computational modeling of the auditory system,” 10 and 12 March 2010. Brown University, Providence, RI.
x Preface and Acknowledgments
From the commencement of my graduate studies, my intention was to focus on some- thing unique and interesting. I think most people would agree that researching bat sonar is exactly that. So much has been learned through this experience, both pro- fessionally and personally. Ultimately, the most important lesson is that time is truly our most valuable and limited resource and it must be spent wisely. I would first like to thank my wonderful wife, Elena. You have kept me going through the many times of uncertainty and frustration, reviewed my endless supply of presentations and manuscript revisions, and supported me in all of my endeavors. This was certainly a long journey and I could not have done it without your devotion. To my parents, I would like to say that this is all your fault. You encouraged me to learn, and taught me the value of education, but forgot to tell me when to stop. Nevertheless, I will always appreciate everything you have done for me. I can only hope that I am able to instill the same set of values into my children. Among the many other people who deserve acknowledgment for this dissertation are my family, close friends, and many of my teachers and colleagues at Brown, URI, and NUWC. My personal drive stems from all of these relationships and I would be remiss to overlook this fact. There are far too many people to thank individually, but I would regret not mentioning at least a few. My earliest interests in bio-inspired engineering stemmed from working closely with Alberico Menozzi, Henry Leinhos, David Beal, and Pro- mode Bandyopadhyay at NUWC, and it was this initial exposure to biorobotics that has had a lasting impact. It was also by great fortune that I met John DiCecco, as his ideas on non-linear time-frequency analysis are what shaped the early parts of this dissertation. From the bat lab, I feel honored to have worked closely with
xi Jeff Knowles, an outstanding academic with whom I’ve shared many a philosophi- cal felafel and who also launched my sailing career; Michaela Warnecke, who quickly transformed into the German I ask for answers to everything; Alyssa Wheeler, who made me appreciate the sheer difficulty of lab work; and Laura Kloepper, who taught me how to write good. Among the many people to review various drafts of my dis- sertation, I would also like to thank Andrea Simmons, David Segala, Robin Murray, and Jennifer Wardell for their many helpful comments and suggestions. I would like to thank all of the members of my thesis committee for their com- ments, suggestions, criticisms, and overall guidance of my research. Shaping such broad objectives into substantial research requires the highly interdisciplinary ex- pertise afforded by this group. I sincerely appreciate the considerable commitment toward this effort. I am particularly indebted to Prof. John Buck who, as an exter- nal advisor from UMass Dartmouth, asked the difficult questions that helped me to improve the overall quality of this research. I owe a great deal of thanks to my advisor, Prof. James Simmons, who deserves most of the credit for inspiring the research in this dissertation. Jim’s devotion, creative ideas, cheerfulness, and infinite patience are just a few of the reasons that keep me passionate about this work. Finally, all of my graduate courses and research to date has been funded through internal investments by the Naval Undersea Warfare Center in Newport, Rhode Is- land. I am pleased and extremely grateful to the Chief Technology Office as well as my management and colleagues for committing to employees’ professional and educa- tional goals. Without this continued support, none of this could have been achieved.
xii Dedication
To my inquisitive children, Lucas and Alexander.
xiii Table of Contents
Table of Contents xiv
List of Figures xvii
List of Symbols xx
1 Introduction 1 1.1 Motivation ...... 1 1.2 Significance ...... 5 1.2.1 Time-Frequency Analysis and the Auditory System ...... 5 1.2.2 Dynamic Behavior and Adaptation in Echolocation ...... 6 1.2.3 Toward the Design of a Bio-Inspired Broadband Sonar System7 1.3 Dissertation Objectives and Overview ...... 9 References ...... 10
2 Background 13 2.1 Acoustic Information Sensing and Processing by Mammals ...... 13 2.1.1 The Mammalian Auditory System ...... 14 2.1.2 Neural Information Processing by the Auditory System . . . . 16 2.1.3 Auditory Cues for Passive Localization in Biological Systems . 17 2.1.4 Specializations for High-Resolution Active Acoustic Imaging . 19 2.2 Acoustic Imaging in Technological Systems ...... 24 2.2.1 Conventional Array Signal Processing ...... 24 2.2.2 Beam Patterns and Angular Resolution ...... 26 2.3 Model-Based Approach to Bio-Inspired Acoustic Imaging ...... 30 2.3.1 Auditory Modeling Insights and Oversights with Filter Banks 31 2.3.2 Signal Processing Models for High-Resolution Range Estimates 32 2.3.3 Models for Angular Target Localization and Acoustic Imaging 36 2.3.4 Mathematical Models of Echolocation Performance ...... 37 2.3.5 Hardware Prototypes as Exploratory Models ...... 38 References ...... 39
3 Multi-Component Separation and Analysis of Bat Echolocation Calls 53 3.1 Introduction ...... 54 3.2 Data Collection ...... 57 3.3 Methods ...... 58 3.3.1 Separation of Harmonic Components ...... 58 3.3.1.1 Fractional Fourier Transform ...... 59
xiv 3.3.1.2 Rough Approximation of Instantaneous Frequency . 60 3.3.1.3 Zero-Phase Component Filtering ...... 62 3.3.2 Monocomponent Decomposition ...... 63 3.3.2.1 Empirical Mode Decomposition ...... 63 3.3.2.2 Hilbert Spectral Analysis ...... 65 3.3.3 Waveform Synthesis and Ground Truth ...... 66 3.4 Results ...... 67 3.4.1 Telemike Data Series ...... 67 3.4.2 Synthesized Multi-Component FM Analysis ...... 68 3.5 Discussion ...... 69 3.6 Acknowledgments ...... 72 A Multi-Component Frequency-Modulated Waveforms ...... 72 B Hilbert Spectral Analysis of Modulated Waveforms ...... 73 References ...... 75
4 High Resolution Acoustic Measurement System and Beam Pattern Reconstruction Method for Bat Echolocation Emissions 79 4.1 Introduction ...... 80 4.2 Data Collection ...... 82 4.3 Methods ...... 85 4.3.1 Beam Pattern Reconstruction ...... 85 4.3.2 Microphone and System Calibration ...... 88 4.4 Results ...... 90 4.4.1 Example Beam Pattern of a Circular Electrostatic Projector . 90 4.4.2 Example Beam Pattern of the Big Brown Bat, Eptesicus fuscus 93 4.5 Discussion ...... 94 4.6 Acknowledgments ...... 98 References ...... 98
5 Modeling Bio-Inspired Broadband Sonar for High-Resolution Angu- lar Imaging 101 5.1 Introduction ...... 102 5.2 Modeling Broadband Acoustic Information ...... 102 5.2.1 Environmental Acoustics ...... 103 5.2.1.1 The Transformation of Broadband Information in the Physical Environment ...... 103 5.2.1.2 Application of Broadband Transmission Loss to the Active Sonar Equation ...... 105 5.2.2 Transducer Directivity Patterns ...... 109 5.2.2.1 Broadband Spectral Information in Conventional Trans- ducers ...... 109 5.2.2.2 Bio-Acoustic Baffle Structures and Implications for Modeling ...... 111 5.2.3 Reflective Scatterer Structure and Composition ...... 114 5.2.4 The Broadband Echo Spectrum in the Range-Azimuth Plane . 116 5.3 Extraction of Broadband Spatial Information from Echoes ...... 117
xv 5.3.1 Quantifying the Angular Resolution Limit ...... 117 5.3.2 Broadband Acoustic Focusing with a Single Piston Transducer 120 5.3.3 Broadband Acoustic Focusing with a Bio-Inspired Array . . . 121 5.3.4 Mutual Interference and the Diffraction Patterns of Scatterers 123 5.4 Performance Comparison with Conventional Acoustic Imaging . . . . 125 5.4.1 Processing Broadband Signals with Suboptimal Element Spacing126 5.4.2 Coherent Summation of Broadband Signals ...... 129 5.4.3 Limitations to Conventional Beamforming Comparisons . . . . 131 5.5 Discussion ...... 132 5.6 Acknowledgments ...... 135 A Applying Biosonar Modeling to Underwater Acoustic Imaging . . . . 135 References ...... 137
6 Discussion, Applications, Future Directions, and Concluding Re- marks 143 6.1 Discussion ...... 143 6.2 Applications ...... 146 6.2.1 Multi-Component Signals and Time-Frequency Analysis . . . 146 6.2.2 Beam Pattern Measurement Instrumentation and Techniques . 147 6.3 Future Directions ...... 148 6.3.1 Time-Frequency Analysis of Bio-Acoustic Signals ...... 148 6.3.2 Acoustic Measurement and Visualization of the Multi-Dimensional Sound Field ...... 149 6.3.3 Bio-Inspired Broadband Sonar for Micro-Aperture Imaging . . 150 6.4 Concluding Remarks ...... 151 References ...... 153
A Modeling of Precise Onset Spike Timing for Echolocation 154 A.1 Motivation for a Biophysical Model ...... 154 A.1.1 Coincidence Detection and Population Coding in the Auditory System ...... 156 A.2 Methods ...... 158 A.2.1 Peripheral System ...... 159 A.2.1.1 Outer and Middle Ear ...... 159 A.2.1.2 Cochlea and Basilar Membrane ...... 159 A.2.1.3 Meddis Auditory Peripheral Model ...... 160 A.2.1.4 Spike Refractory Equations ...... 161 A.2.2 Cochlear Nucleus ...... 162 A.2.2.1 Leaky IaF Model ...... 162 A.3 Results ...... 164 A.3.1 Auditory Stimuli ...... 164 A.3.1.1 Meddis Auditory Peripheral Model ...... 165 A.3.1.2 IaF Neurons ...... 165 A.3.1.3 Integration with BiSCAT ...... 166 A.4 Discussion ...... 166 References ...... 170
xvi List of Figures
1.1 Close-up photograph of the big brown bat, Eptesicus fuscus and time- frequency diagram (spectrogram) for an example E. fuscus echoloca- tion call ...... 2 1.2 The measured transmit and receive acoustic directivity, or beam pat- terns, of E fuscus are plotted across the azimuth plane ...... 3
2.1 The mammalian auditory system mapped from the cochlea to the cortex 15 2.2 Beam patterns in air from a line array of N = 10 omni-directional elements that are spaced at d = 1.72 cm ...... 27 2.3 Active underwater sonar data collected from the site of a shipwreck in Narragansett Bay, Rhode Island ...... 29 2.4 The magnitude, phase, and group delay response for a gammatone filter bank ...... 33 2.5 Block diagram of the Spectrogram Correlation and Transformation (SCAT) receiver model ...... 34
3.1 Four different time-frequency distributions of an FM echolocation call from E. fuscus ...... 55 3.2 Rotation-fraction domain of the E. fuscus signal from the FrFT . . . 61 3.3 Overview of harmonic component separation using a least-squares cu- bic approximation of instantaneous frequency, fi(t)...... 63 3.4 Results of the empirical mode decomposition on the separated second harmonic, FM2, from E. fuscus ...... 64 3.5 Hilbert spectral analysis results showing instantaneous amplitude, ai(t), and frequency, fi(t), for each harmonic component of the E. fuscus call 66 3.6 Multi-component analysis performed on call sequences from radioteleme- try recordings of E. fuscus and three Asian bat species ...... 67 3.7 Multi-component analysis results from the telemike data series plotted separately for FM1 and FM2 ...... 69 3.8 Standard time-frequency representations and multi-component analy- sis results for synthetic signals ...... 70
4.1 Photograph of fully constructed microphone array and close-up view of a microphone preamplifier circuit board showing the integrated MEMS microphone unit ...... 84 4.2 Flow chart describing the signal processing steps to reconstruct each beam...... 86
xvii 4.3 Diagram showing microphone sensor positions mapped to spherical co- ordinates with the sound source positioned at the origin ...... 87 4.4 Aspect view and contour plot of the reconstructed transmit beam pat- tern of a 2 cm diameter transducer at its resonant frequency of 60 kHz ...... 91 4.5 Theoretical beam pattern of a piston transducer with 2 cm diameter inair...... 92 4.6 Aspect view and 6 dB contour plot of the reconstructed beam patterns for a single E. fuscus transmit pulse ...... 94
5.1 The total absorption effect in air and the three individual components that dominate in different frequency regions ...... 106 5.2 Absorption vs. frequency at 50% relative humidity plotted for temper- atures between 0◦C and 40◦C in steps of 5◦ ...... 107 5.3 Combined transmission loss components due to both spherical spread- ing and absorption ...... 107 5.4 Relative echo strength vs. distance at different frequencies for an ideal 0 dB point reflector ...... 108 5.5 Theoretical directivity pattern for a piston transducer in air with a fixed circular aperture of 0.94 cm ...... 111 5.6 Example beam pattern data measured from an obliquely truncated horn113 5.7 The target strength of individual fish at dorsal aspect versus length . 115 5.8 Relative echo intensity as a function of range, azimuth, and frequency 118 5.9 The region of focus after applying the L1 spectral distance around 4.5 m at 0◦ azimuth (a) and 25◦ off-axis ...... 120 5.10 A bio-inspired broadband sonar array utilizing only three circular piston- like elements ...... 122 5.11 The region of focus after applying the L1 spectral distance around 4.5 m at 0◦ azimuth for a single transmitter and a pair of identical receive transducers ...... 122 5.12 The time difference of arrival between two receiving transducers when separated by 1.4 cm in air ...... 123 5.13 The region of focus after combining binaural spectrogram correlation and TDOA estimates ...... 124 5.14 The beam patterns of an array with N = 10 omni-directional elements spaced at d = 1.4 cm in air ...... 128 5.15 The beam patterns of an array with N = 2 omni-directional elements spaced apart by d = 1.4 cm in air ...... 128 5.16 Summed beam patterns for a simple array of N = 2 elements spaced apart by d = 1.4 cm in air ...... 130 5.17 Absorption coefficient in water vs. frequency at various temperatures between -5◦C and 35◦C, depth of 0 m, salinity of 35 ppt, and acidity of 8.0 pH ...... 137
A.1 Action potentials recorded from a rat when presented with a low fre- quency sinusoidal stimulus ...... 157 A.2 Proposed neural network architecture of the auditory population coding158
xviii A.3 Block diagram of the Meddis IHC model ...... 160 A.4 Time series and spectrogram of a synthetic linear FM and 2 pairs of echoes ...... 164 A.5 Magnitude and phase plot of 4 channels in a gammatone filterbank between 25kHz and 100kHz ...... 165 A.6 Example gammatone filterbank output using the signal as shown above and generated at 4 arbitrary frequencies ...... 166 A.7 Internal states of the Meddis model (k, q, c,& w) in response to a synthesized acoustic stimulus ...... 167 A.8 Pspike and resulting spike train for 40 LSR auditory nerve fibers . . . 167 A.9 Membrane potential and spikes with 4 integrate-and-fire neurons . . . 168 A.10 Integrate-and-fire neurons (M=4) with random, but overlapping synap- tic input (N=100) ...... 169 A.11 Layout of each of three tabbed panels in the BiSCAT GUI ...... 172
xix List of Symbols
This dissertation spans many fields, including acoustics, biology, and engineering. Where noted in the descriptions below, the application of symbols is context spe- cific. Acoust: acoustics and acoustic modeling, Anat: anatomy, ASP: array signal processing, Model: Auditory modeling and linear filter theory, TFA: time-frequency analysis.
Abbreviations AC Anat: auditory cortex ...... 4 AN Anat: auditory nerve ...... 14 ARMA Model: auto-regressive moving-average ...... 86 ATR automatic target recognition ...... 116 AVCN Anat: anteroventral cochlear nucleus ...... 15 BM Anat: basilar membrane ...... 15 CN Anat: cochlear nucleus ...... 14 CRLB Cramer-Rao lower bound ...... 37 DCN Anat: dorsal cochlear nucleus ...... 15 DRNL Model: dual-resonance non-linear ...... 31 EMD TFA: empirical mode decomposition ...... 64 FFT TFA: fast Fourier transform ...... 88 FM frequency modulated ...... 80 FPGA field programmable gate array ...... 7 FrFT TFA: fractional Fourier transform ...... 5 FT TFA: Fourier transform ...... 5 HPBW ASP: half-power beam width ...... 28
xx HRTF Acoust: head-related transfer function ...... 18 IC Anat: inferior colliculus ...... 4 IHC Anat: inner hair cells ...... 16 IID Acoust: interaural intensity difference ...... 18 IIR Model: infinite impulse response ...... 31 IMF TFA: intrinsic mode function ...... 64 ITD Acoust: interaural time difference ...... 18 JAMF TFA: joint acoustic and modulation frequency ...... 6 LSO Anat: lateral superior olive ...... 15 LTI Model: linear time-invariant ...... 6 MA Model: moving average ...... 88 MEMS micro electro-mechanical systems ...... 83 MRA Acoust: main response axis ...... 3 MSO Anat: medial superior olive ...... 15 NLL Anat: nucleus of the lateral lemniscus ...... 15 NTB Anat: nucleus of the trapezoidal body ...... 15 OHC Anat: outer hair cells ...... 15 PVCN Anat: posteroventral cochlear nucleus ...... 15 RCF Model: rectify, compress, and filter ...... 34 RWT TFA: Radon-Wigner transform ...... 59 SCAT Model: spectrogram correlation and transformation ...... 34 SOC Anat: superior olivary complex ...... 14 SPL Acoust: sound pressure level ...... 90 STFT TFA: short-time Fourier transform ...... 5 TDOA time difference of arrival ...... 85 TFR TFA: time-frequency representation ...... 56 VLSI very-large scale integrated ...... 38 VRDR Model: variable resolution and detection receiver ...... 36
xxi WVD TFA: Wigner-Ville distribution ...... 5
Variables α Acoust: frequency dependent acoustic absorption coefficient ...... 88 α TFA: normalized fractional angle of rotation ...... 59 β angle of truncation for an acoustic horn ...... 112 λ Acoust: wavelength in the medium ...... 18 φ TFA: angle of fractional rotation in radians ...... 59 φ(f) Model: phase response of a filter ...... 33
φ0 TFA: initial phase of a modulated signal ...... 66
φi(t) TFA: instantaneous phase law ...... 62 ρ Acoust: atmospheric pressure ...... 104 ψ ASP: steered angle of an array ...... 25 xˇ(t) TFA: original analytic signal, demodulated ...... 62 yˇ(t) TFA: isolated analytic component, demodulated ...... 62
df (θ) ASP: array steering vector, 1 × N ...... 26 x˜(t) TFA: original analytic signal, unmodulated ...... 60 y˜(t) TFA: isolated analytic component, unmodulated ...... 62 ai(t) TFA: instantaneous amplitude ...... 65 D Acoust: depth in water, m ...... 136 d ASP: distance between sensors ...... 18 d Acoust: acoustic propagation distance ...... 88 d0 Acoust: reference distance of a sound source ...... 88 f frequency ...... 104 fi(t) TFA: instantaneous frequency ...... 60 fs sampling rate of a discrete-time signal ...... 65 hr Acoust: relative humidity ...... 104 N ASP: number of elements in an array ...... 25 pH Acoust: acidity, pH ...... 136
xxii S Acoust: salinity, ppt ...... 136 T Acoust: temperature ...... 104 TL Acoust: transmission loss ...... 88 u TFA: fractional dimension between time and frequency ...... 59 W ASP: aperture shading matrix, diagonal N × N ...... 26 x ASP: array data vector, 1 × N ...... 26 Y (f, ψ) ASP: frequency domain array response ...... 26
xxiii Chapter 1
Introduction
The biosonar system of echolocating bats, dolphins, and whales represents the most advanced acoustic imaging solution known to exist. The sophistication of biosonar lies not in its complexity, but in the real-time performance that is achievable by a minimalistic set of hardware; a few acoustic baffles1 and a compact network of neural circuitry. The primary focus of this dissertation is on improving our understanding of how animals perceive images of objects from the packets of acoustic echoes. The motivation behind this research is presented first, followed by the significance in the context of the current state-of-the-art. The last section states the research objectives and provides an overview of the remaining dissertation chapters.
1.1 Motivation
Echolocation is a complex active sensory system in which animals forage and nav- igate in their environment primarily using emitted acoustic signals. By producing intense, ultrasonic signals and receiving their returning echoes, echolocating animals can identify, discriminate and track prey, often in highly cluttered environments. Bats and toothed whales (Microchiroptera and Odontoceti) are two distinctly dif-
1Acoustic baffles refer to any physical boundary layers or structures in close proximity to the sound transmission source or receiving sensors. Acoustic baffles serve to block or guide sound waves propagating in a particular direction. In biosonar, acoustic baffles refer specifically to a bat’s mouth or nose for transmission and its ears for reception. The baffles of underwater marine mammals consist of the melon for sound emission and the mandibles for sound reception. In general, the head may be included when it has a significant impact on propagating sound waves.
1 ferent suborders of mammals that convergently evolved echolocation, and both have been intensely investigated to understand their mechanisms that may translate to man-made sonar and radar systems [1]. The big brown bat, Eptesicus fuscus, is an ideal model organism for investigating echolocation. These bats produce short broadband signals with ultrasonic frequencies between 20 and 100 kHz and with a bandwidth-to-center frequency ratio greater than unity (Fig. 1.1b). The signals are downward FM sweeps with three harmonically related components spanning several octaves. The duration and the repetition rate of the signals depend on the distance of nearby objects, with both decreasing as the bat approaches targets [2]. Based on the intensity of emitted sounds, transmission losses, and strength of acoustic reflections from insect prey, big brown bats can detect prey at distances up to 20 m [3].
A
120 35 FM3 100 FM2 30 25 80 20 60 FM1 15 40 10 Frequency (kHz)
20 5 B 0 0 35 15 0 0.5 1 1.5 2 2.5 3 3.5 dB Time (ms)
Figure 1.1. (a) A close-up photograph of the big brown bat, Eptesicus fuscus, is shown to highlight the complex set of acoustic baffles – its ears and mouth. The spatial beam or directivity patterns are determined by the geometry of these baffles, which transform the magnitude and phase of sound waves propagating into the inner ears or out from the larynx. (b) The time-frequency diagram (spectrogram) is shown for an example E. fuscus echolocation call along with the corresponding time series (top) and spectral density (side) of the same call. This bat species emits broadband signals that consist of harmonically related components spanning several octaves. The ratio of the bandwidth to center frequency provides an indirect measure of how much a directivity pattern will change naturally over the entire operating frequency range. In the case of E. fuscus, this ratio is greater than unity, but quantities less than 0.2 are common for most man-made active sonar systems.
The echolocation signals of big brown bats are produced in the larynx and transmitted through the mouth. The center of the directed energy, or main response axis (MRA), is straight forward at zero degrees across all frequencies. The angular
2 A
Hartley & Suthers (1989) B C
Aytekin et al. (2004) Aytekin et al. (2004)
D
Simmons et al. (1983)
Figure 1.2. The measured transmit and receive acoustic directivity, or beam patterns, of E fuscus are plotted across the azimuth plane at the specific frequencies of 25 (red), 40 (green), 60 (blue), and 80 kHz (yellow). (a) The transmit beam is emitted through the bat’s mouth. The main response axis (MRA) is straight forward at 0◦ across all frequencies and can be reasonably approximated by a 4.7 mm radius piston transducer [4]. (b and c) The sound reception pattern as measured bilaterally through each ear [5]. Notice that the MRA shifts from off-axis at low frequencies toward on-axis at high frequencies, which is a characteristic of the shape of the ears and can be approximated as an obliquely truncated horn [6]. Due to the limited acoustic aperture, the beam patterns are very broad in angle, even as they become narrower at high frequencies. (d) Despite having very broad beam widths, the angular acuity as measured by a behavioral discrimination task is 1.5◦ in azimuth [7] and 3.0◦ in elevation [8]. This is surprising, because man-made imaging sonar systems generally depend upon narrow transmit and/or receive beams, which require a much larger acoustic aperture (physical or synthetic) for the same frequencies considered here. width of the energy can be reasonably approximated by a 4.7 mm radius piston transducer (Fig. 1.2a) [4]. The returning echoes are received bilaterally through each ear. The receiver MRA shifts from off-axis at low frequencies toward on-axis at high frequencies due to the shape of the ears, which can be approximated as obliquely truncated horns (Fig. 1.2b-c) [5, 6]. A common characteristic among biosonar is that these beams are broad in angle, even at high frequencies. Despite having broad beams, these bats are able to achieve angular acuity of 1.5◦ and 3◦ in azimuth and elevation, respectively (Fig. 1.2d) [7, 8]. The fundamental question is how can bats achieve such fine degrees of acuity
3 with broad beamwidths? A conventional sonar system operating in air over the same frequency range as the big brown bat would require an array length, or aperture, of approximately 1.1 m to achieve 1.5◦ angular resolution in azimuth. Furthermore, element-to-element spacing of 1.7 mm would need to be maintained to avoid ambigu- ous localization [9], which demands approximately 640 array elements in total. This array design becomes completely intractable if the requirement of 3.0◦ is simultane- ously imposed for elevation. Remarkably, the big brown bat requires only two ears spaced 1.4 cm apart (Fig. 1.1a) – a reduction in array aperture of about 80 times and at least two orders of magnitude less sensors. Behavioral and neurophysiological evidence show that bats perform spatial imaging by exploiting three pieces of salient information: 1) the absolute time delay between an emitted pulse and incident echoes, 2) the relative time delay of echoes between ears, and 3) the broadband spectral patterns encoded internally by the bat’s complex acoustic baffles and externally by the environment and reflective scatter- ers. Acoustic imaging in azimuth requires fusing this information together, whereas imaging in elevation is achieved with only the spectral information available to each ear. More specifically, it is known that the spatial imaging process relies upon precise neural timing of echoes arriving at each ear [10, 11] and neural decoding of the fre- quency dependent spectral patterns introduced by the unique structure of the bats’ ears [8, 12]. Biosonar research, indeed neuroscience in general, has advanced prodigiously in a relatively short period of time. Nevertheless, this field is still in its infancy compared to the direction it is heading. Numerous mysteries remain about the underlying mech- anisms for animal echolocation and also how the biological solution can be exploited for improving man-made technologies. Ultimately, the persistence of researchers in this field will be rewarded by a higher level of understanding of acoustic information processing in the mammalian brain. Although mimicking biosonar may not be an optimal solution for all aspects of engineered acoustic sensing and imaging, there are
4 a multitude of important applications where biosonar has the potential to change the way future generations of acoustic imaging systems are conceptualized and designed.
1.2 Significance
1.2.1 Time-Frequency Analysis and the Auditory System
Time-frequency analysis, at the most basic level, is the extraction or interpretation of information from a signal that varies in time. It has traditionally been understood as a decomposition of individual sine waves of different frequencies and amplitudes, i.e., the Fourier transform and its time-varying counterpart, the short-time Fourier trans- form (STFT). Considerable effort has been spent on understanding the relationship between time and frequency, or perhaps time and other domains (e.g. scale). Today, we have alternative developments such as the quadratic representations (Wigner- Ville distribution (WVD), Altes Distribution, etc.) [13, 14], the scalogram, fractional Fourier transform (FrFT) [15], reassignment method [16], wavelets and synchrosqueez- ing [17]. Most of this work has been toward the creation of tools for humans and machines to better understand, analyze, and visualize complex time-based signals, especially for propagating waves in acoustics, electromagnetics (including radar and light), seismic waves, etc. that are abundant in the real physical world. In the field of bio-acoustics, time-frequency analysis is an essential tool for researchers to understand and interpret the sounds emitted by animals; however, in- tercepting and recording the sounds of live animals is only part of the problem. We currently have a great number of mathematical and computational models of the auditory system. These include models of the cochlea at the molecular level, mechan- ical micro-models of the elastic basilar membrane, random stochastic models of the auditory-to-neural transduction, and linear time-invariant (LTI) filter bank models. There also exist a great number of models that seek to interpret sound mathematically
5 using alternative transforms (e.g. spectro-temporal modulations [18], joint acoustic and modulation frequency (JAMF) [19]) or higher-order statistics [20]. Despite all of these models, the basic relationship that links pitch, timbre, and loudness to time- frequency analysis eludes us, because these characteristics are psychologically and physiologically induced effects, not physical manifestations of sound. Even so, these effects are unambiguously understood and agreed upon by all humans when we listen to the difference between a note played on the piano and that same note played on the guitar. The relationship between time and frequency within the auditory system is at the core of understanding the intricate nuances of music, speech, communication, and biosonar.
1.2.2 Dynamic Behavior and Adaptation in Echolocation
Echolocating animals exhibit a great deal of adaptability with their sonar systems. This dynamic control is seen in time-frequency pulse design [21], as well as the spa- tial directivity of the emitted signals [22]. Even at the reception of acoustic echoes, echolocating animals can rapidly change their receiver directivity patterns by me- chanical adjustments to the acoustic baffles [23]. The ultimate example of biosonar adaptation lies within the neural computations of the brain. Short-term plasticity in the auditory system is responsible for adapting to environmental uncertainties and maintaining highly precise internal spatial representations [24, 25]. Neural adaptation is the reason echolocation has been so successful across the many different species of echolocating bats, dolphins, and whales. Without this adaptation, animals would be ill-equipped to handle any new challenges found in the natural world. We are now at the forefront of exploring dynamic behavior in echolocation and have only recently begun to realize the extent to which it is used [26, 27, 28, 29]. Understanding the nature of this dynamic behavior in echolocation requires new and creative approaches to experimental design. For example, past approaches at measuring beam patterns have been hampered by assuming that transmit and
6 receive beams remain constant from pulse-to-pulse. This choice was partly a conse- quence of limitations to measurement technology, but also because these assumptions are highly convenient. Advances in sensing and computing are enabling the creation of new tools and methods for studying behavioral dynamics that were never before possible. In particular, field programmable gate arrays (FPGA) are being used to rapidly build customized digital hardware with increasing complexity. One impor- tant application for FPGAs is acoustic measurement systems that demand a large number of data acquisition channels. Data collection must be performed in paral- lel to maintain synchronous sampling, and without these new technologies, options are prohibitively complex or expensive to implement. The data volume requirements that go along with this new capability are also expanding, which implies the use of high-throughput high-density transceivers and storage devices. One difficulty is that as sensing and measurement become easier, data dimensionality increases and new visualization techniques are needed. Fortunately, computing power and data process- ing have paced sensing developments. Amongst the vast amount of bio-diversity in echolocating mammals, there remain countless discoveries to make of dynamic be- havior and physiological adaptations. As researchers, we must acknowledge that our assumptions may be questionable and find new, intelligent ways of correcting and validating our hypotheses.
1.2.3 Toward the Design of a Bio-Inspired Broadband Sonar System
The implications of developing a bio-inspired broadband sonar system are profound and far-reaching. Biosonar is not a merely theoretical development, it is a proven high-resolution acoustic imaging system that is functional and robust. The excep- tional performance and adaptability by animal echolocators in the midst of dense clutter is what draws engineers and scientists to marvel at its simplicity. Section 2.2 describes how conventional beamforming is done and shows a clear example that this acoustic imaging approach is in wide use today. Advanced sonar systems are consid-
7 ered advanced because they employ some way to improve acoustic imaging perfor- mance beyond the fundamental limitations imposed by the wavelength-to-aperture ratio, λ/L. Performance gain always comes with tradeoffs, which could be extra pro- cessing or making bold assumptions that limit widespread application. Resolution improvements of 2 to 5 times are immediately championed as a success, but biosonar has shown that it is possible to achieve the same angular resolution with orders of magnitude less hardware complexity. Besides achieving higher resolution with fewer sensors, biosonar is superior in numerous aspects over conventional sonar systems. The versatility and adaptability already mentioned are traits that man-made systems severely lack. Echolocating bats use strobe groups to avoid pulse-echo ambiguity and increase pulse-repetition rates when more information is needed. Dynamic usage of echolocation beams is not new, but the way in which bats, dolphins, and whales direct their beams off-axis is. Animals are clearly capable of sonar self-calibration as a superior form of matched- field processing. Any sonar system that can mimic biosonar in these respects would be capable of functioning in a broader range of environments and situations, such as dense foliage in air, or cluttered harbors in shallow water. Biomimetic sonar systems will ultimately bring advanced sensing and imaging capabilities to smaller autonomous systems and wearable augmented sensing devices for humans. In the very near future, a slew of new processing methods will be developed while attempting to replicate the neural information processing of the auditory sys- tem. Alongside these developments come the general advancement of neuroscience on the auditory system. The ability to truly understand and replicate the neural dynamics and architectures at various stages of the auditory system will bring new brain-machine interfaces for the hearing impaired. Advances in technology for speech recognition and synthesis are already showing promise for many commercial applica- tions, such as automated call routing, portable phone and GPS devices, and instant language translation. With the advent of such technological advancements, humans
8 are not far from the creation of fully autonomous systems and machines that hear, interpret, and produce sound in exactly the same manner as animals.
1.3 Dissertation Objectives and Overview
The research objectives of this dissertation are to 1) improve our understanding of acoustic imaging in biosonar from an engineering perspective, and 2) apply this in- sight toward the development of a compact bio-inspired broadband sonar system. Chapter 2 presents the background information necessary for the rest of the disserta- tion. Chapters 3 and 4 are comprised of recently published methods for bio-acoustic analysis. In particular, Chapter 3 addresses the need for a set of new time-frequency analysis methods needed to study multi-harmonic waveforms, such as bat echoloca- tion signals. This new approach enables bioacousticians to perform multi-component signal analysis with improved resolution and accuracy. The robust method enables automatic extraction of useful information from a large ensemble of transmitted sig- nals. Chapter 4 describes the design and construction of an apparatus for capturing the beam patterns of bats’ consecutive transmit pulses with high fidelity. Also de- scribed is a method for processing the acoustic signals to reconstruct the beam pat- terns for visualization and further analysis. Such a system is unprecedented and will elucidate the dynamics of bats’ beam patterns during controlled echolocation experi- ments. Chapter 5 outlines a numerical model of the physical acoustics to understand the rich set of information available in broadband bio-acoustic echoes. The modeling approach is unique, because it is the first study of its kind to look in detail at how broadband signals are transformed in the frequency domain from sound emission to reception of echoes. This chapter also shows a simple method for first quantifying the achievable resolution and then analyzing the sensitivity of resolution to changing environmental parameters. A significant development here is to demonstrate that high-resolution can be achieved using only a few transducers without any complex
9 acoustic baffles. Chapter 6 presents a discussion on applications, future directions, and concluding remarks. Finally, Appendix A describes a biophysical model of the bat’s auditory peripheral system and demonstrates a simple example of event-based neuronal coincidence detection.
References
[1] W. Au and J. Simmons, “Echolocation in dolphins and bats”, Phys. Today 60, 40–45 (2007). [2] A. Surlykke and C. F. Moss, “Echolocation behavior of big brown bats, Eptesicus fuscus, in the field and the laboratory”, J. Acoust. Soc. Am. 108, 2419–2429 (2000). [3] A. Surlykke, P. E. Nachtigall, R. R. Fay, and A. N. Popper, eds., Biosonar, volume 51 of Springer Handbook of Auditory Research (Springer, New York) (2014). [4] D. Hartley and R. Suthers, “The sound emission pattern of the echolocating bat, Eptesicus fuscus”, J. Acoust. Soc. Am. 85, 1348–1351 (1989). [5] M. Aytekin, E. Grassi, M. Sahota, and C. Moss, “The bat head-related transfer function reveals binaural cues for sound localization in azimuth and elevation”, J. Acoust. Soc. Am. 116, 3594–3605 (2004). [6] N. H. Fletcher and S. Thwaites, “Obliquely truncated simple horns: Idealized models for vertebrate pinnae”, Acustica 65, 194–204 (1988). [7] J. A. Simmons, S. A. Kick, B. D. Lawrence, C. Hale, C. Bard, and B. Escudie, “Acuity of horizontal angle discrimination by the echolocating bat, Eptesicus fuscus”, J. Comp. Physiol. A 153, 321–330 (1983). [8] J. Wotton, T. Haresign, M. Ferragamo, and J. Simmons, “Sound source elevation and external ear cues influence the discrimination of spectral notches by the big brown bat, Eptesicus fuscus”, J. Acoust. Soc. Am. 100, 1764–1776 (1996). [9] D. H. Johnson and D. E. Dudgeon, Array Signal Processing: Concepts and Tech- niques (Prentice Hall PTR, Upper Saddle River, NJ) (1993). [10] C. Moss and J. Simmons, “Acoustic image representation of a point target in the bat Eptesicus fuscus: Evidence for sensitivity to echo phase in bat sonar”, J. Acoust. Soc. Am. 93, 1553–1562 (1993). [11] J. A. Simmons and J. E. Gaudette, “Biosonar echo processing by frequency- modulated bats”, IET Radar Sonar Navig. 6, 556–565 (2012).
10 [12] J. Wotton and J. Simmons, “Spectral cues and perception of the vertical position of targets by the big brown bat, Eptesicus fuscus”, J. Acoust. Soc. Am. 107, 1034–1041 (2000). [13] A. Papandreou, F. Hlawatsch, and G. Boudreaux-Bartels, “The hyperbolic class of quadratic time-frequency representations. I. Constant-Q warping, the hyper- bolic paradigm, properties, and members”, IEEE Trans. Signal Process. 41, 3425–3444 (1993). [14] A. Papandreou-Suppappola and L. T. Antonelli, “Use of quadratic time- frequency representations to analyze cetacean mammal sounds”, Technical Re- port 11,284, Naval Undersea Warfare Center, Newport, RI (2001). [15] H. M. Ozaktas, M. A. Kutay, and D. Mendlovic, “Introduction to the fractional Fourier transform and its applications”, Adv. Imag. Elect. Phys. 106, 239–291 (1999). [16] F. Auger and P. Flandrin, “Improving the readability of time-frequency and time- scale representations by the reassignment method”, IEEE Trans. Signal Process. 43, 1068–1089 (1995). [17] F. Auger, P. Flandrin, L. Qiang, S. McLaughlin, S. Meignen, T. Oberlin, and H.-T. Wu, “Time-frequency reassignment and synchrosqueezing: An overview”, IEEE Signal Process. Mag. 30, 32–41 (2013). [18] T.-S. Chi and C.-C. Hsu, “Multiband analysis and synthesis of spectro-temporal modulations of Fourier spectrogram”, J. Acoust. Soc. Am. 129, EL190–EL196 (2011). [19] L. Atlas and S. A. Shamma, “Joint acoustic and modulation frequency”, EURASIP Journal on Applied Signal Processing 2003, 668–675 (2003). [20] S. Bourennane and A. Bendjama, “Locating wide band acoustic sources using higher order statistics”, Applied Acoustics 63, 235–251 (2002). [21] S. Hiryu, M. E. Bates, J. A. Simmons, and H. Riquimaroux, “FM echolocating bats shift frequencies to avoid broadcast-echo ambiguity in clutter”, Proc. Natl. Acad. Sci. U.S.A. 107, 7048–7053 (2010). [22] N. Matsuta, S. Hiryu, E. Fujioka, Y. Yamada, H. Riquimaroux, and Y. Watan- abe, “Adaptive beam-width control of echolocation sounds by CF-FM bats, Rhi- nolophus ferrumequinum nippon, during prey-capture flight”, J. Exp. Biol. 216, 1210–1218 (2013). [23] L. Gao, S. Balakrishnan, W. He, Z. Yan, and R. M¨uller,“Ear deformations give bats a physical mechanism for fast adaptation of ultrasonic beam patterns”, Phys. Rev. Lett. 107, 214301 (2011). [24] B. J. Fischer, L. J. Steinberg, B. Fontaine, R. Brette, and J. L. Pe˜na,“Effect of instantaneous frequency glides on interaural time difference processing by au- ditory coincidence detectors”, Proc. Natl. Acad. Sci. U.S.A. 108, 18138–18143 (2011).
11 [25] R. Rao and T. Sejnowski, “Spike-timing-dependent Hebbian plasticity as tem- poral difference learning”, Neural Comput. 13, 2221–2237 (2001). [26] L. N. Kloepper, P. E. Nachtigall, M. J. Donahue, and M. Breese, “Active echolo- cation beam focusing in the false killer whale, Pseudorca crassidens”, J. Exp. Biol. 215, 1306–1312 (2012). [27] P. H. S. Jen, “Adaptive mechanisms underlying the bat biosonar behavior”, Front. Biol. 5, 128–155 (2010). [28] M. Aytekin, B. Mao, and C. F. Moss, “Spatial perception and adaptive sonar behavior”, J. Acoust. Soc. Am. 128, 3788–3798 (2010). [29] P. W. Moore, L. A. Dankiewicz, and D. S. Houser, “Beamwidth control and angu- lar target detection in an echolocating bottlenose dolphin (Tursiops truncatus)”, J. Acoust. Soc. Am. 124, 3324–3332 (2008).
12 Chapter 2
Background
This chapter introduces background material relevant to the research found in the next several chapters of the dissertation. The first section reviews the general mam- malian auditory system, auditory cues for passive sound source localization, and specializations that enable bats to perform high-resolution active acoustic imaging. Following Section 2.1, Section 2.2 describes the current technological means of acous- tic imaging and contrasts conventional array signal processing with the biosonar solu- tion. The last background topic in Section 2.3 introduces the model-based approach to understanding and replicating biosonar and discusses recent progress in this area.
2.1 Acoustic Information Sensing and Processing by Mammals
Acoustic waves are produced and sensed by nearly all motile animals. Sound provides a fundamental means of communication, detection and classification of predator and prey, localization of sound sources, and orientation relative to the immediate envi- ronment. Most animals rely upon sound for survival, but a select few have developed a refined sense of hearing. Nocturnal birds such as the barn owl excel at passive localization for capturing prey at night [1]. A specialized group of mammals (e.g. microchiropteran bats and odontocetes) have evolved to use acoustic waves as their primary active sense in the absence of visual information in the electromagnetic spec- trum [2]. These echolocating mammals have developed an extreme acuity and agility
13 with which their external world is precisely reconstructed from the stream of echoes received; however, the exact physical and neuronal mechanisms responsible for this precision are not well understood nor are they matched by any existing technological system. The following sections provide a brief overview of the mammalian auditory system, acoustic neural information processing, sound source localization by mam- mals, and specializations required for echolocation.
2.1.1 The Mammalian Auditory System
The mammalian auditory system utilizes a complex set of sensory organs at its periph- ery – the external ear, ossicular chain, and cochlea – that are tightly integrated with neural circuitry in the cochlear nucleus (CN) by way of the auditory nerve (AN) fibers as illustrated in Figure 2.1 [3]. Originating in the CN there are multiple ascending, as well as descending pathways throughout the auditory system [4, 5]. While many of these pathways are monaural, there are several neural stages where specific nuclei in the midbrain receive bilateral input and integrate the information between the ipsilateral and contralateral auditory circuitry (e.g. superior olivary complex (SOC) and inferior colliculus (IC)). The entire auditory system from the cochleae up through the auditory cortex (AC) has a tonotopic organization where neural nuclei at specific regions of the brain appear to be spatially organized by frequency selectivity. The location where acoustic-to-neural transduction occurs is within the inner hair cells (IHC) of the cochlea [9, 10]. The primary information required to localize sound sources lies in the onset response of IHCs tuned to different frequencies [11]. AN fibers mark the onset of sound with a time-delay (i.e. first spike latency) that is related non-linearly to the acceleration of the acoustic pressure waves [12, 13, 14, 15, 16]. Subsequent neural spikes encode other features of the sound, such as duration, intensity, and relation to other frequency channels. Beginning with the AN fibers, all acoustic information is carried by neural spikes throughout the complex of neural pathways mirrored on either side of the brain. Neural spikes are essentially point
14 AC AC MGB MGB
IC IC
NLL NLL
DNLL DNLL
INLL INLL
VNLLc VNLLc
VNLLm VNLLm
SOC SOC
LSO MSO MSO LSO ILD ITD
NTB NTB
LNTB MNTB MNTB LNTB
CN CN
DCN DCN AGC PVCN PVCN Spectral AVCN AVCN c Timing AN b OHC d IHC a
Cochlea Midline Cochlea
Figure 2.1. The mammalian auditory system mapped from the cochlea to the cortex. Monaural and binaural projections from one cochlea are shown. Auditory input from the right cochlea has been omitted for clarity, but all pathways are mirrored across the brain’s midline. Excitatory and inhibitory synaptic connections are marked by triangles and bars, respectively. (a) Acoustic-to- neural transduction begins with the inner hair cells (IHC) of the cochlea. (b) The auditory nerve (AN) fibers respond to the neurotransmitter chemicals released by the IHC in response to sound pressure waves. (c) The cochlear nucleus (CN) receives all ipsilateral AN inputs in three subregions: dorsal, anteroventral, and posteroventral cochlear nucleus (DCN, AVCN, PVCN). (d) The DCN projects efferent connections to the outer hair cells (OHC) in the cochlea, which are thought to provide a mechanism for automatic gain control by amplifying the mechanical vibrations in the cochlea’s basilar membrane (BM). Numerous specializations have been identified in echolocating bats, including a significantly hypertrophied IC and peculiarly organized VNLLc [6, 7, 8]
15 processes, where the probability of a neuron firing a spike is proportional to the group activity level of attached synapses in the network. Acoustic events are encoded by the stochastic response of neural populations tuned to different amplitude ranges. To date, the relationship between morphological connectivity and physiological functions of the mammalian auditory system is not completely understood [17].
2.1.2 Neural Information Processing by the Auditory System
At the peripheral stage of the mammalian cochlea, acoustic information arrives rapidly compared to the time scale of a single neural spike [12]. To encode this information, AN fibers that innervate the cochlea must remain highly sensitive to acoustic stimuli, but this also increases spontaneous spiking (i.e. noise)[13]. To compensate for this, AN fibers are overrepresented at each narrow frequency band along the cochlea’s basilar membrane (BM). The frequency selective regions along the BM contain many redundant IHCs, and every IHC has many redundant AN fibers synapsed to it. As the BM is deflected in response to an acoustic wave, IHCs release bursts of neuro- transmitter, and AN fibers take up this neurotransmitter to respond with a spike sent into the CN [18]. The simultaneous coincidence of neural spikes from many redun- dant AN fibers is the reason the auditory system is able to encode precisely timed acoustic information. Coincidence detection is therefore a critical responsibility of the CN and it is performed through the population response of a large number of AN fibers – essentially averaging out the noise of spontaneous responses [19]. The CN is the gateway of acoustic information into the brain, because this is where all AN fibers innervate. If precision of spike timing is important anywhere in the brain, it is here in the CN, because once this precisely timed acoustic information is lost it cannot be recovered through any amount of data processing [20]. The CN contains an assortment of cell types, many of which are not fully understood [21, 22, 23]. Above the CN, a large portion of the neural complex in the auditory brainstem is used in the feedback necessary for motor control and does not contribute directly
16 to sound source localization; for example, reflexes controlling head aim or automatic gain control of the OHC in the cochlea and muscles [24]. There are a class of general models of neural information processing that are based on registering the timing of spikes across different neurons (i.e. coincidence detection cells) [25]. These models are usually put forth as generalized networks of cortical information processing using the timing of individual spikes across cells rather than conventional spike-rate codes [26]. The relevance of these models to auditory processing in the brainstem is that specific spike timing models have been proposed for the perception of sound pitch [15, 27, 14, 12, 28], for sound localization using interaural timing cues [29, 1], and for determination of target range of echo delay in bats [30, 31, 32] Many attempts have been made at understanding and quantifying the informa- tion content in neural spikes, particularly with respect to precise timing [33, 34, 35, 36]. Neural spikes must carry all information about peripheral stimuli throughout the brain and the brain must be able to interpret this information without any supplemen- tary guidance [37]. Synfire chains, for example, are models where spike timing plays a crucial role in self-constructing complex binding networks and compositionality [25]. Polychronization has also surfaced as a neural information processing mechanism that relies upon understanding the neuronal dynamics [38, 39]. Effectively, all spike timing models can be reduced to having coincidence detecting neurons at a higher level look- ing downward to detect the simultaneity of spikes along multiple inputs. For sound localization, even at the level of the AC, “spatial acoustic information is represented by relative timings of pyramidal cell output” [40].
2.1.3 Auditory Cues for Passive Localization in Biological Systems
Traditionally, the mammalian auditory system has been understood as having two primary methods for localizing sound sources: Interaural time difference (ITD) and interaural intensity difference (IID). Recent work has shed light on a third critical
17 piece of information, which is the angular dependent spectra of broadband sounds, also known as the head-related transfer function (HRTF) [17]. ITD is the relative time delay for a propagating sound wave to reach both ears. This delay is used by mammals to localize a sound’s point of origin. In perceptual tasks, human listeners are typically presented with sounds from an array of loud- speakers or a stereo headset and are asked to localize the source [41, 42, 43]. Based on early psycho-acoustic results from tonal stimuli, ITD was historically only consid- ered useful for frequencies with a wavelength greater than the distance between ears. The reason ITD works in these experiments is that the neural response to continuous tones can phase lock on each period of the wave and encode location based on the relatively small time difference between ears [29, 19]. Since the refractory period for neural spikes exceeds the time period for frequencies above approximately 1 kHz, ITD is generally considered useful for low-frequency sound source localization in the horizontal plane, or azimuth [44]. These ITD experimental results are not valid for sounds that occur naturally, especially for echolocation signals. The primary reason is that acoustic signals in nature are not continuous pure tones; but are instead short transient waveforms. For example, the broadband clicks produced by echolocating dolphins and short frequency modulated pulses by bats consist of frequencies well above the phase locking threshold, yet ITD is a crucial auditory cue for these animals. Such short transient signals contain very few cycles within a particular frequency band and there are not enough wave periods to phase-lock. Instead of phase locking, the auditory system encodes the onset response to these transient events with extremely high timing precision – approximately 100 µs [3] in a general mammalian model, which is 10 times less than the width of a single neural spike [15, 45, 46]. These acoustic signals arrive relatively sparsely in time, leaving sufficient margin for auditory neurons to recover from their refractory period before the next sound event. IID is the acoustic intensity difference between each ear and has been attributed
18 as a major auditory cue for high-frequency sound localization in azimuth. For humans, the head acts as an acoustic baffle, masking contralateral sound sources such that the two ears receive different amplitude levels. In other mammals commonly studied (e.g. cats and guinea pigs), the ears are positioned more dorsal and rostral than primates, so the head does not play as large of a role. Nevertheless, the structure of the external ear, or pinna, in many of these mammals can be reasonably approximated as obliquely truncated horns [47, 48]. These horns provide spatial directivity, which means that the amplitude of a sound wave changes depending upon the angle of incidence. Therefore, IID is manifested in these animals by the shape and orientation of the external ears that form acoustic receiving baffles. One notable problem with the basic concept of IID is that it does not encode sufficient information to localize sound sources in elevation. Most acoustic signals in nature are inherently broadband or at least contain some degree of harmonic structure and span multiple frequencies. When a signal arrives at the ears, each acoustic baffle modifies the sound by encoding unique spectral characteristics for any given angle. Therefore, to localize sounds in elevation, the full spectrum of a received sound is compared with the a priori spatial intensity patterns of the ears, which is the HRTF [49]. The HRTF is a complicated function of frequency and angle, but this complexity is necessary to encode a unique spectrum for any particular direction, either monaurally or binaurally. One important piece that is missing from the truncated horn model is the tragus, which encodes notches specifically used for vertical localization [50, 51]. The full spectral characteristics of the HRTF are not only useful for localization in elevation, but also azimuth and range.
2.1.4 Specializations for High-Resolution Active Acoustic Imaging
The passive localization cues as described above are commonly exploited by many species [52, 28]. The active perception systems of echolocating bats, dolphins, and whales have improved upon passive hearing mechanisms by broadcasting high fre-
19 quency acoustic sounds into the environment, whose echoes can then be accurately localized. In this sense, acoustic echoes are just sound sources originating from many different reflecting objects. Thus, echolocation enables precise control over the acous- tic localization process and results in high-resolution spatial images from the contin- uous flow of information [2]. From the same basic mammalian auditory system, echolocators have evolved to fit the specific needs prescribed by individual echolocation strategies [53]. The types of specializations extend from the physical acoustic baffles of sound reception and transmission [47, 54], to the specific waveforms used for echolocation [55], and even throughout the brain at the various neural complexes [6]. These biological specializations can be thought of as an iterative process of design optimization. The biosonar optimization criteria are not just maximizing performance (e.g. acoustic field-of-view, spatial resolution, signal-to-noise ratio); an equally important criterion for animals is minimizing the energy required to achieve “good-enough” performance. As a result, evolution has produced significant biodiversity in echolocating mammals while still maintaining the minimalist approach to acoustic design. The sound production mechanisms are one of the most important developments for echolocation. Marine mammals such as dolphins and toothed whales produce sound through a highly unique structure in the melon of their head [56, 57]. The intense sounds are produced pneumatically by forcing air through a set of phonic lips, recapturing the air held in sacs, and repeating the process. The broadband echolocation signals are best described as short transient “clicks” that are typically on the order of 10 to 100 µs in duration. The sound pressure waves are guided by bone and tissue through lipids, or acoustic fats, in the melon where it is then prop- agated outward into the water [58, 57, 59]. Bats have evolved their echolocation strategies to fit a particular foraging environment [47]. The result is an extremely diverse set of acoustic baffle structures and echolocation waveforms. For example, to augment their vision Egyptian fruit bats (Rousettus aegyptiacus) echolocate using
20 broadband transient “clicks” of their tongue [60]. Other bats (mostly from the subor- der Microchiroptera) emit a variety of frequency modulated signals using the larynx through either the oral or nasal cavities. The noseleaf structures of nasally emit- ting bats are notoriously complex and prominent [61, 62]. The types of echolocation waveforms may be classified as frequency modulated (FM), constant frequency (CF), or both (CF-FM) [63]. CF waveforms are useful for bats detecting Doppler shifts from moving prey in an open environment [6]. FM waveforms provide excellent range resolution and are better suited for operating in densely cluttered environments, but are Doppler invariant [64]. The echolocation signals produced by both bats and dol- phins are usually stereotypical such that a particular species can be identified by the characteristics of its time-frequency signature. The reception of acoustic waves by echolocating mammals is hyper-sensitive [65]. Although the sounds emitted for echolocation are generally high intensity, the re- flected signals that return to the ears are many orders of magnitude lower. The dissipation and absorption of acoustic energy enforces an upper limit on the useful range of animal echolocation. To compensate, echolocators have evolved auditory systems with high sensitivity and large dynamic range. Many of these specializations exist within the brain, such as an overrepresentation of AN fibers in the cochlea, hypertrophied auditory nuclei (e.g. IC, CN, and LL) [6], and extreme timing preci- sion at the early neural processing stages [7]. Other specializations appear obvious, such as acoustic baffles and directivity patterns that are well matched to the emitted sounds [47]. Perhaps not-so-obvious is the mechanism by which underwater marine mammals receive acoustic echoes. Although the topic was historically controver- sial [66, 67, 68, 69, 70, 71], dolphins and toothed whales receive sounds bilaterally at the mandible. The hollow bone structures form an acoustic waveguide for sound pressure waves to travel within acoustic fats and to each inner ear [58, 57]. There are certainly many other neurological and anatomical specializations for echolocation that have yet to be discovered.
21 The role of vision in echolocating animals depends upon the species. Some mammals (i.e. Megachiroptera and Delphinids) rely a great deal on vision for guid- ance, foraging, and other routine behaviors. However, animals that must function in the complete absence of light use their auditory system as the primary sensory modality. In these animals vision can still aid the senses to some degree, but the en- vironment is actively probed and perceived through sound. A fundamental question is, what do these animals “see” in terms of acoustic images and how does it differ from vision? Spatial resolution provides a direct measure of the three-dimensional image quality perceived by echolocating animals. In this context, resolution is the minimum spacing between two distinct acoustic echoes that can be unambiguously differenti- ated [72]. Spatial resolution is typically characterized by three separate, but related quantities: Angle, range, and range-rate (i.e. Doppler) [73]. Angular resolution can be further separated by azimuth and elevation. Echolocating mammals such as the big brown bat (Eptesicus fuscus) and the bottlenose dolphin (Tursipos truncatus) are well-known for their high-resolution sonar systems, especially in range [2]. Although high-resolution is a subjective term, in the context of biosonar it refers to the abil- ity of an echolocating bat, dolphin, or whale to perceive spatial images with greater detail than a man-made sonar given the same set of signals and acoustic apparatus. One aspect of echolocation that has been studied extensively is the extreme range-resolution for bats [30, 74, 75, 76, 77, 78, 79, 80] and cetaceans [59, 81, 82]. When two or more acoustic waves overlap in time, they constructively and destruc- tively interfere to produce spectral interference patterns. The big brown bat (E. fuscus) exploits these patterns of interference to deconvolve the echoes and produce a “hyper-resolution” image in range. These broadband spectral patterns have been shown to persist throughout the auditory system in this species [83, 84, 85] and appear to contribute reliable information to the bat’s acoustic imaging process. Angular localization, in general, has been studied behaviorally [86, 87, 88, 89,
22 90], analytically [91], and computationally [58, 92, 93, 94]; however, angular perfor- mance in the presence of multiple closely-spaced targets (i.e. angular resolution, as defined above) has not been a primary focus. Nevertheless, a few experiments do exist where angular resolution was directly or indirectly measured in E. fuscus [95, 96] and T. truncatus [86]. Behavioral evidence has shown that E. fuscus primarily utilizes the spectral notches encoded by its HRTF to encode elevation information [51, 87, 88, 89]. In addition, recent work has shown that off-axis echoes of echolocation signals can be completely rejected even when overlapping in time [97, 98]; an echolocation version of the cocktail-party problem. Decades of behavioral studies have been performed on bats, dolphins, and whales to provide additional clues about the resolution limits of echolocation. Unlike bats, however, echolocation research in marine mammals is restricted to behavioral tasks and infrequent necropsies from strandings. Furthermore, the costs associated with marine mammal research are much greater, because of substantial investment in acoustic facilities, the larger physical size of the animals and all their supporting equipment and food, and the difficulties with testing in an aquatic environment. For these reasons, significantly more is known about echolocation in bats; specifically the neurophysiological and morphology of the auditory system. Regardless of the type of echolocation waveforms used by bats (i.e. CF or FM), a common signal characteristic is the presence of multiple harmonics. Multi- harmonic waveforms have the advantage of increasing the natural bandwidth of a signal to one or more octaves, significantly improving performance in range [72]. The relative phase coherence between harmonics in an echo is also important for angular imaging [99]. Furthermore, given that broadband spectral information are the only known mechanism by which bats can localize echoes in elevation, it seems unlikely that they would successfully evolve by emitting a narrowband CF pulse having only a single component – exactly the type of waveforms that pervade man-made sonar.
23 2.2 Acoustic Imaging in Technological Systems
The technological development of acoustic imaging was borne out of necessity. In seawater, the electromagnetic radiation spectrum is significantly attenuated by the density of the medium [100, 101], which means that neither the visible light spec- trum nor radio waves are useful beyond very short distances. This fact is particularly troublesome in naval applications, where information is critical to situational aware- ness for large ships, submarines, and unmanned undersea vehicles. The problem is addressed by using acoustic waves since they propagate quickly over long distances, exhibit strong reflections, and pass relatively uninhibited in the dense medium [102]. The invention of piezoelectric materials enabled the design of acoustic trans- ducers to convert electrical signals into sound pressure waves and vice versa. Early devices were fairly basic and consisted of a single source and receiver that permitted echo ranging in the open ocean [103]. With the coupling of multiple piezoelectric sensors came the advent of array signal processing and the ability to produce cross- range images of objects from sound waves [102, 104]. Apart from its undersea origins, acoustic imaging has found uses in a wide range of applications such as biomedical diagnostics, geophysical tomography, and devices for the visually impaired.
2.2.1 Conventional Array Signal Processing
Array signal processing is the method used to produce images from an array of dis- crete acoustic elements. The critical piece of information to localize sound sources is the relative time delay of acoustic waves as they propagate across the entire array. With knowledge of the array geometry and the speed of sound propagation, pressure waveforms at each transducer element can be delayed in time and summed to cor- respond with any incident direction (defined as the steered angle). This concept is known as a delay and sum beamformer and represents the most basic idea in acoustic imaging.
24 When an acoustic wave arrives from a direction matching the steered angle, the correlated signals combine additively and the beamformer produces the strongest response. An acoustic wave arriving from some different angle will not align properly and the beamformer produces a weakened response due to lack of correlation. Noise, which can be acoustic, thermal, or electronic will not produce a strong response unless it is correlated in time between elements. For example, in the presence of uncorrelated ambient noise, a correlated signal across N array elements will have an
improved signal-to-noise ratio (i.e. array gain) of 10 log10N, one important advantage to using an array [105, p. 306]. In practice, a beamformer is almost always implemented in the frequency do- main [106, 107], since discrete-time delays would require high-order interpolation or fractional delay filters [108]. The response of an N element array at frequency, f, for the steered angle, θ, is computed as
N X Y (f, θ) = dj(f, θ)wjXj(f) (2.1) j=1
th where dj(f, θ) is the delay (frequency and angle dependent) of the j element, wj
is the aperture shading coefficient applied to element j, and Xj(f) is the frequency
th domain data of the j element [109, Ch. 4]. In the frequency domain, dj(f, θ) is a phase shift that is equivalent to the time delay relative to some fixed point on the array, given f and θ:
−ik∆j dj(f, θ) = e . (2.2)
Here, k = 2π/λ is the acoustic wavenumber and ∆j is the distance from element j to ~ ~ a fixed reference point along the projected direction θ. ∆j = δj · ζ for distance vector, ~ ~ iθ δj, and unit vector, ζ = e . In matrix form, Equation 2.1 simplifies to
25 T Y (f, θ) = df (θ)Wxf (2.3)
where df (θ) is the 1 × N steering vector of complex phase delays, W is a diagonal
N × N aperture shading matrix, and xf is the 1 × N complex data vector (T denotes the transpose), all corresponding to frequency, f [110].
2.2.2 Beam Patterns and Angular Resolution
A commonly used method to describe an array’s imaging performance is through the directivity, or beam pattern. The beam pattern of an array is simply the beamformer’s angular response to an ideal unity-power acoustic source located in the direction of
ψ. This can be computed by replacing the complex data vector, xf , in Equation 2.3
by the complex steering vector, df (ψ):
T D(f, θ) = df (θ)Wdf (ψ) . (2.4)
◦ For a line array, when df (ψ) is steered to 0 all of its elements are equal to 1 and we
are left with the array’s natural response, D(f, θ) = df (θ)W. Figure 2.2 illustrates the beam pattern of an N = 10 element uniformly-spaced line array steered to 0◦ and 45◦ at two different frequencies. With proper element spacing, d ≤ λ/2, the beam pattern response is approximately D(f, θ) = sinc(L/λ cosθ), for an array aperture length, L = d(N − 1). A phase-delay beamformer is equivalent to applying a Fourier transform in the spatial domain. As such, the discrete elements suffer from spatial aliasing in exactly the same way as a signal sampled in the time domain. The presence of grating lobes is simply an aliasing artifact introduced by designing an array with improper element spacing (d > λ/2). The consequence is that there will be ambiguity regarding what angle the sound wave originated from. There is also a direct corollary between the
26 Beam Response (ψ=0°, N=10, d=1.72cm) Beam Response (ψ=45°, N=10, d=1.72cm) 10 10 A C 0 0
−10 −10
Mag. (dB) −20 Mag. (dB) −20
−80 −60 −40 −20 0 20 40 60 80 −80 −60 −40 −20 0 20 40 60 80 10 kHz 60 kHz 10 kHz 60 kHz
1 1 0.5 B 0.5 D 0 0 −0.5 −0.5 Amplitude Amplitude −1 −1
−80 −60 −40 −20 0 20 40 60 80 −80 −60 −40 −20 0 20 40 60 80 Bearing Angle, θ (deg.) Bearing Angle, θ (deg.)
Figure 2.2. Beam patterns are the angular response of an array due to the presence of an ideal acoustic source located in the steered direction. They are traditionally plotted on a log-magnitude scale and phase is ignored, but in reality the response exhibits a 180◦ phase reversal when the amplitude response becomes negative. Shown here are example beam patterns in air from a line array of N = 10 omni-directional elements that are spaced at d = 1.72 cm. Steer angles are plotted for 0◦ (a - log, b - linear) and 45◦ (c - log, d - linear). No aperture shading function is applied to this example, so W is the identity matrix. Each plot shows two different frequencies, 10 kHz (blue) and 60 kHz (green), which correspond to proper element spacing of λ/2 and undersampled spacing of 3λ, respectively. The width of the main lobe is one measure of angular resolution. Although the 60 kHz pattern has better resolution, the elements are not spaced properly and the result becomes ambiguous due to grating lobes. Regardless of frequency, the main and sidelobe responses are wider at angles off to the side. This is due to the effective array aperture decreasing with the cosine of the angle, θ. window function used for spectral analysis and the array aperture shading function used in array signal processing. Selecting the aperture shading weights is a tradeoff between mainlobe resolution and sidelobe reduction [111, Ch. 10]. The angular resolution of a uniformly spaced line array can be defined as the minimum angular spacing between two point sources of equal strength, whereby both can be simultaneously resolved [109, p. 142]. This limit occurs at the half-power beam width, β, of the beam pattern’s mainlobe and can be approximated through series expansion [110] as
λ λ β(ψ) ≈ sin−1 cosψ − γ − sin−1 cosψ + γ (2.5) win L win L where γwin is an aperture shading constant (e.g. γwin = 0.402 for uniform weighting;
γwin = 0.484 for 26-dB Chebychev weighting). L and λ are the array aperture length
27 and wavelength, as defined previously. As seen in Figure 2.2, β is dependent upon the steer angle, ψ. The maximum achievable resolution for a line array is when ψ = 0◦:
λ β ≈ 2sin−1 γ . (2.6) 3dB win L
These equations show that resolution of an array is critically dependent upon the ratio, λ/L. By increasing the aperture of the array, L, this improves the resolution by reducing the width of the main lobe. Alternatively, resolution can be improved by increasing the operating frequency, thereby reducing λ. It is clear that improv- ing resolution requires adding more elements, finding ways to increase the effective aperture, or handling insufficient element spacing in some other way. Under conventional beamforming, acoustic imaging is achieved by iterating the beamformer through multiple overlapping angles, θ, and repeating over subsequent time windows.1 The magnitude of each complex result from Equation 2.3 is plotted at the corresponding range and angle to produce an image of the spatially distributed acoustic energy. Figure 2.3 shows an example of high-resolution acoustic imaging in the range-azimuth plane produced by beamforming the underwater sonar data from a shipwreck. Images from consecutive transmit-receive cycles on a moving vehicle can be stitched together to map a much larger area. This concept of acoustic imaging with conventional beamforming is readily extended to a second angular dimension (e.g. azimuth and elevation). There is an impressive amount of literature on the various theories, methods, and implementations that improve upon classical array signal processing as described above. Some noteworthy techniques are sub-optimally spaced arrays (e.g. sparse, co- prime, Costas) [112, 113, 114, 115], synthetic aperture sonar (SAS) [116, 117, 118], and monopulse direction finding [119, 120]. Other methods, such as split aper-
1Time, t, corresponds directly to range, r, in an active sonar system. The translation is r = tc/2, where c is the speed of sound in the medium and the factor of two accounts for the two-way propagation path.
28 Figure 2.3. The concept of acoustic imaging in the range-azimuth plane is demonstrated using ac- tive underwater sonar data collected from the site of a shipwreck in Narragansett Bay, Rhode Island. The sonar array (SeaBat 7130 prototype, Teledyne-Reson, Denmark) is a forward-looking 635 kHz line array with N = 256 elements spaced at λ/2 (d = 1.1 mm, L = 0.3 m). The active transmit waveform is a 17 ms, 30 kHz linear FM pulse (4.7% bandwidth-to-center-frequency ratio). This image was produced from a single transmit-receive cycle (66 ms) using a phase-delay beamformer and has 0.48◦ angular resolution at θ = 0◦. Brightness in the image corresponds to the beamformer’s magnitude response when steered at a particular range and azimuth. The brightest locations are specular reflections alongside the ship’s hull and the darker red areas consist mostly of returns from the sea floor. Two faint rings of energy can be seen around 17 and 21 m, which are caused by the most intense ship reflections being present in the sidelobes when steered to other angles. The large, well-defined dark region behind the ship is an acoustic shadow created from the occlusion of acoustic energy by the ship. Note that the beams are only steered to ±60◦ due to limited transmit beam coverage and widening receive beams. Data were collected and processed by the Naval Undersea Warfare Center, Newport, RI. ture processing [105, p. 329] and Vernier interferometry [121, 122], are based off of the narrowband phase comparison between widely spaced elements. A variety of high-resolution techniques have been applied successfully, but performance de- grades when their many assumptions break down (e.g. minimum variance and adap- tive beamforming [105, 123, 124], eigenvector and multiple-signal classification (MU- SIC) [102, 105, 125, 126], and matched field processing [127, 128]). There have also
29 been some interesting departures from the traditional line-array concepts; in partic- ular, blazed arrays [129, 130] and vector sensing2 [133, 134]. This is by no means an exhaustive list of existing high-resolution angular techniques in array signal pro- cessing. A full review of this field lies beyond the scope of this section, but we can generalize many of these methods with respect to their intended goals and the infor- mation they use for acoustic imaging. Array signal processing traditionally uses the signal correlation and time delay between elements to localize sound sources and perform acoustic imaging. Many of the advanced techniques mentioned above serve to improve array resolution beyond the aperture constraints in Equations 2.5 and 2.6. They often achieve these performance gains at great cost by increasing the effective aperture, synthesizing more elements, or taking advantage of destructive interference of grating lobes and sidelobes. By contrast, biosonar uses very broad beam patterns and exploits the additional infor- mation contained in broadband, multi-harmonic signals. This enables bio-inspired broadband sonar to achieve high-resolution acoustic imaging with extremely small apertures and a minimal number of sensors. In this manner, biosonar represents a significant departure from the conventional approach that is in common use today.
2.3 Model-Based Approach to Bio-Inspired Acous- tic Imaging
The model-based approach is a generic term used to describe numerical solutions to a variety of signal processing problems [135]. Models that include additional infor- mation about a physical process and its dynamics should, in theory, improve overall performance. These models usually consist of linearized systems, such as linear and adaptive filters [136, 137]; state-space estimation, e.g. Kalman filtering and its many
2Most piezoelectric sensors are simple pressure-field measurement devices, while vector sensors measure both magnitude and direction from the particle velocity component of the acoustic wave. Since mammalian ears do not have a means of measuring particle velocity [131, 132], this additional information is explicitly omitted from further consideration.
30 adaptive and non-linear variants [138, 139]; statistical processors, like Markov chains and support vector machines [140, 141]; or neural networks, including classical firing- rate based and dynamical spiking neural models [142, 39]. The model-based approach may be used for creating new technological systems, or to better understand an ex- isting physical system. In the context of biosonar, we are interested in using the model-based approach for both purposes – gaining insight about animal echolocation and applying this toward development of new innovative acoustic imaging systems.
2.3.1 Auditory Modeling Insights and Oversights with Filter Banks
Auditory modeling has embraced the idea of using parallel banks of linear filters to mimic the frequency selectivity of the cochlea’s mechanical response. To construct these filter banks, hearing researchers began mimicking the physiological and psycho- logical findings from various auditory studies in humans, cats, and guinea pigs. Early attempts to capture the critical bandwidth and asymmetrical roll-off characteristics of hearing used low-order band-pass Roex and Gammatone filters [49], which have an infinite impulse response (IIR). These filter designs are purely linear and time- invariant models of the cochlear mechanics. As neurophysiology provided new insight about the active non-linear feedback processes of the OHCs, more complicated filter shapes emerged; such as the Gammachirp and Dual-Resonance Non-Linear (DRNL) filters [143, 144]. These filter types expanded upon the existing models by including time-variant compression that is based upon the amplitude of the acoustic stim- uli [17]. Filter banks have become ubiquitous in many aspects of auditory research, from human audition to bat echolocation, and they remain a highly valuable tool for learning about how the auditory system encodes acoustic information. Using filter bank models, the benefit of decades of linear systems theory can be applied. There are unfortunately some drawbacks to this tool as well. One problem with using a filter bank model of the cochlea is the phase response. Great care has been taken to capture the exact amplitude response of these band-pass
31 auditory filters, yet little or no attention has been paid to the phase response of the filter and, perhaps most importantly, the implications for group delay. Figure 2.4 shows the frequency response of an auditory filter bank for the ultrasonic range of frequencies between 20 kHz and 100 kHz; those relevant to biosonar hearing in bats and cetaceans. Filters are usually spaced on a logarithmic frequency axis to reflect the distribution of neurons in the auditory system [4, 145]. The magnitude response matches fairly close to what has been found in other mammals at reasonable sound intensities. The phase response varies predictably near the poles and zeros of each band-pass filter such that the phase response changes most rapidly within the pass- band. The group-delay of a filter is simply the negative derivative of the phase response and can be understood as the literal time-delay of a signal passing through the filter. Signals passed through the filter bank will be amplified or attenuated based on the magnitude response, but delayed in time according to the group delay. If the group delay varies over frequency, signals with any bandwidth will become dispersive in time. This artifact becomes important when modeling the auditory system’s response to complex acoustic signals. In many cases, using an auditory filter bank is an appropriate model of the cochlea; however, accounting for phase is especially important when modeling a broadband system like the bat’s that can process information down to the microsecond [74, 146] or even nanosecond scale [147, 148].
2.3.2 Signal Processing Models for High-Resolution Range Estimates
Some of the earliest computational modeling work related to bat echolocation was de- veloped to explain the results of hyper-resolution experiments on range-discrimination [74, 146, 147]. These behavioral experiments were highly controversial [75], because they showed that bats were clearly achieving timing resolution well beyond what was thought possible (at the time) by neural coding in the auditory system [150]. Many questions about the neural mechanisms remain unanswered, even decades later. Nev-
32 Gammatone Filterbank Frequency Response Gammatone Filterbank Group Delay 300 100 0 A C −20 85 250 −40 72
−60 s) 62 µ 200
Magnitude (dB) −80 53 0 25 50 75 100 125 150 150 45 (kHz)
0 c
38 f
) 100 ° −180 B 32 Group Delay ( −360 28 50 −540
Phase ( 23 −720 0 20 0 25 50 75 100 125 150 0 25 50 75 100 125 150 Frequency (kHz) Frequency (kHz)
Figure 2.4. The gammatone filter bank is an example of an auditory cochlear model that is commonly used in hearing and echolocation research. Each band-pass filter represents the vibratory motion at a single physical point along the basilar membrane (BM) of the cochlea. This location is where numerous afferent AN fibers synapse with each local cluster of IHCs and translates BM displacement to neural spikes. (a) The magnitude response shows the logarithmic spacing of a gammatone filter bank designed from 20 to 100 kHz. The bandwidth-to-center-frequency ratio is normally kept constant to match the widening of the auditory critical bands at higher frequencies. This consistent ratio also ensures a constant overlap between filter channels. Only 11 channels are shown here for illustration, but practical models of bat echolocation require at least 80 channels per ear [149]. (b) The phase response, φ(f), varies significantly in the pass-band of each filter. (c) dφ Group delay, which is the negative derivative of phase (− df ), is a commonly overlooked artifact of using a linear filter model. The consequence of non-constant group-delay is that broadband signals become dispersive within and between channels – that is, they are delayed in time by different amounts depending on frequency. This effect can have unknown consequences for auditory modeling, especially since the interaural time delay for a bat (0 to 40 µs) is one to two orders of magnitude lower than the group delay for a gammatone filter bank. Color is used to separate overlapping lines and corresponds to the center frequency of each filter channel. ertheless, signal processing models were developed to understand how animals might be achieving hyper-acuity in the range dimension. The Spectrogram Correlation and Transformation (SCAT) receiver [78, 80] is a biosonar model that mimics the echolocating bat’s hyper-resolution of a closely spaced pair of point scatterers. SCAT was the first known computational model that attempted to mimic bat echolocation based upon experimental evidence of neural information processing. SCAT has served as the basis for many later models of bat echolocation, and therefore requires a short description of how it functions. Figure 2.5 shows a block diagram of the monaural model, which includes a constant-Q filter bank to separate time series auditory input into multiple narrowband channels and convert time series waveforms into neural spikes. Following the cochlear filter bank
33 are two distinct spectrogram functions (correlation and transformation) that operate in parallel across all frequency channels.
Figure 2.5. Block diagram of the Spectrogram Correlation and Transformation (SCAT) receiver model. Time series data enters the model through the cochlear filter bank, which consists of 2nd order Butterworth band-pass filters (hyperbolically spaced) followed by half-wave rectification, non- linear compression, and low-pass filtering (RCF) for each frequency channel. Neural spikes are produced at the output of each frequency channel to mimic information encoded by the auditory nerve fibers. The spectrogram correlation block produces a response with course echo resolution for detection. Once an echo is detected, the spectrogram transformation block is triggered to split this echo into multiple high-resolution echoes by a process of spectral deconvolution. The result is a hyper-resolution receiver that exceeds the resolution of a conventional cross-correlation receiver.
The spectrogram correlation block takes the narrowband spike events and per- forms the neural equivalent to a parallel cross-correlation in time. When an echolo- cation pulse is emitted, it triggers a broadband onset response across all frequencies. Any echoes received will also produce a broadband onset response at the appropriate time delay. The coincidence of spikes across multiple channels indicates the reception of one or more target echoes. Due to the inherent time-delay in the FM signals, some narrowband frequency channels will spike earlier than others. This apparent incoherence (or time separation) across channels will match the incoherence between the outgoing pulse and any received echoes, thereby eliminating the need to de-chirp received signals. Although the detection of a single pulse-echo pair is sufficient to estimate target range, the SCAT receiver goes further to deconvolve the spectral information into hyper-resolution images. Closely spaced point targets will produce acoustic echoes that overlap in the time-frequency plane. When this occurs, deterministic interference patterns arise in the form of spectral notches. Each pair of echoes separated in time
1 by ∆T produces the first notch at f0 = 2∆T , and subsequent notches at intervals of
34 1 fj = fj−1 + ∆T for j = 1, 2, 3 ... For signals with bandwidth between 20 and 100 kHz, these spectral notches occur for ∆T > 5µs = 1.7 mm until the echoes no longer overlap in the time-frequency plane. Unlike a traditional cross-correlation receiver, the spectrogram transformation block uses this additional spectral information to produce fine delay estimates. In the original SCAT model, the spectrogram transformation block is imple- mented as a “voting mechanism” with a set of cosine basis functions. Each frequency channel contains its own unique basis function with a period proportional to the center frequency of the filter. The amplitude of a basis function was scaled by the received echo level in each frequency channel. Despite its simplicity and lack of biological rel- evance, the summation across all channels produces impulses at the correct locations of two overlapping spikes. As pointed out by Peremans and Hallam [151], the SCAT model incorrectly estimates the times of two echoes having different amplitudes and produces artificial phantom echoes. Even with these nonlinearities, the SCAT model remains one of several models to date that can replicate bats’ hyper-resolution images of two-point targets. A recent review by Park and Allen [152] has likened the spectrogram transforma- tion process to a pattern recognition problem, where notches are actively detected and matched to corresponding time delays. This is in contrast to the original model that detects spectral energy and simply ignores the contributions from channels containing spectral notches. The cosine basis functions in the spectrogram transformation block produce many oscillatory peaks that can be incorrectly classified as point targets. Park and Allen proposed a method to suppress these unwanted peaks by predicting their locations and canceling them out. The goal of this process is comparable to the way interference cross-terms in a Wigner-Ville time-frequency distribution are smoothed [153]. Just as in Wigner-Ville smoothing, we sacrifice some resolution for reduced cross-term interference. Since SCAT was first published, other models have emerged that take on the
35 idea of spectral deconvolution for hyper-resolution range estimates. For example, Sanderson and Neretti used auditory filter bank models to address the question of biological relevance of the SCAT model [77, 76, 154]. By modifying the low-pass smoothing parameters at the RCF stage, they found that despite the low-temporal resolution of higher cortical areas in auditory system, there is indeed sufficient infor- mation across the time-frequency representation to register the interference patterns of two or more closely spaced echoes. Matsuo has applied Gaussian chirplet filter- banks [155] to the two-point resolution problem without relying upon an acoustic- to-neural transduction component [156, 157, 158]. More recently, Sharma and Buck proposed the variable resolution detection receiver (VRDR) without requiring filter banks [159, 160]. The VRDR model approaches the ideal impulse resolution of an inverse filter while maintaining a stable filter that can adapt to noise levels using a tuning parameter. Many of these modeling developments have focused on the prob- lem of achieving greater range resolution based on the hyper-resolution exemplified by echolocating bats. An equally intriguing problem is how echolocating animals are able to achieve hyper-acuity in angle.
2.3.3 Models for Angular Target Localization and Acoustic Imaging
A binaural version of SCAT, named Artificial SCAT, was created to reconstruct two- dimensional images of simple objects in the range-azimuth plane [79]. The superior range resolution allowed two separate SCAT processes to be used to localize in az- imuth by comparing ITD. Echoes from wires and spheres were recorded using a pair of microphones and a loudspeaker. The stereo time series recordings were presented to the SCAT processing model one channel at a time and triangulation with intersecting ellipses generated the 2-dimensional images from each time series signal. Although implementation details were not published, some of the range-azimuth imaging re- sults were made available [80]. Other binaural sonar models that explicitly use ITD for angular imaging have appeared in the literature [158, 161, 162]. These models
36 take advantage of the large bandwidth that yields improved range resolution, but additional spectral information is useful to improve azimuthal performance and is absolutely necessary for localization in elevation. Only recently have models begun to include spectral cues in the source localization process, including azimuth and el- evation [11, 163, 93], but many of these models abandon the filter bank approach in favor of more traditional signal processing tools.
2.3.4 Mathematical Models of Echolocation Performance
Taking a systems of systems approach to biosonar modeling and not concerning our- selves with the complexities of the brain can prove useful. There have been several interesting mathematical models published that aim to provide an explanation of echolocation performance by animals. In one of the earliest (and possibly most il- luminating) mathematical studies on a binaural sonar system, Altes calculated the Cramer-Rao lower bound (CRLB) for azimuth and elevation, and derived the max- imum likelihood estimator based on these results [91]. This analytical model found that azimuth localization accuracy is not only a function of ITD and SNR, but also of the gradient (i.e. sensitivity) of the magnitude and phase of broadband beam patterns versus angle. Since this work was ahead of its time, it did not include a numerical analysis with any measured biosonar beam patterns that have become available. Although the spectral effects for both, transmit and receive beam patterns were considered, none of the frequency-dependent effects in signal propagation were included. This particular study was limited to the accuracy of angular localization rather than resolution, which is required for acoustic imaging in densely cluttered environments. Altes does briefly comment on the subject of resolution, “Accurate unambiguous azimuth resolution can be obtained with only two transducers, even if the beam patterns of the transducers are very broad. It is only necessary to utilize a wide-band signal with an autocorrelation width that is narrow relative to the distance between transducers.”
37 With advances in computed-tomography and computational power, finite-element methods were pioneered to estimate the complex spectral properties of HRTFs [164]. With these new techniques, high-resolution HRTF models of bats’ pinnae and nose- leaves can be quickly assembled into libraries [47]. The HRTF libraries can be used for high-fidelity acoustic simulations, or quantifying the spectral information by the CRLB [165] or information theory [166, 167]. The information theoretic approach has also been used to evaluate performance of bio-inspired processing with conventional transducers [168].
2.3.5 Hardware Prototypes as Exploratory Models
As stated previously, modeling can lead to many insights into a problem if done properly. Unfortunately, models may also mask the true phenomenon of interest. In this vein, taking real acoustic measurements and constructing biomimetic prototype systems are necessary to test and verify models in the real world. Hardware prototypes are also the first step toward creating autonomous biomimetic sensors that can operate in real-time3. Over the past 15 years, biomimetic sonar models have appeared on integrated circuits [169, 170, 171]. All-digital field-programmable gate arrays (FPGA) are ap- pealing for the real-time implementation of auditory filter banks, because of the sheer number of parallel computations required [172]. Unfortunately, neural information processing on digital hardware is computationally expensive and makes inefficient use of resources. This is the primary reason that very-large scale integrated (VLSI) analog circuits have appeared for various bio-inspired computations (e.g. echo ranging with delay lines [173, 174, 175], azimuthal localization using IID cues [176, 177, 178, 179], binaural comparison of spectral cues [180], and spike-based neural information pro- cessing [181, 182]).
3Real-time has many interpretations that depend on the context. For a biosonar signal processor, real- time should be defined as having sufficient data throughput such that a bottleneck is never reached and latency that allows adequate response time to real-world events.
38 Various bio-inspired robotic sonar systems have been developed, which can be grouped by the basic set of information used for localization. Kuc used ITD with a simple pair of circular aperture receive transducers to localize and classify objects in realistic environments [183, 184]. Although only ITD was used for localization, the transducers were oriented off-axis so that a comparison between the broadband time-based signals could be used to perform classification. Schillebeeckx and Pere- mans have applied Bayesian probabilistic techniques [185] and maximum likelihood estimation (MLE) [186] to the localization problem from binaural HRTF. Using the spectrum of an emitted sound in a different manner, Guarato et al. showed that es- timating source orientation is possible [187]. Combining the concept of sparse arrays and bio-inspired processing, Steckel and Peremans used bandwidth to average out grating lobes over multiple frequency octaves [188, 189, 190]. A model and hardware processor was also created for simultaneous localization and mapping for guidance and control of a robot [191]. Each hardware prototype has individual merit, but together they demonstrate the clear advantages of biosonar acoustic imaging.
References
[1] W. Gerstner, R. Kempter, J. Van Hemmen, and H. Wagner, “A neuronal learn- ing rule for sub-millisecond temporal coding”, Nature 383, 76–78 (1996). [2] W. Au and J. Simmons, “Echolocation in dolphins and bats”, Phys. Today 60, 40–45 (2007). [3] C. J. Sumner, R. Meddis, and I. M. Winter, “The role of auditory nerve inner- vation and dendritic filtering in shaping onset responses in the ventral cochlear nucleus”, Brain Res. 1247, 221–234 (2009). [4] E. Covey and J. H. Casseday, “The lower brainstem auditory pathways”, in Hearing by bats, 235–295 (Springer, New York, NY) (1995). [5] N. Suga, E. Gao, Y. Zhang, and X. Ma, “The corticofugal system for hearing: Recent progress”, Proc. Natl. Acad. Sci. U.S.A. 97, 11807–11814 (2000). [6] E. Covey, “Neurobiological specializations in echolocating bats”, Anat. Rec. Part A 287, 1103–1116 (2005).
39 [7] E. Covey and J. Casseday, “Timing in the auditory system of the bat”, Annu. Rev. Physiol. 61, 457–476 (1999). [8] J. Casseday, “The monaural nuclei of the lateral lemniscus in an echolocating bat: Parallel pathways for analyzing temporal features of sound”, J. Neurosci. 11, 3456–3470 (1991). [9] R. Meddis, “Simulation of auditory-neural transduction: Further studies”, J. Acoust. Soc. Am. 83, 1056–1063 (1988). [10] R. Meddis, “Simulation of mechanical to neural transduction in the auditory receptor”, J. Acoust. Soc. Am. 79, 702–711 (1986). [11] B. Fontaine and H. Peremans, “Bat echolocation processing using first-spike latency coding”, Neural Networks 22, 1372–1382 (2009). [12] P. Heil, H. Neubauer, M. Brown, and D. Irvine, “Towards a unifying basis of auditory thresholds: Distributions of the first-spike latencies of auditory-nerve fibers”, Hearing Res. 238, 25–38 (2008). [13] P. Heil, H. Neubauer, D. Irvine, and M. Brown, “Spontaneous activity of auditory-nerve fibers: Insights into stochastic processes at ribbon synapses”, J. Neurosci. 27, 8457–8474 (2007). [14] P. Heil, “First-spike latency of auditory neurons revisited”, Curr. Opin. Neuro- biol. 14, 461–467 (2004). [15] R. Meddis, “Auditory-nerve first-spike latency and auditory absolute threshold: A computer model”, J. Acoust. Soc. Am. 119, 406–417 (2006). [16] P. Heil and D. Irvine, “First-spike timing of auditory-nerve fibers and compar- ison with auditory cortex”, J. Neurophysiol. 78, 2438–2454 (1997). [17] “Computational Models of the Auditory System”, Springer, New York (2010). [18] A. R. Moller, Hearing, Anatomy, Physiology, and Disorders of the Auditory System, 2nd edition (Academic Press, Burlington, MA) (2006). [19] N. S. Harper and D. McAlpine, “Optimal neural population coding of an audi- tory spatial cue”, Nature 430, 682–686 (2004). [20] T. Cover and J. Thomas, Elements of Information Theory, Wiley Series in Telecommunications and Signal Processing, 2nd edition (Wiley-Interscience, Hoboken, NJ) (2006). [21] D. Oertel, “The role of timing in the brain stem auditory nuclei of vertebrates”, Annu. Rev. Physiol. 61, 497–519 (1999). [22] D. Oertel and E. Young, “What’s a cerebellar circuit doing in the auditory system?”, Trends Neurosci. 27, 104–110 (2004).
40 [23] D. Oertel, S. Wright, X. Cao, and M. Ferragamo, “The multiple functions of T stellate/multipolar/chopper cells in the ventral cochlear nucleus”, Hearing Res. 276, 61–69 (2011). [24] P. H. S. Jen, “Adaptive mechanisms underlying the bat biosonar behavior”, Front. Biol. 5, 128–155 (2010). [25] M. Abeles, G. Hayon, and D. Lehmann, “Modeling compositionality by dynamic binding of synfire chains.”, J Comput. Neurosci 17, 179–201 (2004). [26] P. Dayan and L. Abbott, Theoretical Neuroscience: Computational and Math- ematical Modeling of Neural Systems (MIT Press, Cambridge, MA) (2001). [27] J. C. R. Licklider, “A duplex theory of pitch perception”, Experientia 7, 128– 134 (1951). [28] S. Shamma, “On the role of space and time in auditory processing”, Trends Cogn. Sci. 5, 340–348 (2001). [29] P. Joris, P. Smith, and T. Yin, “Coincidence detection minireview in the audi- tory system: 50 years after Jeffress”, Neuron 21, 1235–1238 (1998). [30] S. Dear and N. Suga, “Delay-tuned neurons in the midbrain of the big brown bat”, J. Neurophysiol. 73, 1084–1100 (1995). [31] J. F. Olsen and N. Suga, “Combination-sensitive neurons in the medial genic- ulate body of the mustached bat: encoding of target range information.”, J. Neurophysiol. 65, 1275–1296 (1991). [32] J. A. Simmons and J. E. Gaudette, “Biosonar echo processing by frequency- modulated bats”, IET Radar Sonar Navig. 6, 556–565 (2012). [33] N. Tishby, F. Pereira, and W. Bialek, “The information bottleneck method”, Arxiv Preprint Physics 1–16 (2000). [34] L. Buesing and W. Maass, “A spiking neuron as information bottleneck”, Neural Comput. 22, 1961–1992 (2010). [35] D. Johnson, “Information Theory and Neural Information Processing”, IEEE Trans. Inf. Theory 56, 653–666 (2010). [36] T. Lu and X. Wang, “Information content of auditory cortical responses to time-varying acoustic stimuli”, J. Neurophysiol. 91, 301 (2004). [37] W. Bialek, F. Rieke, R. R. de Ruyter van Steveninck, and D. Warland, “Reading a neural code.”, Science 252, 1854–1857 (1991). [38] E. M. Izhikevich, “Polychronization: Computation with spikes”, Neural Com- put. 18, 245–282 (2006). [39] E. M. Izhikevich, Dynamical systems in neuroscience, the geometry of excitabil- ity and bursting (MIT Press, Cambridge, MA) (2007).
41 [40] P. Chadderton, J. P. Agapiou, D. Mcalpine, and T. W. Margrie, “The Synaptic Representation of Sound Source Location in Auditory Cortex”, J. Neurosci. 29, 14127–14135 (2009). [41] F. L. Wightman and D. J. Kistler, “Monaural sound localization revisited.”, J. Acoust. Soc. Am. 101, 1050–1063 (1997). [42] R. A. Butler and R. A. Humanski, “Localization of sound in the vertical plane with and without high-frequency spectral cues.”, Percept. Psychophys. 51, 182– 186 (1992). [43] R. A. Butler, R. A. Humanski, and A. D. Musicant, “Binaural and monaural localization of sound in two-dimensional space”, Perception 19, 241–256 (1990). [44] H. Neubauer and P. Heil, “A physiological model for the stimulus dependence of first-spike latency of auditory-nerve fibers”, Brain Res. 1220, 208–223 (2008). [45] B. J. Fischer, L. J. Steinberg, B. Fontaine, R. Brette, and J. L. Pe˜na,“Effect of instantaneous frequency glides on interaural time difference processing by auditory coincidence detectors”, Proc. Natl. Acad. Sci. U.S.A. 108, 18138– 18143 (2011). [46] A. Brand, O. Behrend, T. Marquardt, D. Mcalpine, and B. Grothe, “Precise inhibition is essential for microsecond interaural time difference coding”, Nature 417, 543–547 (2002). [47] J. Ma and R. M¨uller,“A method for characterizing the biodiversity in bat pin- nae as a basis for engineering analysis”, Bioinspiration Biomimetics 6, 026008 (2011). [48] N. H. Fletcher and S. Thwaites, “Obliquely truncated simple horns: Idealized models for vertebrate pinnae”, Acustica 65, 194–204 (1988). [49] E. Lopez-Poveda, “Spectral processing by the peripheral auditory system: Facts and models”, Int. Rev. Neurobiol. 70, 7–48 (2005). [50] R. M¨uller,“A numerical study of the role of the tragus in the big brown bat”, J. Acoust. Soc. Am. 116, 3701–3712 (2004). [51] M. Aytekin, E. Grassi, M. Sahota, and C. Moss, “The bat head-related transfer function reveals binaural cues for sound localization in azimuth and elevation”, J. Acoust. Soc. Am. 116, 3594–3605 (2004). [52] J. A. Simmons and A. Megela Simmons, “Bats and frogs and animals in be- tween: Evidence for a common central timing mechanism to extract periodicity pitch”, J. Comp. Physiol. A 197, 585–594 (2010). [53] D. Griffin, Listening in the Dark, The Acoustic Orientation of Bats and Men (Cornell University Press, London) (1958).
42 [54] N. Veselka, D. D. Mcerlain, D. W. Holdsworth, J. L. Eger, R. K. Chhem, M. J. Mason, K. L. Brain, P. A. Faure, and M. B. Fenton, “A bony connection signals laryngeal echolocation in bats”, Nature 463, 939–942 (2010). [55] G. Neuweiler, The Biology of Bats (Oxford University Press, New York, NY) (2000). [56] T. W. Cranford, M. Amundin, and K. S. Norris, “Functional morphology and homology in the odontocete nasal complex: Implications for sound generation”, J. Morphol. 228, 223–285 (1996). [57] J. L. Aroyan, “Three-dimensional numerical simulation of biosonar signal emis- sion and reception in the common dolphin”, Ph.D. thesis, University of Califor- nia at Santa Cruz, Santa Cruz, CA (1996). [58] T. W. Cranford, P. Krysl, and J. A. Hildebrand, “Acoustic pathways revealed: Simulated sound transmission and reception in Cuvier’s beaked whale (Ziphius cavirostris)”, Bioinspiration Biomimetics 3, 016001 (2008). [59] W. E. Evans, “Echolocation by marine delphinids and one species of fresh-water dolphin”, J. Acoust. Soc. Am. 54, 191–199 (1973). [60] Y. Yovel, B. Falk, C. F. Moss, and N. Ulanovsky, “Optimal localization by pointing off axis”, Science 327, 701–704 (2010). [61] Q. Zhuang and R. M¨uller,“Noseleaf furrows in a horseshoe bat act as resonance cavities shaping the biosonar beam”, Phys. Rev. Lett. 97, 218701 (2006). [62] D. Vanderelst, F. De Mey, H. Peremans, I. Geipel, E. Kalko, and U. Firzlaff, “What noseleaves do for FM bats depends on their degree of sensorial special- ization”, PLoS ONE 5, e11893 (2010). [63] A. Surlykke and C. F. Moss, “Echolocation behavior of big brown bats, Eptesi- cus fuscus, in the field and the laboratory”, J. Acoust. Soc. Am. 108, 2419–2429 (2000). [64] R. Altes and E. Titlebaum, “Bat signals as optimally Doppler tolerant wave- forms”, J. Acoust. Soc. Am. 48, 1014–1020 (1970). [65] R. Altes, “Ubiquity of hyperacuity”, J. Acoust. Soc. Am. 85, 943–952 (1989). [66] F. C. Fraser and P. E. Purves, “Hearing in cetaceans”, Bulletin of the British Museum (Natural History) (1954). [67] F. C. Fraser and P. E. Purves, “Hearing in cetaceans: Evolution of the accessory air sacs and the structure and function of the outer and middle ear in recent cetaceans”, Bulletin of the British Museum (Natural History) (1960). [68] K. S. Norris, “Some problems of echolocation in cetaceans”, in Marine bioa- coustics, edited by W. N. Tavolga, 316–336 (Pergamon Press, New York, NY) (1964).
43 [69] K. S. Norris, “The evolution of acoustic mechanisms in odontocete cetaceans”, in Evolution and environment, edited by E. T. Drake, 297–324 (Yale University Press, New Haven, CT) (1968). [70] K. S. Norris, “The echolocation of marine mammals”, in The biology of marine mammals, edited by H. T. Anderson, 391–423 (Academic Press, New York, NY) (1969). [71] R. L. Brill, M. L. Sevenich, T. J. Sullivan, J. D. Sustman, and R. E. Witt, “Be- havioral evidence for hearing through the lower jaw by an echolocating dolphin (Tursiops truncatus)”, Marine Mammal Science 4, 223–230 (1988). [72] A. Rihaczek, Principles of High-Resolution Radar (Artech House, Norwood, MA) (1996). [73] M. I. Skolnik, Introduction to Radar Systems, 3rd edition (McGraw-Hill, Boston, MA) (2001). [74] J. A. Simmons, “The resolution of target range by echolocating bats”, J. Acoust. Soc. Am. 54, 157–173 (1973). [75] D. Menne and H. Hackbarth, “Accuracy of distance measurement in the bat Eptesicus fuscus: Theoretical aspects and computer simulations”, J. Acoust. Soc. Am. 79, 386–397 (1986). [76] M. I. Sanderson, N. Neretti, N. Intrator, and J. A. Simmons, “Evaluation of an auditory model for echo delay accuracy in wideband biosonar”, J. Acoust. Soc. Am. 114, 1648–1659 (2003). [77] N. Neretti, M. Sanderson, N. Intrator, and J. Simmons, “Time-frequency model for echo-delay resolution in wideband biosonar”, J. Acoust. Soc. Am. 113, 2137– 2147 (2003). [78] P. Saillant, J. Simmons, S. Dear, and T. McMullen, “A computational model of echo processing and acoustic imaging in frequency-modulated echolocating bats: The spectrogram correlation and transformation receiver”, J. Acoust. Soc. Am. 94, 2691–2712 (1993). [79] J. Simmons, P. Saillant, and S. Boatright, “Biologically inspired SCAT sonar receiver for 2-D imaging”, J. Acoust. Soc. Am. 102, 3153 (1997). [80] P. A. Saillant, “Neural Computations for Biosonar Imaging in the Big Brown Bat”, Ph.D. thesis, Brown University, Providence, RI (1995). [81] L. N. Kloepper, P. E. Nachtigall, M. J. Donahue, and M. Breese, “Active echolo- cation beam focusing in the false killer whale, Pseudorca crassidens”, J. Exp. Biol. 215, 1306–1312 (2012). [82] L. N. Kloepper, P. E. Nachtigall, C. Quintos, and S. A. Vlachos, “Single-lobed frequency-dependent beam shape in an echolocating false killer whale (Pseu- dorca crassidens)”, J. Acoust. Soc. Am. 131, 577–581 (2012).
44 [83] J. Simmons, C. Moss, and M. Ferragamo, “Convergence of temporal and spec- tral information into acoustic images of complex sonar targets perceived by the echolocating bat, Eptesicus fuscus”, J. Comp. Physiol. A 166, 449–470 (1990). [84] M. Sanderson and J. Simmons, “Neural responses to overlapping FM sounds in the inferior colliculus of echolocating bats”, J. Neurophysiol. 83, 1840–1855 (2000). [85] M. Sanderson and J. Simmons, “Selectivity for echo spectral interference and delay in the auditory cortex of the big brown bat Eptesicus fuscus”, J. Neuro- physiol. 87, 2823–2834 (2002). [86] B. K. Branstetter, S. J. Mevissen, L. M. Herman, A. Pack, and S. P. Roberts, “Horizontal angular discrimination by an echolocating bottlenose dolphin tur- siops truncatus”, Bioacoustics 14, 15–34 (2003). [87] J. Wotton and J. Simmons, “Spectral cues and perception of the vertical po- sition of targets by the big brown bat, Eptesicus fuscus”, J. Acoust. Soc. Am. 107, 1034–1041 (2000). [88] J. Wotton, T. Haresign, M. Ferragamo, and J. Simmons, “Sound source ele- vation and external ear cues influence the discrimination of spectral notches by the big brown bat, Eptesicus fuscus”, J. Acoust. Soc. Am. 100, 1764–1776 (1996). [89] Z. M. Fuzessery, “Monaural and binaural spectral cues created by the external ears of the pallid bat”, Hearing Res. 95, 1–17 (1996). [90] W. M. Masters, A. J. Moffat, and J. A. Simmons, “Sonar tracking of horizontally moving targets by the big brown bat Eptesicus fuscus”, Science 228, 1331–1333 (1985). [91] R. Altes, “Angle estimation and binaural processing in animal echolocation”, J. Acoust. Soc. Am. 63, 155–173 (1978). [92] R. M¨uller, “Numerical analysis of biosonar beamforming mechanisms and strategies in bats”, J. Acoust. Soc. Am. 128, 1414–1425 (2010). [93] J. Reijniers and H. Peremans, “Biomimetic sonar system performing spectrum- based localization”, IEEE Trans. Robot. 23, 1151–1159 (2007). [94] B. K. Branstetter and E. Mercado, III, “Sound Localization by Cetaceans”, International Journal of Comparative Psychology 19, 26–61 (2006). [95] S. S¨umer,A. Denzinger, and H.-U. Schnitzler, “Spatial unmasking in the echolo- cating Big Brown Bat, Eptesicus fuscus”, J. Comp. Physiol. A 195, 463–472 (2009). [96] J. A. Simmons, S. A. Kick, B. D. Lawrence, C. Hale, C. Bard, and B. Escudie, “Acuity of horizontal angle discrimination by the echolocating bat, Eptesicus fuscus”, J. Comp. Physiol. A 153, 321–330 (1983).
45 [97] M. E. Bates, S. A. Stamper, and J. A. Simmons, “Jamming avoidance response of big brown bats in target detection”, J. Exp. Biol. 211, 106–113 (2008). [98] M. Warnecke, M. E. Bates, V. Flores, and J. A. Simmons, “Spatial release from simultaneous echo masking in bat sonar”, J. Acoust. Soc. Am. 135, 1–9 (2014). [99] M. E. Bates, J. A. Simmons, and T. V. Zorikov, “Bats use echo harmonic structure to distinguish their targets from background clutter”, Science 333, 627–630 (2011). [100] R. M. Pope and E. S. Fry, “Absorption spectrum (380-700 nm) of pure water. II. Integrating cavity measurements”, Applied optics 36, 8710–8723 (1997). [101] G. E. Becker and S. H. Autler, “Water vapor absorption of electromagnetic radiation in the centimeter wave-length range”, Physical Review 70, 300–307 (1946). [102] X. Lurton, An Introduction to Underwater Acoustics, Principles and Applica- tions (Springer, New York) (2002). [103] H. S. Maxim, A New System for Preventing Collisions at Sea (Cassell and Company, London) (1912). [104] R. Urick, Principles of Underwater Sound, 3rd edition (Pennsylvania Publica- tions, Los Altos, CA) (1983). [105] W. Burdic, Underwater Acoustic System Analysis, 2nd edition (Pennsylvania Publications, Los Altos, CA) (2003). [106] B. Maranda, “Efficient digital beamforming in the frequency domain”, J. Acoust. Soc. Am. 86, 1813–1819 (1989). [107] M. Bono, B. Shapo, P. McCarty, and R. Bethel, “Subband energy detection in passive array processing”, Technical Report ADA405484, Univ. of Texas at Austin. Applied Research Labs., Austin, TX (2000). [108] V. Valimaki and T. Laakso, “Principles of fractional delay filters”, in Proc. IEEE ICASSP ’00, 3870–3873 (2000). [109] D. H. Johnson and D. E. Dudgeon, Array Signal Processing: Concepts and Techniques (Prentice Hall PTR, Upper Saddle River, NJ) (1993). [110] D. Abraham, “Short Course on Array Signal Processing for Sonar”, in 166th Meeting of the Acoustical Society of America (San Francisco, CA) (2013). [111] A. V. Oppenheim, R. W. Schafer, and J. R. Buck, Discrete-Time Signal Pro- cessing, 2nd edition (Prentice Hall PTR, Englewood Cliffs, NJ) (1999). [112] P. P. Vaidyanathan and P. Pal, “Sparse coprime sensing with multidimensional lattice arrays”, Digital Signal Processing Workshop IEEE 425–430 (2011). [113] M. J. Hinich, “Processing spatially aliased arrays”, J. Acoust. Soc. Am. 64, 792–794 (1978).
46 [114] K. Drakakis, “A review of Costas arrays”, J. Appl. Math. 2006, 1–32 (2006). [115] J. Costas, “A study of a class of detection waveforms having nearly ideal range— Doppler ambiguity properties”, Proc. IEEE 72, 996–1009 (1984). [116] M. P. Hayes and P. T. Gough, “Synthetic aperture sonar: A review of current status”, IEEE J. Ocean. Eng. 34, 207–224 (2009). [117] A. Bellettini and M. A. Pinto, “Theoretical accuracy of synthetic aperture sonar micronavigation using a displaced phase-center antenna”, IEEE J. Ocean. Eng. 27, 780–789 (2002). [118] M. Pinto, “Use of frequency and transmitter location diversities for ambiguity suppression in synthetic aperture sonar systems”, in OCEANS ’97. MTS/IEEE Proc., 363–368 (1997). [119] K. F. Nieman, K. A. Perrine, T. L. Henderson, K. H. Lent, T. J. Brudner, and B. L. Evans, “Wideband monopulse spatial filtering for large receiver arrays for reverberant underwater communication channels”, in Proc. IEEE OCEANS 2010 MTE, 1–8 (IEEE) (2010). [120] E. Mosca, “Angle estimation in amplitude comparison monopulse systems”, IEEE Trans. Aerosp. Electron. Syst. AES-5, 205–212 (1969). [121] G. Llort-Pujol, C. Sintes, and D. Gueriot, “Analysis of Vernier interferometers for sonar bathymetry”, in Proc. IEEE OCEANS ’08, 1–5 (IEEE) (2008). [122] G. Llort-Pujol, C. Sintes, and X. Lurton, “A new approach for fast and high- resolution interferometric bathymetry”, in Proc IEEE OCEANS ’06, 1–7 (2006). [123] R. G. Lorenz and S. P. Boyd, “Robust minimum variance beamforming”, IEEE Trans. Signal Process. 53, 1684–1696 (2005). [124] J. Capon, “High-resolution frequency-wavenumber spectrum analysis”, Proc. IEEE 57, 1408–1418 (1969). [125] J. W. Odendaal, E. Barnard, and C. W. I. Pistorius, “Two-dimensional super- resolution radar imaging using the MUSIC algorithm”, IEEE Trans. Antennas Propagat. 42, 1386–1391 (1994). [126] R. Schmidt, “Multiple emitter location and signal parameter estimation”, IEEE Trans. Antennas Propagat. 34, 276–280 (1986). [127] A. Baggeroer and W. Kuperman, “An overview of matched field methods in ocean acoustics”, IEEE J. Ocean. Eng. 18, 401–424 (1993). [128] A. Baggeroer, W. Kuperman, and H. Schmidt, “Matched field processing: Source localization in correlated noise as an optimum parameter estimation problem”, J. Acoust. Soc. Am. 83, 571–587 (1988). [129] R. L. Thompson, J. Seawall, and T. Josserand, “Two dimensional and three dimensional imaging results using blazed arrays”, in Proc. IEEE OCEANS ’01, 985–988 (2001).
47 [130] R. L. Thompson and W. J. Zehner, “Frequency-steered acoustic beam forming system and process”, US Patent Office 5,923,617 (1999). [131] M. Hiipakka, T. Kinnari, and V. Pulkki, “Estimating head-related transfer functions of human subjects from pressure–velocity measurements”, J. Acoust. Soc. Am. 131, 4051–4061 (2012). [132] V. A. Gordienko, V. I. Il’ichev, and L. N. Zakharov, Vector-phase methods in acoustics (George Washington University, Seattle, WA) (1989). [133] D. M. Donskoy and B. A. Cray, “Acoustic particle velocity horns”, J. Acoust. Soc. Am. 131, 3883 (2012). [134] A. Nehorai and E. Paldi, “Acoustic vector-sensor array processing”, IEEE Trans. Signal Process. 42, 2481–2491 (1994). [135] J. V. Candy, Model-Based Signal Processing (John Wiley & Sons, Hoboken, NJ) (2005). [136] L. B. Jackson, Digital Filters and Signal Processing with MATLAB Exercises, 3rd edition (Klewer Academic Publishers, Norwell, MA) (1995). [137] S. S. Haykin, Adaptive Filter Theory, 5th edition (Prentice Hall, Upper Saddle River, NJ) (2013). [138] D. Simon, Optimal State Estimation, Kalman, H Infinity, and Nonlinear Ap- proaches (John Wiley & Sons, Hoboken, NJ) (2006). [139] R. Van der Merwe and E. Wan, “The square-root unscented Kalman filter for state and parameter-estimation”, in IEEE ICASSP ’01 Proc., 3461–3464 vol.6 (2001). [140] D. Gamerman and H. F. Lopes, Markov Chain Monte Carlo, Stochastic Sim- ulation for Bayesian Inference, Second Edition, 2nd edition (CRC Press, Boca Raton, FL) (2006). [141] I. Steinwart and A. Christmann, Support Vector Machines (Springer, New York) (2008). [142] S. S. Haykin, Neural Networks and Learning Machines (Prentice Hall, Upper Saddle River, NJ) (2009). [143] T. Irino and R. Patterson, “A time-domain, level-dependent auditory filter: The gammachirp”, J. Acoust. Soc. Am. 101, 412–419 (1997). [144] C. Sumner, L. O’Mard, E. Lopez-Poveda, and R. Meddis, “A nonlinear filter- bank model of the guinea-pig cochlear nerve: Rate responses”, J. Acoust. Soc. Am. 113, 3264–3274 (2003). [145] E. Covey and J. H. Casseday, “Connectional basis for frequency representation in the nuclei of the lateral lemniscus of the bat Eptesicus fuscus”, J. Neurosci. (1986).
48 [146] J. A. Simmons, M. B. Fenton, and M. J. O’Farrel, “Echolocation and pursuit of prey by bats”, Science 203, 16–21 (1979). [147] Ferragamo, M. Sanderson, and J. Simmons, “Phase sensitivity of auditory brain- stem responses in echolocating big brown bats”, J. Acoust. Soc. Am. 112, 2288 (2002). [148] J. A. Simmons, M. Ferragamo, C. F. Moss, S. B. Stevenson, and R. A. Altes, “Discrimination of jittered sonar echoes by the echolocating bat, Eptesicus fus- cus: The shape of target images in echolocation”, J. Comp. Physiol. A 167, 589–616 (1990). [149] R. Roverud, “Complex sound analysis in the lesser bulldog bat: Evidence for a mechanism for processing frequency elements of frequency modulated signals over restricted time intervals”, J. Comp. Physiol. A 174, 559–565 (1994). [150] M. Ferragamo, T. Haresign, and J. Simmons, “Frequency tuning, latencies, and responses to frequency-modulated sweeps in the inferior colliculus of the echolocating bat, Eptesicus fuscus”, J. Comp. Physiol. A 182, 65–79 (1997). [151] H. Peremans and J. Hallam, “The spectrogram correlation and transformation receiver, revisited”, J. Acoust. Soc. Am. 104, 1101–1110 (1998). [152] M. Park and R. Allen, “Pattern-matching analysis of fine echo delays by the spectrogram correlation and transformation receiver”, J. Acoust. Soc. Am. 128, 1490–1500 (2010). [153] W. Martin and P. Flandrin, “Wigner-Ville spectral analysis of nonstationary processes”, IEEE Trans. Acoust., Speech, Signal Process. 33, 1461–1470 (1985). [154] M. I. Sanderson, “The representation of temporal and spectral information cor- responding to target range in the auditory system of the big brown bat”, Ph.D. thesis, Brown University, Providence, RI (2002). [155] S. Mann and S. S. Haykin, “The chirplet transform: physical considerations”, IEEE Trans. Signal Process. 43, 2745–2761 (1995). [156] I. Matsuo, K. Kunugiyama, and M. Yano, “An echolocation model for range dis- crimination of multiple closely spaced objects: Transformation of spectrogram into the reflected intensity distribution”, J. Acoust. Soc. Am. 115, 920–928 (2004). [157] I. Matsuo and M. Yano, “An echolocation model for the restoration of an acous- tic image from a single-emission echo”, J. Acoust. Soc. Am. 116, 3782–3788 (2004). [158] I. Matsuo, J. Tani, and M. Yano, “A model of echolocation of multiple targets in 3D space from a single emission”, J. Acoust. Soc. Am. 110, 607–624 (2001). [159] N. S. Sharma, J. R. Buck, and J. A. Simmons, “Trading detection for resolution in active sonar receivers”, J. Acoust. Soc. Am. 130, 1272 (2011).
49 [160] N. S. Sharma and J. Buck, “A generalized linear filter approach for sonar re- ceivers”, in IEEE DSP/SPE 2009, 507–512 (2009). [161] I. Matsuo, “Localization and tracking of moving objects in two-dimensional space by echolocation”, J. Acoust. Soc. Am. 133, 1151–1157 (2013). [162] S. E. Forsythe, H. A. Leinhos, and P. R. Bandyopadhyay, “Dolphin-inspired combined maneuvering and pinging for short-distance echolocation”, J. Acoust. Soc. Am. 124, EL255–EL261 (2008). [163] L. Wiegrebe, “An autocorrelation model of bat sonar”, Biol. Cybern. 98, 587– 595 (2008). [164] R. M¨ullerand J. C. T. Hallam, “Knowledge mining for biomimetic smart an- tenna shapes”, Rob. Autom. Syst. 50, 131–145 (2005). [165] R. M¨uller,H. Lu, and J. Buck, “Sound-diffracting flap in the ear of a bat generates spatial information”, Phys. Rev. Lett. 100, 108701 (2008). [166] D. Vanderelst, J. Reijniers, J. Steckel, and H. Peremans, “Information gener- ated by the moving pinnae of Rhinolophus rouxi: Tuning of the morphology at different harmonics”, PLoS ONE 6, e20627 (2011). [167] J. Reijniers, D. Vanderelst, and H. Peremans, “Morphology-induced information transfer in bat sonar”, Phys. Rev. Lett. 105, 148701 (2010). [168] D. Vanderelst, J. Reijniers, F. Schillebeeckx, and H. Peremans, “Evaluat- ing three-dimensional localisation information generated by bio-inspired in-air sonar”, IET Radar Sonar Navig. 6, 516–525 (2012). [169] T. Horiuchi, “A systems view of a neuromorphic VLSI echolocation system”, IEEE ISCAS 2008 (2007). [170] T. Horiuchi, “Seeing in the dark: Neuromorphic VLSI modeling of bat echolo- cation”, IEEE Signal Process. Mag. 22, 134–139 (2005). [171] G. Cauwenberghs, R. Edwards, Y. Deng, R. Genov, and D. Lemonds, “Neuro- morphic processor for real-time biosonar object detection”, IEEE ICASSP ’02 Proc. 4, 3984–3987 (2001). [172] C. Clarke and L. Qiang, “Bat on an FPGA: A biomimetic implementation of a highly parallel signal processing system”, in Proc. IEEE ACSSC ’04, 456–460 (2004). [173] T. Horiuchi, “A spike-latency model for sonar-based navigation in obstacle fields”, IEEE Trans. Circuits Syst. I, Reg. Papers 56, 2393–2401 (2009). [174] T. Horiuchi, “A neural model for sonar-based navigation in obstacle fields”, IEEE ISCAS 2008 605–608 (2006). [175] M. Cheely and T. Horiuchi, “A VLSI model of range-tuned neurons in the bat echolocation system”, IEEE ISCAS 2003 4, 872–875 (2003).
50 [176] T. Horiuchi, “A neuromorphic VLSI model of bat interaural level difference pro- cessing for azimuthal echolocation”, IEEE Trans. Circuits Syst. I, Reg. Papers 54, 74–88 (2007). [177] T. Horiuchi, “A VLSI model of the bat dorsal nucleus of the lateral lemniscus for azimuthal echolocation”, IEEE ISCAS 2005 5, 4217–4220 (2005). [178] R. Z. Shi and T. K. Horiuchi, “A VLSI model of the bat lateral superior olive for azimuthal echolocation”, in IEEE ISCAS ’04, 900–903 (2004). [179] T. Horiuchi, “Spike-based VLSI modeling of the ILD system in the echolocating bat”, Neural Networks (2001). [180] T. Horiuchi, “Binaural spectral cues for ultrasonic localization”, IEEE ISCAS 2008 2110–2113 (2008). [181] H. Abdalla and T. K. Horiuchi, “Spike-based acoustic signal processing chips for detection and localization”, in 2008 IEEE Biomedical Circuits and Systems Conference, 225–228 (IEEE) (2008). [182] T. Horiuchi, “An ultrasonic filterbank with spiking neurons”, IEEE ISCAS 2008 (2005). [183] R. Kuc, “Biomimetic sonar and neuromorphic processing eliminate reverbera- tion artifacts”, IEEE Sensors J. 7, 361–369 (2007). [184] R. Kuc, “Biomimetic sonar locates and recognizes objects”, J. Ocean. Eng., IEEE 22, 616–624 (1997). [185] F. Schillebeeckx, J. Reijniers, and H. Peremans, “Probabilistic spectrum based azimuth estimation with a binaural robotic bat head”, in 2008 Fourth Inter- national Conference on Autonomic and Autonomous Systems (ICAS), 142–147 (IEEE) (2008). [186] F. Schillebeeckx and H. Peremans, “Biomimetic sonar: 3D-localization of mul- tiple reflectors”, in IEEE/RSJ International Conference on Intelligent Robots and Systems, 3079–3084 (2010). [187] F. Guarato, L. Jakobsen, D. Vanderelst, A. Surlykke, and J. Hallam, “A method for estimating the orientation of a directional sound source from source direc- tivity and multi-microphone recordings: Principles and application”, J. Acoust. Soc. Am. 129, 1046–1058 (2011). [188] J. Steckel, A. Boen, and H. Peremans, “Broadband 3-D sonar system using a sparse array for indoor navigation”, IEEE Trans. Robot. 29, 161–171 (2013). [189] J. Steckel and H. Peremans, “A novel biomimetic sonarhead using beamform- ing technology to mimic bat echolocation”, IEEE Tran. Ultrason., Ferroelectr., Freq. Control 59, 1369–1377 (2012). [190] J. Steckel, F. Schillebeeckx, and H. Peremans, “Biomimetic sonar, outer ears versus arrays”, in Sensors, 2011 IEEE, 821–824 (2011).
51 [191] J. Steckel and H. Peremans, “BatSLAM: Simultaneous Localization and Map- ping Using Biomimetic Sonar”, PLoS ONE 8, e54076 (2013).
52 Chapter 3
Multi-Component Separation and Analysis of Bat Echolocation Calls
Abstract
The vast majority of animal vocalizations contain multiple FM components with vary- ing amounts of non-linear modulation and harmonic instability. This is especially true of biosonar sounds where precise time-frequency templates are essential for neural in- formation processing of echoes. Understanding the dynamic waveform design by bats and other echolocating animals may help to improve the efficacy of man-made sonar through biomimetic design. Bats are known to adapt their call structure based on the echolocation task, proximity to nearby objects, and density of acoustic clutter. To interpret the significance of these changes, a method was developed for component separation and analysis of biosonar waveforms. Techniques for imaging in the time- frequency plane are typically limited due to the uncertainty principle and interference cross-terms. This problem is addressed by extending the use of the fractional Fourier transform to isolate each non-linear component for separate analysis. Once separated, Empirical Mode Decomposition (EMD) can be used to further examine each compo- nent. The Hilbert transform may then successfully extract detailed time-frequency information from each isolated component. This multi-component analysis method is
The contents of this chapter were published in the Journal of the Acoustical Society of America. 2013 January; 133(1):538–546. [DOI: 10.1121/1.4768877].
53 applied to the sonar signals of four species of bats recorded in-flight by radiotelemetry along with a comparison of other common time-frequency representations.
3.1 Introduction
The active sonar call of the big brown bat (Eptesicus fuscus) contains multiple non- linear FM components that are harmonically related [1]. The scale invariant proper- ties of this species’ echolocation signals [2, 3] implies that cross-correlation between the signal and the echo returns are insensitive to in-flight Doppler shifts. Furthermore, the call of E. fuscus is a multi-component signal that naturally increases the effective bandwidth and consequently improves range resolution. Despite the advantages for active sonar pulse design, these non-linear and multi-component characteristics make it difficult to precisely localize energy in the time-frequency plane. Animal vocalizations are typically described using conventional spectrograms, which have intrinsically low time-frequency resolution. Alternative representations may better capture the information that animals actually use, particularly since bats manifest greater time-frequency acuity. Small details in the call signal structure may appear subtle and unimportant, but could actually lead to statistically significant ob- servations of the animals’ behavior. An example of nearly indistinct, yet intentional adaptive pulse design by E. fuscus is described in Hiryu et al. [4]. Using the spectro- gram, they found that bats shifted echolocation frequencies by several kHz (< 4-8% of total bandwidth) to avoid pulse-echo ambiguity in dense clutter. Most interesting is the fact that temporal cross-correlation between the pulse-echo pairs are nearly iden- tical, which strongly suggests that these bats do not simply use conventional matched filtering for echo processing. Many different time-frequency representations (TFR) are used to process multi- component, linear, quadratic, and higher-order FM signals. If the signal is stationary, the Fourier Transform (FT) is an effective tool for analyzing the frequency content.
54 0 0 120 120 A FM3 B 100 FM2 −5 100 −5
80 −10 80 −10
60 FM1 −15 60 −15
40 −20 40 −20 Frequency (kHz) Frequency (kHz) 20 −25 20 −25
0 −30 0 −30 0 0.5 1 1.5 2 2.5 3 3.5 0 0.5 1 1.5 2 2.5 3 3.5 Time (ms) Time (ms)
0 0 120 C 120 D 100 −5 100 −5
80 −10 80 −10
60 −15 60 −15
40 −20 40 −20 Frequency (kHz) Frequency (kHz) 20 −25 20 −25
0 −30 0 −30 0 0.5 1 1.5 2 2.5 3 3.5 0 0.5 1 1.5 2 2.5 3 3.5 Time (ms) Time (ms)
Figure 3.1. Four different time-frequency distributions of an FM echolocation call from E. fuscus. (a) The spectrogram shows that this species of bat produces at least two prominent harmonic com- ponents (labeled FM1, FM2, etc.), which is a common characteristic among many echolocating bats. (b) The Wigner-Ville distribution (WVD) provides very good resolution, but interference cross-terms incorrectly place energy within and between components. (c) Cross-terms are effectively removed at the cost of resolution in the Smoothed Pseudo WVD. (d) The reassignment method [7] (computed on c) is a highly effective technique for improving the readability of any TFR. Reassignment works by remapping the energy distributed in a TFR onto its center-of-gravity; however, it cannot show details that are unresolved in the base representation. All plots are shown on a normalized decibel scale.
However, the FT provides little insight into the nature of signals from nonstationary or nonlinear systems. For instance, quadratic phase (linear FM) signals are poorly represented by the FT because it is a transform from time to frequency, i.e. not a joint distribution in time and frequency. A common way around this issue is to take the FT of short moving windows of the signal in time, thus providing frequency information as a function of time. This leads us to the short-time Fourier transform and its squared modulus, the spectrogram. The difficulty with this approach is that the window must be small enough in time to provide good time resolution and wide
55 enough in the bandwidth sense to provide good frequency resolution. These simul- taneous conflicting objectives lead to leakage of the spectral energy and a generally smeared appearance in the time-frequency plane. Use of the spectrogram has be- come ubiquitous due to its fast computation, simple interpretation, and widespread software integration; however, it is very difficult to resolve fine details from the spec- trogram alone, especially if attempting to automate the process. Fig. 3.1 illustrates the spectrogram of an example echolocation call by E. fuscus alongside other TFRs, including the Wigner-Ville distribution (WVD) [5], the smoothed psuedo-WVD [6], and the reassignment method [7]. Many different methods have been used to visualize biosonar signals beyond the common TFRs. These include time-scale analysis [8], the Fractional Fourier Transform (FrFT) [9, 10, 11], wavelets [12], and the minimum variance estimator [13]. In these methods only a small number of signals were analyzed to show the processing technique. For practical applications, it is important to consider how well a method can automatically extract waveform parameters in a large set of data. Recently, a host of TFR tools based on the idea of polynomial phase signal models have appeared [14, 15, 16, 17]. They generally rely on adaptations to the ambiguity function including multiple products, lagged versions, higher orders, or some combination thereof [18, 19, 20]. It is not surprising that this approach has received a great deal of attention, as the ambiguity function is itself the characteristic function of the WVD [21]. In other words, the WVD and ambiguity function form a Fourier pair [22]. A significant reason for not adopting these more mathematically rigorous parametric models is that often they are defended with the caveat that the amplitude be constant or slowly varying in time. This condition cannot be guaranteed for biosonar signals which contain unique amplitude modulations that change with each emitted pulse. Unfortunately there is no single time-frequency technique that is optimized for all situations. While meaningful insight can be gleaned using parametric models or
56 appropriate TFRs for a specific signal, the notion of using an adaptive or empirical decomposition is attractive due to the complexity and nonlinearity of the bat’s sound production system. Imaging techniques to improve time-frequency fidelity would more easily identify small differences in call structure (as found in Hiryu et al.). Resolving these differences is critical, however, to understand how these changes are actually perceived by the bat. This paper extends the use of the FrFT and applies several techniques to sepa- rate and analyze nonlinear harmonic components in biosonar signals. The methodol- ogy should be easily extrapolated to other highly variable, multi-component signals, such as calls of other bat species, marine mammal calls and whistles, insect commu- nication, and voiced-speech.
3.2 Data Collection
The algorithm was developed and refined using a single E. fuscus call recorded at high signal-to-noise ratio (Fig. 3.1). An ultrasonic free-field microphone (Series 4139, Br¨uel& Kjær) was placed directly in front of the bat on a stationary platform at approximately 20 cm. A recording was made while the bat performed a 2-choice discrimination test. The echolocation signal was recorded with a digital audio recorder (ISC-16, R.C. Electronics) at a 250kHz sampling rate [23]. Typical of this species of bat, the signal is non-linearly modulated, with two principal harmonics FM1 and FM2 along with a partial 3rd harmonic, FM3. To evaluate the utility of our method, we analyzed a body of existing data. This consisted of biosonar sounds recorded from four species of bats using a radio microphone (“Telemike”) carried by the flying bat [4, 24, 25, 26]. The Telemike includes an electret condenser microphone (FG Series, Knowles Acoustics, IL, USA) positioned above the bat’s head and attached to a miniature radio transmitter used to record the sounds without the acoustic artifacts that normally occur when a moving
57 bat is recorded by a stationary microphone. The data set included calls from E. fuscus, the eastern bent-winged bat (Miniopterus fuliginosus), the Japanese house bat (Pipistrellus abramus), and the greater horseshoe bat (Rhinolophus ferrumequinum). For each species the time series contained multiple biosonar signals recorded while the animal was navigating through a flight room used for testing their responses to clutter. The flights and recordings were conducted in the laboratories of Hiroshi Riquimaroux and Shizuko Hiryu at Doshisha University (Kyotanabe, Japan) or at Brown University. During recording, the signals were digitally sampled at either 384 kHz or 192 kHz [4, 24].
3.3 Methods
The multi-component analysis presented here is a two-part process: separation of harmonic components followed by mono-component decomposition. Component sep- aration includes a new use of the FrFT to find a rough approximation of instantaneous frequency, fi(t), time-varying demodulation centered about fi(t), and a zero-phase filtering technique that will not affect the phase or group delay of the signal compo- nent. Mono-component decomposition consists of applying analysis techniques such as Empirical Mode Decomposition (EMD) and Hilbert spectral analysis. The re- sulting decomposition produces highly resolvable images of each component in the time-frequency plane. The reader is referred to the Appendices for an overview of our definitions of a multi-component waveform and how the Hilbert spectral analysis can be used to extract this information.
3.3.1 Separation of Harmonic Components
Component separation may be performed in a variety of ways; however, the follow- ing demonstrates a robust approach that combines the use of the Fractional Fourier Transform, demodulation, and zero-phase filtering. The FrFT provides an easy way
58 to approximate a component’s instantaneous frequency. We apply a time-varying bandpass filter along this estimate to isolate the component. Subtracting the result from the original signal allows the process to be repeated until all components have iteratively been separated.
3.3.1.1 Fractional Fourier Transform
The fractional Fourier transform (FrFT) and Radon-Wigner transform (RWT) are both fractional rotations of a signal from the time domain to the frequency domain in the time-frequency plane. The FrFT can be defined in its more familiar integral form [27] as
−i( π − φ ) Z e 4 2 1 i(t2+u2) cot(φ) −i tu F rF T (φ, u) = x(t)e 2 e sin(φ) dt (3.1) p2π sin(φ)
The parameter φ is the angle of rotation in radians and u is the fractional dimension
π between time and frequency. Letting φ = α 2 , a rotation of α = 0 is simply the time series itself and a rotation of α = 1 is a traditional FT, any non-integer rotation will produce a fractional FT. This can be accomplished easily by forming the Fourier unitary matrix, raising it to an arbitrary power, α, then multiplying the FT of the original signal with the matrix. Repeatedly applying the FT to a signal is equivalent to raising this matrix to an integer power. For example, raising the matrix to 0, 1, 2 and 3, results in the original time series, the FT, the time-reversed series, and the FT of the time-reversed signal, respectively. The RWT is the Radon transform of the WVD. Geometrically, the RWT is a tomographic transform that combines a rotation of the WVD with a projection onto a one dimensional axis at some angle of rotation φ. Like the WVD, the RWT results in a 2D distribution. Unlike the WVD, the RWT provides intensity information not as a function of time and frequency, but rather as a function of frequency and angle
59 of rotation of the WVD. As a result, the relationship between the RWT and FrFT follows
RWT (φ, u) = |F rF T (φ, u)|2 (3.2)
That is, the RWT is equivalent to the squared modulus of the FrFT [28, 29, 30]. It should be noted that, like the conventional FT, the FrFT is a linear operator. The WVD, and therefore the RWT, are both bilinear operators on the signal. As a result, the FrFT is a TFR which does not produce the cross-term interference asso- ciated with bilinear TFRs. Because the RWT is a projection onto a one-dimensional axis through a line integral at angle α, the two-dimensional, bilinear (quadratic) rep- resentation loses the cross term interference during the projection, thus preserving the relationship between the RWT and the FrFT [29].
3.3.1.2 Rough Approximation of Instantaneous Frequency
This method uses a discrete implementation of the Fractional Fourier Transform (FrFT) [31] to compute the RWT of the analytic signal,x ˜(t). Fig. 3.2 shows the signal from Fig. 3.1 in the rotation-fraction domain. Each column in the image is formed by computing the RWT ofx ˜(t) for a specific angle of rotation, α. Computing the RWT at more angles leads to better α resolution and zero-padding or interpolating the signal will increase resolution in u. Every (α, u) pair corresponds to a specific line in the time-frequency plane. For a linear FM signal, fi(t) = f0 + kt can be precisely estimated by finding its peak in the rotation-fraction plane and solving for the constants f0 and k as
60 1 0
0.8 −5 −10
(u) 0.6 −15 0.4
Fraction −20
0.2 −25
0 −30 −1 −0.5 0 0.5 1 Rotation (α)
Figure 3.2. Rotation-fraction domain of the E. fuscus signal. The FrFT is computed on the analytic time series signal at incremental rotation values, α. The squared modulus, |F rF T |2, produces the vertical slices of the rotation-fraction domain. Each (α, u) point in the image corresponds to a unique line cutting across the time-frequency plane. Once the global peak on the surface is found, points along the local ridge (inset) represent lines passing through subsections of the nonlinear component in the time-frequency plane. A polynomial curve is fit to the intersection points of adjacent lines which results in a rough estimate of fi(t) for one component.
f 2 π k = − s cot(α ) (3.3) T 2 1 π f =f (u − )csc(α ) (3.4) c s 2 2 T f =f − k (3.5) 0 c 2
where fs is the sampling frequency, T is the period of the signal, and fc is the frequency at the midpoint of the line [32]. Since the bat’s signal consists of nonlinear FM components, there is no single peak, but a continuous ridge where multiple (α, u) pairs correspond to lines that pass through subsections of a component. We make use of this fact by normalizing the RWT to the highest peak, detecting local points along the ridge above a thresh- old, then finding the intersection points of the lines from adjacent (α, u) pairs. This
61 generates points in the time-frequency plane along the most prominent component. The end points can be extended by projecting out from the first and last intersec- tion points. Fitting a polynomial or spline curve to these points provides a rough
approximation to fi(t) for one component without a priori information on any FM parameters.
3.3.1.3 Zero-Phase Component Filtering
A time-varying bandpass filter is effectively applied to the analytic signal along the
instantaneous frequency approximation. This is achieved by first integrating fi(t) to
find the phase law, φi(t), as in Eq. (3.13) and demodulating the signal as
xˇ(t) =x ˜(t)e−jφi(t) (3.6)
The demodulated complex signal,x ˇ(t), is then lowpass filtered to remove unwanted harmonics and reverberation. The filter bandwidth can be adjusted depending on the
accuracy of the initial fi(t) estimate. Note that a zero-phase forward-backward filter is required to minimize phase distortions and avoid introducing group delay:
Yˇ (ejωT ) = H(e−jωT )H(ejωT )Xˇ(ejωT ) (3.7)
The signal is then remodulated using the negative of the phase law:
y˜(t) =y ˇ(t)ejφi(t) (3.8)
Each step is shown in Fig. 3.3 for the 2nd component, FM2. The process of rough approximation and zero-phase filtering is repeated for subsequent components (i.e. FM1 and FM3) once the isolated component,y ˜(t), is subtracted from the analytic signal,x ˜(t). After each harmonic component has been effectively isolated, this opens the door for a variety of different processing options.
62 x˜(t) xˇ(t) 0 0 100 100
−10 −10 50 50
0 −20 0 −20
−50 −50 Frequency (kHz) −30 (kHz) Frequency −30 −100 A −100 B −40 −40 0 1 2 3 0 1 2 3 Time (ms) Time (ms) yˇ(t) y˜(t) 0 0 100 100
−10 −10 50 50
0 −20 0 −20
−50 −50 Frequency (kHz) −30 (kHz) Frequency −30 −100 C −100 D −40 −40 0 1 2 3 0 1 2 3 Time (ms) Time (ms)
Figure 3.3. Overview of FM2 component separation using a least-squares cubic approximation of fi(t). Negative frequencies are shown to accommodate the frequency warping caused by demodula- tion. (a) The analytic signal,x ˜(t), with approximate fi(t) curve for FM2. (b) FM2 is now clearly separable by frequency after demodulation to 0 Hz (ˇx(t)). (c) A zero-phase lowpass filter is applied to remove other components (ˇy(t)). (d) FM2 is modulated back using the negative phase law, re- sulting iny ˜(t). Through the process of component separation, the resulting component is free from non-overlapping echoes, reverberation, and background noise.
3.3.2 Monocomponent Decomposition
3.3.2.1 Empirical Mode Decomposition
Empirical mode decomposition (EMD) is a useful technique for analyzing nonlinear FM signals due to its robustness in handling nonstationary, nonlinear data. The EMD separates a time-series signal into multiple decompositions known as intrinsic mode functions (IMFs). An IMF is defined only if (1) the number of extrema and the number of zero-crossings are equal or at most differ by one, and (2) the mean of the envelope of the maxima and the envelope of the minima is zero at all points.
63 This works due to the tacit relationship between zero-crossings and the frequency spectrum of a signal [33]. IMFs have properties conducive to signal processing, namely that they are linear and have well behaved Hilbert transforms. Additionally, the EMD forms a basis which is complete, approximately orthogonal, local, and adaptive. The orthogonal property of the IMFs ensures that the energy associated with the distribution is positive, a critical designation for a time-frequency representation.
−70 −60 −50 −70 −60 −50
100 A B 50 IMF 1 IMF 2 0 −40 −30 −20 −20 −10 0
100 C D 50 IMF 3 IMF 4 0 −50 −40 −30 −70 −60 −50
Frequency (kHz) 100 E F 50 IMF 6−13 IMF 5 0 1 2 3 1 2 3 Time (ms) Figure 3.4. Shown here are results of the empirical mode decomposition on the separated second harmonic, FM2, from E. fuscus (Fig. 3.3.1.3). Since the EMD works strictly in the time-domain, interpolation beyond the Nyquist rate is necessary to achieve good performance. FM2 was inter- polated by a factor of 8 before EMD to avoid aliasing artifacts. Spectrograms for IMF 1 through 5 (a-e) illustrate how energy is distributed amongst the IMFs. High frequency noise is contained largely in IMFs 1 and 2 (a and b). IMFs 3 and 4 (c and d) contain the strongest parts of the signal with a weaker part found in IMF 5 (e). Residual low frequency energy is found in IMFs 6 through 13 (combined in f). IMFs 4-6 may be summed and passed on to later processing stages. Since the decomposition forms a complete basis, summation across all IMFs will result in the original signal. The color scale depth is set to 30 dB on all plots.
The result of the EMD is similar to that of passing the signal of interest through
64 a filter bank [34]. The key differences are that filtering is not stationary nor restricted to separation in the time-frequency plane. In this regard, the IMF that results from the decomposition is composed of the same time-varying frequency modulation of the original signal with much of the non-coherent signals (noise) and riding waves (DC to very low-frequency) suppressed. Spectrograms of the IMFs generated from FM2 are shown in Fig. 3.4.
3.3.2.2 Hilbert Spectral Analysis
Computing instantaneous frequency and amplitude from the mono-component signals provides very useful information that cannot be easily found by other methods. In the
discrete-time implementation, ai(t) is a straightforward absolute value calculation of the complex analytic signal. Finding fi(t) involves numerical integration and therefore requires some approximation. Calculation of fi(t) for a filtered analytic component, y˜(t), can be accomplished directly in discrete-time by
f f [k] = s (˜y[k + 1]y ˜∗[k − 1]) (3.9) i 2π ∠
for k = 2, 3, 4 ...N − 1 where k is the discrete-time sample number, N is the total
number of sample points, and fs is the sampling rate [35]. This is immediately recognized as the central finite difference [36].
The resulting fi(t) and ai(t) functions (Fig. 3.5a-b) may optionally be smoothed to compensate for low signal-to-noise ratio using the least-squares Savitzky-Golay filter [37]. If applied, care should be taken to avoid over-smoothing by using a short filter length and a sufficient polynomial order. Each component is then combined to form a precise and high-resolution TFR (Fig. 3.5c).
65
100 A
50
Freq. (kHz) FM1 FM2 FM3 0 0 0.5 1 1.5 2 2.5 3 3.5 −20 B −60
Amp. (dB) FM1 FM2 FM3 −100 0 0.5 1 1.5 2 2.5 3 3.5 C −25 100 C −35 50 −45 −55 Freq. (kHz) 0 0 0.5 1 1.5 2 2.5 3 3.5 Time (ms)
Figure 3.5. Hilbert spectral analysis results showing ai(t) and fi(t) for each harmonic component of the E. fuscus call (a-b). Each component has its own fi(t) and ai(t) function. (a) and (b) are combined to form the time-frequency representation shown in (c). The instantaneous amplitude is plotted on a decibel scale in (b) and is shown with intensity in (c). Line thickness has been increased in all plots to improve visibility.
3.3.3 Waveform Synthesis and Ground Truth
An important aspect to these mono-component decomposition techniques is that all of the original signal information is retained. This implies that recorded biosonar signals can be decomposed, modified in some way, and finally synthesized into a noise-free replica of the recorded waveform for detailed acoustic simulations or computational models of auditory neural processing. This step is also useful to perform a ground truth by subtracting the synthesized signal from the original. When the initial phase
φ0 (see Sec. B) is properly adjusted, results show negligible error in the time-frequency plane with only the broadband noise and non-interfering echoes removed from the signal.
66 3.4 Results
3.4.1 Telemike Data Series
Echolocation signals from E. fuscus and three East Asian bat species were processed to show the method’s flexibility and ease of use. Data from various Telemike experi- ments were used in all four cases [4, 25, 26]. First, the biosonar calls were separated using a simple energy detector and then individually run through multi-component analysis. The spectrogram of the full time series are shown side-by-side with the analysis results for each bat in Figure 3.6a-d.
Spectrogram of Telemike Data Overlaid Analysis Results
0 100 100 −10
kHz 50 50 −20 E A 0 −30 0 0 1 2 3 0 0.1 0.2 0.3 0.4 0.5 0.6 90 90 0 −10 45 kHz 45 F −20 B 0 0 −30 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0 1 2 90 90 0 −10 Frequency 45 kHz 45 −20 C G 0 0 −30 0 0.02 0.04 0.06 0.08 0.1 0.12 0 1 2 90 90 0 45 −20 kHz 45 D H −40 0 0 0 0.1 0.2 0.3 0.4 0.5 0 10 20 30 Time (seconds) Time (ms)
Figure 3.6. Multi-component analysis was performed on call sequences from radiotelemetry record- ings of E. fuscus and three Asian bat species. The spectrogram for the entire time series are shown for E. fuscus (a), P. abramus (b), M. fuliginosus (c), and R. ferrumequinum (d). The analysis results for each call are aligned and overlaid in the time-frequency plane (e-h). The color scales are the sme across each row. Pairs of pulses, known as strobe groups, can be identified by short inter-pulse timing in the cases of E. fuscus, P. abramus, and R. ferrumequinum. Both P. abramus and M. fuliginosus emit mono-component non-linear FM waveforms. Although their calls are nearly identical in time-frequency structure (f and g), only P. abramus is known to emit strobe groups. R. ferrumequinum use relatively long constant frequency tones with short FM tails at the beginning and end of each call. The color depth was extended to -50 dB for R. ferrumequinum (d and h) to show the first harmonic, which is approximately 20 dB weaker than the second in this species. The E. fuscus data set was collected by Hiryu et al. [4] and the remaining data sets were collected by Riquimaroux et al. and Hiryu et al. [25, 26].
The Telemike data from E. fuscus (Fig. 3.6a) contains 13 echolocation signals emitted as it entered a densely cluttered array of chains. This data set is the same
67 as the example shown in Hiryu et al. [4]. Fig. 3.6b and 3.6c shows spectrograms of the Telemike data from the eastern bent-windged bat (Miniopterus fuliginosus) and the Japanese house bat (Pipistrellus abramus). Fig. 3.6d shows eight calls emitted by the greater horseshoe bat (Rhinolophus ferrumequinum). Figure 3.7 shows the results from E. fuscus in more detail. The pulse-to-pulse time intervals were used to identify strobe groups, which are closely spaced pairs of calls with short time intervals [1, 4]. The figure shows the strobe groups identified with brackets. It is worth noting that FM1 is stronger than FM2 by approximately 8 dB due to the off-axis microphone placement of the Telemike. In this data set, the first four pulses were emitted early in the clutter field where pulse-echo ambiguity was present. The last four pulses were emitted after pulse-echo ambiguity subsided. Hiryu et al. found that when pulse-echo ambiguity was strong, the bats shifted the tail-end frequency for each strobe group pair. This behavior was absent when pulse- echo ambiguity was not present. The results from our method confirm this occurred in the example data set, but it is significantly more pronounced than when looking at the spectrogram alone.
3.4.2 Synthesized Multi-Component FM Analysis
To demonstrate how the proposed technique can adapt to small time-frequency per- turbations, a multi-component linear FM waveform is generated with a small sinu- soidal modulation. The combined FM signal can be defined using
µ0 2 B φ(t) = f0t + t + sin 2πfmt (3.10) 2 4πfm
where f0 is the initial frequency, µ0 is the linear sweep rate, B is the amplitude of
sinusoidal modulation (in Hz) and fm is the modulation frequency. This phase law is used directly in Eq. (3.12) to construct the discrete-time noiseless components, which are then added together.
68 FM2 120 1 2 3 strobe groups 4 5 −10 100 80 −20 60 −30 rqec (kHz) Frequency A 40 2 ms FM1 80 0 1 2 3 4 5 60 −10 40 −20
rqec (kHz) Frequency 20 B 2 ms Artificially Compressed Interval Time
Figure 3.7. E. fuscus was previously found to use slight frequency shifts to avoid pulse-echo ambiguity. Multi-component analysis results are plotted separately for FM2 (a) and FM1 (b). The time duration of each pulse component matches the scale bar, but the inter-pulse interval time is artificially compressed. This was done to show the fine detail in each call, which cannot be easily seen in the overlaid plot (Fig. 3.6e). The results reveal a clear distinction between the lowest frequency in each harmonic component for strobe groups 1 and 2. As noted in Hiryu et al. this separation becomes insignificant when pulse-echo ambiguity is no longer a problem. This is shown circled in strobe groups 4 and 5.
Fig. 3.8a shows the desired fi(t) functions to synthesize a multicomponent si- nusoidal FM riding on an linear FM. The sinusoidal riding wave varies by ± 2.5 kHz, but neither the Wigner-Ville nor the Reassignment method (Fig. 3.8b-c) can resolve these variations. The proposed component separation and Hilbert spectral analysis faithfully reproduces the original fi(t) curves (Fig. 3.8d).
3.5 Discussion
Many decompositions, including Hilbert spectral analysis and EMD, do not perform well on multi-component signals. In fact, unless the multi-component signal is first decomposed into the corresponding mono-component signals, the concepts of fi(t) and
69
120 120 A B 100 100
80 FM2 80
60 60
40 FM1 40 Frequency (kHz) Frequency (kHz) 20 20
0 0 0 1 2 3 1 2 3 Time (ms) Time (ms)
120 120 C D 100 100
80 80
60 60
40 40 Frequency (kHz) Frequency (kHz) 20 20
0 0 1 2 3 0 1 2 3 Time (ms) Time (ms)
Figure 3.8. (a) The original fi(t) functions used to synthesize two linear plus sinusoidal FM compo- nents, (b) Wigner-Ville distribution, (c) Smoothed Pseudo-WVD, and (d) results after separation of components with the proposed method (d). Despite having better resolution, the WVD is perfectly localized for up to a second order phase law, such as a linear FM or a constant tone. This syn- thetic FM signal demonstrates that methods we consider “high fidelity” may not resolve small, but significant features in natural signals such as biosonar calls. For cases where the signal generation mechanism is unknown or not well understood, it is best not to assume any TFR is optimal.
ai(t) lose physical meaning [38, 39, 40, 41]. How does one define the instantaneous frequency of a signal that has overlapping functions of frequency at a single point in time? Therefore, these signals must first be separated into mono-components and analyzed individually. Using such a technique, signal parameter estimation is not restricted to the coarse resolution of a spectrogram or interference cross-terms that plague other high-resolution methods. We have presented a technique for isolating and processing individual compo- nents of the call from E. fuscus based on the fractional Fourier transform, time-varying demodulation, EMD, and Hilbert spectral analysis. The method can be applied to any frequency modulated multi-component signal provided a rough estimate of the
70 instantaneous phase is achievable and components are separable in the time-frequency plane. Algorithm parameters can be freely adjusted to allow for an automated algo- rithm with various types of signals. Ultimately, we arrive at a TFR that is highly localized in both time and frequency. The EMD has important insights to offer in the realm of biological sonar. It was
asserted [13] that the EMD technique is not generally efficient for estimating fi(t) of bat calls. We do not feel that is accurate. When recording an E. fuscus echolocation signal along the main response axis, the dominant signal energy typically transitions from the first to the second harmonic. This was offered as a reason to avoid the EMD, as the decomposition tracks the strongest energy in the signal. We have shown that a simple technique for isolating and separating the components can and does provide effective relief of this problem. Second, the EMD is not solely designed to break a multi-component signal into mono-components. The property of most importance is the similarity to the time-varying constant Q filter bank. In this way, EMD is more similar to the Minimum Variance Estimator (MVE) technique, which Kopsinis et al. endorse. This is due to the strong relationship between zero-crossings and spectral content [33]. Since its inception, the EMD has provided insights into a great many systems that are categorized by nonlinear and nonstationary signals. However, the problems with EMD have been well documented [13, 34, 42]. The lack of mathematical rigor and definition related to the EMD is often identified as a source for criticism. If the EMD is applied carefully and the results scrutinized, this concern can be effectively mitigated by applying known techniques to serve as a model for comparison. Recent advances have been made with empirical-based methods. The normal- ized Hilbert Transform, the normalized amplitude Hilbert Transform, and their rela- tionship to the signal quadrature help to mitigate some of the restrictions imposed by Bedrosian and Nutall [43, 44, 45]. In certain instances, the error between the approximated Hilbert transform and the quadrature can produce spectral artifacts in
71 the Hilbert spectral analysis. In other instances, the EMD can highlight the issue of undersampling. In conclusion, higher resolution time-frequency techniques are necessary to un- derstanding biosonar. This paper describes one possible solution to the problem of multi-component time-frequency analysis. Further developments in empirical decom- position techniques will enable new ways of evaluating non-linear processes.
3.6 Acknowledgments
This work was funded through internal investments by the Naval Undersea Warfare Center, Division Newport, RI and ONR grant N00014-09-1-0691. The authors wish to thank Hiroshi Riquimaroux and Shizuko Hiryu for providing time series data from recordings using the Telemike recording system, Ivars Kirsteins and Lee Estes for dis- cussions on the Fractional Fourier Transform, and Laura Kloepper and Andrea Sim- mons for editorial suggestions. Figures showing the WVD, smoothed pseudo-WVD, and reassignment method for comparison were produced using the Time-Frequency Toolbox for MATLAB [46].
A Multi-Component Frequency-Modulated Wave- forms
Many bat echolocation signals consist of components (usually harmonics) with a varying degree of amplitude and phase modulation. The multi-component version of the big brown bat’s echolocation call is a summation of each individual FM waveform, or
N X s(t) = x˜n(t) (3.11) n=1
72 for N independent harmonically related components,x ˜n(t). Given this assumption, each component therefore has a time-dependent amplitude and frequency, or more precisely is an instantaneous function of time. A signal component can be defined in its analytic form as
jφi(t)+jφ0 x˜(t) = ai(t)e (3.12)
where ai(t) is the instantaneous amplitude, φi(t) is the instantaneous phase modula- tion (or phase law), and φ0 is the initial phase of the complex exponential. The phase law is related to the instantaneous frequency, fi(t), by
Z T φi(t) = 2π fi(t)dt (3.13) 0 In this manner, we assume that the bat’s multi-component FM waveforms can be completely described by defining ai(t), fi(t), and φ0 for each harmonic component.
B Hilbert Spectral Analysis of Modulated Wave- forms
We present the formulation below in continuous time for the purpose of familiarity. Assume for computational purposes that the signal of interest, x[n], is obtained by sufficiently sampling a band-limited signal x(t) such that x[n] = x(nT ), where T = 1 fs is the sampling interval chosen to avoid aliasing. If a real mono-component signal, x(t), fits the criteria for a modulated wave- form, then we can extract the parameters of interest directly from estimates of fi(t) and ai(t). This requires first converting the original mono-component signal into its complex analytic form using the Hilbert Transform, H, and is achieved with
x˜(t) = x(t) +x ˆ(t) (3.14)
73 where x(t) is the purely real signal under consideration andx ˆ(t) is the purely imagi- nary H{x(t)}. This is calculated as follows
Z ∞ xˆ(t) = x(τ)h(t − τ)dτ (3.15) −∞
1 with h(t) = πt . The integral can be solved by using Cauchy’s principal value theorem; however, it should be noted that many simple approximations exist for a discrete-time implementation. The resulting analytic signal will consist only of positive spectral components in the frequency domain. This signal representation is convenient since it provides the information to fully describe a mono-component modulated signal. Once in this form, finding an estimate of the instantaneous amplitude, phase, and frequency is given by
p 2 2 ai(t) =|x˜(t)| = Re{x˜} + Im{x˜} (3.16)