Auditory-Based Time-Frequency Representations and Feature Extraction Techniques for Sonar Processing

Auditory-Based Time-Frequency Representations and Feature Extraction Techniques for Sonar Processing CS-05-12 October 2005 Robert Mill and Guy Brown Speech and Hearing Research Group Department of Computer Science University of Sheffield Abstract Passive sonar classification involves identifying underwater sources by the sound they make. A human sonar operator performs the task of classification both by listening to the sound on headphones and looking for features in a series of `rolling' spectrograms. The construction of long sonar arrays con- sisting of many receivers allows the coverage of several square kilometres in many narrow, directional beams. Narrowband analysis of the signal within one beam demands considerable concentration on the part of the sonar operator and only a handful of the hundred beams can be monitored effectively at a single time. As a consequence, there is an increased requirement for the automatic classification of sounds arriving at the array. Extracting tonal features from the signal—a key stage of the classification process—must be achieved against a broadband noise background contributed by the ocean and vessel engines. This report discusses potential solutions to the problem of tonal detection in noise, with particular reference to models of the human ear, which have been shown to provide a robust encoding of frequency components (e.g. speech formants) in the presence of additive noise. The classification of sonar signals is complicated further by the presence of multiple sources within individual beams. As these signals exhibit considerable overlap in the frequency and time domain, some mechanism is required to assign features in the time-frequency plane to distinct sources. Recent research into computational auditory scene analysis has led to the development of models that simulate human hearing and emphasise the role of the ears and brain in the separation of sounds into streams. The report reviews these models and investigates their possible application to the problem of concurrent sound separation for sonar processors. 3 Contents 1 Introduction 7 1.1 Composition of Sonar Signals . 8 1.1.1 Vessel Acoustic Signatures . 8 1.1.2 Sonar Analysis . 9 1.2 Anatomy and Function of the Human Ear . 9 1.2.1 The Outer Ear . 10 1.2.2 The Middle Ear . 10 1.2.3 The Cochlea and Basilar Membrane . 10 1.2.4 Hair Cell Transduction . 10 1.2.5 The Auditory Nerve . 11 1.3 Perceiving Sound . 12 1.3.1 Masking and the Power Spectrum Model . 12 1.3.2 Pitch . 12 1.3.3 Modulation . 13 1.4 Auditory Scene Analysis . 14 1.5 Chapter Summary . 16 2 Auditory Modelling 17 2.1 Modelling the Auditory Periphery . 17 2.1.1 The Outer and Middle Ear Filter . 17 2.1.2 Basilar Membrane Motion . 17 2.1.3 Hair Cell Transduction . 19 2.2 Computational Auditory Scene Analysis . 20 2.3 Auditory Modelling in Sonar . 23 2.4 Summary . 26 3 Time-Frequency Representations and the EIH 29 3.1 Signal Processing Solutions . 29 3.1.1 Short-time Fourier Transform . 29 3.1.2 Wigner Distribution . 30 3.1.3 Wavelet Transform . 30 3.2 Ensemble Interval Histogram . 31 3.2.1 Model . 31 3.2.2 Properties . 32 3.2.3 Analysis of Vowels . 34 3.2.4 Analysis of Sonar . 36 3.2.5 Using Entropy and Variance . 38 3.3 Summary and Discussion . 41 5 CONTENTS 4 Feature Extraction 43 4.1 Lateral Inhibition . 43 4.1.1 Shamma's Lateral Inhibition Model . 44 4.1.2 Modelling Lateral Inhibition in MATLAB . 46 4.1.3 Discussion . 49 4.2 Peak Detection and Tracking . 50 4.2.1 Time-frequency Filtering . 50 4.2.2 Peak Detection . 50 4.2.3 Peak Tracking . 54 4.3 Modulation Spectrum . 55 4.3.1 Computing the Modulation Spectrum . 56 4.3.2 Suitability for Sonar . 57 4.4 Phase Modulation . 57 4.4.1 Phase-tracking using the STFT . 58 4.4.2 Measuring Fluctuations . 59 4.4.3 The Effect of Noise . 61 4.4.4 Non-linear Filtering . 62 5 Conclusions and Future Work 65 5.1 Future Work . 65 6 Chapter 1 Introduction The undersea acoustic environment comprises a rich mixture of sounds, both man-made and natural in origin. Examples of these include vessel engines, sonar pings, shoreside industry, snapping shrimp, whale vocalisations and rain. The energy in electromagnetic waves (including visible light) is absorbed rapidly by sea water, so sound waves, which can propagate over many kilometres, remain the principal carrier of information about the environment. In its simplest incarnation, sonar classification is the procedure of listening to and identifying these underwater sounds, and is an essential military tool for de- termining whether a seaborne target is hostile or friendly, natural or unnatural. Modern sonar analysis is performed by a human expert who listens to the sound in a single directional beam and makes a judgement as to what can be heard. In conjunction with an aural analysis, spectrograms of the sound within each beam are presented on visual displays. The manufacture of longer sonar arrays has led to a commensurate increase in the number of beams to which an operator must attend. In order to reduce this load, there have been numerous attempts to perform the classification of sonar signals using a machine. How- ever, such attempts have been frustrated by the presence of interfering sources within a beam—a second vessel, or biological sounds, for example. The difficulty in isolating individual sounds from a mixture has been en- countered in other technology areas, a notable example being automatic speech recognition (ASR) systems, whose performance degrades in the presence of multiple talkers or interference from the environment. Human beings, on the other hand, are able to decipher and attend to individual sources within a mixture of sounds as a matter of course, e.g. the voice of speaker in a crowd. In recent years, computational models of hearing have emerged, which aim to explain and emulate this listening process. Improved ASR, intelligent hearing aids and automatic music transcription have all been cited as techologies that could benefit from such an auditory approach. This report presents automatic sonar classification as a listening activity and considers how the recent advances in computational hearing may assist a human sonar operator in managing the increasing quantity of data from the array. Following a literature survey, methods of signal extraction from noisy data using models of the ear are examined. Later sections discuss the possibility of source separation and tonal grouping by exploiting correlated changes in signal properties, such as amplitude and phase. 7 Chapter 1. Introduction 1.1 Composition of Sonar Signals Sonar (sound navigation and ranging) systems detect and locate underwater objects by measurement of reflected or radiated sound waves and may be cat- egorised as either active or passive systems. [30] Active sonar systems transmit a brief pulse or `ping' and await the return of an echo, for example, against the hull of a vessel; the delay and direction of the echo reveal the distance and bear- ing of the target, respectively. Active sonar is considered unsuitable for many military applications as the transmission of a ping can easily reveal the location of the sonar platform to hostile targets. In addition, the two-way propagation loss incurred by echo-ranging restricts the radius over which active systems can operate effectively. Passive sonar systems use an array of hydrophones to receive sound radiated by the target itself, for example, the noise from the en- gine and propeller of a vessel. Analysis of the received signal allows a target to be classified according to features of its time-varying spectrum, an advantage not afforded by an active system. Work conducted in this project is based on the passive sonar model; active sonar is not considered further. 1.1.1 Vessel Acoustic Signatures Burdic [5] defines the acoustic signature of a vessel as follows: The target acoustic signature is characterized by the radiated acoustic spectrum level at a reference distance of 1m from the effective acoustic center of the target. For practical purposes, the content of the idealised spectrum at one metre is not available and must be inferred from measurements made at the hydrophone array using a spherical spreading law. The acoustic path between the source and receiver can appreciably modify the spectrum even at a short distance (less than two-hundred metres). Vessel acoustic signatures consist of a series of discrete lines or tonals, which may or may not be harmonically related, immersed in a continuous, broadband noise spectrum. The tonal components appear in the range 0–2kHz and arise chiefly as a consequence of the periodic motion of the machinery and propellers along with any hull resonances that these actuate. The relative in- tensities and frequencies of the tonals, which provide salient features for target classification, are catalogued by the military and are often highly classified. The broadband component can be ascribed to hydrodynamic noise and cavitation (tiny bubbles which form at the propeller) and obscures the discrete lines with increasing frequency such that a crossover point can be iden- tified above which the tonal components can no longer be discerned. [30] The crossover point for a merchant ship lies between 100Hz and 500Hz. As the ship's speed increases, the contribution from the broadband sources become dominant and the crossover point is lower. In addition to the stationary spectrum, transient events contribute to the received signal. These may arise from the target (e.g. a wrench being dropped, chains clanking), or other interfering sources, such as objects colliding with a hydrophone or biological sounds (e.g.

Auditory-Based Time-Frequency Representations and Feature Extraction Techniques for Sonar Processing

Enhance Your DSP Course with These Interesting Projects

4 – Synthesis and Analysis of Complex Waves; Fourier Spectra

Lecture 1-10: Spectrograms

Analysis of EEG Signal Processing Techniques Based on Spectrograms

Improved Spectrograms Using the Discrete Fractional Fourier Transform

Attention, I'm Trying to Speak Cs224n Project: Speech Synthesis

Time-Frequency Analysis of Time-Varying Signals and Non-Stationary Processes

On the Use of Time–Frequency Reassignment in Additive Sound Modeling*

ECE438 - Digital Signal Processing with Applications 1

Chapter 3 Sound Windows: Visibility, Views, Linkage, & Navigation

Multiscale Analysis

Enhancing Digital Signal Processing Education with Audio Signal Processing and Music Synthesis