Acoustic Detection of Elephant Presence in Noisy Environments
Total Page:16
File Type:pdf, Size:1020Kb
Acoustic Detection of Elephant Presence in Noisy Environments Matthias Zeppelzauer Angela S. Stöger Christian Breiteneder Vienna University of University of Vienna Vienna University of Technology Department of Cognitive Technology Interactive Media Systems Biology Interactive Media Systems Group Vienna, Austria Group Vienna, Austria angela.stoeger- Vienna, Austria [email protected] [email protected] [email protected] ABSTRACT every year [10]. Different efforts have been undertaken to The automated acoustic detection of elephants is an impor- alleviate this conflict, such as the establishment of electric tant factor in alleviating the human-elephant conflict in Asia fences, which is, however, not practicable to cover larger ar- and Africa. In this paper, we present a method for the au- eas. Early warning systems are required that monitor travel tomated detection of elephant presence and evaluate it on a routes of elephants and alert humans to avoid involuntary large dataset of wildlife recordings. We introduce a novel confrontations. technique for signal enhancement to improve the robust- Elephants communicate with each other by low-frequency ness of the detector in noisy situations. Experiments show sounds, which travel distances of several kilometers. The that the proposed detector outperforms existing methods most common elephant call is the rumble, which extends and that signal enhancement strongly improves the robust- into the infrasound band. The rumble is a harmonic sound ness to noise sources from the environment. The proposed with a fundamental frequency in the range of 15-35Hz and method is a first step towards an automated detection sys- a duration between 0.5 and 5s [12]. Figure 1 shows a typical tem for elephant presence. rumble with a high signal-to-noise ratio (SNR). The acoustic detection of elephants by their calls is currently the most promising approach towards an early warning system that is Categories and Subject Descriptors able to detect the presence of elephants over large distances. [H. Information Systems]: H3. Information storage and 250 retrieval—H3.3 Information Search and Retrieval 200 General Terms Algorithms, Experimentation 150 Hz 100 Keywords Audio retrieval, sound detection, feature extraction, sound 50 enhancement 0 1.5 3.0 4.5 6.0 7.5 9.0 10.5 12 s 1. INTRODUCTION The human-elephant conflict is a serious conservation prob- Figure 1: A typical elephant rumble. lem in Africa and Asia. Due to the rising number of ele- phants and the increasing human population, the habitat Attempts towards acoustic detection (and localization) of of elephants becomes increasingly narrow. Due to the lack elephants exist in literature [11, 4]. However, the large va- of habitat, elephants enter new territory, which often coin- riety of noise sources present in the wild impede automated cides with agricultural areas or human villages. The conse- analysis methods. As a result, no system exists so far that quence is the involuntary confrontation of people and ele- is ready to operate in the field. So far research on acous- phants, which claims the lives of many animals and humans tic analysis of elephant calls has addressed highly selective tasks, such as the identification of elephants by their calls [3] and the analysis of particular call types, e.g. rumble types Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed [15]. The automated detection of elephant calls, which is for profit or commercial advantage and that copies bear this notice and the full cita- the basis for the above mentioned tasks, has rarely been tion on the first page. Copyrights for components of this work owned by others than investigated. ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- There are two major challenges in the detection of ele- publish, to post on servers or to redistribute to lists, requires prior specific permission phants in wildlife recordings. The first challenge is the large and/or a fee. Request permissions from [email protected]. variety of uncontrollable noise sources. Noise originates for MAED 2013, Barcelona Spain Copyright 2013 ACM 978-1-4503-2401-4/13/10 ...$15.00. example, from wind, rain, cars, and airplanes, which par- http://dx.doi.org/10.1145/2509896.2509900. ticularly pollute the low-frequency channel where elephant 250 250 calls reside. Additionally, human speech and sounds from other animals disturb the automated detection. The second 200 200 challenge is the sparsity and irregularity of elephant calls, 150 150 which makes it difficult to predict the occurrence of a call. Hz Hz The contribution of this paper is a robust method for the 100 100 detection of elephant presence. For this purpose, we employ an audio representation that models psychoacoustic proper- 50 50 ties of the elephant’s hearing system. We introduce a novel 0 1.5 3.0 4.5 6.0 7.5 9.0 10.5 12 0 1.5 3.0 4.5 6.0 7.5 9.0 10.5 12 method for the enhancement of signal quality to improve the s s noise robustness of the representation. The detector is eval- (a) harmonic noise (b) broadband noise uated on a large dataset of wildlife recordings to simulate a 250 250 real-life scenario. 200 200 The paper is structured as follows. In Section 2 we re- view related approaches on elephant detection. Section 3 150 150 describes the acoustic detector and the proposed method for Hz Hz 100 100 sound enhancement. The experimental setup and the eval- uation of the proposed approach are presented in Section 4. 50 50 Finally, we conclude our work in Section 5. 0 1.5 3.0 4.5 6.0 7.5 9.0 10.5 12 0 1.5 3.0 4.5 6.0 7.5 9.0 10.5 12 s s 2. RELATED WORK (c) short rumbles (d) missing harmonics Sound detection has a long history [9]. There are two gen- eral approaches to sound detection: template-based meth- Figure 2: Rumbles with different interfering noise ods and feature-based methods. Template-based methods sources. successively match a given sound example (template) to a (longer) sound recording, in order to find occurrences of the template in the recording. A straight-forward approach is pitch analysis. The authors report good performance as the matched filter method where two spectrograms are di- long as the harmonic structure of the rumbles is not buried rectly matched to each other. The method is optimal to find in background noise and at least three harmonics can be occurrences of the template itself in the recording, but sub- clearly distinguished. In practice, we observe that the har- optimal if similar signals to the template should be found or monic structure of rumbles is often covered by noise, which complex noise sources are present [13]. [9] propose the spec- is introduced by wind and other low-frequency disturbers trogram correlation technique, which employs more abstract like cars and airplanes. [13] report that engine noises lead templates to make the matched filter approach more robust. to false positive detections if they have stronger harmonics The templates represent the coarse spectro-temporal energy than the rumbles. Figure 2(a) shows a rumble in the pres- distribution of the searched-for sound and improve the toler- ence of narrow-band noise introduced by a car engine. The ance of the matching process. The spectrogram correlation engine sound has a harmonic at 70Hz, which is particularly technique has been applied to elephant call detection in [13]. misleading for detectors that rely on harmonic structure. However, results are reported to be suboptimal. One rea- There are, however, additional factors that corrupt the har- son is that elephant calls vary significantly in duration and monic structure. Figure 2(b) shows a rumble superimposed spectrogram correlation is not able to model variances in by broadband noise where the harmonic structure is hardly duration. Figure 2 shows the large variation in duration visible. The harmonic structure for short rumbles (see Fig- of rumbles (from 0.5s to 2.5s). A more promising template- ures 2(a) and 2(c) is less salient than for rumbles with a based method for call detection has been recently introduced longer duration (see Figure 1). Additionally, the number of in [7]. The authors perform semi-supervised learning to se- harmonics decreases with the distance of the caller to the lect the sound snippet (template) that best discriminates be- microphone. Figure 2(d) shows two distant rumbles where tween the positive and negative sound samples in a provided the higher harmonics are completely missing, which impedes training set. For sound detection, the spectrograms of the pitch detection as reported by [13]. template and of the recording are compared to each other Based on these observations, [14] proposes formant anal- using a distance measure that builds upon MPEG compres- ysis for the detection of elephant rumbles. The formants sion [2] and which allows for a certain tolerance in time are derived from the peaks of the transfer function of the and frequency. The approach has not been applied to ele- all pole filter obtained by linear predictive coding (LPC). phant calls so far. We evaluate the approach on elephant The basic assumption of the approach is that the first and call detection and compare it with the proposed approach the second formant are stationary during a rumble. This as- in Section 4. sumption does not hold in general as illustrated in Figure 3. The second class of approaches for sound detection are Figure 3(a) shows the formant tracks of a rumble, which feature-based techniques.