Acoustic Classification of Australian Anurans Using Syllable Features
Total Page:16
File Type:pdf, Size:1020Kb
This may be the author’s version of a work that was submitted/accepted for publication in the following source: Xie, Jie, Towsey, Michael, Truskinger, Anthony, Eichinski, Phil, Zhang, Jinglan,& Roe, Paul (2015) Acoustic classification of Australian anurans using syllable features. In Tan, H P & Palaniswami, M S (Eds.) Proceedings of the 2015 IEEE Tenth International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP 2015). Institute of Electrical and Electronics Engineers Inc., United States of America, pp. 1-6. This file was downloaded from: https://eprints.qut.edu.au/89673/ c Consult author(s) regarding copyright matters This work is covered by copyright. Unless the document is being made available under a Creative Commons Licence, you must assume that re-use is limited to personal use and that permission from the copyright owner must be obtained for all other uses. If the docu- ment is available under a Creative Commons License (or other specified license) then refer to the Licence for details of permitted re-use. It is a condition of access that users recog- nise and abide by the legal requirements associated with these rights. If you believe that this work infringes copyright please provide details by email to [email protected] Notice: Please note that this document may not be the Version of Record (i.e. published version) of the work. Author manuscript versions (as Sub- mitted for peer review or as Accepted for publication after peer review) can be identified by an absence of publisher branding and/or typeset appear- ance. If there is any doubt, please refer to the published source. https://doi.org/10.1109/ISSNIP.2015.7106924 Acoustic classification of Australian anurans using syllable features Authors ABSTRACT—Acoustic classification of anurans (frogs) has spectral peak tracks. A set of spectral features were derived received increasing attention for its promising application in by time-varying analysis of the recorded bird vocalizations biological and environment studies. In this study, a novel for classification. Tyagi et al. [8] proposed the spectral feature extraction method for frog call classification is ensemble average voice to do bird recognition. Then, presented based on the analysis of spectrograms. The frog dynamic time warping was combined to improve the calls are first automatically segmented into syllables. Then, spectral peak tracks are extracted to separate desired signal recognition accuracy. Lee et al. [9] introduced a recognition (frog calls) from background noise. The spectral peak tracks method based on the analysis of spectrogram to detect each are used to extract various syllable features, including: syllable. Mel-frequency cepstral coefficients features syllable duration, dominant frequency, oscillation rate, (MFCCs) of each frame were defined as features, and linear frequency modulation, and energy modulation. Finally, a k- discriminant analysis was used for classifying 30 kinds of nearest neighbor classifier is used for classifying frog calls frog calls and 19 kinds of cricket calls. based on the results of principal component analysis. The experiment results show that syllable features can achieve an Most prior work often reports high accuracy rates for average classification accuracy of 90.5% which outperforms recognition and classification. However, most features used Mel-frequency cepstral coefficients features (79.0%). in the prior work are based on only either on only frequency domain or time domain information. However, a Keywords—audio classification; syllable feature; principal combination of the two will be able to discriminate between component analysis; k nearest neighbour; spectral peak track a wider variety of species that may share similar I. INTRODUCTION characteristics in either time or frequency information but not both. This research presents a novel feature extraction Acoustic sensor networks are a well-established and method for frog call classification which includes both time widely deployed method of collecting acoustic data for and frequency domain information. monitoring animals [1]. The traditional field survey methods that require ecologists to physically visit sites for collecting After segmenting input frog calls into syllables, the bio-diversity data are both time-consuming and costly. spectral peak track (SPT) algorithm is applied for locating Comparatively, sensors can record acoustic data the frog call frequency boundary. Then, the syllable features automatically, objectively, and continuously for long are extracted from the SPT results. Principal component durations. However, analyzing the large amount of collected analysis (PCA) is applied to decorrelate the syllable features data manually is very time-consuming. Developing semi- and to reduce the dimensionality. Finally, a k-NN classifier automatic or automatic methods for classifying collected is used to classify the frog calls. The proposed syllable acoustic data by sensors is thus in high demand and has features achieve higher classification accuracy (90.5%) than attracted a lot of research [2-7]. MFCCs (79.0%). Prior call classification research typically adopts the The rest of this paper is organized as follows: In section following structure : (1) pre-processing, (2) segmentation, II, we describe the method for frog call classification, which (3) feature extraction, (4) classification [2]. Taylor et al. includes data set acquisition, syllable segmentation, feature proposed a system for identifying 22 frog species recorded extraction, PCA and classification. Section III reports in northern Australia based on peak values (intensity of experiment results. Section IV presents conclusion and spectrogram) [3]. Huang et al. [4] extracted the spectral future work. centroid, signal bandwidth and threshold crossing rate and used these features with k nearest neighbor (k-NN) and II. METHOD support vector machine (SVM) classifiers to classify frog Our frog call classification method consists of five steps: calls. Dayou et al. [5] developed a method based on entropy data set acquisition, syllable segmentation, feature to recognize frog calls. Shannon entropy, Renyi entropy and extraction, PCA and classification (Fig.1). Detailed Tsallis entropy were trialed as inputs to a k-NN classifier for information of each step is shown in following sections. recognition. A multi-stage average spectrum was proposed A. Data set acquisition by Chen et al [6]. Syllable length was first used for the pre- classification. Then the multi-stage average spectrum was In this study, 16 frog species which are widespread in extracted for the classification. Chen et al. [7] described the Queensland, Australia are selected for experiments (Table semi-automatic bird call classification method based on I). All the recordings are obtained from David Stewart [10], and has a sample rate of 44.1 kHz. All recordings were all Syllable segmentation Feature extraction PCA Fig.2. Segmantaion result marked with red line Classification Frog species C. Feature extraction Five features are extracted from each syllable for frog call classification. They are syllable duration, dominant Fig. 1. Flowchart of frog call classification system frequency, oscillation rate, frequency modulation, and mixed-down to mono. 50% dataset was used as training energy modulation. MFCCs are used as baseline for data, and the rest for testing. comparison. B. Syllable segmentation x Extraction of syllable features One syllable is a continuous anuran vocalization emitted from an individual, which is one elementary acoustic unit Syllable features are extracted from spectral peak tracks for classification. In this study, audio data is automatically (SPTs), which in turn, isolate the desired signal within the segmented into a set of syllables using the method proposed syllable. The SPT method has been used for bird calls in by Haሷrmaሷ [11] which is described as follows: previous research [7]. Here, it is adapted for analyzing frog calls. The SPT method works by matching peaks in the Step 1: Compute the spectrogram (Fig.4) of audio data spectrogram from one time frame to the next to produce a using a short-time Fourier transform (STFT) (Hamming window, ). We size = 512 samples, overlap = 25% TABLE I. SUMMARY OF THE FROG SCIENTIFC NAME ,COMMON NAME AND denote the spectrogram as a matrix S(f, t) ,where ݂ CORRESPONDING CODE .represents the frequency index and ݐ is the time index No. Scientific name Total Common name Code Step 2: Smooth the original spectrogram using Gaussian syllable filter (5×5) to remove small gaps within syllables. This step 1 Assa darlingtoni 36 Pouched frog ADI is the only deviation from the original technique by Haሷrmaሷ. 2 Crinia parinsignifera 40 Eastern Sign-bearing CPA Frog Step 3: Find f and t that |S(f,t)| |S(f, t)| for every ௧ pair of (f, t), and set the position of the ݊ syllable to be t. 3 Litoria caerulea 72 White’s tree frog LCA Step 4: Compute the amplitude of the first frame A(0) = 4 Litoria chloris 26 Red-eyed tree frog LCS 20݈݃ଵ|(f,t)| decibel (dB). If A(0) <A(0) െߚ, stop 5 Litoria latopalmata 169 Broad-palmed frog LLA the segmentation process, where ߚ is the stopping criteria and its default value is 18 dB. If stopped, it means that the 6 Litoria nasuta 60 Striped rocket frog LNA amplitude of the n௧ syllable is too small and hence no more 7 Litoria revelata 151 Whirring Tree Frog LEA syllables need to be extracted. 8 Litoria rubella 37 Desert tree frog LRA Step 5: Start from t, trace the maximal peak of |S(f, t)| for 9 Litoria verreauxii 28 Verreauxii’s tree frog LVV = (t<t until A(ݐ െݐ) =A(0) െߚ,where A(ݐ െݐ 10 Litoria tyleri 117 Tyler's tree frog LTI |(20logଵ|ܵ(݂, ݐ)|. Next, trace the maximal peak of |S(f, t for t>t until A(ݐെݐ) =A(0) െߚ ,where A(ݐെ 11 Limnodynastes 14 spotted grass frog LTS tasmaniensis ݐ) = 20logଵ|ܵ(݂, ݐ)| . Hence, the starting and stopping ௧ time of the n syllable are determined as ݐ െݐ௦ and 12 Limnodynastes 44 Northern banjo frog LTE ݐ + ݐ.