Monitoring and Mining Insect Sounds in Visual Space Yuan Hao Bilson Campana Eamonn Keogh University of California, Riverside {Yhao, Bcampana, Eamonn}@Cs.Ucr.Edu

Monitoring and Mining Insect Sounds in Visual Space Yuan Hao Bilson Campana Eamonn Keogh University of California, Riverside {yhao, bcampana, eamonn}@cs.ucr.edu ABSTRACT In addition to field work, researchers working in Monitoring animals by the sounds they produce is an laboratory settings frequently create control and important and challenging task, whether the application treatment groups of animals, expose them to different is outdoors in a natural habitat, or in the controlled interventions, and test for different outcomes. One environment of a laboratory setting. possible manifestation of different outcomes may be In the former case the density and diversity of animal changes in the bioacoustics of the animals. To obtain sounds can act as a measure of biodiversity. In the statistically significant results researchers may have to latter case, researchers often create control and monitor and hand-annotate the sounds of hundreds of treatment groups of animals, expose them to different animals for days or weeks, a formidable task that is interventions, and test for different outcomes. One typically outsourced to students [23]. possible manifestation of different outcomes may be There are also several important commercial changes in the bioacoustics of the animals. applications of acoustic animal detection. For example, With such a plethora of important applications, there the US imports tens of billions of dollars worth of have been significant efforts to build bioacoustic timber each year. It has been estimated that the classification tools. However, we argue that most inadvertent introduction of the Asian Longhorn Beetle current tools are severely limited. They often require (Anoplophora glabripennis) with a shipment of lumber the careful tuning of many parameters (and thus huge could cost the US lumber industry tens of billions of amounts of training data), they are too computationally dollars [22]. It has been noted that different beetle expensive for deployment in resource-limited sensors, species have subtlety distinctive chewing sounds, and they are specialized for a very small group of species, ultra sensitive sensors that can detect these sounds are or they are simply not accurate enough to be useful. being produced [17]. As a very recent survey of In this work we introduce a novel bioacoustic acoustic insect detection noted, “The need for recognition/classification framework that mitigates or nondestructive, rapid, and inexpensive means of solves all of the above problems. We propose to detecting hidden insect infestations is not likely to classify animal sounds in the visual space, by treating diminish in the near future” [22]. the texture of their spectrograms as an acoustic With such a plethora of important applications, there fingerprint using a recently introduced parameter-free have been significant efforts to build bioacoustic texture measure as a distance measure. We further classification tools [4]. However, we argue that current show that by searching for the most representative tools are severely limited. They often require the acoustic fingerprint we can significantly outperform careful tuning of many parameters (as many as eighteen other techniques in terms of speed and accuracy. [8]) and thus huge amounts of training data, they are too computationally expensive for use with resource- Keywords limited sensors that will be deployed in the field [7], Classification, Spectrogram, Texture, Bioacoustics they are specialized for a very small group of species, or they are simply not accurate enough to be useful. 1. INTRODUCTION In this work we introduce a novel bioacoustic Monitoring animals by the sounds they produce is an recognition/classification framework that mitigates or important and challenging task, whether the application solves all of the above problems. We propose to is outdoors in a natural habitat [4], or in the controlled classify animal sounds in the visual space, by treating environment of a laboratory setting. the texture of their spectrograms as an acoustic In the former case the density and variety of animal “fingerprint” and using a recently introduced sounds can act as a measure of biodiversity and of the parameter-free texture measure as a distance measure. health of the environment. Algorithms are needed here We further show that by searching for the smallest not only because they are (in the long term) cheaper representative acoustic fingerprint (inspired by the than human observers, but also because in at least some shapelet concept in time series domain [28]) in the cases algorithms can be more accurate than even the training set, we can significantly outperform other most skilled and motivated observers [21]. techniques in terms of both speed and accuracy. Note that monitoring of animal sounds in the wild 2.2 General Animal Sound Classification opens up a host of interesting problems in sensor The literature on the classification of animal sounds is placement, wireless networks, resource-limited vast; we refer the interested reader to [20][1] for useful computation [7], etc. For simplicity, we gloss over such surveys. At the highest level, most research efforts considerations, referring the interested reader to [4] advocate extracting sets of features from the data, and and the references therein. In this work we assume all using these features as inputs to standard classification such problems have been addressed, and only the algorithms such as a decision tree, a Bayesian classifier recognition/classification steps remain to be solved. or a neural network. As a concrete representative 2. RELATED WORK / BACKGROUND example, consider [24], which introduces a system to recognize Orthoptera (the order of insects that includes 2.1 A Brief Review of Spectrograms grasshoppers, crickets, katydids2 and locusts). This As hinted at above, we intend to do method requires that we extract multiple features from recognition/classification in the visual space, by the signal, including distance-between-consecutive- examining the spectrogram of the animal sounds. As pulses, pulse-length, frequency-contour-of-pulses, shown in Figure 1, a spectrogram is a time-varying energy-contour-of-pulses, time-encoded-signal-of- spectral representation that shows how the spectral pulses, etc. However, robustly extracting these features density of a signal varies with time. from noisy field recordings is non-trivial, and while these features seem to be defined for many Orthoptera, it is not clear that they generalize to other insects, much 0 less to other animals. Moreover, a significant number of parameters need to be set, both for the feature Frequency(kHz) 3 extraction algorithms, and the classification algorithm. Common Virtuoso Katydid For more complex animal sounds (essentially all non- Forty seconds (Amblycorypha longinicta) insect animals), once again features are extracted from Figure 1: A spectrogram of the call of an insect. Note the raw data; however, because the temporal transitions the highly repetitious nature of the call. In this case, between features are themselves a kind of meta-feature, capturing just two “busts” may be sufficient to recognize the insect techniques such as Hidden Markov Models are typically used to model these transitions [20][5][1]. There is a huge amount of literature leveraging off This basic idea has been applied with varying degrees manual inspection of such spectrograms; see [12] and of success to birds [14], frogs and mammals [5]. the references therein for some examples. However, as One major limitation of Hidden Markov Model-based we shall see, algorithmic analysis of spectrograms systems is that they require careful tuning of their many remains an open problem, and an area of active parameters. This in turn requires a huge amount of research. Beyond the problems that plague attempts to labeled training data, which may be difficult to obtain define a distance measure in any domain, including in many circumstances for some species. invariance to offset, scaling, uniform scaling, non- Many other approaches have been attempted in the last uniform warping, etc., spectrograms almost always decade. For example, in a series of papers, Dietrich et have significant noise artifacts, even when obtained in al. introduce several classification methods for insect tightly controlled conditions in a laboratory setting. sounds, some of which require up to eighteen One avenue of research is to “clean” the spectrograms parameters, and which were trained on a dataset using various techniques [2], and then apply shape containing just 108 exemplars [8]. similarity measures to the cleaned shape primitives. It is important to note that our results Some types of specialized cleaning may be possible; are completely automatic. Numerous papers report for example, removing the 60Hz noise is commonly high accuracies for the classification of animal sounds, encountered1. However, algorithms to robustly clean but upon careful reading it appears (or it is explicitly general spectrograms seem likely to elude us for the admitted) that human effort was required to extract the foreseeable future. right data to give to the classifier. Many authors do not As we shall see in Section 3, our solution to this seem to fully appreciate that “extracting the right data” problem is to avoid any type of data cleaning or is at least as difficult as the classification step. explicit feature extraction, and use the raw spectrogram For example, a recent paper on the acoustic directly. classification of Australian

Load more