Monitoring and Mining Sounds in Visual Space

Yuan Hao

Dept. of Computer Science & Engineering University of California, Riverside Task

• Monitoring by examining the sounds they produce • Build animal sound recognition/classification framework

0

Frequency(kHz) 3 Common Virtuoso Katydid Forty seconds (Amblycorypha longinicta)

2 Outline

• Motivation • Our approach • Experimental evaluation • Conclusion & future work

3 Motivation-application

Monitoring animals: Outdoors • The density and variety of animal sounds can act as a measure of biodiversity Laboratory setting • Researchers create control groups of animals, expose them to different settings, and test for different outcomes

Commercial application: Acoustic animal detection can save money

4 Motivation-difficulties

Most current bioacoustic classification tools have significant limitations They… • require careful tuning of many parameters • are too computationally expensive for sensors • are not accurate enough • too specialized

5 Related Work

• Dietrich et al (MCS 01), several classifications methods for sounds – Preprocessing and complicated feature extraction – Up to eighteen parameters – Learned on a data set containing just 108 exemplars

• Brown et al (J. Acoust. Soc 09), analyze Australian anurans (frogs and toads) – Identify the species of the frogs with an average accuracy of 98% – Requires extracting features from syllables – “Once the syllables have been properly segmented, a set of features can be calculated to represent each syllable”

6 Outline

• Motivation • Our approach – Visual space-spectrogram – CK distance measure – Sound fingerprint searching • Experimental evaluation • Conclusion & future work

7 Intuition of our Approach

• Classify the animal sounds in the visual space, by treating the texture of their spectrograms as an “acoustic fingerprint”, using a recently introduced parameter-free texture measure as a distance measure Can be considered the “fingerprint” for this sound

One second subset of a common cricket’ sound spectrogram 8 Intuition of our Approach

• Classify the animal sounds in the visual space, by treating the texture of their spectrograms as an “acoustic fingerprint”, using a recently introduced parameter-free texture measure as a distance measure Can be considered the “fingerprint” for this sound

One second subset of a common cricket’ sound spectrogram 9 Our Approach minLen

maxLen P U

T = 0.43

10 Visual Space

Spectrogram • Algorithmic analysis needed instead of manual inspection • Significant noise artifacts • Avoid any type of data cleaning or explicit feature extraction, and use the raw spectrogram

0

Frequency(kHz) 3 Common Virtuoso Katydid Forty seconds (Amblycorypha longinicta) 11 CK Distance Measure

C( x | y ) C ( y | x ) d( x , y ) 1 CK C( x | x ) C ( y | y )

• Distance measure of texture similarity • Robustly extracting features from noisy field recordings is non-trivial • Expands the scope of the compression-based similarity measurements to real-valued images by exploiting the compression technique used by MPEG video encoding. • Effective on images as diverse as moths, nematodes, wood grains, tire tracks etc (SDM 10)

12 Sanity Check CK as a tool for

Gryllus rubens 0.2 National Geographic article “the sand field cricket ( 0 firmus) and the southeastern field cricket (Gryllus rubens) look nearly identical and

-0.2 inhabit the same geographical areas”

Gryllus firmus -0.4 0 0.4

Gryllus rubens

13 Outline

• Motivation • Our approach – Visual space-spectrogram – CK distance measure – Sound fingerprint searching • Experimental evaluation • Conclusion & future work

14 Difficulties

• Do not have carefully extracted prototypes for each class – Only have a collection of sound files • Do not know the call duration • Do not know how many occurrences of it appear in each file • May have mislabeled data • Noisy: most of the recordings are made in the wild

15 Example: Discrete Text Strings

Assume three observations that correspond to a particular species P = {rrbbcxcfbb, rrbbfcxc, rrbbrrbbcxcbcxcf} Given access to the universe of sounds that are known not to contain any example in P U = {rfcbc, crrbbrcb, rcbbxc, rbcxrf,..,rcc } Our task is equivalent to asking: Is there substring that appears only in P and not in U?

16 Example: Discrete Text Strings

Assume three observations that correspond to a particular species P = {rrbbcxcfbb, rrbbfcxc, rrbbrrbbcxcbcxcf} Given access to the universe of sounds that are known not to contain any example in P U = {rfcbc, crrbbrcb, rcbbxc, rbcxrf,..,rcc } Our task is equivalent to asking: Is there substring that appears only in P and not in U?

T1 = rrbb, T2 = rrbbc, T3 = cxc

17 Case Studies Six pairs of recordings of various . Visually determined and extracted one-second 3 4 2 1 8 10 11 5 12 9 6 7 similar regions

One Second

One size does not fit all, when it comes to the length of the sound sequence. Tettigonioidea Grylloidea

11 12 7 8 9 10 1 2 3 4 5 6

One Second

18 Sound Fingerprint

Given U and P P: Contains examples only from the “positive” species class U: Non-target species sounds

To find a subsequence of one of the objects in P, which is close to at least one subsequence in each element of P, but far from all subsequences in every element of U

Potential sound fingerprint

19 Example

1 5 3 4 2

Candidate being tested

0 1

Split point (threshold) C B D A

To find a subsequence of one of the objects in P, which is close to at least one subsequence in each element of P, but far from all subsequences in every element of U

20 How Hard is This ?

1 5 3 4 2

Candidate being tested

0 1

Split point Lmax (threshold) C B D A (Mli  1) l Lmin Si {} P where l is a certain length of candidate

M i is the length of any sound sequence S i in P L min and L max is possible user defined length of sound fingerprint

21 Brute Force Search Generate and Evaluate Step 1: Given P and U, generate all possible subsequences from the objects in P of length m as the sound fingerprint candidates. 0 1 2 3 4 5 6 7 8 Step 2: 1 Using a sliding window with the same size of candidate’ s, locate the minimum distance 2 for each object in P and U 3 Step 3: 4 Evaluation mechanism for splitting datasets into two groups 5 Step 4: . . Sound fingerprint with the best splitting . point, which is the one can produce the largest information gain to separate two classes 22

Evaluation Mechanism

Step3: Information gain to evaluate candidate splitting rules E(D) = -p(X)log(p(X))-p(Y)log(p(Y)) where X and Y are two classes in D Gain = E(D) – E’(D) where E(D) and E’(D) are the entropy before and after partitioning D into D1 and D2 respectively.

E’(D) = f(D1)E(D1) + f(D2)E(D2) where f(D1) is the fraction of objects in D1, and f(D2) is the fraction of objects in D2.

23 Example

A total of nine objects, five from P, and four from U.

This gives us the entropy for the unsorted data [-(5/9)log(5/9)-(4/9)log(4/9)] = 0.991

1 5 3 4 2

Candidate being tested

Information Gain = 0.991- 0.401 = 0.590 0 1

Split point (threshold) C B D A

Four objects from P are the only four objects on the left side of the split point. Of the five objects to the right of the split point we have four objects from U and just one from P

(4/9)[-(4/4)log(4/4)]+(5/9)[-(4/5)log(4/5)-(1/5)log(1/5)] = 0.401 24 Outline

• Motivation • Our approach – Visual space-spectrogram – CK distance measure – Sound fingerprint searching • Experimental evaluation – Brute force search evaluation – Speed up and efficiency • Conclusion & future work

25 Example

P U The distance ordering The sound fingerprint

4 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Distance value

0.6 Distance value

0.4 Recognition Threshold

0.2

0

A demonstration of brute force search algorithm and the discrimination ability of the CK measure. One short template of insect sounds is scanned along a long sequence of sound, which contains one example of the target sound, plus three examples commonly confused insect sounds 26 P = Atlanticus dorsalis

P U

The distance ordering The sound fingerprint

4 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Distance value

1

0.9

0.8

0.7 Running time: 7.5 hours

0.6

Information gain Information 0.5

0.4 Brute-force search 0.3 terminates

0.2 0 100 200 300 400 500 600 700 800 900 27 Speedup by Entropy-based Pruning

0 1

U 0 1

Before split: [-(5/9)log(5/9)-(4/9)log(4/9)] = 0.991 After split: (3/9)[-(3/3)log(3/3)]+(6/9)[-(4/6)log(4/6)-(2/6)log(2/6)] = 0.612 Best-so-far Information Gain

0.991- 0.401 = 0.590

<

Upper bound Information Gain = 0.991- 0.612= 0.379 28 P = Atlanticus dorsalis

P U

The distance ordering The sound fingerprint

4 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Distance value

1

0.9

0.8

0.7 Running time: 1.9 hours Running time: 7.5 hours

0.6

Information gain Information 0.5

0.4 Entropy Brute-force pruning search search 0.3 terminates terminates

0.2 0 100 200 300 400 500 600 700 800 900 29 In brute-force search, we search left to right, top to bottom

Is there a better order? How can we find a good candidate earlier?

The earlier we find a good candidate, the information gain is higher, the more instances we can prune.

But how do we resolve this “chicken and egg” paradox?

Speedup intuition

• Euclidean distance is much faster than CK 1

0.9

0.8 • So let us use Euclidean distance to approximate the 0.7 0.6 best search order for CK 0.5

0.4 Entropy Brute- pruning force 0.3 search search terminates terminates

0.2 • This will only work if Euclidean distance is a good 0 100 200 300 400 500 600 700 800 900 proxy for CK…. (next slide)

30 Euclidean Distance Measure Pruning

11000

10000

9000

8000

7000

6000

Euclidean 5000

4000

3000

2000

1000 0.4 0.5 0.6 0.7 0.8 0.9 1 CK

31 Performance of Optimization

1

0.9

0.8

Running time: 1.1 hours

0.7 Running time: 1.9 hours Running time: 7.5 hours 0.6

Information gain Information 0.5

0.4 All Entropy Brute-force optimizations pruning search search 0.3 search terminates terminates terminates

0.2 0 100 200 300 400 500 600 700 800 900 32 Case Study (1)

0.6 value Distance

0.4

Recognition 0.2

Threshold 0

For more visual understanding, please take a look at the video on YouTube

33 Case Study (2)

0.6 value Distance

0.4 Recognition

Threshold 0.2

0

34 Classification Benchmark of insect classification: The data consists of twenty species of , eight of which are Gryllidae (crickets) and twelve of which are Tettigoniidae (katydids)

Problems: either a twenty-species level problem, or two-class genus level problem.

Method: predicted the testing exemplars class label (as the pink one shown on the left ) by sliding each fingerprint across it and recording the fingerprint that produced the minimum value as the exemplar’s nearest neighbor (the pink fingerprint ). Testing dataset

Insect classification accuracy

… species-level problem genus-level problem

default rate fingerprint default rate fingerprint

20 sound fingerprints sound 20 10 species 0.10 0.70 0.70 0.93

20 species 0.05 0.44 0.60 0.77

35 Scalability of Fingerprint Discovery

1

0.8 Search with reordering optimization terminates 0.6 Entropy pruning search Brute-force search

0.4 terminates terminates Information Information Gain 0.2 0 1500 Number of calls to the CK distance measure

To test the speedup of our toy problem shown on the left, we reran these experiments with a more realistically-sized universe U, containing 200-objects from other insects, birds, trains, helicopters, etc. The result is shown on above.

36 Mislabeled Data Sanity Check

P U P U The distance ordering The distance ordering The sound fingerprint The sound fingerprint

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Distance value Distance value

P = Atlanticus dorsalis Same dataset for mislabel check Left: assume all labeled correctly Right: two instances in positive class mislabeled

37 Mislabeled Data Sanity Check

1 Recognition Threshold 0.5

Distance value Distance 0

P U

1 Recognition Threshold 0.5

Distance value Distance 0

P U

0 200 400 600 800 1000 1200 1400 1600 1800

Same dataset for mislabel check Top: assume all labeled correctly Bottom: two instances in positive class mislabeled

38

Noise background experiment Distancevalue 0.6 0.6 Distancevalue

0.4 0.4

0.2 0.2 Recognition Threshold

Recognition Threshold 0 0

No noise Noise: -5dB Distancevalue 0.6 Distancevalue 0.6

0.4 0.4

0.2 0.2

Recognition Threshold

Recognition Threshold 0 0

Noise: -4dB Noise: +5dB Classification

species-level problem genus-level problem

default rate fingerprint default rate fingerprint

10 species 0.10 0.70 0.70 0.93

20 species 0.05 0.44 0.60 0.77

Twenty insect species datasets: Eight of them are Grylliadae (crickets) Twelve of them are Tettigoniidae (katydids)

40

Other animals-Frogs CK CK Distancevalue 0.6

0.4

0.2 Recognition Threshold 0

41 Outline

• Motivation • Our approach – Visual space-spectrogram – CK distance measure – Sound fingerprint searching • Experimental evaluation – CK as a tool for taxonomy – Speed up and efficiency • Conclusion & future work

42 Conclusion & Future Work

• Our approach to analyze insect sound in visual space is parameter free • Our optimizations can speedup the brute-force search • We will test more species and dataset • We will further speedup the algorithm

43 Thank you

Code and Data: http://www.cs.ucr.edu/~yhao/animalsoundfingerprint.html

44