Bird Song Recognition

THE UNIVERSITY OF MANCHESTER BSC (HONS)COMPUTER SCIENCE AND MATHEMATICS WITH INDUSTRIAL EXPERIENCE FINAL YEAR PROJECT Bird Song Recognition Author: Supervisor: Jake PALMER Dr. Andrea SCHALK April 2016 Abstract This project investigates the viability and potential approaches for automatic bird song recognition by testing various methods on a handful of British birds. How to gather and process the data is detailed, and we discuss whether or not doing so is important. A novel approach to classification using archetypal bird songs is developed, with an augmented similarity measure using the dynamic time warping algorithm taking into account the number of components in a song. The results, benefits, and drawbacks of the archetypes approach is compared to a more typical machine learning approach using a decision tree. Contents List of Figures 3 List of Tables 4 1 Introduction 5 1.1 Motivation . 5 1.2 Project Goals . 6 1.3 Representing Bird Song . 7 1.3.1 The Raw Waveform . 7 1.3.2 The Spectrum . 9 1.3.3 Human Intuition and the Spectrogram . 11 Our Human Ability To Recognise Patterns . 11 The Spectrogram . 11 1.4 Existing literature . 14 2 Gathering and Processing Data 15 2.1 xeno-canto . 15 2.2 A Variety of Songs . 15 2.3 Preprocessing Gathered Data . 17 2.3.1 The Procedure . 17 2.4 Data to Be Classified . 18 3 Approaches 21 3.1 Desirable Properties . 22 3.2 Archetypes . 22 3.2.1 Finding the Archetype . 23 The Method . 23 Set Up Time . 24 1 3.2.2 Measuring Similarity . 25 Best Representation . 25 Throwing Away Data . 26 Unexpected Similarity Results (and What To Do About It) 28 3.2.3 Results . 29 3.3 Machine Learning . 29 3.3.1 Results . 30 4 Conclusion 32 5 References 33 A Appendix 35 A.1 Dynamic Time Warping . 35 A.2 The (Fast) Fourier Transform . 37 2 List of Figures 1.1 A recording of me saying ”the cat sat” . 7 1.2 Processed recording of the song of a Common Chiffchaff . 8 1.3 Section of a raw Blackbird recording . 8 1.4 A plot of the spectrum of a Woodpigeon’s song . 10 1.5 A plot of the spectrum of a Chiffchaff’s song . 10 1.6 A plot of the spectrum of a recording of me saying ”the cat sat” . 11 1.7 The spectrogram of a Woodpigeon’s song . 12 1.8 The spectrogram of a Chiffchaff’s song . 13 1.9 The spectrogram of a section of a raw recording of a Blackbird’s song . 13 2.1 Spectrogram before starting procedure . 20 2.2 Waveform before starting procedure . 20 2.3 Spectrogram after applying a low-pass filter . 20 2.4 Waveform after applying a low-pass filter . 20 2.5 Spectrogram after removing noise . 20 2.6 Waveform after removing noise . 20 3.1 Waveform with all values kept . 26 3.2 Waveform with every 100th value kept . 26 3.3 A plot comparing the time per DTW measure for various amounts of data being kept . 27 3.4 Only the top half (values above 0) of a Woodpigeon song’s waveform . 31 3.5 Smoothed top half of a Woodpigeon song’s waveform . 31 3.6 Marked components of the smoothed top half of a Woodpigeon’s waveform . 31 3 A.1 A fabricated example of the result of attempting to find the best match between two signals using DTW . 37 A.2 A plot of sin(x) for x from 0 to 2p, along with its spectrum . 38 A.3 A plot of sin(x) for x from 0 to 2p + 1, along with its spectrum . 39 List of Tables 2.1 A comparison of the pitch and structure similarity of the birds being used . 16 2.2 The number of both processed and unprocessed songs for each bird . 17 3.1 The classification accuracies for the archetypes method . 29 4 Introduction Bird vocalisation comes in two flavours: songs, and calls. Bird song is what birds use for courtship and mating and tends to be relatively complex (depending on the bird), whereas calls (also known as signals[2]) are for signalling alarm or for members of a flock to communicate information, which may simply be their locations relative to each other. Due to the greater complexity of bird song, in terms of its structure, it is much easier to tell one bird’s song (belonging to one species) apart from another bird’s song (belonging to another species). Calls tend to be quite similar both in terms of structure and pitch. I am fairly sure the same techniques I have used for songs could be applied to calls too, but I have chosen to focus my attention on songs in particular for the reasons mentioned. For all birds within a given species it is usually the case that their songs (of a particular kind, say for courting) sound reasonably similar to a human listener. This isn’t always the case, but to a first approximation this is more or less true. Of course there are some birds of different species that have quite similar songs, and so can be difficult to tell apart, and there are some birds of the same species that will have a number of songs they can choose from, each of which may serve a different purpose for them, such as for the different stages of a courting ritual. 1.1 Motivation I think this sort of software would see a lot of use recreationally, for people to identify what bird they are hearing if they don’t have the knowledge necessary to be able to recognise it themselves, or because they want a second opinion on their guess. But apart from recreational use, I can imagine this sort of thing being very use- ful for mapping out and monitoring bird populations in a way that requires less manual intervention. For example, placing some recorders in an area relatively free from noise like cars and people and have it try and recognise sounds day and night and store them on a local disk. A person could then come back some time later and retrieve the data for analysis. Once data has been gathered over several years some precise statements about the local bird population over time could be made. This could be important for spotting a decline in population due to disease 5 or an increase in predators. Additionally, you might be able to say some interesting things about their daily habits, and which of their particular vocalisations are more prevalent at different times of the year, maybe to identify which songs are for what purpose. Alternatively it could be used to spot that migration times have changed over the years and use that as secondary data to provide further evidence for some primary data such as a change in climate. For monitoring bird populations this would require much less manual labour and is thus much more scalable than some more traditional methods[13]. 1.2 Project Goals The primary goal of this project was to assess the viability of recognising bird song to some reasonable degree of accuracy. More specifically - limiting the scope of the project slightly to one that was feasible in the time frame - to recognise some modestly sized subset of British birds, again with a reasonable degree of accuracy (what a ”reasonably degree of accuracy” means will of course depend on the desired application, but I am taking it to mean several times better than random guessing, at the very least). Upon achieving that goal, I wanted to be able to compare and contrast different approaches to the problem if possible in the available time period, or if I only managed to look deeply into one then discuss some of the pros and cons of that method in isolation. Fortunately I have been able to do some research into two relatively different methods (beyond just machine learning algorithm A and machine learning algorithm B, which would not be especially insightful). I took quite a different first approach, which I will discuss later. I have found that it is absolutely viable, and with some more investigation it seems to me that you should be able to get good accuracy as long as you have the data to back up the algorithms. This project doesn’t concern itself with how to present this information to a user or similar things in that vein, as it has not been about developing an application. It was instead about investigating and testing potential methods. Note also that I will be ignoring things such as the differences between the song of an adult and juvenile bird, and will only gather data and test against adult bird song. I have been very pleased with the results of the project, and I think there are numerous interesting avenues for investigation in the future. 6 1.3 Representing Bird Song Of course if we are to have any hope of automating the process of recognition, we first need to find suitable forms and data structures for storing our bird song. So, what exactly is a bird song, when it is in digital format? 1.3.1 The Raw Waveform The simplest imaginable representation of a sound, at least to my intuition, is a digitised form of how it appears in nature. A collective summing of simple sine waves at particular frequencies gives us a waveform. Most uncompressed audio file formats will store the information using the pulse-code modulation method, which I won’t go into here, but the WAV file format is an instance of that, and that’s the file format my program reads in.

Load more