Temporal Sensorfusion for the Classification of Bioacoustic Time
Total Page:16
File Type:pdf, Size:1020Kb
Abteilung Neuroinformatik Prof. Dr. Gunther¨ Palm Temporal Sensorfusion for the Classification of Bioacoustic Time Series Dissertation zur Erlangung des Doktorgrades Dr. rer. nat. der Fakultat¨ fur¨ Informatik der Universitat¨ Ulm von Christian R. Dietrich aus Hirschegg 2003 ii Amtierender Dekan: Prof. Dr. Friedrich W. von Henke Erster Gutachter Prof. Dr. Gunther¨ Palm Zweiter Gutachter PD Dr. Alfred Strey Tag der Promotion 25.06.04 Abstract Classifying species by their sounds is a fundamental challenge in the study of animal vocalisations. Most of existing studies are based on manual inspection and labelling of acoustic features, e.g. amplitude signals and sound spectra, which relies on the agreement between human experts. But during the last ten years, systems for the automated classification of ani- mal vocalisations have been developed. In this thesis a system for the classification of Orthoptera species by their sounds is described in great detail and the behaviour of this approach is demonstrated on a large data set containing sounds of 53 different species. The system consists of multiple classifiers, since in previous studies it has been shown, that for many applications the classification performance of a single classifier system can be improved by combining the decisions of multiple classifiers. To determine features for the individual classifiers these features have been selected manually and automatically. For the manual feature selec- tion, pattern recognition and bioacoustics are considered as two coher- ent interdisciplinary research fields. Hereby the sound production mecha- nisms of the Orthoptera reveals significant features for the classification to family and to species level. Nevertheless, we applied a wrapper feature selection method, the sequential forward selection, in order to determine further discriminative feature sets for the individual classifiers. In particular, this thesis deals with classifier ensemble methods for time series classification applied to bioacoustic data. Hereby a set of lo- cal features is extracted inside a sliding time window which moves over the whole sound signal. The temporal combination of local features and the combination over the feature space is studied. Static combining paradigms where the classifier outputs are simply com- bined through a fixed fusion mapping, and adaptive combining paradigms where an additional fusion layer is trained through a second supervised learning procedure are proposed and discussed. The decision template (DT) fusion scheme is an intuitive approach for such a trainable fusion iv Abstract scheme, which is typically applied to recognise static objects. During the second supervised learning step the DT algorithm uses confusion matrix data to adapt the fusion layer. In several empirical studies it has been shown that the classification performance of adaptive fusion schemes, par- ticularly for the so called decision template, is superior. Many linear trainable decision fusion mappings, e.g. the combination with the linear associative memory, the pseudoinverse matrix and the naive Bayes fusion scheme are based on the same idea. Links between these methods are given. However, regarding the classification of temporal se- quences these methods do not consider the temporal variation of the clas- sifier outputs. In order to deal with variations of classifier decisions within time se- ries we propose to calculate multiple decision templates (MDTs) per class. Two new methods called temporal decision templates (TDTs) and clustered decision templates (CDTs) are introduced and the behaviour of these new methods is discussed on real data from the field of bioacoustics and arti- ficially generated data. In contrast to the combination with decision tem- plates the multiple decision template approaches lead to an increased ex- pression power of the fusion layer. This enhances the classification perfor- mance for classification problems where the outputs of the individual clas- sifiers show different characteristic patterns, which is a typical behaviour in time series classification applications. Hereby several characteristic pat- terns may exist for the whole time series belonging to the same class or for the individual local decisions of a single time series. Whereas the TDT approach offers the advantage to learn several characteristic patterns for whole time series, the CDT approach is also able to learn different charac- teristic patterns over time. Such patterns have been found in the temporal domain of the Orthoptera. Acknowledgements This thesis was supported by the help and the assistance of many people whose efforts I hereby would like to gratefully acknowledge. First of all, I would like to thank my adviser Prof. Dr. Gunther¨ Palm for his steady support, for giving me indispensable motivation and for giving me the right ideas at the right time. His valuable advice and guidance will always be appreciated. I am greateful to PD Dr. Alfred Strey for serving as a reading committee member, who supported this work with valuable directions and a meticulous review of this manuscript. Especially, I would like to thank to my thesis adviser Dr. Friedhelm Schwenker who contributed to this thesis with his guidance, generous as- sistance and many fruitful discussions. Despite of his own tight schedule, he always found a large amount of time for writing many papers in the field of classifier fusion together with me, and for carefully reading this manuscript. Many thanks to the members of the DORSA project who improved this work in the field of bioacoustics, significantly. In particular, I would like to thank PD Dr. Klaus Riede, Dr. Sigfrid Ingrisch, Dr. Klaus G. Heller and Dr. Frank Nischk for providing their sound recordings, suggestions, fruitful discussions and the successful cooperation. Special thanks to all my present and former colleagues at the depart- ment of neural information processing for their friendliness and helpful- ness that makes working each single day a real pleasure for me. Valuable discussions with my colleagues improved this work considerably. I also want to thank Barbara Previer for carefully reading this manu- script and to Dr. Axel Baune for his support regarding LATEX. Last, but most important I am grateful to my family for all their pa- tience, support and loving. My son Daniel who makes the nights shorter but my days much brighter and my marvellous wife Dr. Katrin Dietrich for her encouragement and never-ending love. vi Acknowledgements Contents Abstract iii Acknowledgements v List of Figures xi List of Tables xv Abbreviation xvii Some Common Acronyms xix 1 Introduction 1 1.1 Motivation ............................. 2 1.2 Related Work ........................... 4 1.3 Major Contribution of this Thesis ................ 8 1.4 Outline of the Dissertation .................... 9 2 Classification Methods 11 2.1 Neural Networks ......................... 11 2.1.1 Radial Basis Function Networks ............ 12 2.1.2 Learning Vector Quantisation .............. 19 2.1.3 Fuzzy-K-Nearest Neighbour Classifiers ........ 23 2.1.4 Summary and Discussion ................ 25 2.2 Time Alignment and Pattern Matching ............. 26 2.2.1 Pre-Processing of Features ................ 26 2.2.2 Dynamic Time Warping ................. 28 2.2.3 Time Delay Neural Networks .............. 30 2.2.4 Partially Recurrent Neural Networks ......... 30 2.2.5 Hidden Markov Models ................. 32 2.2.6 Summary ......................... 32 viii Contents 3 Multiple Classifier Systems 35 3.1 Introduction ............................ 35 3.2 MCS Design Methodology and Terminology ......... 42 3.3 Feature Extraction in Time Series ................ 44 3.4 Three Architectures for the Classification of Time Series ... 45 3.4.1 Average Fusion ...................... 49 3.4.2 Probabilistic Fusion .................... 49 3.4.3 Percentiles ......................... 50 3.4.4 Voting ........................... 51 3.5 Decision Templates for Time Series Classification ....... 52 3.5.1 Decision Templates .................... 52 3.5.2 Multiple Decision Templates .............. 55 3.6 Links to Neural Network Training Schemes .......... 60 3.6.1 Linear Associative Memory ............... 60 3.6.2 Decision Templates .................... 62 3.6.3 Pseudo-Inverse Solution ................. 63 3.6.4 Naive Bayes Decision Fusion .............. 64 3.6.5 Discussion ......................... 64 3.7 Statistical Evaluation ....................... 66 3.8 Summary .............................. 66 4 Orthoptera Bioacoustics 69 4.1 Introduction ............................ 69 4.1.1 Automated Song Analysis ................ 71 4.1.2 Sound Production and Morphology of the Orthoptera 72 4.1.3 Acoustic Structure of the Orthoptera Sound ...... 75 4.1.4 The Acoustic Receptor System ............. 78 4.1.5 Acoustic Characteristics ................. 80 4.1.6 Normalisation of the Temperature Influence ..... 81 4.2 Hierarchical Feature Extraction in Bioacoustic Time Series . 85 4.2.1 Pre-processing ...................... 88 4.2.2 First Level Feature Extraction .............. 88 4.2.3 Signal Filtering ...................... 92 4.2.4 Signal Segmentation Through the Signal’s Energy .. 94 4.2.5 Local Features From Sequences of Pulses ....... 98 4.2.6 Global Features ...................... 110 4.2.7 Feature Extraction in Time Scales ............ 111 4.3 Data Set Description ....................... 113 4.4 Feature Evaluation ........................ 114 4.5 Feature Selection ......................... 118 4.6 Experiments and Results ..................... 122