Acoustic Scene Classification by Ensembling Gradient Boosting Machine and Convolutional Neural Networks

Acoustic Scene Classification by Ensembling Gradient Boosting Machine and Convolutional Neural Networks DCASE 2017 Eduardo Fonseca, Rong Gong, Dmitry Bogdanov, Olga Slizovskaia, Emilia Gomez and Xavier Serra Outline ● Introduction ● Proposed System & Results ● Summary 2 2 Introduction ● Acoustic Scene Classification (ASC) ⇀ 15 acoustic scenes system recording environment 3 Introduction ● Traditionally: feature engineering ⇀ feature extraction ⇀ classifier 4 Introduction ● Traditionally: feature engineering ● Nowadays: data-driven ⇀ feature extraction ⇀ learning representations ⇀ classifier 5 Introduction ● Traditionally: feature engineering ● Nowadays: data-driven ⇀ feature extraction ⇀ learning representations ⇀ classifier How about combining both approaches for ASC ? 6 Proposed System Freesound score splitting GBM Extractor aggregation late acoustic scene 10s segment fusion pre- score splitting CNN processing aggregation mel-spectrogram 7 Gradient Boosting Machine audio feature snippets vectors acoustic n Freesound n n score splitting GBM scene Extractor aggregation ● Freesound Extractor by ● http://essentia.upf.edu/documentation/freesound_extractor.html 8 Gradient Boosting Machine audio feature snippets vectors acoustic n Freesound n n score splitting GBM scene Extractor aggregation ● Gradient Boosting Machine: ⇀ effective in Kaggle ⇀ multiple weak learners (decision trees) 9 Gradient Boosting Machine audio feature snippets vectors acoustic n Freesound n n score splitting GBM scene Extractor aggregation ● Gradient Boosting Machine: ⇀ effective in Kaggle ⇀ multiple weak learners (decision trees) ⇀ added iteratively ● Implementation: ⇀ LigthGBM https://github.com/Microsoft/LightGBM 10 Gradient Boosting Machine audio feature snippets vectors acoustic n Freesound n n score splitting GBM scene Extractor aggregation ● Score aggregation: ⇀ averaging scores across snippets ⇀ argmax ● Results: ⇀ development set ⇀ 4-fold cross-validation provided ⇀ Accuracy: 80.8% 11 Convolutional Neural Network log-scaled T-F mel-spectrogram patches acoustic pre- n n score splitting CNN scene processing aggregation ● log-scaled mel-spectrogram ⇀ 128 bands 12 Convolutional Neural Network log-scaled T-F mel-spectrogram patches acoustic pre- n n score splitting CNN scene processing aggregation ● log-scaled mel-spectrogram ⇀ 128 bands ● Time splitting: ⇀ T-F patches 1.5s 13 Convolutional Neural Network log-scaled T-F mel-spectrogram patches pre- score acoustic splitting n CNN n processing aggregation scene 14 Convolutional Neural Network log-scaled T-F mel-spectrogram patches pre- score acoustic splitting n CNN n processing aggregation scene 15 Convolutional Neural Network log-scaled T-F mel-spectrogram patches acoustic pre- n n score splitting CNN scene processing aggregation ● Global time-domain pooling (Valenti, 2016) 16 Convolutional Neural Network ● Design of convolutional filters: ⇀ spectro-temporal patterns for ASC? ⇀ different rectangular filters (Pons, 2017) (Phan, 2016) 17 Convolutional Neural Network ● Design of convolutional filters: ⇀ spectro-temporal patterns for ASC? ⇀ different rectangular filters (Pons, 2017) (Phan, 2016) ⇀ multiple vertical filter shapes ( Q = 1, 2, 3, 4, 5 ) Q = 1 18 Convolutional Neural Network ● Design of convolutional filters: ⇀ spectro-temporal patterns for ASC? ⇀ different rectangular filters (Pons, 2017) (Phan, 2016) ⇀ multiple vertical filter shapes ( Q = 1, 2, 3, 4, 5 ) Q = 4 19 Recap ● Feature engineering: ⇀ Freesound Extractor ⇀ GBM ● Accuracy 80.8% 20 Recap ● Feature engineering: ● Data-driven ⇀ Freesound Extractor ⇀ log-scaled mel-spectrogram ⇀ GBM ⇀ CNN ● Accuracy 80.8% ● Accuracy: 79.9% 21 Recap ● Feature engineering: ● Data-driven: ⇀ Freesound Extractor ⇀ log-scaled mel-spectrogram ⇀ GBM ⇀ CNN ● Accuracy 80.8% ● Accuracy: 79.9% How different do they behave? 22 Models’ Comparison ● (Confusion matrix by GBM - Confusion matrix by CNN) 23 Models’ Comparison ● (Confusion matrix by GBM - Confusion matrix by CNN) GBM performs better CNN performs better 24 Models’ Comparison ● (Confusion matrix by GBM - Confusion matrix by CNN) GBM performs better CNN performs better 25 Models’ Comparison ● (Confusion matrix by GBM - Confusion matrix by CNN) GBM performs better CNN performs better 26 Models’ Comparison ● (Confusion matrix by GBM - Confusion matrix by CNN) GBM performs better CNN performs better 27 Models’ Comparison ● (Confusion matrix by GBM - Confusion matrix by CNN) GBM performs better CNN performs better 28 Models’ Comparison ● (Confusion matrix by GBM - Confusion matrix by CNN) 29 Late Fusion ● GBM: ⇀ prediction probabilities ● CNN: ⇀ softmax activation values 30 Late Fusion ● GBM: ⇀ prediction probabilities ● CNN: ⇀ softmax activation values ● Late fusion approach: ⇀ arithmetic mean + argmax ● System accuracy on development set: ⇀ 83.0 % 31 Results ● residential area vs park 32 Results ● residential area vs park ● tram vs train 33 Results ● residential area vs park ● tram vs train ● grocery store vs cafe/resto 34 Challenge Ranking ● accuracy drop ● outperforming baseline by absolute 6.3 % 35 Summary ● Ensemble of two models ● Simplicity of models: ⇀ GBM + out-of-box feature extractor ⇀ CNN using domain knowledge ⇀ providing complementary information ● Simple late fusion method ● Reasonable results although room for improvement ⇀ individual models ⇀ fusion approach 36 Thank you! 37 References ● H. Phan, L. Hertel, M. Maass, and A. Mertins, “Robust audio event recognition with 1-max pooling convolutional neural networks”, arXiv preprint arXiv:1604.06338, 2016. ● J. Pons, O. Slizovskaia, R. Gong, E. Gómez, and X. Serra, “Timbre Analysis of Music Audio Signals with Convolutional Neural Networks”, in 25th European Signal Processing Conference (EUSIPCO2017). ● M. Valenti, A. Diment, G. Parascandolo, S. Squartini, and T. Virtanen, “DCASE 2016 acoustic scene classification using convolutional neural networks,” in Proc. Workshop Detection Classif. Acoust. Scenes Events, 2016. 38.

Acoustic Scene Classification by Ensembling Gradient Boosting Machine and Convolutional Neural Networks

Arxiv:2006.04059V1 [Cs.LG] 7 Jun 2020

Xgboost: a Scalable Tree Boosting System

Downloaded from the Cancer Imaging Archive (TCIA)1 [13]

Gradient Boosting Meets Graph Neural Networks

Comparative Performance of Machine Learning Algorithms in Cyberbullying Detection: Using Turkish Language Preprocessing Techniques

Enhanced Prediction of Hot Spots at Protein-Protein Interfaces Using Extreme Gradient Boosting

Development and Evaluation of the Combined Machine Learning Models for the Prediction of Dam Inflow

On Interpretability of Artificial Neural Networks

An Intuitive Exploration of Artificial Intelligence

Comparison of Different U-Net Models for Building Extraction from High- Resolution Aerial Imagery Firat ERDEM, Ugur AVDAN

Imitation Learning in Relational Domains: a Functional-Gradient Boosting Approach

Episodic Exploration for Deep Deterministic Policies for Starcraft Micromanagement