Sparse Machine Learning Methods with Applications in Multivariate Signal Processing

Sparse Machine Learning Methods with Applications in Multivariate Signal Processing Thomas Robert Diethe A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy of the University of London. Department of Computer Science University College London 2010 2 I, Thomas Robert Diethe, confirm that the work presented in this thesis is my own. Where informa- tion has been derived from other sources, I confirm that this has been indicated in the thesis. Abstract This thesis details theoretical and empirical work that draws from two main subject areas: Machine Learning (ML) and Digital Signal Processing (DSP). A unified general framework is given for the application of sparse machine learning methods to multivariate signal processing. In particular, methods that enforce sparsity will be employed for reasons of computational efficiency, regularisation, and compress- ibility. The methods presented can be seen as modular building blocks that can be applied to a variety of applications. Application specific prior knowledge can be used in various ways, resulting in a flexible and powerful set of tools. The motivation for the methods is to be able to learn and generalise from a set of multivariate signals. In addition to testing on benchmark datasets, a series of empirical evaluations on real world datasets were carried out. These included: the classification of musical genre from polyphonic audio files; a study of how the sampling rate in a digital radar can be reduced through the use of Com- pressed Sensing (CS); analysis of human perception of different modulations of musical key from Electroencephalography (EEG) recordings; classification of genre of musical pieces to which a listener is attending from Magnetoencephalography (MEG) brain recordings. These applications demonstrate the efficacy of the framework and highlight interesting directions of future research. Acknowledgements To my parents, who have supported my education from start to finish, thank-you so much for giving me this opportunity. To my supervisor John Shawe-Taylor, whose breadth and depth of knowledge never ceases to amaze me, thank-you for your guidance. The research leading to the results presented here has received funding from the EPSRC grant agreement EP-D063612-1, “Learning the Structure of Music”. Contents List of Figures 9 List of Tables 11 1 Introduction 12 1.1 MachineLearning................................. ..... 12 1.2 SparsityinMachineLearning. ......... 12 1.3 MultivariateSignalProcessing . ........... 13 1.4 ApplicationAreas................................ ...... 14 1.4.1 LearningtheStructureofMusic . ....... 14 1.4.2 MusicInformationRetrieval . ....... 15 1.4.3 AutomaticanalysisofBrainSignals . ......... 15 1.4.4 AdditionalApplicationAreas . ....... 15 1.4.5 PublishedWorks ................................ .. 16 1.5 Structureofthisthesis . ........ 16 2 Background 18 2.1 MachineLearning................................. ..... 18 2.1.1 ReproducingKernelHilbertSpaces . ........ 19 2.1.2 Regression .................................... 20 2.1.3 Lossfunctionsforregression. ........ 20 2.1.4 Linearregressioninafeaturespace . ......... 21 2.1.5 StabilityofRegression . ...... 22 2.1.6 Regularisation ................................ ... 24 2.1.7 SparseRegression .............................. ... 25 2.1.8 Classification ................................. ... 27 2.1.9 Lossfunctionsforclassification . ......... 27 2.1.10 MaximumMarginclassification . ....... 31 Contents 6 2.1.11 Boosting ..................................... 32 2.1.12 SubspaceMethods ... ... .... .... .... ... .... .... ... 37 2.1.13 Multi-viewLearning . ..... 38 2.2 DigitalSignalProcessing(DSP) . .......... 39 2.2.1 Bases,Frames,DictionariesandTransforms . ............ 39 2.2.2 SparseandRedundantSignals . ...... 42 2.2.3 GreedyMethodsforSparseEstimation . ........ 43 2.2.4 CompressedSensing(CS) . .... 45 2.2.5 IncoherenceWithRandomMeasurements . ........ 46 2.2.6 MultivariateSignalProcessing . ......... 47 3 Sparse Machine Learning Framework for Multivariate Signal Processing 48 3.1 FrameworkOutline ................................ ..... 48 3.2 GreedymethodsforMachineLearning. .......... 51 3.2.1 Matching Pursuit Kernel Fisher Discriminant Analysis.............. 51 3.2.2 NyströmLow-RankApproximations . ....... 53 3.3 KernelPolytopeFacesPursuit . ......... 63 3.3.1 Generalisationerrorbound . ....... 64 3.3.2 Experiments ................................... 68 3.3.3 BoundExperiments... ... .... .... .... ... .... .... ... 69 3.4 LearninginaNyströmApproximatedSubspace . ............. 70 3.4.1 Theory of Support Vector Machine (SVM) in Nyström Subspace......... 72 3.4.2 Experiments:Classification . ....... 77 3.4.3 Experiments:Regression. ...... 78 3.5 Multi-ViewLearning .. .... ... .... .... .... ... .... .. ...... 81 3.5.1 Kernel Canonical Correlation Analysis with ProjectedNearestNeighbours . 83 3.5.2 Convex Multi-View Fisher Discriminant Analysis . ............. 84 3.6 ConclusionsandFurtherWork . ........ 96 4 Applications I 97 4.1 Introduction.................................... ..... 97 4.2 GenreClassification... .... ... .... .... .... ... .... ....... 98 4.2.1 MIREX....................................... 99 4.2.2 FeatureSelection .............................. 100 4.2.3 Framelevelfeatures .. ... .... .... .... ... .... .... 101 4.2.4 FeatureAggregation . 103 4.2.5 Algorithms .................................... 103 4.2.6 Multiclass Linear Programming Boosting (LPBoost) Formulation (LPMBoost ) . 104 4.2.7 Experiments ................................... 105 Contents 7 4.2.8 Results ....................................... 107 4.3 CompressedSensingforRadar . ........108 4.3.1 ReviewofCompressiveSampling . ......109 4.3.2 ApplicationofCSToRadar . 109 4.3.3 ExperimentalApproach . 110 4.3.4 ResultsAndAnalysis. 112 4.4 Conclusions..................................... 117 5 Applications II 118 5.1 Introduction.................................... 118 5.2 Experiment 1: Classification of tonality from EEG recordings...............119 5.2.1 Participants.................................. 120 5.2.2 Design .......................................121 5.2.3 EEGMeasurements ............................... 121 5.2.4 DataPreprocessing... ... .... .... .... ... .... .... 121 5.2.5 FeatureExtraction ... ... .... .... .... ... .... .... 122 5.2.6 Results ....................................... 124 5.2.7 Leave-one-outAnalysis . 126 5.3 Discussion...................................... 127 5.4 Experiment 2: Classification of genre from MEG recordings ...............128 5.4.1 Participants.................................. 130 5.4.2 Design .......................................130 5.4.3 Procedure..................................... 130 5.4.4 FeatureExtraction ... ... .... .... .... ... .... .... 131 5.4.5 Results ....................................... 132 5.4.6 Discussion .................................... 134 6 Conclusions 135 6.1 Conclusions..................................... 135 6.1.1 Greedymethods ................................. 135 6.1.2 Low-rankapproximationmethods . .......135 6.1.3 Multiviewmethods... ... .... .... .... ... .... .... 136 6.1.4 Experimentalapplications . .......136 6.2 FurtherWork ..................................... 139 6.2.1 Synthesis of greedy/Nyström methods and Multi-View Learning (MVL) methods 139 6.2.2 Nonlinear Dynamics of Chaotic and Stochastic Systems .............140 6.3 One-classFisherDiscriminantAnalysis . .............141 6.4 SummaryandConclusions . ......142 A Mathematical Addenda 143 Contents 8 B Acronyms 144 Bibliography 147 List of Figures 2.1 Modularityofkernelmethods . ........ 19 2.2 StructuralRiskMinimisation . .......... 24 2.3 Minimisationontonormballs. ......... 26 2.4 Some examples of convex loss functions used in classification .............. 28 2.5 CommontasksinDigitalSignalProcessing . ............ 39 3.1 Diagrammatic view of the process of machine learning from multivariate signals . 50 3.2 Diagrammatic representation of the Nyström method . ................ 54 3.3 Plot of generalisation error bound for different values of k usingRBFkernels . 62 3.4 Plot showing how the norm of the deflated kernel matrix and the test error vary with k .. 63 3.5 Generalisation error bound for the ‘Boston housing’ dataset ............... 69 ǫ/2 8 3.6 Plotof f(ǫ)=(1 ǫ/2) ln(1 + 1 ǫ ) ǫ/2 and f(ǫ)= ǫ for ǫ 0, 0.5 ...... 76 − − − ∈{ } 3.7 Error and run-time as a function of k on‘BreastCancer’forNFDA,KFDA . 79 3.8 Error and run-time as a function of k on‘FlareSolar’forNFDA,KFDA . 80 3.9 Error and run-time as a function of k on‘Bodyfat’byKRR,NRRandKRR . 81 3.10 Error and run-time as a function of k for‘Housing’byKRR,NRRandKRR . 82 3.11 Diagrammatic view of the process of a) MSL, b) MVL and c) MKL ........... 83 3.12 Plates diagram showing the hierarchical Bayesian interpretationofMFDA . 87 3.13 WeightsgivenbyMFDAandSMFDAonthetoydataset . ........... 94 3.14 Average precision recall curves for 3 VOC 2007 datasets forSMFDAandPicSOM . 95 4.1 Confusion Matrix of human performance on Anders Meng datasetd004 . 106 4.2 ThemodifiedreceiverchainforCSradar. ...........109 4.3 Fast-timesamplesofthestationarytarget. ...............112 4.4 Rangeprofilesofthestationarytarget. .............113 4.5 Fast-time samples constructed from largest three coefficients.. 113 4.6 Range profiles constructed from largest three coefficients..................114 4.7 The range-frequencysurfaces for the moving targets. .................114 LISTOFFIGURES 10 4.8 Range-frequencysurfacesfor

Load more