Music Similarity: Learning Algorithms and Applications
Total Page:16
File Type:pdf, Size:1020Kb
UNIVERSITY OF CALIFORNIA, SAN DIEGO More like this: machine learning approaches to music similarity A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Computer Science by Brian McFee Committee in charge: Professor Sanjoy Dasgupta, Co-Chair Professor Gert Lanckriet, Co-Chair Professor Serge Belongie Professor Lawrence Saul Professor Nuno Vasconcelos 2012 Copyright Brian McFee, 2012 All rights reserved. The dissertation of Brian McFee is approved, and it is ac- ceptable in quality and form for publication on microfilm and electronically: Co-Chair Co-Chair University of California, San Diego 2012 iii DEDICATION To my parents. Thanks for the genes, and everything since. iv EPIGRAPH I’m gonna hear my favorite song, if it takes all night.1 Frank Black, “If It Takes All Night.” 1Clearly, the author is lamenting the inefficiencies of broadcast radio programming. v TABLE OF CONTENTS Signature Page................................... iii Dedication...................................... iv Epigraph.......................................v Table of Contents.................................. vi List of Figures....................................x List of Tables.................................... xi Acknowledgements................................. xii Vita......................................... xiv Abstract of the Dissertation............................. xvi Chapter 1 Introduction.............................1 1.1 Music information retrieval..................1 1.2 Summary of contributions..................1 1.3 Preliminaries.........................2 Chapter 2 Learning multi-modal similarity..................5 2.1 Introduction..........................5 2.1.1 Contributions.....................8 2.1.2 Preliminaries.....................9 2.2 A graphical view of similarity................ 10 2.2.1 Similarity graphs................... 12 2.2.2 Graph simplification................. 12 2.3 Partial order embedding.................... 14 2.3.1 Linear projection................... 14 2.3.2 Non-linear projection via kernels........... 16 2.3.3 Connection to GNMDS................ 19 2.4 Multiple kernel embedding.................. 20 2.4.1 Unweighted combination............... 21 2.4.2 Weighted combination................ 21 2.4.3 Concatenated projection............... 23 2.4.4 Diagonal learning................... 25 2.5 Experiments.......................... 28 2.5.1 Toy experiment: taxonomy embedding....... 28 2.5.2 Musical artist similarity............... 30 vi 2.6 Hardness of dimensionality reduction............ 38 2.7 Conclusion.......................... 39 2.A Embedding partial orders................... 41 2.B Solver............................. 42 2.C Relationship to AUC..................... 44 Chapter 3 Metric learning to rank....................... 46 3.1 Introduction.......................... 46 3.1.1 Related work..................... 47 3.1.2 Preliminaries..................... 48 3.2 Structural SVM review.................... 48 3.2.1 Optimization..................... 49 3.2.2 Ranking with structural SVM............ 50 3.3 Metric learning to rank.................... 51 3.3.1 Algorithm....................... 52 3.3.2 Implementation.................... 52 3.4 Ranking measures....................... 54 3.4.1 AUC......................... 55 3.4.2 Precision-at-k ..................... 55 3.4.3 Average Precision................... 56 3.4.4 Mean Reciprocal Rank................ 57 3.4.5 Normalized Discounted Cumulative Gain...... 57 3.5 Experiments.......................... 57 3.5.1 Classification on UCI data.............. 58 3.5.2 eHarmony data.................... 59 3.6 Conclusion.......................... 62 Chapter 4 Faster structural metric learning.................. 63 4.1 Introduction.......................... 63 4.2 Structural metric learning................... 64 4.3 Alternating direction optimization.............. 67 4.3.1 Dual optimization................... 68 4.3.2 Multiple kernel projection.............. 70 4.3.3 Implementation details................ 71 4.4 Experiments.......................... 72 4.4.1 UCI data....................... 72 4.4.2 Multimedia data................... 76 4.5 Conclusion.......................... 77 4.A Derivation of eq. (4.11).................... 79 4.B Axis-aligned learning..................... 80 vii Chapter 5 Similarity from a collaborative filter................ 82 5.1 Introduction.......................... 82 5.1.1 Related work..................... 84 5.1.2 Contributions..................... 85 5.2 Learning similarity...................... 86 5.2.1 Collaborative filters.................. 87 5.3 Audio representation..................... 88 5.3.1 Codebook training.................. 89 5.3.2 (Top-τ) Vector quantization............. 89 5.3.3 Histogram representation and distance........ 91 5.4 Experiments.......................... 92 5.4.1 Data.......................... 93 5.4.2 Procedure....................... 94 5.4.3 Comparisons..................... 96 5.5 Results............................ 99 5.6 Conclusion.......................... 105 Chapter 6 Large-scale similarity search.................... 107 6.1 Introduction.......................... 107 6.2 Related work......................... 108 6.3 Spatial trees.......................... 109 6.3.1 Maximum variance KD-tree............. 110 6.3.2 PCA-tree....................... 111 6.3.3 2-means........................ 111 6.3.4 Random projection.................. 112 6.3.5 Spill trees....................... 112 6.3.6 Retrieval algorithm and analysis........... 113 6.4 Experiments.......................... 115 6.4.1 Audio representation................. 116 6.4.2 Representation evaluation.............. 117 6.4.3 Tree evaluation.................... 117 6.4.4 Retrieval results.................... 119 6.4.5 Timing results.................... 120 6.5 Conclusion.......................... 121 Chapter 7 Modeling playlists......................... 123 7.1 Introduction.......................... 123 7.2 A brief history of playlist evaluation............. 124 7.2.1 Human evaluation.................. 124 7.2.2 Semantic cohesion.................. 125 7.2.3 Sequence prediction................. 125 7.3 A natural language approach................. 126 7.3.1 Evaluation procedure................. 127 viii 7.4 Playlist dialects........................ 128 7.5 Hyper-graph random walks.................. 129 7.5.1 The user model.................... 129 7.5.2 The playlist model.................. 130 7.5.3 Learning the weights................. 131 7.6 Data collection........................ 132 7.6.1 Playlists: Art of the Mix 2011............ 132 7.6.2 Edge features..................... 133 7.7 Experiments.......................... 135 7.7.1 Experiment 1: Does dialect matter?......... 135 7.7.2 Experiment 2: Do transitions matter?........ 136 7.7.3 Experiment 3: Which features matter?........ 136 7.7.4 Example playlists................... 139 7.8 Conclusion.......................... 140 Appendix A Optimizing Average Precision................... 145 A.1 Cutting plane optimization of AP............... 145 A.2 Most violated constraint DP................. 146 A.2.1 DP algorithm..................... 147 A.2.2 Complexity...................... 148 Appendix B Online k-Means........................... 150 B.1 Introduction.......................... 150 B.2 Hartigan’s method and online k-means............ 151 ix LIST OF FIGURES Figure 2.1: An overview of multi-modal feature integration...........8 Figure 2.2: Graphical representation of relative comparisons........... 11 Figure 2.3: Two variants of multiple kernel-embedding.............. 25 Figure 2.4: The label taxonomy for the experiment in section 2.5.1........ 29 Figure 2.5: Experimental results for section 2.5.1................. 30 Figure 2.6: aset400 embedding results for each of the base kernels........ 34 Figure 2.7: aset400 embedding results with multiple-kernel embedding..... 35 Figure 2.8: The example-kernel weighting learned by algorithm 2.4....... 36 Figure 2.9: t-SNE visualizations of the learned embedding of aset400...... 37 Figure 2.10: The effect of constraint processing on embedding accuracy..... 38 Figure 2.11: A constraint set that cannot be embedded in R1............ 40 Figure 3.1: Dimensionality reduction for UCI data sets.............. 60 Figure 4.1: Comparison of ITML, LMNN, and MLR on UCI data........ 74 Figure 4.2: Effective rank of learned metrics with noisy dimensions....... 75 Figure 4.3: Comparison of projections for MLR-ADMM and MLR-Proj..... 75 Figure 4.4: Comparison of MLR-ADMM and MLR-Proj for music similarity.. 78 Figure 5.1: Query-by-example retrieval...................... 83 Figure 5.2: Vector quantization: hard and top-τ.................. 90 Figure 5.3: Schematic diagram of audio similarity learning and retrieval..... 93 Figure 5.4: Retrieval accuracy with vector quantized audio representations... 100 Figure 5.5: The effective dimensionality of audio codeword histograms..... 101 Figure 5.6: t-SNE visualization of optimized music similarity space....... 102 Figure 5.7: Comparison of VQ-based audio similarity to alternative methods.. 104 Figure 6.1: Splitting data with a spatial partition tree............... 109 Figure 6.2: Splitting data with a spill tree..................... 113 Figure 6.3: Semantic annotation performance of the learned audio feature rep- resentations.............................. 118 Figure 6.4: Recall performance for spill trees with different splitting rules.... 119 Figure 6.5: Average time to retrieve k (approximate) nearest