
Ten Lectures and Forty-Two Open Problems in the Mathematics of Data Science Afonso S. Bandeira December, 2015 Preface These are notes from a course I gave at MIT on the Fall of 2015 entitled: \18.S096: Topics in Mathematics of Data Science". These notes are not in final form and will be continuously edited and/or corrected (as I am sure they contain many typos). Please use at your own risk and do let me know if you find any typo/mistake. Part of the content of this course is greatly inspired by a course I took from Amit Singer while a graduate student at Princeton. Amit's course was inspiring and influential on my research interests. I can only hope that these notes may one day inspire someone's research in the same way that Amit's course inspired mine. These notes also include a total of forty-two open problems (now 41, as in meanwhile Open Problem 1.3 has been solved [MS15]!). This list of problems does not necessarily contain the most important problems in the field (al- though some will be rather important). I have tried to select a mix of important, perhaps approachable, and fun problems. Hopefully you will enjoy thinking about these problems as much as I do! I would like to thank all the students who took my course, it was a great and interactive audience! I would also like to thank Nicolas Boumal, Ludwig Schmidt, and Jonathan Weed for letting me know of several typos. Thank you also to Nicolas Boumal, Dustin G. Mixon, Bernat Guillen Pegueroles, Philippe Rigollet, and Francisco Unda for suggesting open problems. Contents 0.1 List of open problems . 4 0.2 A couple of Open Problems . 6 0.2.1 Koml´osConjecture . 6 0.2.2 Matrix AM-GM inequality . 7 0.3 Brief Review of some linear algebra tools . 7 0.3.1 Singular Value Decomposition . 7 0.3.2 Spectral Decomposition . 8 1 0.3.3 Trace and norm . 8 0.4 Quadratic Forms . 9 1 Principal Component Analysis in High Dimensions and the Spike Model 10 1.1 Dimension Reduction and PCA . 10 1.1.1 PCA as best d-dimensional affine fit . 10 1.1.2 PCA as d-dimensional projection that preserves the most variance . 12 1.1.3 Finding the Principal Components . 13 1.1.4 Which d should we pick? . 13 1.1.5 A related open problem . 14 1.2 PCA in high dimensions and Marcenko-Pastur . 15 1.2.1 A related open problem . 17 1.3 Spike Models and BBP transition . 18 1.3.1 A brief mention of Wigner matrices . 22 1.3.2 An open problem about spike models . 23 2 Graphs, Diffusion Maps, and Semi-supervised Learning 24 2.1 Graphs . 24 2.1.1 Cliques and Ramsey numbers . 25 2.2 Diffusion Maps . 29 2.2.1 A couple of examples . 32 2.2.2 Diffusion Maps of point clouds . 33 2.2.3 A simple example . 34 2.2.4 Similar non-linear dimensional reduction techniques . 34 2.3 Semi-supervised learning . 35 2.3.1 An interesting experience and the Sobolev Embedding Theorem . 38 3 Spectral Clustering and Cheeger's Inequality 41 3.1 Clustering . 41 3.1.1 k-means Clustering . 41 3.2 Spectral Clustering . 43 3.3 Two clusters . 45 3.3.1 Normalized Cut . 46 3.3.2 Normalized Cut as a spectral relaxation . 48 3.4 Small Clusters and the Small Set Expansion Hypothesis . 53 3.5 Computing Eigenvectors . 53 3.6 Multiple Clusters . 54 4 Concentration Inequalities, Scalar and Matrix Versions 55 4.1 Large Deviation Inequalities . 55 4.1.1 Sums of independent random variables . 55 4.2 Gaussian Concentration . 60 4.2.1 Spectral norm of a Wigner Matrix . 62 4.2.2 Talagrand's concentration inequality . 62 2 4.3 Other useful large deviation inequalities . 63 4.3.1 Additive Chernoff Bound . 63 4.3.2 Multiplicative Chernoff Bound . 63 4.3.3 Deviation bounds on χ2 variables . 63 4.4 Matrix Concentration . 64 4.5 Optimality of matrix concentration result for gaussian series . 66 4.5.1 An interesting observation regarding random matrices with independent matrices 68 4.6 A matrix concentration inequality for Rademacher Series . 69 4.6.1 A small detour on discrepancy theory . 69 4.6.2 Back to matrix concentration . 70 4.7 Other Open Problems . 75 4.7.1 Oblivious Sparse Norm-Approximating Projections . 75 4.7.2 k-lifts of graphs . 76 4.8 Another open problem . 77 5 Johnson-Lindenstrauss Lemma and Gordons Theorem 78 5.1 The Johnson-Lindenstrauss Lemma . 78 5.1.1 Optimality of the Johnson-Lindenstrauss Lemma . 80 5.1.2 Fast Johnson-Lindenstrauss . 80 5.2 Gordon's Theorem . 81 5.2.1 Gordon's Escape Through a Mesh Theorem . 83 5.2.2 Proof of Gordon's Theorem . 83 5.3 Sparse vectors and Low-rank matrices . 85 5.3.1 Gaussian width of k-sparse vectors . 85 5.3.2 The Restricted Isometry Property and a couple of open problems . 86 5.3.3 Gaussian width of rank-r matrices . 87 6 Compressed Sensing and Sparse Recovery 89 6.1 Duality and exact recovery . 91 6.2 Finding a dual certificate . 92 6.3 A different approach . 93 6.4 Partial Fourier matrices satisfying the Restricted Isometry Property . 94 6.5 Coherence and Gershgorin Circle Theorem . 94 6.5.1 Mutually Unbiased Bases . 95 6.5.2 Equiangular Tight Frames . 96 6.5.3 The Paley ETF . 97 6.6 The Kadison-Singer problem . 97 7 Group Testing and Error-Correcting Codes 98 7.1 Group Testing . 98 7.2 Some Coding Theory and the proof of Theorem 7.3 . 102 7.2.1 Boolean Classification . 103 7.2.2 The proof of Theorem 7.3 . 104 7.3 In terms of linear Bernoulli algebra . 105 3 7.3.1 Shannon Capacity . 105 7.3.2 The deletion channel . 106 8 Approximation Algorithms and Max-Cut 108 8.1 The Max-Cut problem . 108 8.2 Can αGW be improved? . 110 8.3 A Sums-of-Squares interpretation . 111 8.4 The Grothendieck Constant . 114 8.5 The Paley Graph . 115 8.6 An interesting conjecture regarding cuts and bisections . 115 9 Community detection and the Stochastic Block Model 117 9.1 Community Detection . 117 9.2 Stochastic Block Model . 117 9.3 What does the spike model suggest? . 117 9.3.1 Three of more communities . 119 9.4 Exact recovery . ..
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages165 Page
-
File Size-