Signature Redacted a U Th O R

Tensors, sparse problems and conditional hardness by Elena-Madalina Persu A.B., Harvard University (2013) S.M., Massachusetts Institute of Technology (2015) Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY September 2018 Massachusetts Institute of Technology 2018. All rights reserved. Signature redacted A u th o r ................. .................. Department of Electkical Engineering and Computer Science August 24, 2018 Signature redacted C ertified by ....... ...................... Ankur Moitra Rockwell International Associate Professor of Mathematics Thesis Supervisor Signature redacted Accepted by ............ .. ................ M ACHUSETTS INSTITUTE (] Leslie A. Kolodziejski OF TECHNOLOGY Professor of Electrical Engineering and Computer Science ocIf 1 0 2018 Chair, Department Committee on Graduate Students UAR IES ARCHNVES Tensors, sparse problems and conditional hardness by Elena-MAdalina Persu Submitted to the Department of Electrical Engineering and Computer Science on August 24, 2018, in partial fulfillment of the requirements for the degree of Doctor of Philosophy Abstract In this thesis we study the interplay between theoretical computer science and machine learning in three different directions. First, we make a connection between two ubiquitous sparse problems: Sparse Principal Component Analysis (SPCA) and Sparse Linear Regression (SLR). We show how to effi- ciently transform a blackbox solver for SLR into an algorithm for SPCA. Assuming the SLR solver satisfies prediction error guarantees achieved by existing efficient algorithms such as those based on the Lasso, we show that the SPCA algorithm derived from it achieves state of the art performance, matching guarantees for testing and for support recovery under the single spiked covariance model as obtained by the current best polynomial-time algorithms. Second, we push forward the study of linear algebra properties of tensors by giving a tensor rank detection gadget for tensors in the smoothed model. Tensors have had a tremen- dous impact and have been extensively applied over the past few years to a wide range of machine learning problems for example in developing estimators for latent variable models, in independent component analysis or blind source separation. Unfortunately, their theoretical properties are still not well understood. We make a step in that direction. Third, we show that many recent conditional lower bounds for a wide variety of problems in combinatorial pattern matching, graph algorithms, data structures and machine learning, including gradient computation in average case neural networks, are true under significantly weaker assumptions. This highlights that the intuition from theoretical computer science can not only help us develop faster practical algorithms but also give us a better understanding of why faster algorithms may not exist. Thesis Supervisor: Ankur Moitra Title: Rockwell International Associate Professor of Mathematics 3 4 Acknowledgements I will never be able to thank my parents, Camelia and Ion, enough for all their love and dedication throughout the years. Growing up, they always believed in me, taught me how to persevere and placed a great emphasis on education. I am very grateful for their constant love, support, and encouragement. I dedicate this thesis to them. I have been extremely fortunate to have Ankur Moitra as my advisor. As a researcher, lie inspires me with his deep intellectual curiosity and a level of intuition that lets him go to the very core of difficult research problems. He savors and enjoys the process of doing research, both the challenges and the small discoveries along the way. It is Ankur's qualities as a person that I am most grateful for; his patience, compassion, and generosity. I would like to thank Guy Bresler and Costis Daskalakis for serving on my dissertation committee. Special thanks to Madhu Sudan and Prateek Jain for expanding my research horizons and hosting me at Microsoft Research over two summers. There are certain moments that completely change one's life path; the first class of Computational Learning Theory was one of those for me. Hence, I would like to thank my undergraduate advisor, Leslie Valiant, for introducing me to the world of theoretical computer science. I am also very grateful for all my collaborators, I had a lot to learn from them: Arturs Backurs, Sam Park, Ankur Moitra. Piotr Indyk, Guy Bresler, Ryan Williams, Virginia Williams, Cristopher Musco, Cameron Musco, Sam Elder and Michael Cohen. I was very lucky to have the chance to work with Michael, one can only wonder what his mind could have achieved. I also want to thank the wonderful lab assistants, Debbie and Patrice, for always having a smile on and letting me borrow their keys the many times I got locked out of my office. Finally, thank you to all my friends during graduate school - from the MIT theory group: especially Katerina, Manolis and the Greeks, Maryam, Ilya, Quanquan, Arturs, Sam, Prashant, 5 Akshay, Itay, Nicole, Aloni, Daniel, Luke, Adam, Rio, Sepideh, Pritish, Ludwig, Jerry, Gau- tam, Henry; and outside: Horia, Sergio, Andreea, Julia, Patricia, loana. Your friendship has meant a lot to me and my MIT experience would not have been the same without you. Thank you also to my Harvard friends who made my undergraduate years some of the best of my life. Thank you especially to Min and Currierism, Fiona, Lily, Robert, Katrina, Andrei, Miriam, Shiya, Gye-Hyun and Jao-ke. 6 Contents List of Symbols 13 1 Introduction 15 2 Sparse PCA from Sparse Linear Regression 21 2.1 Introduction ......... ........................... 21 2.1.1 Our contributions ......... .................... 22 2.1.2 Previous algorithms .................. .......... 23 2.2 Preliminaries ......... .............. ............ 30 2.2.1 Problem formulation for SPCA . .................... 30 2.2.2 Problem formulation for SLR ....... ............... 32 2.2.3 The linear model ........................ ..... 33 2.3 Algorithms and main results ...................... ..... 34 2.3.1 Intuition of test statistic ..... .................... 34 2.3.2 Algorithms ................. ............... 35 2.4 Analysis ........................... ........... 36 2.4.1 Analysis of Qj under H, . ............. ........... 36 2.4.2 Analysis of Qj under Ho ...... ............. ...... 39 2.4.3 Proof of Theorem 5 .................. .......... 41 2.4.4 Proof of Theorem 6 ....................... ..... 41 2.4.5 Discussion ........ ......................... 41 2.5 Experiments ......... ........................... 43 7 2.5.1 Support recovery . .. ............. ........... 44 2.5.2 Hypothesis testing .. ........... ............. 45 2.6 Conclusion ........... ....... .............. ... 46 3 Tensor rank under the smoothed model 49 3.1 Introduction .......... 49 3.1.1 Our results ...... 50 3.1.2 Our approach ..... 51 3.2 Preliminaries and notations . 52 3.3 Young flattenings ....... 55 3.3.1 Young flattenings in the smoothed model 57 3.4 Proof of Theorem 14 .. ... 62 3.5 Future directions .. ..... 63 3.6 Linear algebra lemmas .. .. 65 4 Stronger Fine Grained Hardness via #SETH 69 4.1 Prelim inaries ......... ......... .. .. .. .. 70 4.2 Pattern matching under edit distance ..... .. ...... 71 4.2.1 Preliminaries ..... ........... ...... 71 4.2.2 Reduction .... .............. ..... 73 4.3 Machine learning problems ............ ..... 76 4.3.1 Gradient computation in average case neural networks ..... 77 4.3.2 Reduction .. ........ ........ .... .. 77 4.3.3 Hardness results ........... .... .... .. 80 4.4 Wiener index .................... .. ... 80 4.5 Dynamic graph problems .. ............ .. .... 81 4.5.1 Reductions framework ........... ...... 82 4.6 Counting Matching Triangles ...... ..... ...... 84 4.7 Average case hardness for the Counting Orthogonal Vectors Problem 84 8 4.7.1 Reduction ........ ........ ........ ........ .8 85 A Vector Gadgets 89 A .1 Vector G adgets ............ ...................... 89 B Useful lemmas 93 B.1 Linear minimum mean-square-error estimation .......... ....... 93 B.2 Calculations for linear model from Section 2.2.3 ......... ....... 94 B.3 Properties of design matrix X .......... ................ 96 B.4 Tail inequalities - Chi-squared ........ .............. .... 99 Bibliography 101 9 10 List of Figures 2-1 Performance of diagonal thresholding (DT), covariance thresholding (CT), and Q for support recovery at n = d = 625, 1250, varying values of k, and 0 = 4 44 2-2 Performance of diagonal thresholding (D), MDP, and Q for hypothesis testing at n = 200, d = 500, k = 30, 0 = 4 (left and center). TO denotes the statistic T under H0 , and similarly for TI. Effect of rescaling covariance matrix to make variances indistinguishable is demonstrated (right) ........... 46 3-1 CANDECOMP/PARFAC tensor decomposition of a third-order tensor ... 53 11 12 List of Symbols covariance matrix sample covariance matrix E[-] expectation over the appropriate sample space M(p, o-) Gaussian distribution with mean vector yt and covariance matrix o In n x n identity matrix diagf di, .. , I} diagonal matrix with diagonal entries di inequality up to an absolute constant Sn n-dimensional unit sphere in Rn+l Bo(k) the set of k-sparse vectors in C R d [n] {1, ... , n} 0 tensor product A exterior product w.p. "with probability" w.h.p. "with high probability" 13 14 Chapter 1 Introduction In the past couple of decades machine

Load more