Statistics in High Dimensions Without IID Samples: Truncated Statistics and Minimax Optimization Emmanouil Zampetakis

Statistics in High Dimensions without IID Samples: Truncated Statistics and Minimax Optimization by Emmanouil Zampetakis Diploma, National Technical University of Athens (2014) S.M., Massachusetts Institute of Technology (2017) Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY September 2020 © Massachusetts Institute of Technology 2020. All rights reserved. Author............................................................. Department of Electrical Engineering and Computer Science August 14, 2020 Certified by . Constantinos Daskalakis Professor of Electrical Engineering and Computer Science Thesis Supervisor Accepted by . Leslie A. Kolodziejski Professor of Electrical Engineering and Computer Science Chair, Department Committee on Graduate Students 2 Statistics in High Dimensions without IID Samples: Truncated Statistics and Minimax Optimization by Emmanouil Zampetakis Submitted to the Department of Electrical Engineering and Computer Science on August 14, 2020, in partial fulfillment of the requirements for the degree of Doctor of Philosophy Abstract A key assumption underlying many methods in Statistics and Machine Learn- ing is that data points are independently and identically distributed (i.i.d.). However, this assumption ignores the following recurring challenges in data collection: (1) censoring - truncation of data, (2) strategic behavior. In this thesis we provide a mathematical and computational theory for designing solutions or proving im- possibility results related to challenges (1) and (2). For the classical statistical problem of estimation from truncated samples, we provide the first statistically and computationally efficient algorithms that provably recover an estimate of the whole non-truncated population. We design algorithms both in the parametric setting, e.g. Gaussian Estimation, Linear Regres- sion, and in the non-parametric setting. In the non-parametric setting, we provide techniques that bound the extrapolation error of multi-variate polynomial log densities. Our main tool for the non-parametric setting is a Statistical Taylor Theorem that is based on sample access from some probability distribution with smooth probability density function. We believe that this theorem can have applications beyond the topic of Truncated Statistics. In the context of learning from strategic behavior, we consider the problem of min-max optimization, which is a central problem in Game Theory and Statis- tics and which has recently found interesting applications in Machine Learning tasks such as Generative Adversarial Networks. Our first result is the PPAD- hardness of minmax optimization which implies the first exponential separation between finding an approximate local minimum and finding an approximate local min-max solution. Our second result is a second-order algorithm that provably asymptotically converges to an approximate local min-max solution. Due to our PPAD-hardness result an algorithm with stronger guarantee is out of reach. Thesis Supervisor: Constantinos Daskalakis Title: Professor of Electrical Engineering and Computer Science 3 4 Acknowledgments There are many people without whom this thesis would not be possible. I am really grateful to all of them and I apologize in advance if I forget someone in this very short acknowledgments section. I was really fortunate to be advised by Prof. Constantinos Daskalakis. He is in many ways an ideal advisor and nothing written in this thesis could not have been possible without his supervision. Costis cares a lot about his students and in the very stressful period of Ph.D. studies he is always a very strong support. He never hesitated to spend a lot of time and effort not only to help me with research but also to promote me in many important moments of my academic life. I had a perfect collaboration with him. I really enjoyed working on many problems and research directions. I am certainly grateful to him for his supervision all these years and I hope we will continue collaborating in the future. Next, I want to thank all the members of my thesis committee: Prof. Ankur Moitra, Prof. Ronitt Rubinfield, and Prof. Suvrit Sra for their help in the preparation of my defense and my thesis. I have been very fortunate to have an ongoing collaboration with Ronitt and I have learned a lot from her very clear way of thinking about problems in the intersection of probability and algorithms, and I hope our collaboration will continue. I had a very nice interaction with Ankur all these years where he always had an important piece of advice not only for research but also for other aspects of academic life. Suvrit was in my qualification exam committee. He was very interested in the material of this thesis and had many very interesting suggestions and comments that have been very helpful in the preparation of my defense and of my thesis. Next, I want to thank all of my collaborators all these years: Dimitris, Vasilis, Katerina, Bálász, Robert, Stratis, Noah, Christos, Pritish, Alessandro, Mohammad, Vasilis, Alexandros, Aris, Anna, Mika, Vahab, Giorgos, Ronitt, Costis, Maryam, Dimitris, Gianis, Andrew, Giorgos, Themis, Rui, Dhruv. I learned a lot from each one of them and I hope we will continue to work together. Apart from my 5 advisor Costis I want to particularly thank my academic brother Christos Tzamos with whom I have written the majority of my published papers. I really enjoyed the hours that we spend together trying to solve many interesting problems. I was fortunate to have been supported by a Google Ph.D. Fellowship for the last years of my Ph.D.. The theory group at MIT was a wonderful environment to work and study. I met a lot of friends there and I had many fruitful conversations for many aspects of research and of academic and non-academic life. I would not be able to survive all these years in Boston without the circle of my very close friends. I want to thank them all: Katerina, Konstantinos, Christina, Ilias, Vasilis, Christos, Vasso, Daphne, Artemisa, Chara, Lydia, Lydia, Marinos, Sophie, Konstantina, Themis, Dimitris, Shibani, Yorgos, Will, Agni, Alvaro, Pana- giota, Nishnath, Vardis, for the great warm and family environment I had the chance to be a part of all these years. I want to particularly thank my partner Katerina for all of her support and the joy that she brought every day to my life in Boston. Without a doubt I would never regret coming to Boston just for the great experiences that I had with all of my close friends. Going to the other side of the Atlantic ocean I want to thank all my friends and professors from the lab where I completed my undergraduate degree in Greece. I want to thank Thanasis, Natalia, Thodoris, Markos, Eleni, Antonis, Dimitris, Aris, Stathis for never giving up in communicating with me and supporting me throughout the years of my PhD, despite the long distance. Last but not least I want to thank all of my close family: Maria, Vasilis, Alexandra, Giorgos, Costas, Olympia, Despina, Marina, Tasos, Dimitris, Kate- rina, Sofia, Costas, Spuros, Vasso, Stelios, Giorgos, Fotis, Chrysa, Haralampos, Dora, Olympia, Eleni, Lambros, Christina, Maria, Konstantinos, Giorgos, Agge- los, Fotis, Eleni, Costas, Leonidas, Konstantinos. Although we are spread around the world, we always try to find ways to meet and spend wonderful times all together. Their support all these years was always a reason to never give up even in the difficult and very stressful periods. 6 This doctoral thesis has been examined by a Committee of the Department of Electrical Engineering and Computer Science as follows: Professor Ronitt Rubinfeld . Chairman, Thesis Committee Edwin Sibley Webster Professor of Electrical Engineering and Computer Science Professor Constantinos Daskalakis . Thesis Supervisor Professor of Electrical Engineering and Computer Science Professor Ankur Moitra . Member, Thesis Committee Rockwell International Career Development Associate Professor Professor Suvrit Sra . Member, Thesis Committee Esther & Harold E. Edgerton Career Development Associate Professor 8 Contents 1 Introduction 19 1.1 Statistical Analysis from Truncated Samples.............. 20 1.1.1 Early works on Truncated Statistics............... 20 1.1.2 Limited Dependent Variable Model............... 21 1.1.3 Unknown Survival Set and Non-Parametric Models..... 23 1.2 Min-Max Optimization........................... 26 1.3 Summary of Contribution – Overview of Thesis............ 30 1.3.1 Gaussian Estimation from Truncated Samples – Chapter3.. 31 1.3.2 Truncated Linear Regression – Chapter4............ 33 1.3.3 Gaussian Estimation for Unknown Truncation – Chapter5.. 35 1.3.4 Non-parametric Truncated Statistics – Chapter6........ 40 1.3.5 Hardness of Min-Max Optimization – Chapter8........ 42 1.3.6 Algorithms for 2D Min-Max Optimization – Chapter9.... 44 I Truncated Statistics 46 2 Truncated Statistics: Models and Preliminaries 47 2.1 Multivariate Gaussian and Truncated Gaussian............ 48 2.2 Multivariate Polynomials......................... 51 2.3 Hermite Polynomials and Gaussian Surface Area........... 51 2.4 Taylor Theorem and Smooth Log Densities............... 55 2.4.1 Bounded and High-order Smooth Function........... 56 9 2.4.2 Probability Distributions from Log-Density Functions.... 57 2.5 Useful Concentration of Measure Bounds................ 58 2.6 Anticoncentration of Multivariate Polynomials............. 60 2.7 Further Related

Statistics in High Dimensions Without IID Samples: Truncated Statistics and Minimax Optimization Emmanouil Zampetakis

1.2 Examples of Descriptive Statistics in Statistics, a Summary Statistic Is a Single Numerical Measure of an At- Tribute of a Sample

Chapter 4 Truncated Distributions

Statistical Inference of Truncated Normal Distribution Based on the Generalized Progressive Hybrid Censoring

Appendix Methods for Calculating the Mean of a Lognormal Distribution

Measures of Average and Spread

Re-Establishing the Theoretical Foundations of a Truncated Normal Distribution: Standardization Statistical Inference, and Convolution Jinho Cha Clemson University

Mean Estimation and Regression Under Heavy-Tailed Distributions—A

Limited Dependent Variables—Truncation, Censoring, and Sample Selection

Mean-Variance Policy for Discrete-Time Cone Constrained

1986: Estimation of the Mean from Censored Income Data

Robust Multivariate Mean Estimation: the Optimality of Trimmed Mean

Typical Values and Measures of Variability