Copyright by Xue Chen 2018 the Dissertation Committee for Xue Chen Certiﬁes That This Is the Approved Version of the Following Dissertation

Copyright by Xue Chen 2018 The Dissertation Committee for Xue Chen certifies that this is the approved version of the following dissertation: Using and Saving Randomness Committee: David Zuckerman, Supervisor Dana Moshkovitz Eric Price Yuan Zhou Using and Saving Randomness by Xue Chen DISSERTATION Presented to the Faculty of the Graduate School of The University of Texas at Austin in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY THE UNIVERSITY OF TEXAS AT AUSTIN May 2018 Dedicated to my parents Daiguang and Dianhua. Acknowledgments First, I am grateful to my advisor, David Zuckerman, for his unwavering support and encouragements. David has been a constant resource for providing me with great advice and insightful feedback about my ideas. Besides these and many fruitful discussions we had, his wisdom, way of thinking, and optimism guided me through the last six years. I also thank him for arranging a visit to the Simons institute in Berkeley, where I enrich my understanding and made new friends. I could not have hoped for a better advisor. I would like to thank Eric Price. His viewpoints and intuitions opened a different gate for me, which is important for the line of research presented in this thesis. I benefited greatly from our long research meetings and discussions. Besides learning a lot of technical stuff from him, I hope I have absorbed some of his passion and attitudes to research. Apart from David and Eric, I have had many mentors during my time as a student. I especially thank Yuan Zhou, who has been an amazing collaborator and friend since starting college. I thank Dana Moshkovitz for discussions at various points and being in my thesis committee. I thank Xin Li for many valuable discussions and sharing many research direc- tions. I thank Pinyan Lu for inviting me to China Theory Week and arranging a visit to Shanghai University of Finance and Economics. I thank the other professors and graduate students in UT Austin. Their discussions and numerous talks give me the chance of learning new results and ideas. I also want to thank my collaborators: Guangda Hu, Daniel Kane, Zhao Song, Xiaoming Sun, and Lei Wang. Last but the most important of all, I would like to thank my parents for a lifetime of support. v Using and Saving Randomness Xue Chen, Ph.D. The University of Texas at Austin, 2018 Supervisor: David Zuckerman Randomness is ubiquitous and exceedingly useful in computer science. For example, in sparse recovery, randomized algorithms are more efficient and robust than their determin- istic counterparts. At the same time, because random sources from the real world are often biased and defective with limited entropy, high-quality randomness is a precious resource. This motivates the studies of pseudorandomness and randomness extraction. In this thesis, we explore the role of randomness in these areas. Our research contributions broadly fall into two categories: learning structured signals and constructing pseudorandom objects. Learning a structured signal. One common task in audio signal processing is to com- press an interval of observation through finding the dominating k frequencies in its Fourier transform. We study the problem of learning a Fourier-sparse signal from noisy samples, where [0;T ] is the observation interval and the frequencies can be “off-grid”. Previous meth- ods for this problem required the gap between frequencies to be above 1=T , which is necessary to robustly identify individual frequencies. We show that this gap is not necessary to recover the signal as a whole: for arbitrary k-Fourier-sparse signals under `2 bounded noise, we pro- vide a learning algorithm with a constant factor growth of the noise and sample complexity polynomial in k and logarithmic in the bandwidth and signal-to-noise ratio. In addition to this, we introduce a general method to avoid a condition number depending on the signal family F and the distribution D of measurement in the sample vi complexity. In particular, for any linear family F with dimension d and any distribution D over the domain of F, we show that this method provides a robust learning algorithm with O(d log d) samples. Furthermore, we improve the sample complexity to O(d) via spectral sparsification (optimal up to a constant factor), which provides the best known result for a range of linear families such as low degree multivariate polynomials. Next, we generalize this result to an active learning setting, where we get a large number of unlabeled points from an unknown distribution and choose a small subset to label. We design a learning algorithm optimizing both the number of unlabeled points and the number of labels. Pseudorandomness. Next, we study hash families, which have simple forms in theory and efficient implementations in practice. The size of a hash family is crucial for many applications such as derandomization. In this thesis, we study the upper bound on the size of hash families to fulfill their applications in various problems. We first investigate the number of hash functions to constitute a randomness extractor, which is equivalent to the degree of the extractor. We present a general probabilistic method that reduces the degree of any given strong extractor to almost optimal, at least when outputting few bits. For various almost universal hash families including Toeplitz matrices, Linear Congruential Hash, and Multiplicative Universal Hash, this approach significantly improves the upper bound on the degree of strong extractors in these hash families. Then we consider explicit hash families and multiple-choice schemes in the classical problems of placing balls into bins. We construct explicit hash families of almost-polynomial size that derandomizes two classical multiple-choice schemes, which match the maximum loads of a perfectly random hash function. vii Table of Contents Acknowledgmentsv Abstract vi List of Tables xii List of Figures xiii List of Algorithms xiv Chapter 1. Introduction1 1.1 Overview......................................2 1.1.1 Continuous Sparse Fourier Transforms...................2 1.1.2 Condition-number Free Query and Active Learning...........5 1.1.3 Existence of Extractors from Simple Hash Families............7 1.1.4 Hash functions for Multiple-choice Schemes................ 11 1.1.5 CSPs with a global cardinality constraint................. 12 1.2 Organization.................................... 14 Chapter 2. Preliminaries 15 2.1 Condition Numbers................................ 16 2.2 Chernoff Bounds.................................. 17 Chapter 3. Condition Numbers of Continuous Sparse Fourier Transform 19 3.1 The Worst-case Condition number of Fourier Sparse Signals.......... 20 3.2 The Average Condition number of Fourier Sparse Signals........... 24 3.3 Polynomials and Fourier Sparse Signals..................... 27 3.3.1 A Reduction from Polynomials to Fourier Sparse Signals........ 28 3.3.2 Legendre Polynomials and a Lower Bound................ 30 viii Chapter 4. Learning Continuous Sparse Fourier Transforms 32 4.1 Gram Matrices of Complex Exponentials.................... 36 4.1.1 The Determinant of Gram Matrices of Complex Exponentials..... 38 4.2 Shifting One Frequencies............................. 42 Chapter 5. Fourier-clustered Signal Recovery 44 5.1 Band-limit Signals to Polynomials........................ 45 5.2 Robust Polynomial Interpolation......................... 48 Chapter 6. Query and Active Learning of Linear Families 53 6.1 Condition Number of Linear families...................... 56 6.2 Recovery Guarantee for Well-Balanced Samples................ 58 6.2.1 Proof of Theorem 6.0.3.......................... 59 6.2.2 Proof of Corollary 6.2.2.......................... 62 6.3 Performance of i.i.d. Distributions........................ 63 6.3.1 Proof of Lemma 6.3.2........................... 64 6.4 A Linear-Sample Algorithm for Known D .................... 66 6.4.1 Proof of Lemma 6.4.1........................... 67 6.5 Active Learning.................................. 71 6.6 Lower Bounds................................... 74 Chapter 7. Existence of Extractors in Simple Hash Families 78 7.1 Tools........................................ 82 7.2 Restricted Extractors............................... 83 7.2.1 The Chaining Argument Fooling one test................ 85 7.2.2 Larger Degree with High Confidence................... 91 7.3 Restricted Strong Extractors........................... 95 7.3.1 The Chaining Argument of Strong Extractors.............. 97 Chapter 8. Hash Functions for Multiple-Choice Schemes 105 8.1 Preliminaries.................................... 107 8.2 Witness Trees................................... 108 8.3 Hash Functions.................................. 113 ix 8.3.1 Proof Overview............................... 116 8.4 The Uniform Greedy Scheme........................... 118 8.5 The Always-Go-Left Scehme........................... 122 8.6 Heavy Load.................................... 127 Chapter 9. Constraint Satisfaction Problems Above Average with Global Cardinality Constraints 130 9.1 Notation and Tools................................ 133 9.1.1 Basics of Fourier Analysis of Boolean functions............. 134 9.1.2 Distributions conditioned on global cardinality constraints....... 136 9.1.3 Eigenspaces in the Johnson Schemes................... 138 2 9.2 Eigenspaces and Eigenvalues of EDp [f ] and VarDp (f)............. 140 9.3 Parameterized algorithm for CSPs above average with the bisection constraint 147 9.3.1 Rounding.................................. 148

Copyright by Xue Chen 2018 the Dissertation Committee for Xue Chen Certiﬁes That This Is the Approved Version of the Following Dissertation

Information Theory and the Length Distribution of All Discrete Systems

Conformational Transition in Immunoglobulin MOPC 460" by Correction. in Themembership List of the National Academy of Scien

Shannon's Formula Wlog(1+SNR): a Historical Perspective

Maxwell's Demon: Entropy, Information, Computing

Information Basics Observer and Choice

Active Regression Via Linear-Sample Sparsification

Sounds of Science – Schall Im Labor (1800–1930)

Backward Chaining –! Bayesian Belief Network –! Explanation

University of Central Florida School of Electrical Engineering and Computer Science EEL-6532: Information Theory and Coding. Spring 2010 - Dcm

Claude Shannon and the Making of Information Theory

Hartley Oscillator - Wikipedia, the Free Encyclopedia Page 1 of 3

This Is IT: a Primer on Shannon's Entropy and Information