Copyright by Xue Chen 2018 The Dissertation Committee for Xue Chen certifies that this is the approved version of the following dissertation:
Using and Saving Randomness
Committee:
David Zuckerman, Supervisor
Dana Moshkovitz
Eric Price
Yuan Zhou Using and Saving Randomness
by
Xue Chen
DISSERTATION Presented to the Faculty of the Graduate School of The University of Texas at Austin in Partial Fulfillment of the Requirements for the Degree of
DOCTOR OF PHILOSOPHY
THE UNIVERSITY OF TEXAS AT AUSTIN May 2018 Dedicated to my parents Daiguang and Dianhua. Acknowledgments
First, I am grateful to my advisor, David Zuckerman, for his unwavering support and encouragements. David has been a constant resource for providing me with great advice and insightful feedback about my ideas. Besides these and many fruitful discussions we had, his wisdom, way of thinking, and optimism guided me through the last six years. I also thank him for arranging a visit to the Simons institute in Berkeley, where I enrich my understanding and made new friends. I could not have hoped for a better advisor.
I would like to thank Eric Price. His viewpoints and intuitions opened a different gate for me, which is important for the line of research presented in this thesis. I benefited greatly from our long research meetings and discussions. Besides learning a lot of technical stuff from him, I hope I have absorbed some of his passion and attitudes to research.
Apart from David and Eric, I have had many mentors during my time as a student. I especially thank Yuan Zhou, who has been an amazing collaborator and friend since starting college. I thank Dana Moshkovitz for discussions at various points and being in my thesis committee. I thank Xin Li for many valuable discussions and sharing many research direc- tions. I thank Pinyan Lu for inviting me to China Theory Week and arranging a visit to Shanghai University of Finance and Economics.
I thank the other professors and graduate students in UT Austin. Their discussions and numerous talks give me the chance of learning new results and ideas. I also want to thank my collaborators: Guangda Hu, Daniel Kane, Zhao Song, Xiaoming Sun, and Lei Wang.
Last but the most important of all, I would like to thank my parents for a lifetime of support.
v Using and Saving Randomness
Xue Chen, Ph.D. The University of Texas at Austin, 2018
Supervisor: David Zuckerman
Randomness is ubiquitous and exceedingly useful in computer science. For example, in sparse recovery, randomized algorithms are more efficient and robust than their determin- istic counterparts. At the same time, because random sources from the real world are often biased and defective with limited entropy, high-quality randomness is a precious resource. This motivates the studies of pseudorandomness and randomness extraction. In this thesis, we explore the role of randomness in these areas. Our research contributions broadly fall into two categories: learning structured signals and constructing pseudorandom objects.
Learning a structured signal. One common task in audio signal processing is to com- press an interval of observation through finding the dominating k frequencies in its Fourier transform. We study the problem of learning a Fourier-sparse signal from noisy samples, where [0,T ] is the observation interval and the frequencies can be “off-grid”. Previous meth- ods for this problem required the gap between frequencies to be above 1/T , which is necessary to robustly identify individual frequencies. We show that this gap is not necessary to recover the signal as a whole: for arbitrary k-Fourier-sparse signals under `2 bounded noise, we pro- vide a learning algorithm with a constant factor growth of the noise and sample complexity polynomial in k and logarithmic in the bandwidth and signal-to-noise ratio.
In addition to this, we introduce a general method to avoid a condition number depending on the signal family F and the distribution D of measurement in the sample
vi complexity. In particular, for any linear family F with dimension d and any distribution D over the domain of F, we show that this method provides a robust learning algorithm with O(d log d) samples. Furthermore, we improve the sample complexity to O(d) via spectral sparsification (optimal up to a constant factor), which provides the best known result for a range of linear families such as low degree multivariate polynomials. Next, we generalize this result to an active learning setting, where we get a large number of unlabeled points from an unknown distribution and choose a small subset to label. We design a learning algorithm optimizing both the number of unlabeled points and the number of labels.
Pseudorandomness. Next, we study hash families, which have simple forms in theory and efficient implementations in practice. The size of a hash family is crucial for many applications such as derandomization. In this thesis, we study the upper bound on the size of hash families to fulfill their applications in various problems. We first investigate the number of hash functions to constitute a randomness extractor, which is equivalent to the degree of the extractor. We present a general probabilistic method that reduces the degree of any given strong extractor to almost optimal, at least when outputting few bits. For various almost universal hash families including Toeplitz matrices, Linear Congruential Hash, and Multiplicative Universal Hash, this approach significantly improves the upper bound on the degree of strong extractors in these hash families. Then we consider explicit hash families and multiple-choice schemes in the classical problems of placing balls into bins. We construct explicit hash families of almost-polynomial size that derandomizes two classical multiple-choice schemes, which match the maximum loads of a perfectly random hash function.
vii Table of Contents
Acknowledgmentsv
Abstract vi
List of Tables xii
List of Figures xiii
List of Algorithms xiv
Chapter 1. Introduction1 1.1 Overview...... 2 1.1.1 Continuous Sparse Fourier Transforms...... 2 1.1.2 Condition-number Free Query and Active Learning...... 5 1.1.3 Existence of Extractors from Simple Hash Families...... 7 1.1.4 Hash functions for Multiple-choice Schemes...... 11 1.1.5 CSPs with a global cardinality constraint...... 12 1.2 Organization...... 14
Chapter 2. Preliminaries 15 2.1 Condition Numbers...... 16 2.2 Chernoff Bounds...... 17
Chapter 3. Condition Numbers of Continuous Sparse Fourier Transform 19 3.1 The Worst-case Condition number of Fourier Sparse Signals...... 20 3.2 The Average Condition number of Fourier Sparse Signals...... 24 3.3 Polynomials and Fourier Sparse Signals...... 27 3.3.1 A Reduction from Polynomials to Fourier Sparse Signals...... 28 3.3.2 Legendre Polynomials and a Lower Bound...... 30
viii Chapter 4. Learning Continuous Sparse Fourier Transforms 32 4.1 Gram Matrices of Complex Exponentials...... 36 4.1.1 The Determinant of Gram Matrices of Complex Exponentials..... 38 4.2 Shifting One Frequencies...... 42
Chapter 5. Fourier-clustered Signal Recovery 44 5.1 Band-limit Signals to Polynomials...... 45 5.2 Robust Polynomial Interpolation...... 48
Chapter 6. Query and Active Learning of Linear Families 53 6.1 Condition Number of Linear families...... 56 6.2 Recovery Guarantee for Well-Balanced Samples...... 58 6.2.1 Proof of Theorem 6.0.3...... 59 6.2.2 Proof of Corollary 6.2.2...... 62 6.3 Performance of i.i.d. Distributions...... 63 6.3.1 Proof of Lemma 6.3.2...... 64 6.4 A Linear-Sample Algorithm for Known D ...... 66 6.4.1 Proof of Lemma 6.4.1...... 67 6.5 Active Learning...... 71 6.6 Lower Bounds...... 74
Chapter 7. Existence of Extractors in Simple Hash Families 78 7.1 Tools...... 82 7.2 Restricted Extractors...... 83 7.2.1 The Chaining Argument Fooling one test...... 85 7.2.2 Larger Degree with High Confidence...... 91 7.3 Restricted Strong Extractors...... 95 7.3.1 The Chaining Argument of Strong Extractors...... 97
Chapter 8. Hash Functions for Multiple-Choice Schemes 105 8.1 Preliminaries...... 107 8.2 Witness Trees...... 108 8.3 Hash Functions...... 113
ix 8.3.1 Proof Overview...... 116 8.4 The Uniform Greedy Scheme...... 118 8.5 The Always-Go-Left Scehme...... 122 8.6 Heavy Load...... 127
Chapter 9. Constraint Satisfaction Problems Above Average with Global Cardinality Constraints 130 9.1 Notation and Tools...... 133 9.1.1 Basics of Fourier Analysis of Boolean functions...... 134 9.1.2 Distributions conditioned on global cardinality constraints...... 136 9.1.3 Eigenspaces in the Johnson Schemes...... 138 2 9.2 Eigenspaces and Eigenvalues of EDp [f ] and VarDp (f)...... 140 9.3 Parameterized algorithm for CSPs above average with the bisection constraint 147 9.3.1 Rounding...... 148 9.3.2 2 → 4 Hypercontractive inequality under distribution D ...... 152 9.3.3 Proof of Theorem 9.3.1...... 157
9.4 2 → 4 Hypercontractive inequality under distribution Dp ...... 158 9.4.1 Proof of Claim 9.4.4...... 165 9.5 Parameterized algorithm for CSPs above average with global cardinality con- straints...... 167 9.5.1 Rounding...... 168 9.5.2 Proof of Theorem 9.5.1...... 174
Chapter 10. Integrality Gaps for Dispersers and Bipartite Expanders 176 10.1 Preliminaries...... 180 10.1.1 Pairwise Independent Subspaces...... 182 10.2 Integrality Gaps...... 183 10.2.1 Integrality Gaps for Dispersers...... 191 10.2.2 Integrality Gaps for Expanders...... 194 10.3 An Approximation algorithm for Dispersers...... 196
Appendices 201 A Omitted Proof in Chapter3...... 202 B Omitted Proof in Chapter 7...... 205 C Omitted Proof in Chapter8...... 208
x Bibliography 210
xi List of Tables
6.1 Lower bounds and upper bounds in different models...... 56
xii List of Figures
8.1 A witness tree with distinct balls and a pruned witness tree with 3 collisions 110 8.2 An example of extracting C0 from C given two collisions...... 113
xiii List of Algorithms
1 Recover k-sparse FT...... 33 2 RobustPolynomialLearningFixedInterval...... 51 3 SampleDF...... 57 4 A well-balanced sampling procedure based on Randomized BSS...... 68 5 Regression over an unknown distribution D ...... 72 6 Construct M ...... 75
xiv Chapter 1
Introduction
In this thesis, we study the role of randomness in designing algorithms and construct- ing pseudorandom objects. Often, randomized algorithms are simpler and faster than their deterministic counterparts. We explore the advantage of randomized algorithms in providing more efficient sample complexities and robust guarantees than deterministic algorithms in learning. On the other hand, both from a theoretic viewpoint and for practical applications, it is desirable to derandomize them and construct pseudorandom objects such as hash func- tions using as few random bits as possible. We investigate the following two basic questions in computer science:
1. A fundamental question in many fields is to efficiently recover a signal from noisy ob- servations. This problem takes many forms, depending on the measurement model, the signal structure, and the desired norms. For example, coding theory studies the recovery of discrete signals in the Hamming distance. In this thesis, we consider con- tinuous signals that is approximately sparse in the Fourier domain or close to a linear family such as multivariate low degree polynomials. This is a common form in prac- tice. For example, the compression of audio and radio signals exploits their sparsity in the Fourier domain. Often the sample complexity is of more interest than its run- ning rime, because it is costly to take measurements in practical applications. Also, a sample of labeled data requires expensive human/oracle annotation in active learning. This raises a natural question: how can we learn a structured signal from a few noisy
1 queries? In the first half of this thesis, we design several robust learning algorithms with efficient sample complexity for various families of signals.
2. While randomness is provably necessary for certain tasks in cryptography and dis- tributed computing, the situation is unclear for randomized algorithms. In the second half, we consider an important pseudorandom object in derandomization — hash func- tions, which have wide applications both in practice and theory. For many problems of derandomization, a key tool is a hash family of small size such that a deterministic al- gorithm could enumerate all hash functions in this family. In this thesis, we study how to construct small hash families to fulfill their requirements in building randomness extractors and the classical problems of placing balls into bins.
Next we discuss the problems studied in this thesis in detail and show how our methods apply to them. We remark that several techniques developed in this thesis are very general and apply to many other problems.
1.1 Overview 1.1.1 Continuous Sparse Fourier Transforms.
The Fourier transform is a ubiquitous computational tool in processing a variety of signals, including audio, image, and video. In many practical applications, the main reason for using the Fourier transform is that the transformed signal is sparse — it concentrates its energy on k frequencies for a small number k. The sparse representation in the Fourier basis exhibits structures that could be exploited to reduce the sample complexity and speed up the computation of the Fourier transform. For n-point discrete signals, this idea has led to a number of efficient algorithms on discrete sparse Fourier transforms, which could achieve the n n optimal O(k log k ) sample complexity [IK14] and O(k log n log k ) running time [HIKP12].
2 However, many signals, including audio and radio, are originally from a continuous domain. Because the standard way to convert a continuous Fourier transform into a discrete one blows up the sparsity, it is desirable to directly solve the sparse Fourier transform in the continuous setting for efficiency gains. In this thesis, we study sparse Fourier transforms in the continuous setting.
Previous work. Let [0,T ] be the observation interval of k-Fourier-sparse signals and F denote the “bandlimit”, which is the limiting of the frequency domain. This problem, recov- ering a signal with sparse Fourier transform off the grid (not multiplies of 1/T ), has been a question of extensive study. All previous research starts with finding the frequencies of the signal and then recovers the magnitude of each frequencies.
The first algorithm was by Prony in 1795, which worked in the noiseless setting. Its refinements MUSIC [Sch81] and ESPRIT [RPK86] empirically work better than Prony’s algorithm with noise. Matrix pencil [BM86] is a method for computing the maximum like- lihood signal under Gaussian noise and evenly spaced samples. Moitra [Moi15] showed that it has an poly(k) approximation factor if the frequency gap is at least 1/T .
However, the above results use FT samples, which is analogous to n in the discrete setting. A variety of works [FL12, BCG+14, TBR15] have studied how to adapt sparse Fourier techniques from the discrete setting to get sublinear sample complexity; they all rely on the minimum separation among the frequencies to be at least c/T for c ≥ 1 or even larger gaps c = Ω(k) and additional assumptions such as i.i.d. Gaussian noise. Price and Song [PS15] gave the first algorithm with O(1) approximation factor for arbitrary noise, 2 finding the frequencies when c & log(1/δ), and the signal when c & log(1/δ) + log k. All of the above algorithms are designed to recover the frequencies and show this yields a good approximation to the overall signal. Such an approach necessitates c ≥ 1:
3 Moitra [Moi15] gave a lower bound, showing that any algorithm finding the frequencies with approximation factor 2o(k) must require c ≥ 1.
Our contribution. In joint work with Kane, Price, and Song [CKPS16], we design the first sample-efficient algorithm that recovers signals with arbitrary k Fourier frequencies from noisy samples. In contrast to all previous approaches that require a frequency gap at least 1/T to recover every individual frequency, our algorithm demonstrates that the gap is not necessary to learn the signal: it outputs a sparse representation in the Fourier basis for arbitrary k-Fourier-sparse signals under `2 bounded noise. This, for the first time, provides a strong theoretic guarantee for the learning of k-Fourier-sparse signals with continuous frequencies.
An important ingredient in our work is the first bound on the condition number 2 kfk∞ 4 3 2 = O(k log k) for any k-Fourier-sparse signal f over the interval [0,T ]. In this thesis, kfk2 we present an improved condition number O(k3 log2 k). The condition number of k-sparse signals with 1/T -separated frequencies is O(k) from the Hilbert inequality. However, for signals with k arbitrary close frequencies, this family is not very well-conditioned — its condition number is Ω(k2) from degree k − 1 polynomials by a Taylor expansion.
On the other hand, degree k polynomials are the special case of k-Fourier-sparse signals in the limit of all frequencies close to 0, by a Taylor expansion. This is a regime with no frequency gap, so previous sparse Fourier results would not apply but our results shows that poly(k) samples suffices. In this special case, we prove that O(k) samples is enough to learn any degree k polynomial under noise, which is optimal up to a constant factor. Our strategy is to query points according to the Chebyshev distribution even though the distance measurement is the uniform distribution over [0,T ]. A natural question is to generalize this idea to Fourier-sparse signals and avoid the condition number Ω(k2). This motivates the follow-up work with Eric Price [CP17].
4 1.1.2 Condition-number Free Query and Active Learning.
In joint work with Price [CP17], we proposed a framework to avoid the condition number in the sample complexity of learning structured signals. Let F be a family of signals
being learned and D be the distribution to measure the `2 distance between different signals. We show a strong lower bound on the number of random samples from D to recover a signal in 2 F: the sample complexity depends on the condition number sup sup{ |f(x)| }, which E [|f(y)|2] x∈supp(D)f∈F y∼D could be arbitrary large under adversarial distributions.
We consider two alternative models of access to the signal to circumvent the lower bound. The first model is that we may query the signal where we want, such as in the problem of sparse Fourier transforms. We show how to improve the condition number by biasing the choices of queries towards points of high variance. For a large class of families, this approach significantly saves the number of queries in agnostic learning, including the family of k-Fourier-sparse signals and any linear family.
For continuous sparse Fourier transforms discussed in Section 1.1.1, we present an explicit distribution that scales down the condition number from O(k3 log2 k) to O(k log2 k) through this approach. Because the condition number is at least k for any query strategy, our estimation O(k log2 k) is almost optimal up to log factors. Based on this, we show that O(k4 log3 k + k2 log2 k log FT ) samples are sufficient to recover any k-Fourier-sparse signal with “bandlimit” F on the interval [0,T ], which significantly improves the sample complexity in the previous work [CKPS16].
For any linear family F of dimension d and any distribution D, we show that this approach scales down the condition number to d and provides an algorithm with O(d log d) queries to learn any signal in F under arbitrary noise. The log d factor in the query complex- ity is due to the fact that the algorithm only uses the distribution with the least condition number to generate all queries. Then, we improve this query complexity to O(d) by de- signing a sequence of distributions and generating one query from each distribution. On
5 the other hand, we prove an information lower bound on the number of queries matching the query complexity of our algorithm up to a constant factor. Hence this work completely characterizes query access learning of signals from linear families.
The second model of access is active learning, where the algorithm receives a set of unlabeled points from the unknown distribution D and chooses a subset of these points to obtain their labels from noisy observations. We design a learning algorithm that optimizes both the number of labeled and unlabeled examples required to learn a signal from any linear family F under any unknown distribution D.
Previous work. We notice that the distribution with the least condition number for linear families described in our work [CP17] is similar to strategies proposed in importance sam- pling [Owe13], leverage score sampling [DMM08, MI10, Woo14], and graph sparsification [SS11]. One significant body of work on sampling regression problems uses leverage score sampling [DMM08, MI10, Mah11, Woo14]. For linear function classes, our sample distribu- tion with the least condition number can be seen as a continuous limit of the leverage score sampling distribution, so our analysis of it is similar to previous work. At least in the fixed design setting, it was previously known that O(d log d + d/ε) samples suffice [Mah11].
Multiple researchers [BDMI13, SWZ17] have studied switching from leverage score sampling—which is analogous to the near-linear spectral sparsification of [SS11]—to the linear-size spectral sparsification of [BSS12]. This improves the sample complexity to O(d/ε),
but the algorithms need to know all the yi as well as the xi before deciding which xi to sample; this makes them unsuitable for active/query learning settings. Because [BSS12] is deterministic, this limitation is necessary to ensure the adversary doesn’t concentrate the noise on the points that will be sampled. Our result uses the alternative linear-size spectral
sparsification of [LS15] to avoid this limitation: we only need the xi to choose O(d/ε) points that will perform well (on average) regardless of where the adversary places the noise.
6 The above references all consider the fixed design setting, while [SM14] gives a method for active regression in the random design setting. Their result shows (roughly speaking) that O((d log d)5/4 + d/ε) samples of labeled data suffice for the desired guarantee. They do not give a bound on the number of unlabeled examples needed for the algorithm. (Nor do the previous references, but the question does not make sense in fixed-design settings.)
Less directly comparable to this work is [CKNS15], where the noise distribution (Y | x) is assumed to be known. This assumption, along with a number of assumptions on the structure of the distributions, allows for significantly stronger results than are possible in our agnostic setting. The optimal sampling distribution is quite different, because it depends on the distribution of the noise as well as the distribution of x.
1.1.3 Existence of Extractors from Simple Hash Families.
Next we consider hash families and their application to construct another pseudoran- dom object — randomness extractors. In the real world, random sources are often biased and defective, which rarely satisfy the requirements of its applications in algorithm design, distributed computing, and cryptography. A randomness extractor is an efficient algorithm that converts a “weak random source” into an almost uniform distribution. As is standard, we model a weak random source as a probability distribution with min-entropy.
1 Definition 1.1.1. The min-entropy of a random variable X is H∞(X) = min log2 X x . x∈supp(X) Pr[ = ]
It is impossible to construct a deterministic randomness extractor for all sources of min-entropy k [SV86], even if k is as large as n − 1. Therefore a seeded extractor also takes as input an additional independent uniform random string, called a seed, to guarantee that the output is close to uniform [NZ96].
+ d Definition 1.1.2. For any d ∈ N , let Ud denote the uniform distribution over {0, 1} . For two random variables W and Z with the same support, let kW − Zk denote the statistical
7
(variation) distance kW − Zk = max Prw∼W [w ∈ T ] − Prz∼Z [z ∈ T ] . T ⊆supp(W ) Ext : {0, 1}n × {0, 1}t → {0, 1}m is a (k, )-extractor if for any source X with min- t entropy k and an independent uniform distribution Y on {0, 1} , kExt(X,Y ) − Umk ≤ . It is a strong (k, )-extractor if in addition, it satisfies Ext(X,Y ),Y − Um,Y ≤ .
We call 2t the degree of Ext, because when Ext is viewed as a bipartite graph on {0, 1}n ∪ {0, 1}m, its left degree is 2t. Often the degree 2t is of more interest than the seed length t. For example, when we view an extractor as a hash family H = Ext(·, y)|y ∈ {0, 1}t , the degree 2t corresponds to the size of H.
Minimizing the degree of an extractor is crucial for many applications such as con- structing optimal samplers and simulating probabilistic algorithms with weak random sources. n−k It is well known that the optimal degree of (k, )-extractors is Θ( 2 ), where the upper bound is from the probabilistic method and the lower bound was shown by Radhakrishnan and Ta- Shma [RT00]. At the same time, explicit constructions [Zuc07] with an optimal degree, even for constant error, have a variety of applications in theoretical computer science such as hard- ness of inapproximability [Zuc07] and constructing almost optimal Ramsey graphs [BDT17].
Most known extractors are sophisticated based on error-correcting codes or expander graphs and complicated to implement in practice. This raises natural questions — are there simple constructions like linear transformations and even simpler ones of Toeplitz matrices? Are there extractors with good parameters and efficient implementations? In joint work with Zuckerman, we answer these questions in the context of the extractor degree. Our main result indicates the existence of strong extractors of linear transformations and Toeplitz matrices and extremely efficient strong extractors with degree close to optimal, at least when outputting few bits.
Our contributions. In joint work with Zuckerman [CZ18], we present a probabilistic construction to improve the degree of any given extractor. We consider the following method
8 to improve the degree of any strong extractor while keeping almost the same parameters of min entropy and error.
Definition 1.1.3. Given an extractor Ext : {0, 1}n × {0, 1}t → {0, 1}m and a sequence of t seeds (y1, ··· , yD) where each yi ∈ {0, 1} , we define the restricted extractor Ext(y1,··· ,yD) to n be Ext restricted in the domain {0, 1} × [D] where Ext(y1,··· ,yD)(x, i) = Ext(x, yi).
Our main result is that given any strong (k, )-extractor Ext, most restricted extractors ˜ n with a quasi-linear degree O( 2 ) from Ext are strong (k, 3)-extractors for a constant number of output bits, despite the degree of Ext.
Theorem 1.1.4. There exists a universal constant C such that given any strong (k, )- n t m n·2m 2 n·2m extractor Ext : {0, 1} × {0, 1} → {0, 1} , for D = C · 2 · log random seeds t y1, . . . , yD ∈ {0, 1} , Ext(y1,...,yD) is a strong (k, 3)-extractor with probability 0.99.
We state the corollaries of Theorem 1.1.4 for simple extractors of linear transforma- tions and Toeplitz matrices and hash families with efficient implementations in Chapter7.
On the other hand, the same statement of Theorem 1.1.4 holds for extractors. In this case, we observe that the dependency 2m on the degree D of restricted extractors is necessary to guarantee its error is less than 1/2 on m output bits.
Proposition 1.1.5. There exists a (k, )-extractor Ext : {0, 1}n × {0, 1}t → {0, 1}m with k = 1 and = 0 such that any restricted extractor of Ext requires the degree D ≥ 2m−1 to guarantee its error is less than 1/2.
While the dependency 2m is necessary for the degree of restricted extractors, it may not be necessary for strong extractors.
9 Previous work. In a seminal work, Impagliazzo, Levin, and Luby [ILL89] proved the Leftover Hash Lemma, i.e., all functions from an almost universal hash family constitute a strong extractor. In particular, this implies that all linear transformations and all Toeplitz matrices constitute strong extractors respectively.
Most previous research has focused on linear extractors, whose extractor functions are linear on the random source for every fixed seed. Because of their simplicity and various applications such as building blocks of extractors for structured sources [Li16, CL16], there have been several constructions of linear extractors with small degree. The first nontrivial progress was due to Trevisan [Tre01], who constructed the first linear extractor with degree polynomial in n and . Based on Trevisan’s work, Shaltiel and Umans [SU05] built linear extractors with almost linear degree for constant error. Later on, Guruswami, Umans, and Vadhan [GUV09a] constructed almost optimal linear condensers and vertex-expansion ex- panders, which are variants of extractors and lead to an extractor with a degree n·poly(k/). However, the GUV extractor is not linear. Moreover, it is still open whether the degree of n−k linear extractors could match the degree Θ( 2 ) of general extractors. On the other hand, much less is known about extractors consisting of Toeplitz ma- trices. Prior to this work, even for m = 2 output bits, the best known upper bound on extractors with Toeplitz matrices was exponential in n by the Leftover Hash Lemma [ILL89] of all Toeplitz matrices.
At the same time, from a practical point of view, it is desirable to have an extractor with a small degree that runs fast and is easy to implement. In this work, we consider efficient extractors from almost universal hash families, which are easier to implement than error-correcting codes and expander graphs used in most known constructions of extrac- tors. In Chapter7, we describe a few notable almost universal hash families with efficient implementations. Prior to this work, the best known upper bounds on the degree of extrac- tors from almost universal hash families were the Leftover Hash Lemma [ILL89], which are
10 exponential in the length of the random source n.
Other work on improving extractors focus on other parameters such as the error and number of output bits. Raz et al. [RRV99] showed how to reduce the error and enlarge the number of output bits of any given extractor by sacrificing the degree.
1.1.4 Hash functions for Multiple-choice Schemes.
We consider explicit hash families for the classical problem of placing balls into bins. The basic model is to hash n balls into n bins independently and uniformly at random, which we call the 1-choice scheme. A well-known and useful fact of the 1-choice scheme is that log n with high probability, each bin contains at most O( log log n ) balls. By high probability, we mean probability 1 − n−c for an arbitrary constant c.
An alternative variant, which we call Uniform-Greedy, is to provide d ≥ 2 independent random choices for each ball and place the ball in the bin with the lowest load. In a seminal work, Azar et al. [ABKU99] showed that the Uniform-Greedy scheme with d independent log log n random choices guarantees a maximum load of only log d + O(1) with high probability for n balls. Later, V¨ocking [Voc03] introduced the Always-Go-Left scheme to further improve log log n the maximum load to + O(1) for d choices where φd > 1.61 is the constant satisfying d log φd d d−1 φd = 1 + φd + ··· + φd . For convenience, we always use d-choice schemes to denote the Uniform-Greedy and Always-Go-Left scheme with d ≥ 2 choices.
Traditional analysis of load balancing assumes a perfectly random hash function. A large body of research is dedicated to the removal of this assumption by designing ex- plicit hash families using fewer random bits. In the 1-choice scheme, it is well known that log n log n O( log log n )-wise independent functions guarantee a maximum load of O( log log n ) with high log2 n probability, which reduces the number of random bits to O( log log n ). Celis et al. [CRSW13] designed a hash family with a description of O(log n log log n) random bits that achieves the log n same maximum load of O( log log n ) as a perfectly random hash function.
11 In this thesis, we are interested in the explicit constructions of hash families that achieve the same maximum loads as a perfectly random hash function in the d-choice schemes. More precisely, we study how to derandomize the perfectly random hash function in the Uniform-Greedy and Always-Go-Left scheme. For these two schemes, O(log n)-wise inde- pendent hash functions achieve the same maximum loads from V¨ocking’s argument [Voc03], which provides a hash family with Θ(log2 n) random bits. Recently, Reingold et al. [RRW14] showed that the hash family in [CRSW13] guarantees a maximum load of O(log log n) in the Uniform-Greedy scheme with O(log n log log n) random bits.
Our results. We construct a hash family with O(log n log log n) random bits based on the previous work of Celis et al. [CRSW13] and show the following results.
log log n 1. This hash family has a maximum load of log d +O(1) in the Uniform-Greedy scheme.
2. It has a maximum load of log log n + O(1) in the Always-Go-Left scheme. d log φd
The maximum loads of our hash family match the maximum loads of a perfectly random hash function [ABKU99, Voc03] in the Uniform-Greedy and Always-Go-Left scheme separately.
1.1.5 CSPs with a global cardinality constraint.
A variety of problems in pseudorandomness such as bipartite expanders and dispersers could be stated as constraint satisfaction problems complying with a global cardinality con- straint. In a d-ary constraint satisfaction problem (CSP), we are given a set of boolean variables {x1, x2, ··· , xn} over {±1} and m constraints C1, ··· ,Cm, where each constraint
Ci consists of a predicate on at most d variables. A constraint is satisfied if and only if the assignment of the related variables is in the predicate of the constraint. The task is to find an assignment to {x1, ··· , xn} so that the greatest (or the least) number of constraints in
{C1, ··· ,Cm} are satisfied.
12 Given a boolean CSP instance I, we can impose a global cardinality constraint Pn i=1 xi = (1 − 2p)n (we assume that pn is an integer). Such a constraint is called the bisection constraint if p = 1/2. For example, the MaxBisection problem is the Max- Cut problem with the bisection constraint. Constraint satisfaction problems with global cardinality constraints are natural generalizations of boolean CSPs. Researchers have been studying approximation algorithms for CSPs with global cardinality constraints for decades, where the MaxBisection problem [FJ95, Ye01, HZ02, FL06, GMR+11, RT12, ABG13] and the SmallSet Expansion problem [RS10, RST10, RST12] are two prominent examples.
Adding a global cardinality constraint could strictly enhance the hardness of the problem. The SmallSet Expansion problem can be viewed as the MinCUT problem with the cardinality of the selected subset to be ρ|V | (ρ ∈ (0, 1)). While MinCUT ad- mits a polynomial-time algorithm to find the optimal solution, we do not know a good approximation algorithm for SmallSet Expansion. Raghavendra and Steurer [RS10] sug- gested that the SmallSet Expansion problem is where the hardness of the notorious UniqueGames problem [Kho02] stems from.
We study several problems about CSPs under a global cardinality constraint in this thesis. The first one is the fixed parameter tractability (FPT) of the CSP above average under a global cardinality constraint. Specifically, let AV G be the expected number of constraints satisfied by randomly choosing an assignment to x1, x2, . . . , xn complying with the global cardinality constraint. We show an efficient algorithm that finds an assignment (complying with the cardinality constraint) satisfying more than (AV G+t) constraints for an input parameter t. The second is to approximate the vertex expansion of a bipartite graph. Our main results are strong integrality gaps in the Lasserre hierarchy and an approximation algorithm for dispersers and bipartite expanders.
13 1.2 Organization
We provide some preliminaries and discuss condition numbers in Chapter2. We also introduce a few tools in signal recovery of continuous sparse Fourier transform and linear families in this chapter.
In Chapter3, we prove the condition number of k-Fourier-sparse signals is in [k2, O˜(k3)] and show how to scale it down to O˜(k). In Chapter4, we present the sample-efficient algo- rithm for continuous sparse Fourier transform. In Chapter5, we consider a special case of continuous sparse Fourier transform and present the robust polynomial recovery algorithm. In Chapter6, we study how to learn signals from any given linear families. Most of the ma- terial in these chapters are based on joint work with Kane, Price, and Song [CKPS16, CP17].
We consider hash functions and its applications in Chapters7 and8. We show how to reduce the degree of any extractor and discuss its implications on almost universal hash families in Chapter7. We present our hash families that derandomizes multiple-choice schemes in Chapter8. Most of the material in these chapters are based on joint work with Zuckerman [CZ18] and [Che17].
We study problems about CSPs under a global cardinality constraint in Chapters9 and 10. We show that they are fixed parameter tractable in Chapter9. We present the integrality gaps of approximating dispersers and bipartite expanders in Chapter 10. Most of the material in these chapters are based on joint work with Zhou [CZ17] and [Che16].
14 Chapter 2
Preliminaries