<<

The Structure of Optimal Private Tests for Simple Hypotheses∗

Clément L. Canonne Gautam Kamath Audra McMillan Simons Institute for the Theory of Boston University and Northeastern USA Computing University [email protected] USA USA [email protected] [email protected] Adam Smith Jonathan Ullman Boston University Northeastern University USA USA [email protected] [email protected]

ABSTRACT of Computing (STOC ’19), June 23–26, 2019, Phoenix, AZ, USA. ACM, New Hypothesis testing plays a central role in statistical inference, and York, NY, USA, 12 pages. https://doi.org/10.1145/3313276.3316336 is used in many settings where privacy concerns are paramount. This work answers a basic question about privately testing simple 1 INTRODUCTION hypotheses: given two distributions P and Q, and a privacy level ε, how many i.i.d. samples are needed to distinguish P from Q subject Hypothesis testing plays a central role in statistical inference, anal- to ε-differential privacy, and what sort of tests have optimal sample ogous to that of decision or promise problems in computability and complexity? Specifically, we characterize this sample complexity complexity theory. A hypothesis testing problem is specified by two up to constant factors in terms of the structure of P and Q and the disjoint sets of probability distributions over the same set, called H H privacy level ε, and show that this sample complexity is achieved by hypotheses, 0 and 1. An T for this problem, called a a certain randomized and clamped variant of the log-likelihood ratio hypothesis test, is given a sample x from an unknown distribution ( ) test. Our result is an analogue of the classical Neyman–Pearson P, with the requirement that T x should, with high probability, ∈ H ∈ H lemma in the setting of private hypothesis testing. We also give output “0” if P 0, and “1” if P 1. There is no requirement H ∪ H an application of our result to the private change-point detection. for distributions outside of 0 1. In , such Our characterization applies more generally to hypothesis tests problems sometimes go by the name distribution property testing. satisfying essentially any notion of algorithmic stability, which Hypothesis testing problems are important in their own right, as is known to imply strong generalization bounds in adaptive data they formalize yes-or-no questions about an underlying population analysis, and thus our results have applications even when privacy based on a randomly drawn sample, such as whether is not a primary concern. strongly influences life expectancy, or whether a particular med- ical treatment is effective. Successful hypothesis tests with high CCS CONCEPTS degrees of confidence remain the gold standard for publication in top journals in the physical and social sciences. Hypothesis testing • Mathematics of computing → Hypothesis testing and con- problems are also important in the theory of statistics and machine fidence interval computation; • Security and privacy; • The- learning, as many lower bounds for estimation and optimization ory of computation → Design and analysis of ; problems are obtained by reducing from hypothesis testing. KEYWORDS This paper aims to understand the structure and sample complex- ity of optimal hypothesis tests subject to strong privacy guarantees. differential privacy, hypothesis testing Large collections of personal information are now ubiquitous, but ACM Reference Format: their use for effective scientific discovery remains limited bycon- Clément L. Canonne, Gautam Kamath, Audra McMillan, Adam Smith, and Jonathan cerns about privacy. In addition to the well-understood settings of Ullman. 2019. The Structure of Optimal Private Tests for Simple Hypotheses. data collected during scientific studies, such as clinical experiments In Proceedings of the 51st Annual ACM SIGACT Symposium on the Theory and surveys, many other data sources where privacy concerns are ∗In memory of Stephen E. Fienberg (1942–2016). A full version of this paper is available paramount are now being tapped for socially beneficial analysis, as [19]. such as Social Science One [70], which aims to allow access to data collected by Facebook and similar companies. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed We study algorithms that satisfy differential privacy (DP) [32], a for profit or commercial advantage and that copies bear this notice and the full citation restriction on the algorithm that ensures meaningful privacy guar- on the first page. Copyrights for components of this work owned by others than the antees against an adversary with arbitrary side information [47]. author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission Differential privacy has come to be the de facto standard forthe and/or a fee. Request permissions from [email protected]. analysis of private data, used as a measure of privacy for data STOC ’19, June 23–26, 2019, Phoenix, AZ, USA analysis systems at Google [36], Apple [25], and the U.S. Census © 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM. ACM ISBN 978-1-4503-6705-9/19/06...$15.00 Bureau [24]. Differential privacy and related distributional notions https://doi.org/10.1145/3313276.3316336 of algorithmic stability can be crucial for statistical validity even

310 STOC ’19, June 23–26, 2019, Phoenix, AZ, USA Canonne, Kamath, McMillan, Smith, Ullman when confidentiality is not a direct concern, as they provide gener- problems is delicate. We know there are regimes where statisti- alization guarantees in an adaptive setting [29]. cal problems can be solved privately “for free” asymptotically (e.g. Consider an algorithm that takes a set of data points from a set [20, 32, 45, 69]) and others where there is a significant cost, even for X—where each point belongs to some individual—and produces relaxed definitions of privacy (e.g. [15, 35]), and we remain far from some public output. We say the algorithm is differentially private a general characterization of the statistical cost of privacy. Duchi, if no single data point can significantly impact the distribution on Jordan, and Wainwright [26] give a characterization for the special outputs. Formally, we say two data sets x, x ′ ∈ Xn of the same size case of simple tests by local differentially private algorithms, a more are neighbors if they differ in at most one entry. restricted setting where samples are randomized individually, and Definition 1.1 ([32]). A T taking inputs in the test makes a decision based on these randomized samples. Our X∗ and returning random outputs in a space with event set S is characterization in the general case is more involved, as it exhibits ε-differentially private if for all n ≥ 1, for all neighboring data sets several distinct regimes for the parameter ε. x, x ′ ∈ Xn, and for all events S ∈ S, P [T (x) ∈ S] ≤ eε P [T (x ′) ∈ S]. Our analysis relies on a number of tools of independent interest: For the special case of tests returning output in {0, 1}, the output a characterization of private hypothesis testing in terms of cou- Xn distribution is characterized by the probability of returning “1”. plings between distributions on , and a novel interpretation of Letting д(x) = P [T (x) = 1], we can equivalently require that Hellinger distance as the advantage over random guessing of a   specific, randomized likelihood ratio test. д(x) 1 − д(x) ε max ′ , ′ ≤ e . д(x ) 1 − д(x ) The Importance of Simple Hypotheses. Many of the hypothe- For algorithms with binary outputs, this definition is essentially ses that arise in application are not simple, but are so-called com- equivalent to all other commonly studied notions of privacy and dis- posite hypotheses. For example, deciding if two features are inde- tributional algorithmic stability (see “Connections to Algorithmic pendent or far from it involves sets H0 and H1 each containing Stability”, below). many distributions. Yet many of those tests can be reduced to sim- Contribution: The Sample Complexity of Private Tests for ple ones. For example, deciding if the mean of a Gaussian is less Simple Hypotheses. We focus on the setting of i.i.d. data and than 0 or greater than 1 can be reduced to testing if the mean is either 0 or 1. Furthermore, simple tests arise in lower bounds for singleton hypotheses H0, H1, which are called simple hypotheses. estimation—the well-known characterization of parametric estima- The algorithm is given a sample of n points x1,..., xn drawn i.i.d. from one of two distributions, P or Q, and attempts to determine tion in terms of Fisher information is obtained by showing that n n the Fisher information measures variability in the Hellinger dis- which one generated the input. That is, H0 = {P } and H1 = {Q }. We investigate the following question. tance and then employing the Hellinger-based characterization of nonprivate simple tests (e.g. [12, Chap. II.31.2, p.180]). Given two distributions P and Q and a privacy param- Our characterization of private testing implies similar lower eter ε > 0, what is the minimum number of samples P,Q bounds for estimation (along the lines of lower bounds of Duchi (denoted SCε ) needed for an ε-differentially private and Ruan [27] in the local model of differential privacy). test to reliably distinguish P from Q, and what are op- timal private tests? Connection to Algorithmic Stability. For hypothesis tests with These questions are well understood in the classical, nonprivate constant error probabilities, sample complexity bounds for differ- setting. The number of samples needed to distinguish P from Q is ential privacy are equivalent, up to constant factors, to sample 2 2 Θ(1/H (P,Q)), where H denotes the squared Hellinger distance complexity bounds for other notions of distributional algorithmic 1 (3). Furthermore, by the Neyman–Pearson lemma, the exactly stability, such as (ε, δ)-DP [31], concentrated DP [14, 34], KL- and n n optimal test consists of computing the likelihood ratio P (x)/Q (x) TV-stability [9, 77] (see [3, Lemma 5]). (Briefly: if we ensure that and comparing it to some threshold. Pr(T (x) = 1) ∈ [0.01, 0.99] for all x, then an additive change of We give analogous results in the private setting. First, we give a ε corresponds to an multiplicative change of 1 ± O(ε), and vice- closed-form expression that characterizes the sample complexity versa.) Consequently, our results imply optimal tests for use in up to universal multiplicative constants, and highlights the range conjunction with stability-based generalization bounds for adap- of ε in which private tests use a similar amount of data to the best tive data analysis, which has generated significant interest in recent nonprivate ones. We also give a specific, simple test that achieves years [9, 28–30, 37, 38, 64, 65, 78]. that sample complexity. Roughly, the test makes a noisy decision based on a “clamped” log likelihood ratio in which the influence 1.1 Hypothesis Testing of each data point is limited. The sample complexity has the form To put our result in context, we review classical results about non- Θ(1/adv1), where adv1 is the advantage of the test over random guessing on a sample of size n = 1. The optimal test and its sample private hypothesis testing. Let P and Q be two probability distri- butions over an arbitrary domain X.A hypothesis test K : X∗ → complexity are described in Theorem 1.2. ∗ Our result provides the first instance-specific characterization {“P”, “Q”} is an algorithm that takes a set of samples x ∈ X and of a statistical problem’s complexity for differentially private algo- attempts to determine if it was drawn from P or Q. Define the rithms. Understanding the private sample complexity of statistical advantage of a test K given n samples as

1This statement is folklore, but see, e.g., [8] for the lower bound, [18] or Corollary 2.2 advn(K) = P [K(x) = “P”] − P [K(x) = “P”]. (1) for the upper bound. x∼P n x∼Q n

311 The Structure of Optimal Private Tests for Simple Hypotheses STOC ’19, June 23–26, 2019, Phoenix, AZ, USA

We say that K distinguishes P fromQ with sample complexity SCP,Q (K) Similarly, define the noisy clamped log-likelihood ratio test: P,Q P,Q if for every n ≥ SC (K), advn(K) ≥ 2/3. We say SC = (   P (x) + 1 > min SCP,Q (K) is the sample complexity of distinguishing P from if cLLRa,b Lap ε(b−a) 0 K ncLLRa,b (X) = Q. Q otherwise Most hypothesis tests are based on some real-valued test statistic The test ncLLR is an instance of postprocessing the Laplace mecha- S : X∗ → R where nism [32], and satisfies ε-differential privacy. ( “P” if S(X) ≥ κ Our main result is that, for every P,Q, and every ε, the tests KS (x) = “Q” otherwise scLLR−ε′,ε and ncLLR−ε′,ε are optimal up to constant factors, for some appropriate 0 ≤ ε ′ ≤ ε. To state the result more precisely, we for some threshold κ. We will sometimes abuse notation and use the introduce some additional notation. First define test statistic S and the implied hypothesis test KS interchangeably.  ∫ ε The classical Neyman–Pearson Lemma says that the exact opti- τ = τ (P,Q) ≜ max max{P(x) − e Q(x), 0} dx, mal test2 for distinguishing P,Q is the log-likelihood ratio test given X  (4) by the test statistic ∫ max{Q(x) − eε P(x), 0} dx , X n Õ P(x ) ∫ LLR(x ,..., x ) = log i . (2) and assume without loss of generality that τ = max{P(x) − 1 n Q(x ) X i=1 i eεQ(x), 0} dx, which we assume for the remainder of this work.3 Another classical result says that the optimal sample complexity is Next, let 0 ≤ ε ′ ≤ ε be the largest value such that squared Hellinger distance , characterized by the between P Q, which ∫ ∫ ′ is defined as max{P(x) −eεQ(x), 0} dx = max{Q(x) −eε P(x), 0} dx = τ, X X ∫ 2 2 1 p p  H (P,Q) = P(x) − Q(x) dx. (3) whose existence is guaranteed by a continuity argument (a formal 2 X argument is in the full version). We give an illustration of the defi- P,Q P,Q 2 Specifically, SC = SC (LLR) = Θ(1/H (P,Q)). Note that nition of τ and ε ′ in Figure1. Finally, define P˜ = min{eεQ, P} and ′ the same metric provides upper and lower bounds on the sample Q˜ = min{eε P,Q} and normalize by (1 − τ ) to obtain distributions complexity. P ′ = P˜/(1 − τ ) and Q ′ = Q˜/(1 − τ ). (5) 1.2 Our Results The distributions P ′,Q ′ are such that Our main result is an approximate characterization of the sam- P ′(x) −ε ′ ≤ log ≤ ε , ple complexity of ε-differentially private tests for distinguishing P Q ′(x) , and Q. Analogous to the non-private case, we will write SCP Q = ε and min SCP,Q (K) to denote the sample complexity of ε-differentially ε-DP K ′ ′′ ′ ′′ privately (ε-DP) distinguishing P from Q, and we characterize this P = (1 − τ )P + τP and Q = (1 − τ )Q + τQ , quantity up to constant factors in terms of the structure of P,Q where P ′′ and Q ′′ are distributions with disjoint support. The quan- and the privacy parameter ε. Specifically, we show that a privatized tity τ is the smallest possible number for which such a representa- clamped log-likelihood ratio test is optimal up to constant factors. tion is possible. With these definitions in hand, we can now state Privacy may be achieved through either the Laplace or Exponential our main result. mechanism, and we will prove optimality of both methods. For b ≥ a, we define the clamped log-likelihood ratio statistic, Theorem 1.2. For every pair of distributions P,Q, and every ε > 0, the optimal sample complexity for ε-differentially private tests is Õ  P(x ) b cLLR (x) = log i , achieved by either the soft or noisy clamped log-likelihood test, and a,b Q(x ) i i a satisfies b P,Q P,Q P,Q where [·]a denotes the projection onto the interval [a,b] (that is, SCε = Θ(SC (ncLLR−ε′,ε )) = Θ(SC (scLLR−ε′,ε )) b [z]a = max(a, min(z,b))).  1   1  Define the soft clamped log-likelihood test: = Θ ′ ′ = Θ . ετ (P,Q) + (1 − τ )H 2(P ,Q ) adv1(scLLR−ε′,ε ) ( 1 P with probability ∝ exp( 2 cLLRa,b (x)) P,Q scLLRa,b (x) = When ε ≥ maxx |log P(x)/Q(x)|, Theorem 1.2 reduces to SCε = Q with probability ∝ 1  1  Θ H 2(P,Q) , which is the sample complexity for distinguishing The test scLLR is an instance of the exponential mechanism [54], between P and Q in the non-private setting. This implies that we get b−a and thus scLLRa,b satisfies ε-differential privacy for ε = 2 . privacy for free asymptotically in this parameter regime. We will

2 3 ∫ More precisely, given any test K, there is a setting of the threshold κ for the log- For α ≥ 0, the quantity Dα (P ∥Q) = max(P(x)−αQ(x), 0) dx is an f -divergence likelihood ratio test that weakly dominates K, meaning that Px∼Q [LLR(x) = P] ≤ and has appeared in the literature before under the names α -divergence, hockey- Px∼Q [K(x) = P] and Px∼P [LLR(x) = Q] ≤ Px∼P [K(x) = Q] (keeping the true stick divergence, or elementary divergence [6, 52] (for α = 1, one obtains the usual positive rates Px∼P [K(x) = P], Px∼Q [K(x) = Q] fixed). One may need to randomize total variation distance). Thus, τ is the maximum of the divergences Deε (P ∥Q) and the decision when S(X ) = κ to achieve some tradeoffs between false negative and Deε (Q ∥P). It can also be described as the smallest value δ such that P and Q are positive rates. (ε, δ )-indistinguishable [33].

312 STOC ’19, June 23–26, 2019, Phoenix, AZ, USA Canonne, Kamath, McMillan, Smith, Ullman

P(x) 1.2.1 Application: Private Change-Point Detection. As an applica- eε P(x) tion of our result, we obtain optimal private algorithms for change- ( ) 2 Q x point detection. Given distributions P and Q, an algorithm solv- eεQ(x) ing offline change-point detection for P and Q takes a stream x = n (x1, x2,..., xn) ∈ X with the guarantee that the there is an index ∗ ∗ 1.5 k such that first k elements are sampled i.i.d. from P and the latter elements are sampled i.i.d. from Q, and attempts to output kˆ ≈ k∗. We can also consider an online variant where elements xi arrive 1 one at a time. Change-point detection has a long history in statistics and infor- mation theory (e.g. [50, 51, 53, 55, 56, 60–63, 67, 68, 73]). Cummings 0.5 et al. [22] recently gave the first private algorithms for change-point detection. Their algorithms are based on a private version of the log-likelihood ratio, and in cases where the log-likelihood ratio 0 is not strictly bounded, they relax to a weaker distributional vari- 0 1 x ant of differential privacy. Using Theorem 1.2, we can achieve the standard worst-case notion of differential privacy, and to achieve Figure 1: An illustration of the definition of τ , for ε = 0.2 optimal error bounds for every P,Q. and for two densities P,Q over X = [0, 1]. The blue shaded ∫ Theorem 1.3 (Informal). For every pair of distributions P and area represents max{P(x) − eεQ(x), 0} dx, while the red cor- Q, and every ε > 0, there is an ε-differentially private algorithm ∫ { ( )− ε ( ) } responds to max Q x e P x , 0 dx. The larger of these two that solves offline change-point detection for P and Q such that, with is τ (P,Q). If the blue area is larger than the red area, the def- , probability at least 9/10, |kˆ − k∗| = O(SCP Q ). inition of ε ′ corresponds to lowering the dotted blue curve ε until the two are the same size. The expected error in this result is optimal up to constant factors for every pair P,Q, as one can easily show that the error must be at P,Q focus on proving the first equality, a proof of the second appears in least Ω(SCε ). Theorem 1.3 can be extended to give an arbitrarily the full version. small probability β of failure, and can be extended to the online change-point detection problem, although with more complex ac- Comparison to Known Bounds. For ε < 1, the bounds curacy guarantees. Our algorithm introduces a general reduction 1  1  ≤ SCP,Q ≤ O from private change-point detection for families of distributions 2 ε 2 H (P,Q) εH (P,Q) H0 and H1 to private hypothesis testing for the same family, which follow directly from the non-private sample complexity. Namely, we believe to be of independent interest. the lower bound is the non-private sample complexity and the upper bound is obtain by applying the sample-and-aggregate tech- 1.3 Techniques nique [59] to the optimal non-private test. They can be recovered First Attempts. A folklore result based on the sample-and-aggregate from Theorem 1.2 by noting that , > P,Q √ paradigm [59] shows that for every P Q and every ε 0, SCε = 2 ε p 2 O( 1 SCP,Q ), meaning privacy comes at a cost of at most O( 1 ).4 εH (P,Q) = ∥ P − Q∥2 ε ε 2 P,Q P,Q ! However, there are many examples where SCε = O(SC ) even √ p 2 p q 2 q 2 − ˜ ˜ − ˜ ˜ − p when ε = o(1), and understanding this phenomenon is crucial. = O ε P P + ε P Q + ε Q Q 2 2 2 A few illustrative pairs of distributions P,Q will serve to demon- = O(ετ + ε(1 − τ )H 2(P ′,Q ′) + ετ ) strate the difficulties that go into formulating and proving Theo- rem 1.2. First, consider the domain X = {0, 1} of size two (i.e. Bernoulli 2 ′ ′ = O(ετ + (1 − τ )H (P ,Q )) distributions). To distinguish P = Ber( 1+α ) from Q = Ber( 1−α ), 2 Í 2 and the optimal non-private test statistic is S(x) = i xi , which requires Θ( 1 ) = Θ( 1 ) samples. ετ + (1 − τ )H 2(P ′,Q ′) α 2 H 2(P,Q) To make the test differentially private, one can use a soft ver- ∫ p q = ε · |P(x) − eεQ(x)| dx + ∥ P˜ − Q˜ ∥2 sion of the test that outputs “P” with probability proportional to 2 Í Í S exp(ε i xi ) and “Q” with probability proportional to exp(ε i 1 − ∫ √ ) ( 1 + 1 ), p 2 xi . This private test has sample complexity Θ α 2 αε which is ≤ ε · |P(x) − Q(x)| dx + ∥ P − Q∥2 S optimal. In contrast, if P = Ber(0) and Q = Ber(α), then the non- ε/2 ∫ private sample complexity becomes Θ( 1 ), and the optimal private 1 + e p p 2 2 α ≤ ε · · ( P(x) − Q(x)) dx + H (P,Q) ( 1 ) X = { , } eε/2 − 1 S sample complexity becomes Θ αε . In general, for 0 1 , one = O(H 2(P,Q)), where S = { x : P(x) − eεQ(x) > 0 }. 4See, e.g., [16] for a proof.

313 The Structure of Optimal Private Tests for Simple Hypotheses STOC ’19, June 23–26, 2019, Phoenix, AZ, USA

can show that P(x) ≤ supx ∈X log Q(x) ε, we get ε-DP for free (since Hellinger dis-   , 1 1 tance, and thus sLLR, characterizes the optimal asymptotic sample SCP Q = Θ + . (6) ε H 2(P,Q) εTV(P,Q) complexity). Thus, we use the clamped log-likelihood ratio test, which forces the log-likelihood ratio to be bounded. Our lower The sample complexity of testing Bernoulli distributions (6) al- bound in a sense shows that any loss of power in the test due to ready demonstrates an important phenomenon in private hypothe- clamping is necessary for differentially private tests. sis testing—for many distributions P,Q, there is a “phase transition” The lower bound proof proceeds by finding a coupling ρ of Pn where the sample complexity takes one form when ε is sufficiently n and Q with low expected Hamming distance E(X,Y )∼ρ [dH (X,Y)], large and another when ε is sufficiently small, and often the sam- which in turn implies that the sample complexity is large (see ple complexity in the “large ε regime” is equal to the non-private e.g. [3]). The coupling we use essentially splits the support of P complexity up to lower order terms. A key challenge in obtaining ( ) and Q into two subsets, those elements with log P x ≥ ε and the Theorem 1.2 is to understand these transitions, and to understand Q(x) X ∼ the sample complexity in each regime. remaining elements. To construct the coupling, given a sample n P(x) ≥ Since each of the terms in (6) is a straightforward lower bound P , the high-ratio elements for which log Q(x) ε are resampled on the sample complexity of private testing, one might conjecture with probability related to the ratio. The contribution of this step that (6) holds for every pair of distributions. However, our next to the Hamming distance gives us the ετ part of the lower bound. illustrative example shows that this conjecture is false even for the The data set consisting of the m ≤ n elements that have not yet domain X = {0, 1, 2}. Consider the two densities been resampled is then coupled using the total variation distance coupling between Pm and Qm. Therefore, with probability 1 − 3/2 3/2 3/2 P = (0, 0.5, 0.5) and Q = (2α , 0.5 + α − α , 0.5 − α − α ) . TV(Pm,Qm) this part of the data set remains unchanged. This part 2 ′ ′ For these distributions, the log-likelihood ratio statistic is roughly of the coupling results in the H (P ,Q ) part of the lower bound. equivalent to the statistic that counts the number of occurrences The proof of the upper bound also splits into two parts, roughly of “0,” S (x) = Í 1{x = 0}, and has sample complexity Θ( 1 ). corresponding to the same aspects of the distributions P and Q as {0} i i α 3/2 For this pair of distributions, the optimal private test depends on above. That is, we view our tester as either counting the number of the relationship between α and ε. One such test is to simply use high-ratio elements or computing the log-likelihood ratio on low- ratio elements. A useful observation is that this duality between the the soft version of S {0}, and the other is to use the soft version of Í 1 upper and lower bounds is inevitable. In Section3, we characterize the test S {0,1} = i {xi ∈ {0, 1}} that counts occurrences of the first two elements. One can prove that the better of these two tests the advantage of the optimal tester in terms of Wasserstein distance P Q {εd (X,Y), } is optimal up to constant factors, giving between and with metric min H 1 . That is, the ad- vantage of the optimal tester must be matched by some coupling of    P,Q 1 1 1 Pn and Qn. SCε = Θ min , + . α3/2ε α2 αε 1.4 Related Work For these distributions, (6) reduces to Θ( 1 + 1 ), so these dis- α 3/2 αε Early work on differentially private hypothesis testing began in tributions show that the optimal sample complexity can be much the Statistics community with [72, 74]. More recently, there has larger than (6). Moreover, these distributions exhibit that the opti- been a significant number of works on differentially private hy- mal sample complexity can vary with ε in complex ways, making pothesis testing. One line of work [17, 21, 40, 44, 49, 71, 76] designs several transitions and never matching the non-private complexity differentially private versions of popular test statistics for testing unless α or ε is constant. goodness-of-fit, closeness, and independence, as well as private Key Ingredients. The second example above demonstrates that ANOVA, focusing on the performance at small sample sizes. Work the optimal test itself can vary with ε in an intricate fashion, which by Wang et al. [75] focuses on generating statistical approximat- makes it difficult to construct a single test for which we can prove ing distributions for differentially private statistics, which they matching upper and lower bounds on the sample complexity. The apply to hypothesis testing problems. A recent work by Awan and use of the clamped log-likelihood test arose out of an attempt to Slavkovic [5] gives a universally optimal test when the domain size find a single test that is optimal for the second pair of distributions is two, however Brenner and Nissim [13] shows that such univer- P and Q, and relies on a few crucial technical ingredients. sally optimal tests cannot exist when the domain has more than First, our upper bound on sample complexity relies on a key ob- two elements. A complementary research direction, initiated by servation that the Hellinger distance between two distributions is Cai et al. [16], studies the minimax sample complexity of private exactly the advantage, adv1, of a soft log-likelihood ratio test sLLR hypothesis testing. [3] and [4] have given worst-case nearly optimal on a sample of size 1. The sLLR test is a randomized test that out- algorithms for goodness-of-fit and closeness testing of arbitrary p puts P with probability proportional to Pn(x)/Qn(x) = eLLR(x)/2, discrete distributions. That is, there exists some worst-case distribu- Í P(xi ) tion P such that their algorithm has optimal sample complexity for where LLR(x) = i log , and Q with probability proportional Q(xi ) testing goodness-of-fit to P. Recently, [2] designed nearly optimal H 2(P,Q) to 1. This characterization of as the advantage of the sLLR algorithms for estimating properties like support size and entropy. tester may be of independent interest. Another related area [1, 41, 66] studies hypothesis testing in the This observation is crucial for our work, because it implies that local model of differential privacy. In particular, Duchi, Jordan, and P(x) ≤ sLLR is ε-DP if supx ∈X log Q(x) ε. That is, in the case that Wainwright [26] proved an analogue of our result for the restricted

314 STOC ’19, June 23–26, 2019, Phoenix, AZ, USA Canonne, Kamath, McMillan, Smith, Ullman case of locally differentially private algorithms. Their characteri- e.g., [12]). In this section we show that the Hellinger distance ex- zation shows that, the optimal sample complexity for ε-DP local actly characterizes the advantage of the following randomized test algorithms is Θ(1/(ε2TV(P,Q)2)). This characterization does not given a single data point: exhibit the same phenomena that we demonstrate in the central ( P with probability д(x) model—privacy never comes “for free” if ε = o(1), and the sample sLLR(x) = complexity does not exhibit different regimes depending on ε. More Q with probability 1 − д(x) generally, local-model tests are considerably simpler, and simpler where to reason about, than central-model tests.  1 P(x)  There are also several rich lines of work attempting to give tight exp 2 log Q(x) д(x) = ∈ [0, 1]. instance-specific characterizations of the sample complexities of  ( )  + 1 P x various differentially private computations, most notably linear 1 exp 2 log Q(x) query release [11, 43, 48, 57, 58] and PAC and agnostic learning [10, Considering the advantage of sLLR might seem puzzling at first 39, 46]. The problems considered in these works are arguably more glance, since the classic likelihood ratio test LLR enjoys a better complex than the hypothesis testing problems we consider here, advantage. More specifically, the value of adv1 for these two tests the characterizations are considerably looser, and are only optimal is H 2(P,Q) (Theorem 2.1) and TV (P,Q) (by the definition of total up to polynomial factors. variation distance), respectively, and H 2(P,Q) ≤ TV (P,Q) (see, There has been a recent line of work [9, 23, 28–30, 37, 38, 64, 65, e.g., [42]), so it would appear that LLR is the better test. There 78] on adaptive data analysis, in which the same dataset is used are two relevant features of sLLR which will be useful. First, as repeatedly across multiple statistics analyses, the choice of each mentioned before, sLLR is naturally private if the likelihood ratio analysis depends on the outcomes of previous analyses. The key is bounded. Second, a tensorization property of Hellinger distance theme in these works is to show that various strong notions of algo- allows us to easily relate the advantage of the n-sample test to the rithmic stability, including differential privacy imply generalization advantage of the 1-sample test. bounds in the adaptive setting. Our characterization applies to all notions of stability considered in these works. Theorem 2.1. For any two distributions P,Q, the advantage, adv1, As an application of our private hypothesis testing results, we of sLLR is H 2(P,Q). provide algorithms for private change-point detection. As discussed in Section 1.2.1, change-point detection has enjoyed a significant Proof. Note that we can rewrite amount of study in information theory and statistics. Our results are 1 ∫ p p H 2(P,Q) = ( P(x) − Q(x))2 dx in the private minimax setting, as recently introduced by Cummings 2 X et al. [22]. We improve on their results by improving the detection p p 1 ∫ P(x) − Q(x) accuracy and providing strong privacy guarantees for all pairs of ( ( ) − ( )) = P x Q x p p hypotheses. 2 X P(x) + Q(x) p 1 ∫ P(x)/Q(x) − 1 ( ( ) − ( )) = P x Q x p dx. 2 UPPER BOUND ON SAMPLE COMPLEXITY 2 X P(x)/Q(x) + 1

OF PRIVATE TESTING q ( ) q ( ) q ( ) q ( ) Now, д(x) = P x / P x +1 = 1 P x −1/ P x + 1+1, In this section, we establish the upper bound part of Theorem 1.2, Q(x) Q(x) 2 Q(x) Q(x) 2 ∫ establishing that the soft clamped log-likelihood ratio test scLLR and therefore H (P,Q) = X(P(x) − Q(x))д(x)dx = Ex∼P [д(x)] − and the noisy clamped log-likelihood ratio test ncLLR achieve the Ex∼Q [д(x)] = Px∼P [sLLR(x) = P] − Px∼Q [sLLR(x) = P]. Thus, stated sample complexity. In more detail, we begin in Section 2.1 the advantage of sLLR is H 2(P,Q), as claimed. □ by characterizing the Hellinger distance between two distributions as the advantage, adv1, of a specific randomized test, sLLR. Besides This tells us the advantage of the test which takes only one being of independent interest, this reformulation also yields some sample. As a corollary, we can derive the sample complexity of insight on its privatized variant, scLLR. In Section 2.2, we introduce distinguishing P and Q using sLLR. the noisy clamped log-likelihood ratio test (ncLLR), and show that   P,Q ( ) = 1 its sample complexity is at least that of scLLR. We then proceed in Corollary 2.2. SC sLLR O H 2(P,Q) . Section 2.3 to upper-bound the sample complexity of ncLLR, which also implies that same bound on that of scLLR. Proof. Our analysis is similar to that of [18]. Observe that the test sLLR which gets n samples from either P or Q is equivalent to the test sLLR which gets 1 sample from either Pn or Qn. By 2.1 Hellinger Distance Characterizes the Soft Theorem 2.1, we have that the advantage of either (equivalent) test Log-Likelihood Test is adv = H 2(Pn,Qn). We will require the following tensorization 2 Recall that the advantage of a test, advn, was defined in Equation (1) property of the squared Hellinger distance: H (P1 × · · · × Pn,Q1 × 2 În 2 and the squared Hellinger distance, H (P,Q), between two distribu- · · · × Qn) = 1 − i=1(1 − H (Pi ,Qi )). With this in hand, adv = 2 1 ∫ p p 2 2( n n) −( − 2( ))n − ( − 2( )) ≥ tions P and Q is defined as H (P,Q) = 2 X( P(x) − Q(x)) dx. H P ,Q = 1 1 H P,Q = 1 exp n log 1 H P,Q It has long been known that the Hellinger distance character- 1 − exp(−nH 2(P,Q)). Setting n = Ω(1/H 2(P,Q)), we get adv ≥ 2/3, izes the asymptotic sample complexity of non-private testing (see, as desired. □

315 The Structure of Optimal Private Tests for Simple Hypotheses STOC ’19, June 23–26, 2019, Phoenix, AZ, USA

2.2 The Noisy Log-Likelihood Ratio Test 2.3 The Sample Complexity of ncLLR We now consider the noisy log-likelihood ratio test, which, similar In this section we prove the upper bound in Theorem 1.2 for the to scLLR−ε′,ε , is also ε-differentially private. case where TV(P,Q) < 1 (i.e., the supports of P and Q have non- ˜ ˜  ε empty intersection). This assumption ensures that P,Q , 0, so that Õ P(xi ) ′ ′ ( ) ncLLR−ε′,ε (X) = log + Lap(2) P ,Q are well defined. A proof of the case where TV P,Q = 1 is in Q(x ) ′ i i −ε the full version. In order to prove the upper bound in Theorem 1.2, we restate it as follows. Here Lap(2) denotes a Laplace random variable, which has den- P,Q sity proportional to exp(−|z|/2). Readers familiar with differential Theorem 2.5. The ncLLR−ε′,ε test is ε-DP and SC (ncLLR−ε′,ε ) ≤ privacy will note that scLLR corresponds to the exponential mecha-  1   n 1 1 o O 2 ′ ′ = O min , 2 ′ ′ . nism, while ncLLR responds to the report noisy max mechanism [33], ετ +(1−τ )H (P ,Q ) ετ (1−τ )H (P ,Q ) and thus the two should behave quite similarly. In particular, we Theorem 2.5 combined with a matching lower bound (given later have the following lemma. ε in Theorem 3.4) imply that ncLLR−ε′ has asymptotically optimal ε sample complexity. Thus, by Corollary 2.4, scLLR ′ has asymptot- Lemma 2.3. For any P and Q: −ε ically optimal sample complexity. P,Q P,Q (1) SC (ncLLR−ε′,ε ) = Ω(SC (scLLR−ε′,ε )). Before proving the bound, we pause to provide some intuition ′ P P,Q ′ (2) Furthermore, if −ε ≤ log Q ≤ ε then SC (ncLLR−ε ,ε ) = for its form. As discussed in the introduction, we can write P and P,Q ′ ′′ ′ ′′ Θ(SC (scLLR−ε′,ε )). Q as mixtures P = (1 −τ )P +τP and Q = (1 −τ )Q +τQ where P ′′,Q ′′ have disjoint support. Now consider a thought experiment, Proof. Recall that in which the test that must distinguish P from Q using a sample ( of size n is given, along with the sample x, a list of binary labels P with probability дε (X) ′ b1,b2,...,bn that indicate for each record whether it was sampled scLLR−ε ,ε (X) = ′ ′ Q with probability 1 − дε (X) from the first component of the mixture (either P or Q ), or the second component (either P ′′ or Q ′′). Of course this can only be a 1 cLLR− ′ (X ) ( ) = e 2 ε ,ε = though experiment—these labels are not available to a real test. where дε X 1 ( ) . If we let the threshold κ 0 then 1+e 2 cLLR−ε′,ε X Because the mixture weights are the same for both P and Q, the test based on the test statistic ncLLR−ε′,ε is the number of labels of each type would be distributed the same ( under P and under Q, and so the tester would be faced with two P with probability hε (X) ′′ ′′ ncLLR− ′ (X) = independent testing problems: distinguishing P from Q using ε ,ε − ( ) Q with probability 1 hε X a sample of size about τn, and distinguishing P ′ from Q ′ using a ( − ) where sample of size about 1 τ n. It would suffice for the tester to solve either of these problems. cLLR ′ (X ) − −ε ,ε  1 2 ′ Theorem 2.5 shows that the real tests (scLLR and ncLLR) do as 1 − 2e if cLLR−ε ,ε (X) > 0 hε (X) = cLLR ′ (X ) . well as the hypothetical tester that has access to the labels. The 1 −ε ,ε  e 2 if cLLR−ε′,ε (X) < 0  2 two arguments to the minimum in the theorem statement corre- spond directly to the ε-DP sample complexity of distinguishing Now, we will use the following two inequalities: if x < 0, then ′′ ′ 1 1 P from Q” (which requires nτ ≥ 1/ε) or distinguishing P from x 2 x x − x 2 x 1 2 e 1 2 8 1 2 e ′ 2 ′ ′ 2e ≤ 1 ≤ 2 2e , and if x > 0, 9 (1 − 2e ) ≤ 1 ≤ Q (which requires n(1 − τ ) ≥ 1/H (P ,Q )). The proof proceeds 1+e 2 x 1+e 2 x 1 − x   by breaking the clamped log-likelihood ratio into two pieces, each 1 − e 2 . Therefore, noting that PX ∼P n scLLR−ε′,ε (X) = P = 2   corresponding to one of the two mixture components (again, this EP n [дε (X)] and PX ∼P n ncLLR−ε′,ε (X) = P = EP n [hε (X)] are 8 decomposition is not known to the algorithm). These two pieces E n [h (X)] ≤ E n [д (X)] ≤ the probabilities of success, we have 9 P ε P ε correspond to the test statistics of the optimal testers for the two n [ ( )] 8 n [ ( )] ≤ n [ ( )] ≤ n [ ( )] 2EP hε X and 9 EQ hε X EQ дε X 2EQ hε X . separate sub-problems in the thought experiment. We show that Therefore, if ncLLR has a probability of success of 5/6 then scLLR the test does well at distinguishing P from Q as long as either of P,Q has a probability of success of 2/3. This implies that SC (ncLLR) ≥ these pieces is sufficiently informative. P,Q SC (scLLR). On a more mechanical level, our proof of Theorem 2.5 bounds P ′ ′ If log Q ∈ [−ε , ε] then cLLR−ε ,ε (X) = LLR(X) so we have the expectation and standard deviation of the two pieces of the test n n P (X) > Q (X) iff LLR(X) > 0 iff дε (X) ≤ hε (X) and therefore statistic. We use the following simple lemma, which states that a ∫ test statistic S performs well if the distribution of the test statistic n n EP n [дε ] − EQ n [дε ] = (P (X) − Q (X))дε (X)dX on P and Q, S(P) and S(Q), must not overlap too much. The proof ∫ is a simple application of Chebyshev’s inequality. n n ≤ (P (X) − Q (X))hε (X)dX = E n [hε ] − E n [hε ], P Q Lemma 2.6. Given a function f : X → R, constant c > 0 and n > 0, if the test statistic S(X) = Í f (x ) satisfies which completes the proof. □ i i np p o max VarP n [S(X)], VarQ n [S(X)] ≤ c|EP n [S(X)]−EQ n [S(X)]| Corollary 2.4. If ncLLR has asymptotically optimal sample com- then S can be used to distinguish between P and Q with probability plexity then scLLR has asymptotically optimal sample complexity. of success 2/3 and sample complexity at most n′ = 12c2n.

316 STOC ’19, June 23–26, 2019, Phoenix, AZ, USA Canonne, Kamath, McMillan, Smith, Ullman

The definitions of P˜ andQ˜ lend naturally to consider a partition of P = τP ′′ + (1 − τ )P ′ and the support of P ′′ is contained in S. Thus, the space X, depending on which quantities achieve the minimum ′ ′ ∫ P (x) in min(eεQ, P) and min(eε P,Q). This partition will itself play a Var [cLLR] ≤ n P(x) log2 dx P n Q ′(x) crucial role in both the proof of the theorem, and later in our lower X ∫ ′ bound: accordingly, define ′′ ′ 2 P (x) = n (τP (x) + (1 − τ )P (x)) log ′ dx X Q (x)  ε ∫ ′ S = x : P(x) − e Q(x) > 0  ′′ 2 ′ 2 P (x)  = n τP (S)ε + (1 − τ ) P (x) log ′ dx n ′ o (7) X Q (x) T = x : Q(x) − eε P(x) > 0 ′ ′ P (x) ≤ ≤ | P (x) | ≤ · | − p ′( )/ ′( )| Since log Q′(x) ε 1, log Q′(x) 3 1 Q x P x . There- and set A = X \ (S ∪ T ). ′( ) 2 ∫ ′( ) 2 P x ≤ ∫ ′( )( − p ′( )/ ′( ) fore, X P x log Q′(x) dx 9 X P x 1 Q x P x dx = 2 ′ ′ ′′ 18·H (P ,Q ). Also, P (S) = 1 and thus we have that VarP n [cLLR] = Proof of Theorem 2.5. Observe that for all x ∈ A, P˜(x) = P(x) O(nτε2 + (1 − τ )nH 2(P ′,Q ′)) = O(nτε + (1 − τ )nH 2(P ′,Q ′). Sim- ˜ ′ ′ ′2 2 ′ ′ and Q(x) = Q(x) (so that P (x)/Q (x) = P(x)/Q(x)), that for all x ∈ ilarly, Var n [cLLR] ≤ O(nτε + (1 − τ )H (P ,Q )) ≤ O(nτε + ′ ′ ′ ′ Q S ( ( )/ ( )) ∈ T ( ( )/ ( )) 2 ′ ′ , log P x Q x = ε and that for all x , log P x Q x = ( − τ )nH (P ,Q )) { n [ ], n [ ]} = ′ 1 . Finally, max VarQ cLLR VarP cLLR −ε . To show that the test works, we first show that the clamped  ( ( − ) 2( ′, ′))2  O nτ ε+n 1 τ H P Q = O ∆2 /n(τε + (1 − τ )H 2(P ′,Q ′)). log likelihood ratio (without noise) is a useful test statistic. In order nτ ε+n(1−τ )H 2(P ′,Q′) gap n ≥ C to apply Lemma 2.6, we first calculate the difference ∆gap in the Therefore, if τ ε+(1−τ )H 2(P ′,Q′)) for some suitably large con- ′ expectations of cLLR−ε ,ε under P and Q. For the remainder of the stant C > 0, this implies that max{VarQ n [cLLR], VarP n [cLLR]} ≤ proof, we omit the −ε ′, ε subscript (since clamping always occurs 2 O(∆gap) and ∆gap = Ω(1), which concludes our proof. □ to the same interval). 3 LOWER BOUND ON THE SAMPLE 1 1 ∆gap = E [cLLR(X)] − E [cLLR(X)] COMPLEXITY OF PRIVATE TESTING n n P n Q n ∫ P ′(x) We now prove the lower bound in Theorem 1.2. We do so by con- n = (P(x) − Q(x)) log ′ dx structing an appropriate coupling between the distributions P and X Q (x) Qn, which implies lower bounds for privately distinguishing Pn ∫ P ′(x) = (P(S) − Q(S))ε + (P˜(x) − Q˜(x)) log dx from Qn. This style of analysis was introduced in [3], though we ′( ) A Q x require a strengthening of their statement. Specifically, the lower + ( (T ) − (T )) ′ ′ Q P ε bound of Acharya et al. involves dε (X,Y ) = εdH (X,Y), whereas ′ ∫ P (x) we have dε (X,Y) = min(εdH (X,Y), 1). ( ˜(S) − ˜(S) ) ( ˜( ) − ˜( )) = P Q + τ ε + P x Q x log ′ dx , ∈ Xn ( , ) A Q (x) For X Y , let dH X Y be the Hamming distance between n n X,Y (i.e., | { i : x , y } |). Given a metric d : X × X → R≥ , + (Q˜(T ) − P˜(T) + τ )ε ′ i i 0 we define the Wasserstein distance Wd (P,Q) to be Wd (P,Q) = n inf ρ E(X,Y )∼ρ [d(X,Y)], where the inf is over all couplings ρ of P where the last equality follows by the definition of τ . Moreover, n and Q . Let dε (X,Y) = min{εdH (X,Y ), 1}. ′ ′ ′ ′ nKL P ∥Q + nKL Q ∥P Lemma 3.1. For every ε-DP algorithm M : Xn → {0, 1}, if X and ∫ ′ [ ( )] ≤ ε [ ( )] ′ ′ P (x) Y are neighboring datasets then E M X e E M Y , where the = n (P (x) − Q (x)) log ′ dx X Q (x) expectations are over the randomness of the algorithm M. ′ ∫ P (x) n = n(P ′(S) − Q ′(S))ε + n (P ′(x) − Q ′(x)) log dx Lemma 3.2. For every ε-DP algorithm M : X → {P,Q}, we ′( ) A Q x have the following bound on the advantage: PX ∼P [M(X) = P] − + n(Q ′(T ) − P ′(T ))ε ′ [ ( ) ] ≤ ( ( )) PX ∼Q M X = P O Wdε P,Q . n  ∫ P ′(x) ( ˜(S) − ˜(S)) ( ˜( ) − ˜( )) n n n n = P Q ε + P x Q x log ′ Proof. Let ρ : X × X → R≥0 be a coupling of P and Q and 1 − τ A Q (x)  M be an ε-DP algorithm. We have + (Q˜(T ) − P˜(T ))ε ′ . P [M(X) = P] − P [M(X) = P] X ∼P X ∼Q Therefore, ∆ = (1−τ )n(KL (P ′∥Q ′)+KL (Q ′∥P ′))+nτε +nτε ′ ≥ ∫ ∫ gap  n ( − τ )H 2(P ′,Q ′) + τε , = ρ(X,Y)P [M(X) = P] − ρ(X,Y)P [M(Y) = P] dX dY 2 1 where the last inequality follows since Xn Xn M M H 2(P,Q) ≤ KL (P ∥Q) for any distributions P and Q. ∫ ∫ εd (X,Y ) We now turn to bounding the variance of the noisy clamped LLR ≤ ρ(X,Y) min{1, (e H − 1)P [M(Y) = P]} dX dY Xn Xn M under P and Q. Noting that VarP n [ncLLR] = VarP n [cLLR] + 8, by ∫ ∫ Lemma 2.6 it suffices to show that max{VarQ n [cLLR]+8, VarP n [cLLR]+ ≤ 2 ρ(X,Y) min{1, εdH (X,Y )} dX dY 2 Xn Xn 8} ≤ O(∆gap); or, equivalently, that max{VarQ n [cLLR], VarP n [cLLR]} ≤ 2 ′′ = O( E [dε (X,Y)]), O(∆gap) and ∆gap = Ω(1). Recall that P is a distribution such that (X,Y )∼ρ

317 The Structure of Optimal Private Tests for Simple Hypotheses STOC ’19, June 23–26, 2019, Phoenix, AZ, USA where PM [·] denotes that the probability is over the randomness of 4 APPLICATION: DIFFERENTIALLY PRIVATE the algorithm M, and the first inequality follows from Lemma 3.1. CHANGE-POINT DETECTION □ In this section, we give an application of our method to differen- tially private change-point detection (CPD). In the change-point The upper bound in Lemma 3.2 is in fact tight. We state the detection problem, we are given a time-series of data. Initially, it converse (whose proof appears in the full version) below for com- comes from a known distribution P, and at some unknown time pleteness, although we will not use it in this work. step, it begins to come from another known distribution Q. The Lemma 3.3. There is a ε-DP algorithm M : Xn → {P,Q} such goal is to approximate when this change occurs. More formally, we [ ( ) ] − [ ( ) ] ≥ ( ( )) that PX ∼P M X = P PX ∼Q M X = P Ω Wdε P,Q . have the following definition. We will also rely on the following standard fact characterizing Definition 4.1. In the offline change-point detection problem, we total variation distance in terms of couplings: are given distributions P,Q and a data set X = {x1,..., xn }. We are ∗ guaranteed that there exists k ∈ [n] such that x1,..., xk ∗−1 ∼ P Fact 1. Given P and Q, there exists a coupling ρ of P and Q such ∗ and x ∗ ,..., x ∼ Q. The goal is to output kˆ such that |kˆ − k | is that P [X , Y] = TV(P,Q). We will refer to this coupling as the k n ρ small. total variation coupling of P and Q. In the online change-point detection problem, we are given dis- We can now prove the lower bound component of Theorem 1.2. tributions P,Q, a stream of data points X = {x1,... }. We are ′ ′ ∗ Recall that P and Q were defined in Equation (5). guaranteed that there exists k such that x1,..., xk ∗−1 ∼ P and ˆ ˆ ∗ xk ∗ , · · · ∼ Q. The goal is to output k such that |k − k | is small. Theorem 3.4. Given P and Q, every ε-DP test K that distinguishes   P,Q ( ) = 1 We study the parameterization of the private change-point de- P and Q has the property that SC K Ω ετ +H 2(P ′,Q′) . tection problem recently introduced by Cummings et al. [22]. Proof. Consider the following coupling of Pn and Qn: Given n 5 Definition 4.2 (Change-Point Detection). An algorithm for a (on- a sample X ∼ P , independently for all xi ∈ S label the point 1 ε ( ) line) change-point detection problem is (α, β)-accurate if for any with probability e Q xi , otherwise label it . Label all the points in P(xi ) input dataset (data stream), with probability at least 1 − β outputs 1 1 A ∪T as . (In particular, this implies that each xi is labeled with a kˆ such that |kˆ − k∗| ≤ α, where the probability is with respect probability 1−τ , and with probability τ .) Each point labeled is then to the randomness in the sampling of the data set and the random independently re-sampled from T with probability distribution ′ choices made by the algorithm. Q−e ε P 1 1 ′ τ T . Let Λ ⊆ [n] be the set of points labeled , and n be ′ Our main result is the following: its size; and note that this set is distributed according to (P ′)n . ′ n′ We transform this set to a set distributed by (Q ) using the TV- Theorem 4.3. There exists an efficient ε-differentially private and ′ n′ ′ n′ n coupling of (P ) and (Q ) . The result is a sample from Q . (α, β)-accurate algorithm for offline change-point detection from dis- Now, we can rewrited (X,Y) = Í 1 +Í 1 =   H i<Λ {Xi =Yi } i ∈Λ {Xi =Yi } = 1 · ( / ) Í tribution P to Q with α Θ ετ (P,Q)+H 2(P ′,Q′) log 1 β . dH (X ¯ ,Y¯ ) + 1{ } · ∈ 1{ }, and therefore Λ Λ XΛ=YΛ i Λ Xi =Yi Furthermore, there exists an efficient ε-differentially private and E [min{εdH (X,Y), 1}] (α, β)-accurate algorithm for online change-point detection from dis- = E min{εd (X,Y ), 1}1  + E min{εd (X,Y), 1}1  tribution P to Q with the same value of α. This latter algorithm also H {XΛ=YΛ } H {XΛ,YΛ }  ,  ∗  requires as input a value n such that n = Ω SCP Q · log k . If ≤  ( )1  1  ε nβ εE dH X,Y {XΛ=YΛ } + E {XΛ,YΛ } the algorithm is accurate, it will observe at most k∗ + 2n data points,  ( ) [ ] = εE dH XΛ¯ ,YΛ¯ + P XΛ , YΛ . and with high probability observe k∗ + O(n logn) data points. Recalling now that the distribution of (XΛ,YΛ) is that of the TV- ′ ′ For constant β, the accuracy of our offline algorithm is optimal coupling of (P ′)n and (Q ′)n , and that |Λ¯ | = n − n′, we get up to constant factors, since one can easily show that the best h ′ ′ n′ ′ n′ i ( P,Q ) E [min{εdH (X,Y), 1}] ≤ E ε(n − n ) + TV((P ) , (Q ) ) accuracy achievable is Ω SCε . A similar statement holds for our online algorithm when the algorithm is given an estimate n of k∗ h√ i p ≤ ετn + E n′ H(P ′,Q ′) ≤ ετn + (1 − τ )n · H(P ′,Q ′) such that n = poly(k∗). As one might guess, this problem is intimately related to the Therefore, by Lemma 3.2, we have that for every ε-DP test M, hypothesis testing question studied in the rest of this paper. Indeed,

p ′ ′ our change-point detection algorithm will use the hypothesis test- P [M(X) = P] − P [M(X) = P] ≤ ετn+ (1 − τ )n·H(P ,Q ). X ∼P X ∼Q ing algorithm of Theorem 1.2 as a black box, in order to reduce to a simpler Bernoulli change-point detection problem (see Lemma 4.4 Thus, in order for the probability of success to be Ω(1), we need in Section 4.1). We then give an algorithm to solve this simpler p ′ ′ either ετn or (1 − τ )nH(P ,Q ) to be Ω(1). That is, n ≥ problem (Lemma 4.5 in Section 4.2), completing the proof of The-  n o Ω 1 , 1 orem 4.3. In Section 4.3, we show that our reduction is applicable min ετ (1−τ )H 2(P ′,Q′) . □ more generally, as we describe an algorithm change-point detection 5Recall the definitions of S, T and A from Section 2.3 (Equation (7)). in a goodness-of-fit setting.

318 STOC ’19, June 23–26, 2019, Phoenix, AZ, USA Canonne, Kamath, McMillan, Smith, Ullman

4.1 A Reduction to Bernoulli CPD the algorithm is accurate, it will observe at most k∗ + 2n data points, ∗ In this section, we provide a reduction from private change-point and with high probability observe k + O(n logn) data points. detection with arbitrary distributions to non-private change-point Proof. We start by describing the algorithm for the offline ver- detection with Bernoulli distributions. sion of the problem. We then discuss how to reduce from the online Lemma 4.4. Suppose there exists an (α, β)-accurate algorithm problem to the offline problem. which can solve the following restricted change-point detection prob- ˜∗ Offline Change-Point Detection. We define the function ℓ(t) = lem: we are guaranteed that there exists k such that z1,..., z ∗ ∼ k˜ −1 Ín ˆ ( ) 2 Ber(τ ) − 1 for some τ > 2/3, z , · · · ∼ 2 Ber(τ ) − 1 for some j=t zj . The algorithm’s output will be k = arg min1≤t ≤n ℓ t . 0 0 k˜ ∗+1 1 ∗ τ < / , and z ∈ {± } is arbitrary. Let k be the true change-point index. To prove correctness 1 1 3 k˜ ∗ 1 ∗ , of this algorithm, we show that ℓ(k + 1) − ℓ(t) < 0 for all t ≥ Then there exists an ε-differentially private and ((α +1)·SCP Q , β)- ε k∗ + 1 + Θ(log(1/β)), and that ℓ(k∗ − 1) − ℓ(t) < 0 for all t ≤ k∗ − accurate algorithm which solves the change-point detection problem, − ( ( / )) ( ) ∈ P,Q 1 Θ log 1 β . Together, these will show that arg min1≤t ≤n ℓ t where SCε is as defined in Theorem 1.2. [k∗ − 1 − Θ(log(1/β)),k∗ + 1 + Θ(log(1/β))], proving the result. Proof. We describe the reduction for the offline version of the For the remainder of this proof, we focus on the former case, the latter will follow symmetrically. Specifically, we will show that problem, the reduction in the online setting is identical. The reduc- ∗ ∗ tion is easy to describe. ℓ(k + 1) − ℓ(t) < 0 for all t ≥ k + 1 + c log(1/β), where c > 0 is P,Q some large absolute constant. We partition the samples into intervals of length SC . Let ε ≥ ∗ + ℓ( ∗ + ) − ℓ( ) = Ít−1 P,Q Observe that for t k 1, k 1 t j=k ∗+1 zj Yj = {x P,Q ,..., x P,Q }, for j = 1 to ⌊n/SCε ⌋, and (i−1)SCε +1 iSCε forms a biased random walk which takes a +1-step with probability disregard the remaining “tail” of xi ’s. We run the algorithm of τ1 ≤ 1/3 and a −1-step with probability 1 − τ1 ≥ 2/3. Define Theorem 1.2 on each Yj , and produce a bit zj = 1 if the algorithm M = ℓ(k∗ + 1) − ℓ(k∗ + 1 +i) +i(1 − 2τ ) for i = 0 to n −k∗ − 1, and − i 1 outputs that the distribution is P, and a zj = 1 otherwise. note that this forms a martingale sequence. We will use Theorem 4 We show that this forms a valid instance of the change-point ∗ of [7], which provides a finite-time law of the iterated logarithm detection problem in the lemma statement. Let k be the change- result. Specialized to our setting, we obtain the following maximal Y point in the original problem, and suppose it belongs to k˜ ∗ . For inequality, bounding how far this martingale deviates away from 0. every j < k˜∗, all the samples are from P, and by Theorem 1.2, each Theorem 4.6 (Follows from Theorem 4 of [7]). Let c > 0 be zj will independently be 1 with probability τ0 ≥ 2/3. Similarly ˜∗ some absolute constant. With probability at least 1 − β, for all i ≥ for every j > k , all the samples are from Q, and each zj will   ( / ) | | ≤ p ( ( ) ( / )) independently be −1 with probability at least 1 − τ1 ≥ 2/3. c log 1 β simultaneously, Mi O i log log i + log 1 β . Finally, we show that the existence of an (α, β)-accurate algo- This implies that, with probability at least 1 − β, we have that rithm that solves this problem also solves the original problem. for all i ≥ c log(1/β), ℓ(k∗ + 1) − ℓ(k∗ + 1 + i) = M − i(1 − 2τ ) ≤ Suppose that the output of the algorithm on the restricted change- i 1 p  i point detection problem is j. To map this to an answer to the original O i (log log(i) + log(1/β)) − 3 . Note that the right-hand side is , problem, we let kˆ = (j − 1)SCP Q . non-increasing in i, so it is maximized at i = c log(1/β), and thus ε   First, note that kˆ will be ε-differentially private. We claim that ℓ(k∗+1)−ℓ(k∗+1+i) ≤ O plog(1/β) (log log log(1/β) + log(1/β)) − the sequence of zj ’s is ε-differentially private. This is because the c ( /β) log 1 < 0, where the last inequality follows for a sufficiently algorithm of Theorem 1.2 is ε-differentially private, we apply the 3 large choice of c. algorithm independently to each component of the partition, and each data point affects only one component (since they are disjoint). Online Change-Point Detection. The algorithm will be as follows. Privacy of kˆ follows since privacy is closed under post-processing. Partition the stream into consecutive intervals of length n, which Finally, we establish accuracy. In the restricted change-point we will draw in batches. If an interval has more −1’s than +1’s, detection problem, with probability at least 1 − β, the output j will then call the offline change-point detection algorithm on the final be such that |j − k˜∗| ≤ α. In the original problem’s domain, this 2n data points with failure probability parameter set to β/4, and ˆ ˆ ∗ P,Q corresponds to a k such that |k −k | ≤ (α +1)SCε , as desired. □ output whatever it says. Let k∗ be the true change-point index. First, we show that with 4.2 Solving Bernoulli CPD probability ≥ 1−β/4, the algorithm will not see more −1’s than +1’s ∗ In this section, we show that there is a (Θ(log(1/β))+1, β)-accurate in any interval before the one containing k . The number of +1’s in algorithm for the restricted change-point detection problem. Com- this interval will be distributed as Binomial(n,τ0) for τ0 > 2/3. By a bined with Lemma 4.4, this implies Theorem 4.3. Chernoff bound, the probability that we have > n/2 −1’s is at most exp(−Θ(n)). Taking a union bound over all O(k∗/n) intervals before There exists an efficient (O( ( /β), β)-accurate al- ∗ Lemma 4.5. log 1 the change point gives a failure probability of k exp(−Θ(n)) ≤ β/4, gorithm for the offline restricted change-point detection problem (as n where the last inequality follows by our condition on n. defined in Lemma 4.4). Next, note that the interval following the one containing k∗ will Similarly, there exists an efficient (O(log(1/β), β)-accurate algo- have a number of +1’s which is distributed as Binomial(n,τ1) for rithm for the online restricted change-point detection problem. This τ1 < 1/3. By a similar Chernoff bound, the probability that we have   k ∗  algorithm requires as input a value n such that n = Ω log nβ . If > n/2 +1’s is at most exp(−Θ(n)) ≪ β/4. Thus, with probability

319 The Structure of Optimal Private Tests for Simple Hypotheses STOC ’19, June 23–26, 2019, Phoenix, AZ, USA

1 − β/2, the algorithm will call the offline change-point detection REFERENCES ∗ algorithm on an interval containing the true change point k . [1] Jayadev Acharya, Clément L. Canonne, Cody Freitag, and Himanshu Tyagi. 2019. We conclude by the correctness guarantees of the offline change- Test without Trust: Optimal Locally Private Distribution Testing. Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics, AISTATS point detection algorithm. Note that we chose the failure probablity 2019 (To appear). Full version available at arXiv:1808.02174. (2019). parameter to be β/4, as the offline algorithm may either be called [2] Jayadev Acharya, Gautam Kamath, Ziteng Sun, and Huanyu Zhang. 2018. INSPEC- ∗ TRE: Privately Estimating the Unseen. In Proceedings of the 35th International at the interval containing k , or the following one, and we take a Conference on Machine Learning (ICML ’18). JMLR, Inc., 30–39. union bound over both of them. □ [3] Jayadev Acharya, Ziteng Sun, and Huanyu Zhang. 2018. Differentially Private Testing of Identity and Closeness of Discrete Distributions. In Advances in Neural Information Processing Systems 31 (NeurIPS ’18). Curran Associates, Inc. 4.3 Private Goodness-of-Fit CPD [4] Maryam Aliakbarpour, Ilias Diakonikolas, and Ronitt Rubinfeld. 2018. Differ- Our reduction above is rather general, and it can apply to more entially Private Identity and Closeness Testing of Discrete Distributions. In Proceedings of the 35th International Conference on Machine Learning (ICML ’18). general change-point detection settings. For instance, the above JMLR, Inc., 169–178. discussion assumes we know both the initial and final distributions [5] Jordan Awan and Aleksandra Slavkovic. 2018. Differentially Private Uniformly Most Powerful Tests for Binomial Data. In Advances in Neural Information Pro- P and Q. Instead, one could imagine a setting where one knows cessing Systems 31 (NeurIPS ’18). Curran Associates, Inc., 4212–4222. the initial distribution P but not the final distribution Q, which we [6] Borja Balle, Gilles Barthe, and Marco Gaboardi. 2018. Privacy Amplification by term goodness-of-fit change-point detection. Subsampling: Tight Analyses via Couplings and Divergences. In Advances in Neural Information Processing Systems 31 (NeurIPS ’18). Curran Associates, Inc., 6280–6290. Definition 4.7. In the offline γ -goodness-of-fit change-point de- [7] Akshay Balsubramani. 2015. Sharp Finite-Time Iterated-Logarithm Martingale tection problem, we are given a distribution P over domain X and Concentration. arXiv preprint arXiv:1405.2639 (2015). [8] Ziv Bar-Yossef. 2002. The Complexity of Massive Data Set Computations. Ph.D. a data set X = {x1,..., xn }. We are guaranteed that there exists ∗ Dissertation. UC Berkeley. Adviser: Christos Papadimitriou. Available at http: k ∈ [n] such that x1,..., xk ∗−1 ∼ P, and xk ∗ ,..., xn ∼ Q, for //webee.technion.ac.il/people/zivby/index_files/Page1489.html. some fixed (but unknown) distribution Q over domain X, such that [9] Raef Bassily, , Adam Smith, Thomas Steinke, Uri Stemmer, and ˆ ˆ ∗ Jonathan Ullman. 2016. Algorithmic Stability for Adaptive Data Analysis. In TV(P,Q) ≥ γ . The goal is to output k such that |k − k | is small. Proceedings of the 48th Annual ACM Symposium on the Theory of Computing (STOC ’16). ACM, Cambridge, MA, 1046–1059. We note that analogous definitions and results hold for the online [10] Amos Beimel, Kobbi Nissim, and Uri Stemmer. 2013. Characterizing the Sample version of this problem, as in the previous sections. Complexity of Private Learners. In Proceedings of the 4th Conference on Innovations in Theoretical Computer Science (ITCS ’13). ACM, 97–110. We omit the full details of the proof, but it proceeds by a very [11] Aditya Bhaskara, Daniel Dadush, Ravishankar Krishnaswamy, and Kunal Talwar. similar argument to that in Sections 4.1 and 4.2. In particular, it is 2012. Unconditional differentially private mechanisms for linear queries. In possible to prove an analogue of Lemma 4.4, at which point we can Proceedings of the forty-fourth annual ACM symposium on Theory of computing. ACM, 1269–1284. apply Lemma 4.5. The only difference is that we need an algorithm [12] Aleksandr Alekseevich Borovkov. 1999. Mathematical Statistics. CRC Press. for private goodness-of-fit testing, rather than Theorem 1.2 for [13] Hai Brenner and Kobbi Nissim. 2014. Impossibility of Differentially Private Universally Optimal Mechanisms. SIAM J. Comput. 43, 5 (2014), 1513–1540. hypothesis testing. We use the following result from [3]. [14] Mark Bun and Thomas Steinke. 2016. Concentrated Differential Privacy: Simpli- fications, Extensions, and Lower Bounds. In Proceedings of the 14th Conference on Theorem 4.8 (Theorem 13 of [3]). Let P be a known distribution Theory of Cryptography (TCC ’16-B). Springer, Berlin, Heidelberg, 635–658. over X, and let Q be the set of all distributions Q over X such that [15] Mark Bun, Jonathan Ullman, and . 2014. Fingerprinting Codes and ( ) ≥ the Price of Approximate Differential Privacy. In Proceedings of the 46th Annual TV P,Q γ . Given n samples from an unknown distribution which ACM Symposium on the Theory of Computing (STOC ’14). ACM, New York, NY, is either P, or some Q ∈ Q, there exists an efficient ε-differentially USA, 1–10. private algorithm which distinguishes between the two cases with [16] Bryan Cai, , and Gautam Kamath. 2017. Priv’IT: Private  |X |1/2 |X |1/2 |X |1/3  and Sample Efficient Identity Testing. In Proceedings of the 34th International ≥ / 1 Conference on Machine Learning (ICML ’17). JMLR, Inc., 635–644. probability 2 3 when n = Θ 2 + 1/2 + 4/3 2/3 + γ ε . γ γ ε γ ε [17] Zachary Campbell, Andrew Bray, Anna Ritz, and Adam Groce. 2018. Differentially Private ANOVA Testing. In Proceedings of the 2018 International Conference on With this in hand, we have the following result for goodness-of- Data Intelligence and Security (ICDIS ’18). IEEE Computer Society, Washington, fit changepoint detection. DC, USA, 281–285. [18] Clément L. Canonne. 2017. A short note on distinguishing discrete distribu- Theorem 4.9. There exists an efficient ε-differentially private and tions. https://github.com/ccanonne/probabilitydistributiontoolbox/blob/master/ testing.pdf. (2017). (α, β)-accurate algorithm for offline γ -goodness-of-fit change-point [19] Clément L. Canonne, Gautam Kamath, Audra McMillan, Adam Smith, and  |X |1/2 |X |1/2 |X |1/3  detection with α = Θ + + + 1 · log(1/β). Jonathan Ullman. 2018. The Structure of Optimal Private Tests for Simple Hy- γ 2 γ ε 1/2 γ 4/3ε 2/3 γ ε potheses. arXiv preprint arXiv:1811.08382 (2018). [20] Kamalika Chaudhuri, Claire Monteleoni, and Anand D. Sarwate. 2011. Differen- tially Private Empirical Risk Minimization. Journal of Machine Learning Research ACKNOWLEDGMENTS 12 (2011), 1069–1109. CC was supported by a Motwani Postdoctoral Fellowship. GK was [21] Simon Couch, Zeki Kazan, Kaiyan Shi, Andrew Bray, and Adam Groce. 2019. Differentially Private Nonparametric Hypothesis Testing. arXiv preprint supported as a Research Fellow, as part of the Simons- arXiv:1903.09364 (2019). Berkeley Research Fellowship program. AM was supported by NSF [22] Rachel Cummings, Sara Krehbiel, Yajun Mei, Rui Tuo, and Wanrong Zhang. 2018. award AF-1763786, a Sloan Foundation Research Award, and a Differentially Private Change-Point Detection. In Advances in Neural Information Processing Systems 31 (NeurIPS ’18). Curran Associates, Inc., 10848–10857. postdoctoral fellowship from BU’s Hariri Institute for Computing. [23] Rachel Cummings, Katrina Ligett, Kobbi Nissim, Aaron Roth, and Zhiwei Steven AS was supported by NSF awards IIS-1447700 and AF-1763786 and Wu. 2016. Adaptive learning with robust generalization guarantees. In Conference on Learning Theory, COLT. 772–814. a Sloan Foundation Research Award. JU was supported by NSF [24] Aref N. Dajani, Amy D. Lauger, Phyllis E. Singer, Daniel Kifer, Jerome P. Reiter, grants CCF-1718088, CCF-1750640, and CNS-1816028, and a Google Ashwin Machanavajjhala, Simson L. Garfinkel, Scot A. Dahl, Matthew Graham, Faculty Research Award. We are grateful to Salil Vadhan for valuable Vishesh Karwa, Hang Kim, Philip Lelerc, Ian M. Schmutte, William N. Sexton, Lars Vilhuber, and John M. Abowd. 2017. The Modernization of Statistical Disclosure discussions in the early stages of this work.

320 STOC ’19, June 23–26, 2019, Phoenix, AZ, USA Canonne, Kamath, McMillan, Smith, Ullman

Limitation at the U.S. Census Bureau. (2017). Presented at the September 2017 [49] Daniel Kifer and Ryan M. Rogers. 2017. A New Class of Private Chi-Square Tests. meeting of the Census Scientific Advisory Committee. In Proceedings of the 20th International Conference on Artificial Intelligence and [25] Differential Privacy Team, Apple. 2017. Learning with Privacy at Statistics (AISTATS ’17). JMLR, Inc., 991–1000. Scale. https://machinelearning.apple.com/docs/learning-with-privacy-at-scale/ [50] Martin Kulldorff. 2001. Prospective Time Periodic Geographical Disease Surveil- appledifferentialprivacysystem.pdf. (December 2017). lance Using a Scan Statistic. Journal of the Royal Statistical Society: Series A [26] John C. Duchi, Michael I. Jordan, and Martin J. Wainwright. 2013. Local Privacy (Statistics in Society) 164, 1 (2001), 61–72. and Statistical Minimax Rates. In IEEE 57th Annual Symposium on Foundations of [51] Tze Leung Lai. 1995. Sequential Changepoint Detection in Quality Control and Computer Science (FOCS ’13). 429–438. Dynamical Systems. Journal of the Royal Statistical Society: Series B (Statistical [27] John C. Duchi and Feng Ruan. 2018. The Right Complexity Measure in Locally Pri- Methodology) 57, 4 (1995), 613–658. vate Estimation: It is not the Fisher Information. arXiv preprint arXiv:1806.05756 [52] Jingbo Liu, Paul W. Cuff, and Sergio Verdú. 2015. Eγ -Resolvability. CoRR (2018). abs/1511.07829 (2015). arXiv:1511.07829 http://arxiv.org/abs/1511.07829 [28] , Vitaly Feldman, Moritz Hardt, Toniann Pitassi, , [53] Gary Lorden. 1971. Procedures for Reacting to a Change in Distribution. The and Aaron Roth. 2015. Generalization in Adaptive Data Analysis and Holdout Annals of Mathematical Statistics 42, 6 (1971), 1897–1908. Reuse. In Advances in Neural Information Processing Systems (NIPS). [54] Frank McSherry and Kunal Talwar. 2007. Mechanism Design via Differential [29] Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, Privacy. In 48th Annual IEEE Symposium on Foundations of Computer Science and Aaron Roth. 2015. Preserving Statistical Validity in Adaptive Data Analysis. (FOCS ’07). 94–103. In Proceedings of the 47th Annual ACM Symposium on the Theory of Computing [55] Yajun Mei. 2006. Sequential Change-Point Detection when Unknown Parameters (STOC ’15). ACM, 1046–1059. are Present in the Pre-Change Distribution. The Annals of Statistics 34, 1 (2006), [30] Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, 92–122. and Aaron Roth. 2015. The reusable holdout: Preserving validity in adaptive data [56] George V. Moustakides. 1986. Optimal Stopping Times for Detecting Changes in analysis. Science 349, 6248 (June 2015), 636–638. Distributions. The Annals of Statistics 14, 4 (1986), 1379–1387. [31] Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and [57] Aleksandar Nikolov. 2015. An Improved Private Mechanism for Small Databases. . 2006. Our Data, Ourselves: Privacy Via Distributed Noise Generation. In Automata, Languages, and Programming - 42nd International Colloquium, ICALP. In Advances in Cryptology - EUROCRYPT 2006, 25th Annual International Confer- 1010–1021. ence on the Theory and Applications of Cryptographic Techniques, St. Petersburg, [58] Aleksandar Nikolov, Kunal Talwar, and Li Zhang. 2016. The Geometry of Differ- Russia, May 28 - June 1, 2006, Proceedings. 486–503. ential Privacy: The Small Database and Approximate Cases. SIAM J. Comput. 45, [32] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006. Cal- 2 (2016), 575–616. https://doi.org/10.1137/130938943 ibrating Noise to Sensitivity in Private Data Analysis. In 3rd IACR Theory of [59] Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. 2007. Smooth sensitivity Cryptography Conference (TCC ’06). Springer, New York, NY, 265–284. and sampling in private data analysis. In Proceedings of the 30th annual ACM [33] Cynthia Dwork and Aaron Roth. 2014. The Algorithmic Foundations of Differen- Symposium on Theory of Computing, STOC. 75–84. tial Privacy. Foundations and Trends in Theoretical Computer Science 9, 3-4 (2014), [60] Ewan S. Page. 1954. Continuous Inspection Schemes. Biometrika 41, 1/2 (1954), 211–407. 100–115. [34] Cynthia Dwork and Guy N Rothblum. 2016. Concentrated Differential Privacy. [61] Ewan S. Page. 1955. A Test for a Change in a Parameter Occurring at an Unknown arXiv preprint arXiv:1603.01887 (2016). Point. Biometrika 42, 3/4 (1955), 523–527. [35] Cynthia Dwork, Adam Smith, Thomas Steinke, Jonathan Ullman, and Salil Vadhan. [62] Moshe Pollak. 1985. Optimal Detection of a Change in Distribution. The Annals 2015. Robust Traceability from Trace Amounts. In Proceedings of the 56th Annual of Statistics 13, 1 (1985), 206–227. IEEE Symposium on Foundations of Computer Science (FOCS ’15). IEEE Computer [63] Moshe Pollak. 1987. Average Run Lengths of an Optimal Method of Detecting a Society, Berkeley, CA, 650–669. Change in Distribution. The Annals of Statistics 15, 2 (1987), 749–779. [36] Úlfar Erlingsson, Vasyl Pihur, and Aleksandra Korolova. 2014. RAPPOR: Ran- [64] Ryan Rogers, Aaron Roth, Adam Smith, and Om Thakkar. 2016. Max-information, domized aggregatable privacy-preserving ordinal response. In ACM SIGSAC differential privacy, and post-selection hypothesis testing. In IEEE 57th Annual Conference on Computer and Communications Security, CCS. Symposium on Foundations of Computer Science, FOCS. 487–494. [37] Vitaly Feldman and Thomas Steinke. 2017. Generalization for Adaptively-chosen [65] Daniel Russo and James Zou. 2016. Controlling bias in adaptive data analysis Estimators via Stable Median. In COLT 2017 - The 30th Annual Conference on using information theory. In Artificial Intelligence and Statistics, AISTATS. 1232– Learning Theory. 1240. [38] Vitaly Feldman and Thomas Steinke. 2018. Calibrating Noise to Variance in [66] Or Sheffet. 2018. Locally Private Hypothesis Testing. In Proceedings of the 35th Adaptive Data Analysis. In Proceedings of the 31st Annual Conference on Learning International Conference on Machine Learning (ICML ’18). JMLR, Inc., 4605–4614. Theory (COLT ’18). 535–544. [67] Walter Andrew Shewhart. 1931. Economic Control of Quality of Manufactured [39] Vitaly Feldman and David Xiao. 2014. Sample complexity bounds on differentially Product. ASQ Quality Press. private learning via communication complexity. In Proceedings of the 27th Annual [68] Albert N. Shiryaev. 1963. On Optimum Methods in Quickest Detection Problems. Conference on Learning Theory (COLT ’14). 1000–1019. Theory of Probability & Its Applications 8, 1 (1963), 22–46. [40] Marco Gaboardi, Hyun-Woo Lim, Ryan M Rogers, and Salil P Vadhan. 2016. [69] Adam Smith. 2011. Privacy-Preserving Statistical Estimation with Optimal Con- Differentially Private Chi-Squared Hypothesis Testing: Goodness of Fit and vergence Rates. In Proceedings of the 43rd Annual ACM Symposium on the Theory Independence Testing.. In ICML. 2111–2120. of Computing (STOC ’11). ACM, New York, NY, USA, 813–822. [41] Marco Gaboardi and Ryan Rogers. 2018. Local Private Hypothesis Testing: Chi- [70] Social Science One. 2018. SocialScienceOne. https://socialscience.one/. Square Tests. In Proceedings of the 35th International Conference on Machine (2018). Learning (ICML ’18). JMLR, Inc., 1626–1635. [71] Marika Swanberg, Ira Globus-Harris, Iris Griffith, Anna Ritz, Adam Groce, and [42] Alison L. Gibbs and Francis E. Su. 2002. On choosing and bounding probability Andrew Bray. 2019. Improved Differentially Private Analysis of Variance. Pro- metrics. International Statistical Review 70, 3 (dec 2002), 419–435. ceedings on Privacy Enhancing Technologies 2019, 3 (2019). [43] Moritz Hardt and Kunal Talwar. 2010. On the geometry of differential privacy. [72] Caroline Uhler, Aleksandra Slavković, and Stephen E. Fienberg. 2013. Privacy- In Proceedings of the 42nd ACM Symposium on Theory of Computing, STOC. preserving Data Sharing for Genome-wide Association Studies. The Journal of [44] Kazuya Kakizaki, Jun Sakuma, and Kazuto Fukuchi. 2017. Differentially Private Privacy and Confidentiality 5, 1 (2013), 137–166. Chi-squared Test by Unit Circle Mechanism. In Proceedings of the 34th Interna- [73] Venugopal V. Veeravalli and Taposh Banerjee. 2014. Quickest Change Detection. tional Conference on Machine Learning (ICML ’17). JMLR, Inc., 1761–1770. Academic Press Library in Signal Processing 3 (2014), 209–255. [45] Vishesh Karwa and Salil Vadhan. 2018. Finite Sample Differentially Private Confi- [74] Duy Vu and Aleksandra Slavković. 2009. Differential Privacy for Clinical Trial dence Intervals. In Proceedings of the 9th Conference on Innovations in Theoretical Data: Preliminary Evaluations. In 2009 IEEE International Conference on Data Computer Science (ITCS ’18). Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Mining Workshops (ICDMW ’09). IEEE, 138–143. Dagstuhl, Germany, 44:1–44:9. [75] Yue Wang, Daniel Kifer, Jaewoo Lee, and Vishesh Karwa. 2018. Statistical Ap- [46] Shiva Prasad Kasiviswanathan, Homin K. Lee, Kobbi Nissim, Sofya Raskhod- proximating Distributions Under Differential Privacy. The Journal of Privacy and nikova, and Adam Smith. 2008. What Can We Learn Privately?. In FOCS. IEEE, Confidentiality 8, 1 (2018), 1–33. 531–540. [76] Yue Wang, Jaewoo Lee, and Daniel Kifer. 2015. Revisiting Differentially Private [47] Shiva Prasad Kasiviswanathan and Adam D. Smith. 2008. A Note on Differential Hypothesis Tests for Categorical Data. arXiv preprint arXiv:1511.03376 (2015). Privacy: Defining Resistance to Arbitrary Side Information. CoRR abs/0803.3946 [77] Yu-Xiang Wang, Jing Lei, and Stephen E. Fienberg. 2016. A Minimax Theory for (2008). Adaptive Data Analysis. CoRR abs/1602.04287 (2016). [48] Assimakis Kattis and Aleksandar Nikolov. 2016. Lower Bounds for Differential Pri- [78] Aolin Xu and Maxim Raginsky. 2017. Information-Theoretic Analysis of Gener- vacy from Gaussian Width, In 33rd International Symposium on Computational alization Capability of Learning Algorithms. In Advances in Neural Information Geometry. arXiv preprint arXiv:1612.02914, 45:1–45:16. Processing Systems 30 (NIPS ’17). Curran Associates, Inc., 2524–2533.

321