Sublinear-Time Computation in the Presence of An Online Adversary
Iden Kalemaj∗ Sofya Raskhodnikova† Nithin Varma‡ June 30, 2021
Abstract We initiate the study of sublinear-time algorithms that access their input via an online adversarial oracle. After answering each query to the input object, such an oracle can erase or corrupt a small number of input values. Our goal is to understand the complexity of basic computational tasks in extremely adversarial situations, where the algorithm’s access to data is blocked—or fraudulent values are introduced—during the execution of the algorithm in response to its actions. Specifically, we focus on property testing in the presence of an online oracle that makes t erasures adversarially after answering each query of the algorithm. We show that two fundamental properties of functions, linearity and quadraticity, can be tested for constant t with asymptotically the same complexity as in the standard property testing model. In contrast, some other properties, including sortedness and the Lipschitz property of sequences, cannot be tested at all, even for t = 1. Our investigation leads to a deeper understanding of the structure of violations to these widely studied properties.
∗Department of Computer Science, Boston University. Email: [email protected]. This work was supported by NSF award CCF-1909612 and Boston University’s Dean’s Fellowship. †Department of Computer Science, Boston University. Email: [email protected]. This work was supported by NSF award CCF-1909612. ‡Department of Computer Science, University of Haifa. Email: [email protected]. This work was supported by the Israel Science Foundation, grant number 497/17 and the PBC Fellowship for Postdoctoral Fellows by the Israeli Council of Higher Education. 1 Introduction
We initiate the study of sublinear-time algorithms that compute in the presence of an online adversary. After the algorithm gets the answer to each query to the input object, such an adversary can hide or corrupt a small number of input values. It might be helpful to think of the algorithm as trying to detect some fraud and of the adversary as trying to obstruct access to data by hiding or corrupting it in order to make it hard to uncover any evidence. Our goal is to understand the complexity of basic computational tasks in extremely adversarial situations, where the algorithm’s access to data is blocked—or fraudulent values are introduced—during the execution of the algorithm in response to its actions. Specifically, we represent the input object as a function f on an arbitrary finite domain1, which the algorithm can access by querying a point x from the domain and receiving the answer O(x) from an oracle. At the beginning of computation, O(x) = f(x) for all points x in the domain of the function. We parameterize our model by a natural number t that controls the number of function values the adversary can modify after the oracle answers each query. Mathematically, we represent the oracle and the adversary as one entity. However, it might be helpful to think of the oracle as the data holder and of the adversary as the obstructionist. A t-online-erasure oracle can replace values O(x) on up to t points x with a special symbol ⊥, thus erasing them, whereas a t-online-corruption oracle can replace them with arbitrary values in the range. The new values will be used by the oracle to answer future queries to the corresponding points. The locations of erasures and corruptions are unknown to the algorithm. The actions of the oracle can depend on the input, the queries made so far, and even on the publicly known code that the algorithm is running, but not on future coin tosses of the algorithm. We focus on investigating property testing in the presence of online erasures. Some of our results also apply to online corruptions and computational tasks related to property testing, as discussed later. In the property testing model, introduced by [RS96, GGR98] with the goal of formally studying sublinear-time algorithms, a property is represented by a set P (of functions satisfying the desired property). A function f is ε-far from P if f differs from each function g ∈ P on at least an ε fraction of domain points. The goal is to distinguish, with constant probability, functions f ∈ P from functions that are ε-far from P. We call an algorithm a t-online-erasure-resilient ε-tester for property P if, given parameters t ∈ N and ε ∈ (0, 1), and access to an input function f via a t-online-erasure oracle, the algorithm accepts with probability at least 2/3 if f ∈ P and rejects with probability at least 2/3 if f is ε-far from P. We study the query complexity of online-erasure-resilient testing of several fundamental properties. We show that for linearity and quadraticity of functions f : {0, 1}d → {0, 1}, the query complexity of t-online- erasure-resilient testing for constant t is asymptotically the same as in the standard model. A function f(x) is linear if it can be represented as a sum of monomials of the form xi, where x = (x1, . . . , xd) is a vector of d bits; the function is quadratic if it can be represented as a sum of monomials, each of the form xi or xixj. To appreciate the difficulty of testing in the presence of online erasures, consider the case of linearity and t = 1. The celebrated tester for linearity in the standard property testing model was proposed by Blum, Luby, and Rubinfeld [BLR93]. It looks for witnesses of non-linearity that consist of three points x, y, and x ⊕ y satisfying f(x) + f(y) 6= f(x ⊕ y), where addition is mod 2, and ⊕ denotes the bitwise XOR operator. Bellare et al. [BCH+96] show that if f is ε-far from linear, then a triple (x, y, x ⊕ y) is a witness with probability at least ε when x, y ∈ {0, 1}d are chosen uniformly and independently at random. In our model, after x and y are queried, the oracle can erase the value of x⊕y. To address this issue, the tester could query three points x, y, and z, and then uniformly select either x ⊕ z or y ⊕ z. If t = 1, the oracle does not have the erasure budget to erase both x ⊕ z and y ⊕ z, and the tester succeeds in catching a triple of the right form with probability 1/2. However, both triples that the tester could have queried—namely, (x, z, x ⊕ z) and (y, z, y ⊕ z)—have to be witnesses of non-linearity for the tester to succeed in catching a witness. So, we need to understand not only the fraction of triples that are witnesses, but also how they interact. Witnesses of non-quadraticity are even more complicated. The tester of Alon et al. [AKK+05] looks for witnesses consisting of points x, y, z, and all four of their linear combinations. We describe a two-player game that models the interaction between the tester and the adversary and give a winning strategy for the tester-
1Input objects such as strings, sequences, images, matrices, and graphs can all be represented as functions.
1 player. We also consider witness structures in which all specified tuples are witnesses of non-quadraticity. We analyze the probability of getting a witness structure under uniform sampling when the input function is ε-far from quadratic. Our investigation leads to a deeper understanding of the structure of witnesses for both properties, linearity and quadraticity. In contrast to linearity and quadraticity, we show that several other properties, specifically, sortedness and the Lipschitz property of sequences, and the Lipschitz property of functions f : {0, 1}d → {0, 1, 2} cannot be tested in the presence of an online-erasure oracle, even with t = 1, no matter how many queries the algorithm makes. Interestingly, witnesses for these properties have a much simpler structure than witnesses for linearity and quadraticity. Consider the case of sortedness of integer sequences, represented by functions f :[n] → N. We say that a sequence is sorted (or the corresponding function is monotone) if f(x) ≤ f(y) for all x < y in [n]. A witness of non-sortedness consists of two points x < y, such that f(x) > f(y). In the standard model, sortedness can be ε-tested with an algorithm that queries an O(pn/ε) + log εn uniform and independent points [FLN 02]. (The fastest testers for this property have O( ε ) query complexity [EKK+00, DGL+99, BGJ+12, CS13, Bel18], but they make correlated queries that follow a more complicated distribution.) Our impossibility result demonstrates that even the simplest testing strategy of querying independent points can be thwarted by an online adversary. To prove this result, we use sequences that are far from being sorted, but where each point is involved in only one witness, allowing the oracle to erase the second point of the witness as soon as the first one is queried. Using a version of Yao’s principle that is suitable for our model, we turn these examples into a general impossibility result for testing sortedness with a 1-online-erasure oracle. Our impossibility result for testing sortedness uses sequences with many (specifically, n) distinct integers. We show that this is not a coincidence by designing a t-online-erasure-resilient sortedness tester that works ε2n for sequences that have O( t ) distinct values. However, the number of distinct values does not have to be large to preclude testing the Lipschitz property in our model. A function f :[n] → N, representing an n-integer sequence, is Lipschitz if |f(x)−f(y)| ≤ |x−y| for all x, y ∈ [n]. Similarly, a function f : {0, 1}d → R d is Lipschitz if |f(x)−f(y)| ≤ kx−yk1 for all x, y ∈ {0, 1} . We show that the Lipschitz property of sequences, as well as d-variate functions, cannot be tested even when the range has size 3, even with t = 1, no matter how many queries the algorithm makes.
Implications for online corruptions. To conclude, we discuss implications of our algorithmic results to computation in the presence of online corruptions. These implications rely on the fact that all three of our testers have 1-sided error, that is, they always accept functions with the property, and that they are nonadaptive, that is, their queries do not depend on answers to previous queries. We consider two types of computational tasks in the presence of online corruptions. The first one is t-online-corruption-resilient testing, defined analogously to t-online-erasure-resilient testing. We show that every nonadaptive, 1-sided error, online-erasure-resilient tester that has a small chance of encountering an erasure (under all adversarial strategies) is also a nonadaptive (2-sided error) online-corruption-resilient tester for the same property, with the same guarantees. This result applies to our testers for linearity and for sortedness with few distinct values. For the second type of computational task, we return to the motivation of the algorithm looking for evidence of fraud. We show that every nonadaptive, 1-sided error, online-erasure-resilient tester for any property P can be modified so that, given access to an input function f that is far from P via an online-corruption oracle, it outputs a witness of f not being in P. This result applies to our testers for sortedness with few distinct values, linearity, and quadraticity.
1.1 Our Results We design t-online-erasure-resilient testers for linearity and quadraticity, two properties widely studied be- cause of their connection to probabilistically checkable proofs (PCPs), applications to hardness of approxi- mating NP-hard problems, and coding theory.
2 Linearity. Starting from the pioneering work of [BLR93], linearity testing has been investigated, e.g., in [BGLR93, BS94, FGL+96, BCH+96, BGS98, Tre98, ST98, ST00, HW03, BSVW03, Sam07, ST09, SW06, KLX10] (see [RR16] for a survey). Linearity can be ε-tested in the standard property testing model with O(1/ε) queries by the BLR tester. We say that a pair (x, y) violates linearity if f(x) + f(y) 6= f(x ⊕ y). The BLR tester repeatedly selects a uniformly random pair of domain points and rejects if it violates linearity. A tight lower bound on the probability that a uniformly random pair violates linearity was proven by Bellare et al. [BCH+96] and Kaufman et al. [KLX10]. We show that linearity can be ε-tested with O(t/ε) queries with a t-online-erasure oracle. When t is constant, the query complexity is O(1/ε), as in the standard property testing model.
Theorem 1.1. There exist a constant c0 ∈ (0, 1) and a 1-sided error, nonadaptive, t-online-erasure-resilient d d/4 t ε-tester for linearity of functions f : {0, 1} → {0, 1} that works for t ≤ c0 · ε · 2 and makes O( ε ) queries. The main ideas behind the linearity tester and its analysis are described in Section 1.3.
Quadraticity. Quadraticity and, more generally, low-degree testing have been studied, e.g., in [BFL91, BFLS91, GLR+91, FGL+96, FS95, RS96, RS97, AKK+05, AS03, MR08, Mos17, KR06, Sam07, ST09, JPRZ09, BKS+10, HSS13, RS13, DG13]. Low-degree testing is closely related to local testing of Reed- Muller codes. The Reed-Muller code C(k, d) consists of evaluations of all functions f : {0, 1}d → {0, 1} that are polynomials of degree at most k. A local tester for a code queries a few locations of a codeword; it accepts if the codeword is in the code, and otherwise it rejects with probability proportional to the distance of the codeword from the code. In the standard property testing model, quadraticity can be ε-tested with O(1/ε) queries by the tester of Alon et al. [AKK+05] that repeatedly selects x, y, z ∼ {0, 1}d and queries f on all of their linear combinations—the points themselves, the double sums x ⊕ y, x ⊕ z, y ⊕ z, and the triple sum x ⊕ y ⊕ z. The tester rejects if the values of the function on all seven queried points sum to 1, since this cannot happen for a quadratic function. A tight lower bound on the probability that the resulting 7-tuple is a witnesses of non-quadraticity was proved by Alon et al. [AKK+05] and Bhattacharyya et al. [BKS+10]. We prove that quadraticity can be ε-tested with O(1/ε) queries with a t-online-erasure-oracle for con- stant t. Our tester can be easily modified to give a local tester for the Reed-Muller code C(2, d) that works with a t-online-erasure oracle. Theorem 1.2. There exists a 1-sided error, nonadaptive, t-online-erasure-resilient ε-tester for quadraticity d 1 of functions f : {0, 1} → {0, 1} that makes O( ε ) queries for constant t. The quadraticity tester and its analysis are the most technical of our contributions. The main ideas behind our quadraticity tester are explained in Section 1.3.
Sortedness. Sortedness testing (see [Ras16] for a survey) was introduced by Ergun et al. [EKK+00]. Its log(εn) + query complexity has been pinned down to Θ( ε ) by [EKK 00, Fis04, CS14, Bel18]. We show that online-erasure-resilient testing of integer-valued sequences is, in general, impossible.
1 Theorem 1.3. For all ε ∈ (0, 12 ], no algorithm can ε-test sortedness of integer sequences when accessed via the 1-online-erasure oracle. In the case without erasures, sortedness can be tested with O(pn/ε) uniform and independent queries [FLN+02]. Theorem 1.3 implies that a uniform tester for a property does not translate into the existence of an online-erasure-resilient tester, counter to the intuition that testers that make only uniform and independent queries should be less prone to adversarial attacks. Our lower bound construction demonstrates that the structure of violations to a property plays an important role in determining whether the property is testable. We prove Theorem 1.3 using Yao’s minimax principle, a common method for showing lower bounds in property testing. A formulation of Yao’s minimax principle suitable for our model is shown in Section A. The hard sequences from the proof of Theorem 1.3 have n distinct values. Pallavoor et al. [PRV18, Pal20] considered the setting when the tester is given an additional parameter r, the number of distinct elements
3 log r in the sequence, and obtained an O( ε )-query tester. Two lower bounds apply to this setting: Ω(log r) for nonadaptive testers [BRY14b] and Ω( log r ) for all testers for the case when r = n1/3 [Bel18]. Pallavoor et log log r √ al. also showed that sortedness can be tested with O( r/ε) uniform and independent queries. We extend the result of Pallavoor et al. to the setting with online erasures for the case when r is small.
Theorem 1.4. Let c0 > 0 be a constant. There exists a 1-sided error, nonadaptive, t-online-erasure-resilient√ r ε-tester for sortedness of n-element sequences with at most r distinct values. The tester makes O( ε ) uniform 2 and independent queries and works when r < ε n . c0t Thus, sortedness is not testable with online erasures when r is large and is testable in the setting when r is small. For example, for Boolean sequences, it is testable with O(1/ε) queries.
The Lipschitz property. Lipschitz testing, introduced by [JR13], was subsequently studied in [CS13, DJRT13, BRY14a, AJMR16, CDJS17]. Lipschitz testing of functions f :[n] → {0, 1, 2} can be performed 1 d d with O( ε ) queries [JR13]. For functions f : {0, 1} → R, it can be done with O( ε ) queries [JR13, CS13]. We show that the Lipschitz property is impossible to test in the online-erasures model even when the range of the function has only 3 distinct values. This applies to both domains, [n] and {0, 1}d.
1 Theorem 1.5. For all ε ∈ (0, 8 ], there is no 1-online-erasure-resilient ε-tester for the Lipschitz property of functions f :[n] → {0, 1, 2}. The same statement holds when the domain is {0, 1}d instead of [n].
Computation in the Presence of Online Corruptions Finally, we state the implications of our algorithmic results to computation in the presence of online corrup- tions. These results are proved in Section 6. Lemma 1.6 states that testers that work in the presence of online erasures and satisfy some additional conditions also work in the presence of online corruptions. Lemma 1.6 uses the notion of an adversarial strategy. We model an adversarial strategy as a distribution on decision trees, where each tree dictates the erasures to be made for a given input and queries of the algorithm. Lemma 1.6. Let P be a property of functions that has a 1-sided error, nonadaptive, t-online-erasure-resilient tester T , such that, for all adversarial strategies, the probability that T queries an erased value of f during its execution is at most 1/3. Then T is a 2-sided error, nonadaptive, t-online-corruption-resilient tester for P. Our online-erasure-resilient testers for linearity and for sortedness with few distinct values have a small probability of encountering an erasure during their computation. As a result, Lemma 1.6 implies online- corruption-resilient testers for these properties. Their performance is summarized in Corollaries 6.1 and 6.2. Our final lemma states the implications of our algorithmic results for the computational task of finding witnesses in the presence of online corruptions. Lemma 1.7. Let P be a property of functions that has a 1-sided error, nonadaptive, t-online-erasure- resilient tester. Then there exists a nonadaptive algorithm with the same query complexity that, given a parameter ε and access via a t-online-corruption-oracle O to a function f that is ε-far from P, outputs with probability at least 2/3 a witness of f∈ / P. The witness consists of queried domain points x1, . . . , xk and values O(x1),..., O(xk) obtained by the algorithm, such that no g ∈ P has g(xi) = O(xi) for all i ∈ [k]. All our online-erasure-resilient testers are nonadaptive and have 1-sided error, so Lemma 1.7 implies algorithms for detecting witnesses for linearity, quadraticity, and for sortedness with few distinct values.
1.2 Comparison to Related Models Our model is closely related to (offline) erasure-resilient testing of Dixit et al. [DRTV18] and tolerant testing of Parnas, Ron, and Rubinfeld [PRR06]. These two models can be thought of as versions of property testing in the presence of offline erasures and corruptions, respectively. In the model of Dixit et al., also investigated in [RV18, RRV19, BFLR20, PRW20, LPRV21, NV21], the adversary performs all erasures to the function
4 x1 x1 x2 x1 x2 y x1 x2 y
Figure 1: The stages in the linearity game for t = 1. Each frame shows a move of Player 1 (as a blue point or a blue solid line) in the winning strategy and a response by Player 2 (as a red dashed line). Each red dashed line between points a and b corresponds to Player 2 erasing a ⊕ b. before the execution of the algorithm. An (offline) erasure-resilient tester is given a parameter α ∈ (0, 1), an upper bound on the fraction of the values that are erased. The adversary we consider is more powerful in the sense that it can perform erasures online, during the execution of the tester. However, in some parameter regimes, our oracle cannot perform as many erasures. Importantly, all three properties that we show are impossible to test in our model, are testable in the model of Dixit et al. with essentially the same query complexity as in the standard model [DRTV18]. The tolerant testing model of [PRR06] has received a lot of attention. An algorithm is an (α, ε)-tolerant tester for a property P if, when given oracle access to a function f, the algorithm accepts with probability at least 2/3 if f is α-close to P and rejects with probability at least 2/3 if f is ε-far from P, where 0 ≤ α < ε ≤ 1. When α ε, this model can be thought of as (ε − α)-testing with at most an α fraction of the input adversarially corrupted before the execution of the algorithm. As pointed out in [PRR06], the BLR tester is a tolerant tester of linearity for α significantly smaller than ε. Tolerant testing of linearity with distributional assumptions was studied in [KS09] and tolerant testing of Reed-Muller codes over large alphabets was studied in [GR05]. Tolerant testing of sortedness is closely related to approximating the distance to monotonicity and estimating the longest increasing subsequence. These tasks can be performed with polylogorithmic in n number of queries [PRR06, ACCL07, SS17]. As we showed, sortedness is impossible to test in the presence of online erasures. Thus, sortedness is an example of a property easily testable with offline erasures and corruptions, but not testable at all with online erasures. However, it is open if there are properties that have lower query complexity in the online model than in the offline models, since erasures in our model are budgeted differently.
1.3 The Ideas Behind Our Linearity and Quadraticity Testers One challenge in generalizing the testers of [BLR93] and [AKK+05] to work with an online-erasure oracle is that their queries are correlated. First, we want to ensure that the tester can obtain function values on tuples of the right form—(x, y, x ⊕ y) for linearity and (x, y, z, x ⊕ y, x ⊕ z, y ⊕ z, x ⊕ y ⊕ z) for quadraticity. Then we want to ensure that, if the original function is far from the property, the tester is likely to catch such a tuple that is also a witness to not satisfying the property. Next, we formulate a two-player game2 that abstracts the first task. In the game, the tester-player sees what erasures are made by the oracle-player. This assumption is made to abstract out the most basic challenge and is not used in the algorithms’ analyses.
Low-degree testing as a two-player game. Player 1 represents the tester and Player 2 represents the adversary. The players take turns drawing points and edges connecting them, each in their own color. For the case of linearity, Player 1 wins the game if it draws in blue two points and an edge connecting them. (The points represent x, y ∈ {0, 1}d and the edge is their sum x ⊕ y). A move of Player 1 consists of drawing a blue point or a blue edge connecting two existing non-adjacent points. A move of Player 2 consists of drawing at most t red edges between existing points, thus spoiling the picture for Player 1. (This corresponds to the oracle erasing the corresponding sums). A winning strategy for Player 1 is to draw t + 2 points x1, . . . , xt+1, y in its first t + 2 moves. The points xi can be thought of as alternatives—or decoys—for point x. After y is drawn, Player 2 spoils at most t of
2This game has been tested on real children, and they spent hours playing it.
5 Figure 2: Stages in the quadraticity game for t = 1, played according to the winning strategy for Player 1: connecting the y-decoys from the first tree to x-decoys (frames 1-4); drawing z and connecting it to y-decoys and an x-decoy (frames 5-6), and creation of a blue triangle (frames 7–8). Frame 5 contains edges from z to two structures, each replicating frame 4. We depict only points and edges relevant for subsequent frames.
the potential edges connecting y to other points. Now Player 1 draws a remaining valid edge between y and another point, thus winning the game. Fig. 1 shows the strategy playing out for the case of t = 1. In the case of quadraticity, Player 1 wins the game if it draws in blue all the vertices and edges of a triangle and colors the triangle blue. The vertices represent the points x, y, z ∈ {0, 1}d, the edges are the sums x ⊕ y, x ⊕ z, y ⊕ z, and the triangle is the sum x ⊕ y ⊕ z. A move of Player 1 consists of drawing a point or an edge between two existing non-adjacent points or coloring an uncolored triangle between three existing points (in blue). A move of Player 2 consists of at most t steps; in each step, it can draw a red edge between existing points or color a triangle between three existing points (in red). Our online-erasure-resilient quadraticity tester is based on a winning strategy for Player 1 with tO(t) moves. At a high level, Player 1 first draws many decoys for x. The y-decoys are organized in t + 1 full (t + 1)-ary trees of depth t. The root for tree i is yi, its children are yi,j, where j ∈ [t + 1], etc. We jot the rest of the winning strategy for t = 1 and depict it in Fig. 2. In this case, Player 1 does the following (i) (i) (i) (i) for each of two trees: it draws points x1 , . . . , x12 , yi; connects yi to half of x-decoys (w.l.o.g., x1 , . . . , x6 ); (i) (i) draws point yi,1, connects it to two of the x-decoys adjacent to y1 (w.l.o.g., x1 and x2 ); draws point yi,2, (i) (i) (i) (i) connects it to two of x3 , . . . , x6 (w.l.o.g., x3 and x4 ); draws z and connects it to one of the roots (w.l.o.g., (1) (1) (1) y1), connects z to one of y1,1 and y1,2 (w.l.o.g., y1,1), connects z to one of x1 and x2 (w.l.o.g., x1 ), and (1) (1) finally colors one of the triangles x1 y1z and x1 y1,1z, thus winning the game. The decoys are arranged to guarantee that Player 1 always has at least one available move in each step of the strategy. For general t, the winning strategy is described in full detail in Algorithm 2. Recall that the y-decoys are organized in t + 1 full (t + 1)-ary trees of depth t. For every root-to-leaf path in every tree, Player 1 draws edges from all the nodes in that path to a separate set of t + 1 decoys for x. After z is drawn, the tester “walks” along a root-to-leaf path in one of the trees, drawing edges between z and the y-decoys on the path. The goal of this walk is to avoid the parts of the tree spoiled by Player 2. Finally, Player 1 connects z to an x-decoy that is adjacent to all vertices in the path, and then colors a triangle involving this x-decoy, a y-decoy from the chosen path, and z. The structure of decoys guarantees that Player 1 always has t + 1 options for its next move, only t of which can be spoiled by Player 1.
From the game to a tester. There are two important aspects of designing a tester that are abstracted away in the game: First, the tester does not actually know which values are erased until it queries them. Second, the tester needs to catch a witness demonstrating a violation of the property, not merely a tuple of the right form with no erasures. Here, we briefly describe how we overcome these challenges. The simplest tester that can be constructed based on the winning strategy for the linearity game repeats
6 d the following procedure. It follows the winning strategy, whereby it queries a random point xi ∼ {0, 1} in lieu of drawing xi, picks a random i ∼ [t + 1], and then queries the sum xi ⊕ y in lieu of drawing an edge 1 (xi, y). The probability that this sum is not erased by the adversary is at least t+1 , since the adversary only has budget for erasing t of the sums. To get a constant probability of successfully obtaining a triple of the form (x, y, x ⊕ y), this procedure has to be repeated Θ(t) times. Since the procedure makes Θ(t) queries, it would result in a tester with query complexity Ω(t2). To get a linear dependence on t, our tester first t queries O( ε ) points to build a reserve, and only then starts querying sums x ⊕ y, where x and y are sampled from the reserve. In some sense, it can be viewed as combining multiple iterations of the winning strategy (without explicitly dividing the points into x’s and y’s). This addresses the first challenge described above. The second challenge is addressed by a structural result (Lemma 2.1) on the concentration of violated pairs among the queried points in the reserve. Our quadraticity tester is based directly on the game. It converts the moves of the winning strategy of Player 1 into a corresponding procedure, making a uniformly random guess at each step about the choices that remain nonerased. There are three core technical lemmas used in the analysis of the algorithm. Lemma 3.4 lower bounds the probability that the tester makes correct guesses at each step about which edges (double sums) and triangles (triple sums) remain nonerased. This probability depends only on the erasure budget t. Lemma 3.3 gives a lower bound on the probability that uniformly random sampled points (the x- and y- decoys together with z) form a large violation structure, where all triangles that Player 1 might eventually complete violate quadraticity. Building on a result of Alon et al., we show that even though the number of triangles involved in the violation structure is large, namely tO(t), the probability of sampling such a structure is α = Ω(min(ε, ct)), where ct ∈ (0, 1) depends only on t. Finally, Lemma 3.2 shows that despite the online adversarial erasures, the tester has a probability of α/2 of sampling the points of such a large violation structure and obtaining their values from the oracle. The three lemmas combined show that quadraticity can be tested with O(1/ε) queries for constant t.
1.4 Conclusions and Open Questions We initiate a study of sublinear-time algorithms in the presence of online adversarial erasures and corruptions. We design efficient online-erasure-resilient testers for several important properties (linearity, quadraticity, and—for the case of small number of distinct values—sortedness). We also show that several basic properties, specifically, sortedness of integer sequences and the Lipschitz properties, cannot be tested in our model. We now list several open problems. • Sortedness is an example of a property that is impossible to test with online erasures, but is easy to test with offline erasures, as well as tolerantly. Is there a property that has smaller query complexity in the online-erasure-resilient model than in the (offline) erasure-resilient model of [DRTV18]? • We design a t-online-erasure-resilient quadraticity tester that makes O(1/ε) queries for constant t. What is the query complexity of t-online-erasure-resilient quadraticity testing in terms of t and ε?
1 k • The query complexity of ε-testing if a function is a polynomial of degree at most k is Θ( ε + 2 ) [AKK+05, BKS+10]. Is there a low-degree test for k ≥ 3 that works in the presence of online erasures?
• Our tester for linearity works in the presence of online corruptions, but our tester for quadraticity does not. Is there an online-corruption-resilient quadraticity tester?
2 An Online-Erasure-Resilient Linearity Tester
In this section, we present our online-erasure-resilient linearity tester (Algorithm 1) and prove Theorem 1.1. In the analysis of Algorithm 1, we let xi, for all i ∈ [q], be random variables denoting the points queried in Step 3. Recall that a pair (x, y) of domain points violates linearity if f(x) + f(y) 6= f(x ⊕ y). The proof of Theorem 1.1 crucially relies on the following lemma about the concentration of pairs violating linearity.
7 Algorithm 1 An Online-Erasure-Resilient Linearity Tester 1 d Input: ε ∈ (0, 2 ), erasure budget t ∈ N, access to function f : {0, 1} → {0, 1} via a t-online-erasure oracle. ct 1: Let c = 88 and q = ε . 2: for all i ∈ [q]: d 3: Sample xi ∼ {0, 1} and query f at xi. 24 4: repeat ε times: 5: Sample a uniform (i, j) such that i, j ∈ [q] and i < j, then query f at xi ⊕ xj. 6: Reject if xi, xj, and xi ⊕ xj are nonerased and f(xi) + f(xj) 6= f(xi ⊕ xj). 7: Accept.
ct 2 Lemma 2.1. Suppose f is ε-far from linear and q = ε . Let Y denote the number of pairs (i, j) ∈ [q] such ε q that i < j and (xi, xj) violates linearity. Then Pr[Y ≤ 4 · 2 ] ≤ 0.25. We prove Lemma 2.1 in Section 2.1. In the rest of this section, we first prove an auxiliary lemma (Lemma 2.2) and then use Lemmas 2.1 and 2.2 to prove Theorem 1.1. We say a query u ∈ {0, 1}d is successfully obtained if it is nonerased right before it is queried, i.e., the tester obtains f(u) as opposed to the symbol ⊥. q Lemma 2.2. Let G1 be the (good)event that all 2 sums xi ⊕ xj of points queried in Step 3 of Algorithm 1 are distinct, and G2 be the (good) event that all q points xi queried in Step 3 are successfully obtained. Then Pr[G1 ∪ G2] ≤ 0.05 for all adversarial strategies. d Proof. First, we analyze event G1. Consider points xi, xj, xk, x` ∼ {0, 1} , where {i, j}= 6 {k, `}. Since these points are distributed uniformly and independently, so are the sums xi ⊕ xj and xk ⊕ x`. The probability d 1 that two uniformly and independently sampled points x, y ∼ {0, 1} are identical is 2d . Therefore, by a q2 1 q4 union bound over all pairs of sums, Pr[G1] ≤ 2 · 2d ≤ 4·2d . To analyze G2, fix an arbitrary adversarial strategy. In Steps 2–3, the algorithm makes at most q queries. d d Hence the oracle erases at most qt points from {0, 1} . Since each point xi is sampled uniformly from {0, 1} , qt Pr [xi is erased when queried] ≤ . d d xi∼{0,1} 2
q2t ct By a union bound over the q points sampled in Step 3, we get Pr[G2] ≤ 2d . We substitute q = ε to get q4 q2t q4 c4t4 Pr[G ∪ G ] ≤ + ≤ = ≤ c4c4 ≤ 0.05, 1 2 4 · 2d 2d 2d ε4 · 2d 0 d/4 since t ≤ c0 · ε · 2 , as stated in Theorem 1.1, and assuming c0 is sufficiently small. Proof of Theorem 1.1. The query complexity of Algorithm 1 is evident from its description. Clearly, Algo- rithm 1 is nonadaptive and accepts if f is linear. Suppose now that f is ε-far from linear and fix an arbitrary 2 adversarial strategy. We show that Algorithm 1 rejects with probability at least 3 . We call a sum xi ⊕ xj queried in Step 5 violating if the pair (xi, xj) violates linearity. For Algorithm 1 to reject, it must sample i, j ∈ [q] such that xi, xj, and xi ⊕ xj are successfully obtained and xi ⊕ xj is a violating sum. Let Ek, for all k ∈ [24/ε], be the event that in iteration k of Step 4, the algorithm successfully S obtains a violating sum. Let E = k∈[24/ε] Ek. We define the following good event concerning the execution of the for loop: G = G1 ∩ G2 ∩ G3, where G1 and G2 are as defined in Lemma 2.2, and G3 is the event ε q that Y , defined in Lemma 2.1, is at least 4 · 2 . By a union bound and Lemmas 2.1 and 2.2, we know that Pr[G] ≥ 0.7. Therefore, the probability that Algorithm 1 rejects is at least Pr[E ∩ G] = Pr[G] · Pr[E|G] ≥ 0.7 · Pr[E|G].
Suppose that G occurs for the execution of the for loop. Since G2 occurred, all q points queried in Step 3 of Algorithm 1 were successfully obtained. So, the algorithm will reject as soon as it successfully obtains a
8 q violating sum. Since G1 occurred, there are at least 2 distinct sums that can be queried in Step 5. Since ε q G3 occurred, at least 4 · 2 of them are violating. The total number of erasures the adversary makes over the course of the entire execution is at most 2qt, since Algorithm 1 queries at most 2q points. Therefore, the fraction of sums that are nonerased and violating before each iteration of Step 5 is at least ε 2qt ε 5t ε 5ε ε 5ε ε − q ≥ − = − = − > . 4 2 4 q 4 c 4 88 6
That is, Pr[Ek|G] > ε/6 for all k ∈ [24/ε]. Observe that the indicator for the event Ek, for every fixed setting of random coins (of the algorithm and the adversary) used prior to iteration k of Step 5, is a Bernoulli random variable with the probability parameter equal to the fraction of sums that are violating and nonerased right before iteration k. Since independent random coins are used in each iteration of Step 5,
\ 24/ε −4 Pr[E|G] ≤ Pr Ek | G < (1 − ε/6) ≤ e . k∈[24/ε]
−4 This yields that Pr[E ∩ G] > 0.7 · (1 − e ) > 2/3, completing the proof of the theorem.
2.1 Concentration of Pairs that Violated Linearity In this section, we prove Lemma 2.1 on the concentration of pairs that violate linearity. The proof relies on Claim 2.3 that gives upper bounds on the fraction of pairs (x, y) ∈ ({0, 1}d)2 that violate linearity and on the fraction of triples (x, y, z) ∈ {0, 1}d×3 such that (x, y) and (x, z) violate linearity. The upper bounds are stated as a function of the distance to linearity. The distance of a function f to linearity, denoted εf , is the minimum over all linear functions g over the same domain as f of Prx[f(x) 6= g(x)]. We remark that a tighter upper bound than (1) in Claim 2.3 is shown in [BCH+96, Lemma 3.2]. Since we only need a looser upper bound, we provide a proof for completeness. Claim 2.3. For all f : {0, 1}d → {0, 1}, the following hold:
Pr [(x, y) violates linearity] ≤ 3εf ; (1) x,y∼{0,1}d 2 Pr [(x, y) and (x, z) violate linearity] ≤ εf + 4εf . (2) x,y,z∼{0,1}d
Proof. Let g be the closest linear function to f. Let S ⊆ {0, 1}d be the set of points on which f and g differ, d d i.e., S = {x ∈ {0, 1} | f(x) 6= g(x)}. Then |S| = εf · 2 . For a pair (x, y) to violate linearity, at least one of x, y, and x ⊕ y must be in S. By a union bound,
Pr [(x, y) violates linearity for f] ≤ Pr [x ∈ S ∨ y ∈ S ∨ x ⊕ y ∈ S] ≤ 3 Pr [x ∈ S] = 3εf , x,y∼{0,1}d x,y∼{0,1}d x∼{0,1}d
completing the proof of (1). To prove (2), let Ex,y,z be the event that (x, y) and (x, z) violate linearity. By the Law of Total Probability,
Pr [Ex,y,z] = Pr[Ex,y,z | x ∈ S] Pr[x ∈ S] + Pr[Ex,y,z | x∈ / S] Pr[x∈ / S] x,y,z∼{0,1}d
≤ 1 · εf + Pr[Ex,y,z | x∈ / S]
≤ εf + Pr[(y, z ∈ S) ∨ (x ⊕ y, z ∈ S) ∨ (y, x ⊕ z ∈ S) ∨ (x ⊕ y, x ⊕ z ∈ S) | x∈ / S] 2 ≤ εf + 4 Pr[y, z ∈ S | x∈ / S] = εf + 4εf ,
where, in the last line, we used symmetry and a union bound, and then independence of x, y, and z. Proof of Lemma 2.1. We prove the lemma by applying Chebyshev’s inequality to Y . For all i, j ∈ [q], where P i < j, let Yi,j be the indicator for the event that (xi, xj) violates linearity. Then Y = i 9 First, we obtain a lower bound on E[Y ]. By linearity of expectation and symmetry, X q q [Y ] = [Yi,j] = · Pr [(xi, xj) violates linearity] ≥ εf , E E d 2 xi,xj ∼{0,1} 2 i X q Var[Y ] ≤ 3 ε . (3) i,j 2 f i When {i, j} ∩ {k, `} = ∅, the random variables Yi,j and Yk,` are independent, and thus Cov[Yi,jYk,l] = 0. Consider the case when |{i, j} ∩ {k, `}| = 1. Suppose that i = k. Then 2 2 2 Cov[Yi,jYi,`] = E[Yi,jYi,`] − E[Yi,j]E[Yi,`] ≤ εf + 4εf − εf = εf + 3εf , where the inequality follows from (2) and [BCH+96, Theorem 1.2]. By symmetry, X X q q Cov[Y Y ] = 6 Cov[Y Y ] ≤ 6 (ε + 3ε2 ) ≤ 15 ε , (4) i,j k,l i,j i,l 3 f f 3 f i