A Zero Knowledge Sumcheck and its Applications
Alessandro Chiesa Michael A. Forbes Nicholas Spooner [email protected] [email protected] [email protected] UC Berkeley Simons Institute for the Theory of Computing University of Toronto and UC Berkeley
April 6, 2017
Abstract Many seminal results in Interactive Proofs (IPs) use algebraic techniques based on low-degree polynomials, the study of which is pervasive in theoretical computer science. Unfortunately, known methods for endowing such proofs with zero knowledge guarantees do not retain this rich algebraic structure. In this work, we develop algebraic techniques for obtaining zero knowledge variants of proof protocols in a way that leverages and preserves their algebraic structure. Our constructions achieve unconditional (perfect) zero knowledge in the Interactive Probabilistically Checkable Proof (IPCP) model of Kalai and Raz [KR08] (the prover first sends a PCP oracle, then the prover and verifier engage in an Interactive Proof in which the verifier may query the PCP). Our main result is a zero knowledge variant of the sumcheck protocol [LFKN92] in the IPCP model. The sumcheck protocol is a key building block in many IPs, including the protocol for polynomial-space computation due to Shamir [Sha92], and the protocol for parallel computation due to Goldwasser, Kalai, and Rothblum [GKR15]. A core component of our result is an algebraic commitment scheme, whose hiding property is guaranteed by algebraic query complexity lower bounds [AW09; JKRS09]. This commitment scheme can then be used to considerably strengthen our previous work [BCFGRS16] that gives a sumcheck protocol with much weaker zero knowledge guarantees, itself using algebraic techniques based on algorithms for polynomial identity testing [RS05; BW04]. We demonstrate the applicability of our techniques by deriving zero knowledge variants of well-known protocols based on algebraic techniques. First, we construct zero knowledge IPCPs for NEXP starting with the Multi-prover Interactive Proofs of Babai, Fortnow, and Lund [BFL91]. This result is a direct application of our zero knowledge sumcheck and our algebraic commitment scheme, augmented with the use of ‘randomized’ low-degree extensions. We also construct protocols in a more restricted model where the prover and verifier engage in a standard Interactive Proof with oracle access to a uniformly random low-degree polynomial (soundness holds with respect to any oracle). In this setting we achieve zero knowledge variants of the protocols of Shamir and of Goldwasser, Kalai, and Rothblum.
Keywords: zero knowledge; sumcheck; algebraic query complexity; probabilistically checkable and interactive proofs
1 Contents
1 Introduction 3 1.1 Prior techniques for achieving zero knowledge ...... 3 1.2 Our goal: algebraic techniques for zero knowledge ...... 4 1.3 Main result: a zero knowledge sumcheck ...... 4 1.4 Applications: delegating computation in zero knowledge ...... 6
2 Techniques 8 2.1 An algebraic approach for zero knowledge in the BFL protocol ...... 8 2.2 Algebraic commitments from algebraic query complexity lower bounds ...... 9 2.3 A zero knowledge sumcheck protocol ...... 11 2.4 Challenges: handling recursion ...... 12 2.5 Sum-product circuits ...... 13
3 Roadmap 15
4 Preliminaries 16 4.1 Basic notations ...... 16 4.2 Sampling partial sums of random low-degree polynomials ...... 16 4.3 Interactive probabilistically checkable proofs ...... 17 4.4 Zero knowledge for Interactive PCPs ...... 17 4.5 Sumcheck protocol and its zero knowledge variant ...... 18
5 Algebraic query complexity of polynomial summation 20
6 Zero knowledge sumcheck from algebraic query lower bounds 22 6.1 Step1 ...... 24 6.2 Step2 ...... 27
7 Zero knowledge for non-deterministic exponential time 28
8 Delegating sum-product computations 31 8.1 Intuition for definition ...... 31 8.2 Sum-product formulas ...... 32 8.3 Sum-product circuits ...... 35
9 Zero knowledge sum-product protocols 39 9.1 The case of sum-product evaluation ...... 39 9.2 The case of sum-product satisfaction ...... 43
10 Zero knowledge for polynomial space 45
11 Zero knowledge for the evaluation of low-depth circuits 48 11.1 Notations for layered arithmetic circuits ...... 49 11.2 Sum-product subcircuits and oracle inputs ...... 50 11.3 Sum-product subcircuits for layered arithmetic circuits ...... 50 11.4 Sum-product subcircuits for small-space Turing machines ...... 51 11.5 Proof of Theorem 11.1 ...... 52
Acknowledgments 54
A Algebraic query complexity of polynomial summation: details 55 A.1 Proof of Theorem 5.1 ...... 55 A.2 Proof of Corollary 5.3 ...... 55 A.3 Upper bounds ...... 56
B Proof of Theorem 7.2 via sum-product circuits 59
References 61
2 1 Introduction
The notion of Interactive Proofs (IPs) [BM88; GMR89] is fundamental in Complexity Theory and Cryptography. An Interactive Proof for a language L is a protocol between a probabilistic polynomial-time verifier and a resource- unbounded prover that works as follows: given a common input x, the prover and verifier exchange some number of messages and then the verifier either accepts or rejects. If x is in L then the verifier always accepts; if instead x is not in L then the verifier rejects with high probability, regardless of the prover’s actions. The seminal results of Lund, Fortnow, Karloff, and Nisan [LFKN92] and Shamir [Sha92] demonstrate the surprising expressiveness of Interactive Proofs, in particular showing that every language decidable in polynomial space has an Interactive Proof. Research on IPs has recently focused on new and more refined goals, motivated by the paradigm of delegation of computation, in which a resource-limited verifier receives the help of a resource-rich prover to check the output of an expensive (but tractable) computation. In this setting bounding the complexity of the honest prover is important. While every IP protocol has a polynomial-space prover, this prover may run in superpolynomial time, even if the protocol is for a tractable language. Recent work has focused on doubly-efficient IPs, where the prover is efficient (it runs in polynomial time) and the verifier is highly efficient (it runs in, say, quasilinear time). Doubly-efficient IPs can be achieved, with various tradeoffs, for many types of computation: languages decidable by uniform circuits of polylogarithmic depth [GKR15]; languages decidable in polynomial time and bounded-polynomial space [RRR16]; and languages decidable by conjunctions of polynomially-many ‘local’ conditions [RG17]. A key building block in all of these protocols is the sumcheck protocol [LFKN92], which is an Interactive Proof for P claims of the form “ ~α∈Hm F (~α) = 0”, where H is a subset of a finite field F and F is an m-variate polynomial over F of small individual degree. The use of sumcheck imbues the aforementioned protocols with an algebraic structure, where the verifier arithmetizes a boolean problem into a statement about low-degree polynomials, which can then be checked via the sumcheck protocol. This algebraic structure is not only elegant, but also very useful. Indeed, this structure is crucial not only for highly-efficient software and hardware systems for delegating computation [CMT12; TRMP12; Tha13; Tha15; WHGSW16; WJBSTWW17] but also for a diverse set of compelling theoretical applications such as memory delegation [CKLR11], refereed delegation [CRR13], IPs of proximity [RVW13], and many others. Despite these subsequent works demonstrating the flexibility of sumcheck to accommodate additional desirable properties, the zero knowledge [GMR89] properties of sumcheck have not been explored. This is surprising because zero knowledge, the ability of the prover to establish the validity of an assertion while revealing no insight into its proof, is highly desirable for the cryptographic applications of Interactive Proofs. Unfortunately, achieving zero knowledge is nontrivial because the sumcheck protocol reveals the results of intermediate computations, in particular the partial P sums ~α∈Hm−i F (c1, . . . , ci, ~α) for c1, . . . , ci ∈ F chosen by the verifier. These partial sums are in general #P-hard to compute so they convey significant additional knowledge to the verifier. The goal of this work is to enlarge the existing algebraic toolkit based on low-degree polynomials and use these tools to provide a native extension of the sumcheck protocol that is zero knowledge, and to explore applications of such an extension. As we discuss shortly, however, we cannot expect to do so within the model of Interactive Proofs.
1.1 Prior techniques for achieving zero knowledge We briefly describe why existing methods fall short of our goal, which is making the sumcheck protocol zero knowledge in an algebraic way. A seminal result in cryptography says that if one-way functions exist then every language having an IP also has a computational zero knowledge IP [GMR89; IY87; BGGHKMR88]; this assumption is ‘minimal’ in the sense that if one-way functions do not exist then computational zero knowledge IPs capture only “average-case” BPP [Ost91; OW93]. While powerful, such results are unsatisfactory from our perspective. First, cryptography adds significant efficiency overheads, especially when used in a non-blackbox way as these results do. Second, the results rely on transformations that erase all the algebraic structure of the underlying protocols. While these limitations can be mitigated by using cryptography that leverages some of the underlying structure [CD98], the costs incurred by the use of cryptography remain significant. Ideally, we wish to avoid intractability assumptions. Unfortunately, this is impossible to achieve under standard complexity assumptions, because Interactive Proofs that are statistical zero knowledge are limited to languages in AM∩coAM [For87; AH91]. Such languages (conjecturally) do not even include NP, so that we cannot even hope to achieve a ‘zero knowledge sumcheck protocol’ (which would give #P).
3 The quest for zero knowledge without relying on intractability assumptions led to the formulation of Multi-prover Interactive Proofs (MIPs) [BGKW88], where the verifier exchanges messages with two or more non-communicating provers. Groundbreaking results establish that MIPs are very powerful: all (and only) languages decidable in non- deterministic exponential time have MIPs [BFL91] and, in fact, even perfect zero knowledge MIPs [BGGHKMR88; DFKNS92]. Similar results hold even for the related model of Probabilistically Checkable Proofs (PCPs) [FRS88; BFLS91; FGLSS96; AS98; ALMSS98], where the prover outputs a proof string that the verifier can check by reading only a few randomly-chosen locations. Namely, all (and only) languages decidable in non-deterministic exponential time have PCPs [BFLS91] and, in fact, even statistical zero knowledge PCPs [KPT97; IMSX15]. However, while information-theoretic, the aforementioned works rely on transformations that, once again, discard the rich algebraic structure of the underlying protocols. Thus, zero knowledge in this setting continues to be out of reach of simple and elegant algebraic techniques.
1.2 Our goal: algebraic techniques for zero knowledge Our goal is to develop information-theoretic techniques for achieving zero knowledge in a way that leverages, and preserves, the algebraic structure of the sumcheck protocol and other protocols that build on it. An additional goal is to preserve the simplicity and elegance of these foundational protocols. Yet, as discussed, we cannot hope to do so with Interactive Proofs, and so we work in another model. We choose to work in a model that combines features of both Interactive Proofs and PCPs: the Interactive PCP (IPCP) model of Kalai and Raz [KR08]. The prover first sends to the verifier a long string as a PCP oracle, after which the prover and verifier engage in an Interactive Proof. The verifier is free at any point to query the PCP oracle at locations of its choice, and the verifier only pays for the number of queries it makes, so that exponentially-large PCP oracles are allowed. Kalai and Raz [KR08] show that the IPCP model has efficiency advantages over both PCPs and IPs (individually). Goyal, Ishai, Mahmoody, and Sahai [GIMS10] construct efficient zero knowledge IPCPs, but their techniques mirror those for zero knowledge PCPs and, in particular, are not algebraic. One can think of the IPCP model as lying somewhere ‘in between’ the IP and MIP models. Indeed, it is equivalent to a (2-prover) MIP where one of the provers is stateless (its answers do not depend on the verifier’s prior messages or queries). This means that soundness is easier to achieve for an IPCP than for an MIP. Zero knowledge, however, is more difficult for an IPCP than for an MIP, because the stateless prover cannot choose which queries it will answer. A significant advantage of the IPCP model over the MIP model is that one can easily compile (public-coin) IPCPs into cryptographic proofs via transformations that preserve zero knowledge, while only making a black-box use of cryptography. For example, using collision-resistant functions one can obtain public-coin interactive arguments by extending ideas of [Kil92; IMSX15]; also, using random oracles one can obtain publicly-verifiable non-interactive arguments via [BCS16] (extending the Fiat–Shamir paradigm [FS86] and Micali’s “CS proofs” [Mic00]). In contrast, known transformations for MIPs yield private-coin arguments [BC12], or do not preserve zero knowledge [KRR14].
1.3 Main result: a zero knowledge sumcheck Our main result is a zero knowledge analogue of the sumcheck protocol [LFKN92], a key building block in many protocols. We now informally state and discuss this result, and in the next sub-section we discuss its applications. P The goal of the sumcheck protocol is to efficiently verify claims of the form “ ~α∈Hm F (~α) = 0”, where H is a subset of a finite field F and F is an m-variate polynomial over F of low individual degree. As the sumcheck protocol is often used in situations where the polynomial F is only implicitly defined (including this paper), it is helpful to adopt the P viewpoint of Meir [Mei13], regarding the sumcheck protocol as a reduction from the summation “ ~α∈Hm F (~α) = 0” to an evaluation “F (~c) = b”; the latter can be checked directly by the verifier or by another protocol. The verifier in this reduction does not need any information about F , aside from knowing that F has small individual degree. The completeness property of the reduction is that if the summation claim is true, then so is the evaluation claim with probability one. Its soundness property is that if the summation claim is false, then so is the evaluation claim with high probability. The theorem below states the existence of sumcheck protocol in the above sense that works in the IPCP model and is zero knowledge, which means that a verifier does not learn any information beyond the fact that F sums to 0 on Hm (and, in our case, a single evaluation of F ). As usual, this means that we establish an efficient procedure for simulating
4 the interaction of a (possibly malicious) verifier with the prover, where the simulator only uses the knowledge of the sum of F on Hm (and a single evaluation of F ) but otherwise has no actual access to the prover (or F ). This interaction is a random variable depending on the randomness of both the verifier and the prover, and we show that the simulated interaction perfectly replicates this random variable. Theorem 1.1 (Informal version of Theorem 6.4). There exists an IPCP for sumcheck with the following zero knowledge guarantee: the view of any probabilistic polynomial-time verifier in the protocol can be perfectly and efficiently simulated by a simulator that makes only a single query to F . Moreover, we do not require the full power of the IPCP model: the honest prover’s PCP consists only of a random multi-variate polynomial over F of small individual degree. Our result significantly strengthens the IPCP for sumcheck of [BCFGRS16] (co-authored by this paper’s authors), which is only zero knowledge with respect to a simulator which requires unrestricted query access to F . Namely, in order to simulate a verifier’s view, their simulator must make a number of queries to F that equals the number of queries to the PCP oracle made by the verifier. This zero knowledge guarantee is weaker than the above because a malicious verifier can make an arbitrarily-large (but polynomial) number of queries to the PCP oracle. However, this weaker guarantee suffices in some cases, such as in the previous work [BCFGRS16]. Perhaps more damaging is that when using this ‘weakly zero knowledge’ sumcheck protocol recursively (as required in applications), we would incur an exponential blowup: each simulated query recursively requires further queries to be simulated. In contrast, the ‘strongly zero knowledge’ sumcheck protocol that we achieve only requires the simulator to make a single query to F regardless of the malicious verifier’s runtime, both providing a stronger zero knowledge guarantee and avoiding any exponential blow-up. An important property of our sumcheck protocol (which also holds for [BCFGRS16]), is that it suffices for the honest prover to send as an oracle a uniformly random polynomial of a certain arity and degree. This brings the result ‘closer to IP’, in the sense that while IPCP captures all of NEXP, only languages in PSPACE have IPCPs (with perfect completeness) where the honest prover behaves in this way. The same property holds for some of our applications. Algebraic commitments. As detailed in Section 2, a key ingredient of our result is a commitment scheme based on algebraic techniques. That is, the prover wishes to commit to a value b ∈ F, and to do so sends a PCP oracle to the verifier. To then reveal (decommit) b, the prover and verifier engage in an Interactive Proof. We show that the sumcheck protocol naturally yields such a commitment scheme, where the PCP oracle is simply a random low-degree polynomial P R such that ~α∈Hm R(~α) = b. The soundness guarantee of the sumcheck protocol shows that this commitment scheme is binding, so that the prover cannot “reveal” a value other than the correct b. To establish the hiding property, which states that the verifier cannot learn anything about b before the prover reveals it, we leverage lower bounds on the algebraic query complexity of polynomial summation, previously studied for completely different reasons [AW09; JKRS09]. As our commitments are themselves defined by low degree polynomials, they are ‘transparent’ to low degree testing. That is, in various protocols the prover sends to the verifier the evaluation table of a low-degree polynomial as a PCP oracle, and the verifier ensures that this evaluation table is (close to) low degree via low degree testing. In our zero knowledge setting, we need the prover to hide the evaluation table under a commitment (to be revealed selectively), and yet still allow the verifier to check that the underlying evaluation table represents a low-degree polynomial. Our commitments naturally have this property due to their algebraic structure, which we exploit in our applications discussed below. Overall, the methods of this paper not only significantly deviate from traditional methods for achieving zero knowledge but also further illustrate the close connection between zero knowledge and Algebraic Complexity Theory, the theory of efficient manipulations of algebraic circuits and low-degree polynomials. This connection was first seen in our prior work developing the ‘weakly zero knowledge sumcheck’ of [BCFGRS16], used here as a subroutine. Indeed, this subroutine derives its zero-knowledge guarantee from an efficient algorithm for adaptively simulating random low-degree polynomials [BCFGRS16; BW04]. This algorithm itself relies on deterministic algorithms for polynomial identity testing of certain restricted classes of algebraic circuits [RS05]. We believe that it is an exciting research direction to further investigate this surprising connection, and to further broaden the set of information-theoretic algebraic techniques that are useful towards zero knowledge.
5 1.4 Applications: delegating computation in zero knowledge The original sumcheck protocol (without zero knowledge) has many applications, including to various methods of delegating computation. Our zero knowledge sumcheck protocol can be used to obtain zero knowledge analogues of foundational results in this area: we achieve natural zero knowledge extensions of the first construction of PCPs/MIPs [BFL91; BFLS91], Shamir’s protocol [Sha92], and doubly-efficient Interactive Proofs for low-depth circuits [GKR15].
1.4.1 Delegating non-deterministic exponential time One of the earliest and most influential applications of the sumcheck protocol is the construction of Multi-prover Interactive Proofs for NEXP due to Babai, Fortnow, and Lund [BFL91]; the same construction also demonstrated the power of low-degree testing as a tool for checking arbitrary computations, another highly influential insight. The subsequent improvements by Babai, Fortnow, Levin, and Szegedy [BFLS91] led to the formulation and the study of Probabilistically-Checkable Proofs [BFLS91; FGLSS96] and then the celebrated PCP Theorem [AS98; ALMSS98]. We show how, by using our zero knowledge sumcheck protocol, we can obtain a zero knowledge analogue of the classical constructions of [BFL91; BFLS91].
Theorem 1.2 (Informal version of Theorem 7.2). NEXP has perfect zero knowledge Interactive PCPs. Our construction extends the protocol of [BFL91; BFLS91], which can be viewed as an IPCP that is later ‘compiled’ into an MIP or a PCP. This protocol reduces the NEXP-complete problem of determining the satisfiability of a ‘succinct’ 3CNF, to testing whether there exists a low-degree polynomial satisfying an (exponentially large) set of constraints. A polynomial satisfying these constraints is necessarily a low-degree extension of a satisfying assignment, and thus implies the existence of such an assignment. The prover sends such a polynomial as an oracle, which the verifier then low-degree tests. That the constraints are satisfied can be checked using the sumcheck protocol. To make this protocol zero knowledge we need to ensure that the oracle hides the original witness. We achieve this by sending not the low-degree extension itself but an algebraic commitment to it. The zero knowledge sumcheck then reduces the problem of checking the constraint on this witness to a single evaluation point of the low degree extension. However, this itself is not zero knowledge as evaluations of the low-degree extension can reveal information about the witness, especially if this evaluation is over the interpolating set Hm. Thus, our construction exploits the fact that the sumcheck protocol works for any low-degree extension of the witness, and not just the one of minimal degree. Thus, the prover will instead send (the commitment to) a randomly sampled extension of the witness of slightly higher (but still constant) individual degree. The evaluations of this polynomial will (outside the interpolating set Hm) be O(1)-wise independent. As the sumcheck reduction will reduce to an evaluation point outside Hm with high probability, the prover can then decommit the evaluation at this point to complete the sumcheck protocol without revealing any non-trivial information. We discuss this construction in more detail in Section 2.1.
1.4.2 Delegating polynomial space The above result shows that a powerful prover can convince a probabilistic polynomial-time verifier of NEXP statements in perfect zero knowledge in the IPCP model. Now we turn our attention to protocols where the honest prover need not be a NEXP machine. One such protocol, due to Shamir [Sha92], provides an Interactive Proof for the PSPACE-complete True Quantified Boolean Formula (TQBF) problem. This protocol is a more sophisticated application of the sumcheck protocol because sumcheck is applied recursively. We aim to obtain zero knowledge analogues for these types of more complex protocols as well but now, to tackle the greater complexity, we proceed in two steps. First, we design a generic framework called sum-product circuits (which we believe to be of independent interest) that can express in a unified way a large class of ‘sumcheck-based Interactive Proofs’, such as Shamir’s protocol. Second, we show how to use our zero knowledge sumcheck protocol to obtain zero knowledge analogues of these, and thus also for Shamir’s protocol. We discuss this further in Section 2.5. As before, the resulting protocols are within the IPCP model. However, a key feature of these protocols that differentiates them from our result for NEXP is that the prover does not need the full power of IPCPs: it suffices for the honest prover to send a PCP oracle that is the evaluation table of a random low-degree polynomial. Of course, soundness will continue to hold against any malicious prover that uses the PCP oracle in arbitrary ways.
6 Theorem 1.3 (Informal version of Theorem 10.2). PSPACE has perfect zero knowledge Interactive PCPs, where the honest prover sends a random low-degree polynomial as the oracle.
As discussed above, any language having an IPCP where the honest prover sends a random low-degree polynomial (and achieves perfect completeness) can be decided in PSPACE. In contrast, in the general case, deciding languages having IPCPs is NEXP-hard. This result shows that moreover, all that is required to achieve unconditional zero knowledge for PSPACE is the ability to send a uniformly random polynomial as an oracle.
1.4.3 Delegating low-depth circuits The doubly-efficient Interactive Proofs for low-depth circuits due to Goldwasser, Kalai, and Rothblum [GKR15] are another landmark result that makes a recursive use of the sumcheck protocol. Their construction can be viewed as a ‘scaled down’ version of Shamir’s protocol where the prover is efficient and the verifier is highly efficient. We obtain a zero knowledge analogue of this protocol; again it suffices for the honest prover to send a random low- degree polynomial as the oracle (and soundness holds against any oracle). We do so by showing how the computation of low-depth circuits can be reduced to a corresponding sum-product circuit, by following the arithmetization in [GKR15]; then we rely on our zero knowledge results for sum-product circuits, mentioned above. The protocol of [GKR15] is an IP for delegating certain tractable computations: the evaluation of log-space uniform NC (circuits of polynomial size and polylogarithmic depth). The prover runs in polynomial time, while the verifier runs in quasilinear time and logarithmic space. But what does achieving zero knowledge mean in this case? If the simulator can run in polynomial time, then it can trivially simulate the verifier’s view by simply running the honest prover. We thus need to consider a fine-grained notion of zero knowledge, by analyzing in more detail the overhead incurred by the simulator with respect to the malicious verifier’s running time. This reckoning is similar to [BRV17], who study zero knowledge for Interactive Proofs of Proximity, and is a relaxation of knowledge tightness [Gol01, Section 4.4.4.2]. Concretely, we show that the running time of our simulator is a fixed (and small) polynomial in the verifier’s running time, with only a polylogarithmic dependence on the size of the circuit. For example, the view of a malicious verifier running in time, say, O(n2) can be simulated in time O˜(n6). If the circuit has size O(n8), then the zero knowledge guarantee is meaningful because the simulator is not able to evaluate the circuit. Theorem 1.4 (Informal version of Theorem 11.1). Log-space uniform NC has perfect zero knowledge Interactive PCPs, where the honest prover sends a random low-degree polynomial as the oracle, and the verifier runs in quasilinear time and logarithmic space. The simulator overhead is a small polynomial: the view of a verifier running in time T can be simulated in time T 3 · polylog(n). An interesting open problem is whether the simulator overhead can be reduced to T · polylog(n), as required in the (quite strict) definition given in [Gol01, Section 4.4.4.2]. It seems that our techniques are unlikely to achieve this because they depend on solving systems of linear equations in Ω(T ) variables. Finally, a property in [GKR15] that has been very useful in subsequent work is that the verifier only needs to query a single point in the low-degree extension of its input. In this case, the verifier runs in polylogarithmic time and logarithmic space. Our zero knowledge analogue retains these properties. Additionally, in this setting the size of the circuit is subexponential in the running time of the verifier. Our zero knowledge guarantee then implies that we obtain zero knowledge under the standard (not fine-grained) definition.
7 2 Techniques
We summarize the techniques underlying our contributions. We begin in Section 2.1 by recalling the protocol of Babai, Fortnow, and Lund, in order to explain its sources of information leakage and how one could prevent them via algebraic techniques. This discussion motivates the goal of an algebraic commitment scheme, described in Section 2.2. Then in Section 2.3 we explain how to use this tool to obtain our main result, a zero knowledge sumcheck protocol. The rest of the section is then dedicated to explaining how to achieve our other applications, which involve achieving zero knowledge for recursive uses of the sumcheck protocol. First we explain in Section 2.4 what are the challenges that arise with regard to zero knowledge in recursive invocations of the sumcheck protocol, such as in the protocol of Shamir. Then in Section 2.5 we describe the framework of sum-product circuits, and the techniques within it that allows us to achieve zero knowledge for the protocols of Shamir and of Goldwasser, Kalai, and Rothblum.
2.1 An algebraic approach for zero knowledge in the BFL protocol We recall the protocol of Babai, Fortnow, and Lund [BFL91] (‘BFL protocol’), in order to explain its sources of information leakage and how one could prevent them via algebraic techniques. These are the ideas that underlie our algebraic construction of an unconditional (perfect) zero knowledge IPCP for NEXP (see Section 1.4.1). The BFL protocol, and why it leaks. The O3SAT problem is the following NEXP-complete problem: given a boolean formula B, does there exist a boolean function A such that
r s B(z, b1, b2, b3,A(b1),A(b2),A(b3)) = 0 for all z ∈ {0, 1} , b1, b2, b3 ∈ {0, 1} ? The BFL protocol constructs an IPCP for O3SAT and later converts it to an MIP. Only the first step is relevant for us. In the BFL protocol, the honest prover first sends a PCP oracle Aˆ: Fs → F that is the unique multilinear extension (in some finite field F) of a valid witness A: {0, 1}s → {0, 1}. The verifier must check that (a) Aˆ is a boolean function on {0, 1}s, and (b) Aˆ’s restriction to {0, 1}s is a valid witness for B. To do these checks, the verifier arithmetizes B into an arithmetic circuit Bˆ, and reduces the checks to conditions that involve Aˆ, Bˆ, and other low-degree polynomials. A technique of [BFLS91] allows the verifier to ‘bundle’ all of these conditions into a low-degree polynomial f such that (with high probability over the choice of f) the conditions hold if and only if f sums to 0 on {0, 1}r+3s+3. The verifier checks that this is the case via a sumcheck protocol with the prover. The soundness of the sumcheck protocol depends on the PCP oracle being the evaluation of a low-degree polynomial; the verifier checks this using a low-degree test. We see that the BFL protocol is not zero knowledge for two reasons: (i) the verifier has oracle access to Aˆ and, in particular, to the witness A; (ii) the prover’s messages during the sumcheck protocol leak further information about A (namely, hard-to-compute partial sums of f, which itself depends on A). A blueprint for zero knowledge. We now describe the ‘blueprint’ for an approach to achieve zero knowledge in the BFL protocol. The prover does not send Aˆ directly but instead a commitment to it. After this, the prover and verifier engage in a sumcheck protocol with suitable zero knowledge guarantees; at the end of this protocol, the verifier needs to evaluate f at a point of its choice, which involves evaluating Aˆ at three points. Now the prover reveals the requested values of Aˆ, without leaking any information beyond these, so that the verifier can perform its check. We explain how these ideas motivate the need for certain algebraic tools, which we later obtain and use to instantiate our approach. (1) Randomized low-degree extension. Even if the prover reveals only three values of Aˆ, these may still leak information about A. We address this problem via a randomized low-degree extension. Indeed, while the prover in the BFL protocol sends the unique multilinear extension of A, one can verify that any extension of A of sufficiently low degree also works. We exploit this flexibility as follows: the prover randomly samples Aˆ in such a way that any three evaluations of Aˆ do not reveal any information about A. Of course, if any of these evaluations is within {0, 1}s, then no extension of A has this property. Nevertheless, during the sumcheck protocol, the prover can ensure that the verifier chooses only evaluations outside of {0, 1}s (by aborting if the verifier deviates), which incurs only a small increase in the soundness error. With this modification in place, it suffices for the prover to let Aˆ be a random degree-4 extension of A: by a dimensionality argument, any 3 evaluations outside of {0, 1}s are now independent and uniformly random in F. Remarkably, we are thus able to reduce a claim about A to a claim which contains no information about A. (2) Low-degree testing the commitment. The soundness of the sumcheck protocol relies on f having low degree, or at least being close to a low-degree polynomial. This in turn depends on the PCP oracle Aˆ being close to a low-degree
8 polynomial. If the prover sends Aˆ, the verifier can simply low-degree test it. However, if the prover sends a commitment to Aˆ, then it is not clear what the verifier should do. One option would be for the prover to reveal more values of Aˆ (in addition to the aforementioned three values), in order to enable the verifier to conduct its low-degree test on Aˆ. The prover would then have to ensure that revealing these additional evaluations is ‘safe’ by increasing the amount of independence among values in Aˆ (while still restricting the verifier to evaluations outside of {0, 1}s), which would lead to a blowup in the degree of Aˆ that is proportional to the number of queries that the verifier wishes to make. For the low-degree test to work, however, the verifier must see more evaluations than the degree. In sum, this circularity is inherent. To solve this problem, we will design an ‘algebraic’ commitment scheme that is transparent to low-degree tests: the verifier can perform a low-degree test on the commitment itself (without the help of the prover), which will ensure access to a Aˆ that is low-degree. We discuss this further in Section 2.2. (3) Sumcheck in zero knowledge. We need a sumcheck protocol where the prover’s messages leak little information about f. The prior work in [BCFGRS16] achieves an IPCP for sumcheck that is ‘weakly’ zero knowledge: any verifier learns at most one evaluation of f for each query it makes to the PCP oracle. If the verifier could evaluate f by itself, as was the case in that paper, this guarantee would suffice for zero knowledge. In our setting, however, the verifier cannot evaluate f by itself because f is (necessarily) hidden behind the algebraic commitment. One approach to compensate would be to further randomize Aˆ by letting Aˆ be a random extension of A of some well-chosen degree d. We are limited to d of polynomial size because the honest verifier’s running time is Ω(d). But this means that a polynomial-time malicious verifier, participating in the protocol of [BCFGRS16] and making d2 queries to the PCP oracle, could learn information about A. We resolve this by relying on more algebraic techniques, achieving an IPCP for sumcheck with a much stronger zero knowledge guarantee (see Theorem 1.1): any malicious verifier that makes polynomially-many queries to the PCP oracle learns only a single evaluation of f. This suffices for zero knowledge in our setting: learning one evaluation of f implies learning only three evaluations of Aˆ, which can be made ‘safe’ if Aˆ is chosen to be a random extension of A of high-enough degree. Our sumcheck protocol uses as building blocks both our algebraic commitment scheme and the [BCFGRS16] sumcheck; we summarize its construction in Section 2.3. Remark 2.1. Kilian, Petrank, and Tardos [KPT97] construct PCPs for NEXP that are statistical zero knowledge, via a combinatorial construction that makes black-box use of the PCP Theorem. Our modification of the BFL protocol achieves a perfect zero knowledge IPCP via algebraic techniques, avoiding the use of the PCP Theorem.
2.2 Algebraic commitments from algebraic query complexity lower bounds We describe how the sumcheck protocol can be used to construct an information-theoretic commitment scheme that is ‘algebraic’, in the IPCP model. (Namely, an algebraic interactive locking scheme; see Remark 2.2 below.) The prover commits to a message by sending to the verifier a PCP oracle that perfectly hides the message; subsequently, the prover can reveal positions of the message by engaging with the verifier in an Interactive Proof, whose soundness guarantees statistical binding. A key algebraic property that we rely on is that the commitment is ‘transparent’ to low-degree tests. Committing to an element. We first consider the simple case of committing to a single element a in F. Let k be a k N PN security parameter, and set N := 2 . Suppose that the prover samples a random B in F such that i=1 Bi = a, and sends B to the verifier as a commitment. Observe that any N − 1 entries of B do not reveal any information about a, and so any verifier with oracle access to B that makes less than N queries cannot learn any information about a. PN However, as B is unstructured it is not clear how the prover can convince the verifier that i=1 Bi = a. Instead, we can consider imbuing B with additional structure by providing its low-degree extension. That is, the prover thinks of B as a function from {0, 1}k to F, and sends its unique multilinear extension Bˆ : Fk → F to the verifier. Subsequently, the prover can reveal a to the verifier, and then engage in a sumcheck protocol for the claim P ˆ ~ “ β~∈{0,1}k B(β) = a” to establish the correctness of a. The soundness of the sumcheck protocol protects the verifier against cheating provers and hence guarantees that this scheme is binding. However, giving B additional structure calls into question the hiding property of the scheme. Indeed, surprisingly a result of [JKRS09] shows that this new scheme is not hiding (in fields of characteristic different than 2): it holds that Bˆ(2−1,..., 2−1) = a · 2−k for any choice of B, so the verifier can learn a with only a single query to Bˆ! Sending an extension of B has created a new problem: querying the extension outside of {0, 1}k, the verifier can learn information that may require many queries to B to compute. Indeed, this additional power is precisely what
9 underlies the soundness of the sumcheck protocol. To resolve this, we need to understand what the verifier can learn about B given some low-degree extension Bˆ. This is precisely the setting of algebraic query complexity [AW09]. A natural approach is to let Bˆ be chosen uniformly at random from the set of degree-d extensions of B for some d > 1. It is not hard to see that if d is very large (say, |F|) then 2k queries are required to determine the summation of Bˆ on Hm. But we need d to be small to achieve soundness. A result of [JKRS09] shows that d = 2 suffices: given a ˆ k ˆ P ˆ ~ random multiquadratic extension B of B, one needs 2 queries to B to determine β~∈{0,1}k B(β). Committing to a polynomial. The prover in our zero knowledge protocols needs to commit not just to a single element but to the evaluation of an m-variate polynomial Q over F of degree dQ. We extend our ideas to this setting. ~x N PN ~x Letting K be a subset of F of size dQ + 1, the prover samples a random B in F such that i=1 Bi = Q(~x) for each ~x ∈ Km. We can view all of these strings as a single function B : Km × {0, 1}k → F, and as before we consider ˆ m k ˆ ~ ~ ~ its unique low-degree extension B : F × F → F; viewed as a polynomial, B(X, Y ) has degree at most dQ in X and ~ P ˆ ~ ~ is multilinear in Y . Observe that since β~∈{0,1}k B(X, β) is a polynomial of individual degree dQ that agrees with Q on Km, it must equal Q. The binding property of the commitment scheme is clear: the prover can decommit to Q(~α) for any ~α ∈ Fm by using the sumcheck protocol as before. We are left to argue the hiding property. It is not difficult to see that we run into the same issue as in the single-value case: we have Bˆ(~α, 2−1,..., 2−1) = Q(~α) · 2−k for any ~α ∈ Fm. We resolve this by again choosing a random extension Bˆ of degree d > 1 in Y~ (and degree dQ in X~ ). Yet, arguing the hiding property now requires a stronger statement than the one proved in [JKRS09]. Not only do we need to know that the verifier cannot determine Q(~α) for a particular ~α ∈ Fm, but we need to know that the verifier cannot determine Q(~α) for any ~α ∈ Fm, or even any linear combination of any such values. We prove that this stronger guarantee holds in the same parameter regime: if d > 1 then 2k queries are both necessary and sufficient. Transparency to low-degree tests. Recall that a key algebraic property we required from our commitment scheme is that the verifier can perform a low-degree test on the committed polynomial without the assistance of the prover. Our commitment scheme naturally has this property. If the PCP oracle B˜ : Fm × Fk → F is low-degree, then it is a m ~ P ˜ ~ ~ ˜ m k commitment to the low-degree Q: F → F defined as Q(X) := β~∈{0,1}k B(X, β). In fact, even if B : F ×F → F is merely close to a low-degree Bˆ : Fm × Fk → F, then we can still regard B˜ as a commitment to the low-degree m ~ P ˆ ~ ~ Q: F → F defined as Q(X) := β~∈{0,1}k B(X, β), because the verifier can check claims of the form “Q(~α) = a” by obtaining the value of Bˆ it needs at the end of the sumcheck protocol via self-correction on B˜. Beyond the boolean hypercube. For efficiency reasons analogous to those in [BFLS91; GKR15], instead of viewing 0 B as a function from Km × {0, 1}k to F, we view B as a function from Km × Hk to F for a subset H of F of size Ω(1) and k0 := log N/ log |H|. This requires us to extend our claims about the algebraic query complexity of F 0 polynomial summation to arbitrary sets H. We show that if d > 2(|H| − 1), then |H|k = N queries are necessary to determine Q(~α) for any ~α (or any linear combination of these). See Section 5 for details. Decommitting in zero knowledge. To use our commitment scheme in zero knowledge protocols, we must ensure that, in the decommitment phase, the verifier cannot learn any information beyond the value a := Q(~α) for a chosen ~α. P ˆ ~ To decommit, the prover sends the value a and has to convince the verifier that the claim “ β~∈{0,1}k B(~α, β) = a” is true. However, if the prover and verifier simply run the sumcheck protocol on this claim, the prover leaks partial sums P ˆ ~ β~∈{0,1}k−i B(~α, c1, . . . , ci, β) for c1, . . . , ci ∈ F chosen by the verifier, which could reveal additional information about Q. Instead, the prover and verifier run on this claim the IPCP for sumcheck of [BCFGRS16], whose ‘weak’ zero knowledge guarantee ensures that this cannot happen. (Thus, in addition to the commitment, the honest prover also sends the evaluation of a random low-degree polynomial as required by the IPCP for sumcheck of [BCFGRS16].)
Remark 2.2 (comparison with [GIMS10]). Goyal, Ishai, Mahmoody, and Sahai [GIMS10] define and construct inter- active locking schemes, information-theoretic commitment schemes in the IPCP model. Their scheme is combinatorial, and we do not know how to use it in our setting (it is not clear how to low-degree test the committed message without disrupting zero knowledge). Putting this difference aside, their construction and our construction are incomparable. On the one hand, we achieve perfect hiding while they only achieve statistical hiding. On the other hand, their scheme is ‘oracle efficient’ (any query to the oracle can be computed statelessly in polynomial time) while our scheme is not.
10 2.3 A zero knowledge sumcheck protocol We summarize the ideas behind our main result, a zero knowledge sumcheck protocol (see Theorem 1.1). This result not only enables us to modify the BFL protocol to achieve zero knowledge (as discussed above), but also to modify the Shamir and GKR protocols to achieve zero knowledge (as discussed below). The two building blocks underlying our sumcheck protocol are our algebraic commitments (see Section 2.2 above) and the IPCP for sumcheck of [BCFGRS16]. We now cover necessary background and then describe our protocol. P Previous sumcheck protocols. The sumcheck protocol [LFKN92] is an IP for claims of the form “ ~α∈Hm F (~α) = 0”, where H is a subset of a finite field F and F is an m-variate polynomial over F of small individual degree. The proto- P col has m rounds: in round i, the prover sends the univariate polynomial gi(Xi) := ~α∈Hm−i F (c1, . . . , ci−1,Xi, ~α); the verifier checks that P and replies with a uniformly random challenge ∈ . After αi∈H gi(αi) = gi−1(ci−1) ci F round m, the verifier outputs the claim “F (c1, . . . , cm) = gm(c1, . . . , cm)”. If F is of sufficiently low degree and does not sum to a over the space, then the output claim is false with high probability. Note that the verifier does not need access to F . The IPCP for sumcheck of [BCFGRS16] modifies the above protocol as follows. The prover first sends a PCP oracle that equals the evaluation of a random ‘masking’ polynomial R; the verifier checks that R is (close to) low degree. After P that the prover and verifier conduct an Interactive Proof. The prover sends z ∈ F that allegedly equals ~α∈Hm R(~α), and the verifier responds with a uniformly random challenge ρ ∈ F∗. The prover and verifier now run the (standard) P sumcheck protocol to reduce the claim “ ~α∈Hm ρF (~α) + R(~α) = ρa + z” to a claim “ρF (~c) + R(~c) = b” for random m b−R(~c) P ~c ∈ F . The verifier queries R at ~c and then outputs the claim “F (~c) = ρ ”. If ~α∈Hm F (~α) =6 a then with high probability over the choice of ρ and the verifier’s messages in the sumcheck protocol, this claim will be false. A key observation is that if the verifier makes no queries to R, then the prover’s messages are identically distributed to the sumcheck protocol applied to a uniformly random polynomial Q. When the verifier does make queries to R, simulating the resulting conditional distribution involves techniques from Algebraic Complexity Theory, as shown in [BCFGRS16]. Given Q, the verifier’s queries to R(~α) for ~α ∈ Fm are identically distributed to Q(~α) − ρF (~α). Thus the simulator need only make at most one query to F for every query to R. That is, any verifier making q queries to R learns no more than it would learn by making q queries to F alone. As discussed, this zero knowledge guarantee does not suffice for the applications that we consider: when a sumcheck protocol is used as a subroutine of another protocol, F may itself be recursively defined in terms of large sums which the verifier cannot evaluate on its own. The verifier does, however, have oracle access to R, and so can learn enough information about F to break zero knowledge. Our sumcheck protocol. The zero knowledge guarantee that we aim for is the following: any polynomial-time verifier learns no more than it would by making one query to F , regardless of its number of queries to the PCP oracle. The main idea to achieve this guarantee is the following. The prover sends a PCP oracle that is an algebraic commitment Z to the aforementioned masking polynomial R. Then, as before, the prover and verifier run the sumcheck P m protocol to reduce the claim “ ~α∈Hm ρF (~α) + R(~α) = ρa + z” to a claim “ρF (~c) + R(~c) = b” for random ~c ∈ F . b−R(~c) We now face two problems. First, the verifier cannot simply query R at ~c and then output the claim “F (~c) = ρ ”, since the verifier only has oracle access to the commitment Z of R. Second, the prover could cheat the verifier by having Z be a commitment to an R that is far from low degree, which allows cheating in the sumcheck protocol. The first problem is addressed by the fact that our algebraic commitment scheme has a decommitment sub-protocol that is zero knowledge: the prover can reveal R(~c) in such a way that no other values about R are also revealed as a side-effect. As discussed, this relies on the protocol of [BCFGRS16], used a subroutine (for the second time). The second problem is taken care of by the fact that our algebraic commitment scheme is ‘transparent’ to low-degree tests: the verifier simply performs a low-degree test on Z, which by self-correction gives the verifier oracle access to a low-degree Z0 that is a commitment to a low-degree R. Overall, the only value that a malicious verifier can learn is F (~c) for ~c ∈ Fm of its choice. Remark 2.3. Our sumcheck protocol ‘leaks’ a single evaluation of F . We believe that this limitation is inherent: the honest verifier always outputs a true claim about one evaluation of F , which it cannot do without learning that evaluation. Either way, this guarantee is strong enough for applications: we ensure that learning a single evaluation of F does not harm zero knowledge, either because it carries no information or because the verifier can evaluate F itself.
11 2.4 Challenges: handling recursion We have so far discussed the ideas behind our main result (a zero knowledge sumcheck protocol) and how to use it to achieve a natural zero knowledge analogue of the classical MIP/PCP construction for NEXP [BFL91; BFLS91]. Other applications require additional ideas to overcome challenges that arise when the sumcheck protocol is used recursively. Shamir’s protocol. Consider the goal of achieving zero knowledge for Shamir’s protocol for PSPACE [Sha92]. This protocol reduces checking any PSPACE computation to checking that: X Y X Y ... φˆ(x1, . . . , xn) = 0
x1∈{0,1} x2∈{0,1} xn−1∈{0,1} xn∈{0,1} where φˆ is the (efficiently computable) arithmetization over a finite field F of a certain boolean formula φ. Since Shamir’s protocol is similar to the sumcheck protocol, a natural starting point would be to try to merely adapt the techniques that ‘worked’ in the case of the sumcheck protocol. However, the similarity between the two protocols is only superficial (e.g., it lacks the useful linear structure present in the sumcheck protocol). An accurate way to compare the two is to view Shamir’s protocol as a recursive application of the sumcheck protocol, as shown by Meir [Mei13]. For example, the TQBF problem is downward self-reducible [TV07]: for i ∈ {1, . . . , n} let X Y X Y Gi(X1,...,Xi) := ... φˆ(X1,...,Xi, xi+1, . . . , xn) .
xi+1∈{0,1} xi+2∈{0,1} xn−1∈{0,1} xn∈{0,1}
From this one obtains the recurrence X Gi(X1,...,Xi) = Gi+2(X1,...,Xi, xi+1, 0) · Gi+2(X1,...,Xi, xi+1, 1) . (1)
xi+1∈{0,1}
In other words, with oracle access to Gi+2, one can use a (small) sumcheck to compute Gi. This suggests a recursive approach: if we could check evaluations of Gi+2 in zero knowledge, then maybe we could use this as a subprotocol to check evaluations of Gi also in zero knowledge. GKR’s protocol. The recursive structure is perhaps more evident in the doubly-efficient Interactive Proof of Gold- wasser, Kalai, and Rothblum [GKR15] (‘GKR protocol’). Its barebones sub-protocol checks a more complex arithmetic expression: the output of a layered arithmetic circuit. Fix an input x to the circuit, and let Vi(j):[S] → F be the value of the j-th gate in layer i (S is the number of gates in a layer). For some subset H ⊆ F and sufficiently large m, one m m views Vi as a function from H to F by imposing some ordering on H . One can relate Vi−1 to Vi as follows: X Vi−1(~z) = addi(~z, ~ω1, ~ω2) · Vi(~ω1) + Vi(~ω2) + muli(~z, ~ω1, ~ω2) · Vi(~ω1) · Vi(~ω2) (2) m ~ω1,~ω2∈H where addi(~z, ~ω1, ~ω2) is 1 if the ~z-th gate in layer i − 1 is an addition gate whose inputs are the ~ω1-th and ~ω2-th gates in layer i, and muli is defined similarly for multiplication gates. We again see a recursive structure: a function defined as the summation over some product space of a polynomial whose terms are functions of the same form; this allows to check Vi−1 given a protocol for checking Vi. The use of recursion in the GKR protocol is even more involved: the barebones protocol relies on the verifier having oracle access to low-degree extensions of addi and muli. For very restricted classes of circuits, the verifier can efficiently ‘implement’ these oracles; however, for the class of circuits that is ultimately supported by the protocol this requires a further sub-protocol that delegates the evaluation of these oracles to the prover, and this is done by composing multiple instances of the GKR protocol. To achieve zero knowledge we also have to tackle this form of recursion. The leakage of recursion. By now the central role of recursion in applications of the sumcheck protocol is clear. There are two main sources of leakage that we need to overcome in order to achieve zero knowledge in such applications.
1. Checking evaluations of Gi+2, Vi, or addi and muli, even in zero knowledge, leaks the evaluations themselves. The verifier, however, is not able to compute these itself (else it would not need to delegate), which means that information is leaked.
12 2. The number of claims can grow exponentially: a claim about Gi (resp. Vi−1) is reduced to two claims about Gi+2 (resp. Vi). There are standard techniques that leverage interaction to reduce multiple claims about a low-degree polynomial to a single one, but we need to replace these with zero knowledge equivalents. We tackle both issues by devising a general framework that captures their shared algebraic structure, solving these problems within this framework, and then recovering the protocols of Shamir and GKR as special cases.
2.5 Sum-product circuits We introduce the notion of sum-product circuits and show that the sumcheck protocol naturally gives rise to algebraic Interactive Proofs for checking the value of such circuits. We then explain how to achieve zero knowledge variants of these by building on the techniques discussed in Section 2.1. We recover zero knowledge variants of the protocols of Shamir and GKR as special cases of this approach. Sum-product circuits are an abstract way of encoding ‘sum-product expressions’. A sum-product expression is either a polynomial over some finite field F represented by a small arithmetic circuit or a polynomial of the form X C X,~ β,~ P1(X,~ β~),...,Pn(X,~ β~) (3) β~∈Hm where C is a low-degree ‘combiner’ polynomial represented by a small arithmetic circuit, and P1,...,Pn are sum- product expressions. Both Equation 1 (for Shamir’s protocol) and Equation 2 (for GKR’s protocol) are of this form. Like a standard arithmetic circuit, a sum-product circuit is a directed acyclic graph associated with a field F in which we associate to each vertex a value, which in our case is the sum-product expression that it computes. Each internal vertex is labeled by a combiner polynomial, and there is an edge from u to v if the sum-product expression of v appears in that of u. For example, the above expression would correspond to a vertex labeled with C, with outgoing edges to the vertices corresponding to P1,...,Pn. An input to the circuit is a labeling of the leaf vertices with small arithmetic circuits. We now spell this out in a little more detail.
Definition 2.4 (Informal version of Definition 8.8). A sum-product circuit C is a rooted directed acyclic graph where x each internal vertex is labeled with an arithmetic circuit Cv over a finite field F. An input to C labels each leaf v with x x x a polynomial v over F. The value of a vertex v on input is a multivariate polynomial v[ ] over F defined as follows: if v is a leaf vertex then v[x] equals xv; if instead v is an internal vertex then, for a chosen integer m, X v[x](X~ ) := Cv X,~ β,~ u1[x](X,~ β~), . . . , ut[x](X,~ β~) . (4) β~∈Hm
The value of C on input x is denoted C[x] and equals the value of the root vertex r (and we require that C[x] ∈ F). We next describe an Interactive Proof that works for any sum-product circuit. The protocols of Shamir [Sha92] and of GKR [GKR15] can be viewed as this protocol applied to specific sum-product circuits (computing G0 and V0 respectively). After that, we explain how to modify the Interactive Proof to obtain a corresponding zero knowledge IPCP for any sum-product circuit, which allows us to derive our zero knowledge variants of these two protocols. A significant advantage of working with sum-product circuits is that they are easy to compose. For example, we can view the composition of the GKR protocol with itself as a composition of sum-product circuits. We can then apply our zero knowledge IPCP to the resulting circuit and directly obtain a zero knowledge analogue of the full GKR protocol.
2.5.1 Delegating the evaluation of a sum-product circuit We explain how to use the sumcheck protocol to obtain an Interactive Proof for checking the value of a sum-product circuit. The protocol is recursively defined: to prove that C[x] = a (i.e., that r[x] = a), it suffices to show that the values of r’s children u1, . . . , ut satisfy Equation 4 where the left-hand side is a. The sumcheck protocol interactively x x m reduces this claim to a new claim “Cv ~c,u1[ ](~c), . . . , ut[ ](~c) = b” for ~c ∈ F chosen uniformly at random by the x x verifier and b ∈ F chosen by the prover. The prover sends h1 := u1[ ](~c), . . . , ht := ut[ ](~c), reducing this new claim
13 to the set of claims “hi = ui[x](~c)” for i = 1, . . . , t and “Cv(~c,h1, . . . , ht) = b”. The latter can be checked by the verifier directly and the rest can be recursively checked via the same procedure. Eventually the protocol reaches the leaf vertices, which are labeled with small arithmetic circuits that the verifier can evaluate on its own. One technicality is that, as defined, the degree of the polynomial at a vertex may be exponentially large, and so the prover would have to send exponentially-large messages in the sumcheck protocol. To avoid this, we use a well-known interactive sub-protocol for degree reduction [She92; GKR15]. Since for all ~x the value v[x](~x) depends only on m u1[x](~x, β~), . . . , ut[x](~x, β~) for β~ ∈ H , we can safely replace each ui[x] with the unique degree-(|H| − 1) extension m uˆi[x] of its evaluation over H . The degree of the summand in Equation 4 is now at most δ|H|, where δ is the total degree of Cv. Now that the sumcheck protocol is only invoked on low-degree polynomials, efficiency is recovered. Another technicality is that since a sum-product circuit is a directed acyclic graph (as opposed to a tree), it is possible that a single vertex v will have many claims about it. If each such claim reduces to many claims about other vertices, the number of claims to check could grow exponentially. This is in fact the case in both Shamir’s and GKR’s protocols. To avoid this blowup, the verifier checks a random linear combination of the claims about each vertex v. It is not difficult to see that soundness is preserved, and the number of claims per vertex is reduced to one.
2.5.2 Achieving zero knowledge The Interactive Proof for sum-product circuits that we have described above is not zero knowledge. First, the sumcheck protocol, which is used to reduce claims about parent vertices to claims about child vertices, leaks information in the form of partial sums of the summand polynomial, as usual. Second, in order to reduce a claim about the root to claims about its children, the prover must provide evaluations of the polynomials of the children. These may be hard for the verifier to compute (indeed, if the verifier could compute both of these on its own then there would be no need to recurse). We use the ideas discussed in Section 2.1 to resolve both of these issues, obtaining a zero knowledge variant in the IPCP model (where the honest prover sends a random low-degree polynomial as the oracle). We resolve the first issue by using our zero knowledge sumcheck protocol. Its zero knowledge guarantee states that the protocol reveals only one value of the summand function, which can be computed via one query to each of the uˆi[x], which are precisely the hi’s sent by the prover. We are left to ensure that hi’s do not leak information. As in our modification of the BFL protocol, rather than taking the unique degree-(|H| − 1) extension vˆ[x](X~ ) of v[x](X~ ), we will instead take a random degree-(|H| + δ) extension v˙[x], where δ depends only on the circuit structure (in all of our protocols, δ is a small constant). This ensures that the few evaluations actually revealed by the prover are uniformly random in F. The prover sends, for each vertex v, the evaluation of a random polynomial Rv, which x ~ x ~ ~ ~ defines the random low-degree extension as v˙[ ](X) :=v ˆ[ ](X) + ZHm (X) · Rv(X) where ZHm is a degree-|H| m m polynomial that is zero on H and nonzero on (F − H) . The prover cannot simply send Rv, however, because the verifier could then query it in order to ‘derandomize’ v˙[x]. Instead, the prover sends a commitment to Rv using our algebraic commitment scheme. The decommitment is performed ‘implicitly’ during the sumcheck for vertex v. See Section 9.1 for details. Finally, recall that in order to avoid a blowup in the number of claims we have to check, the verifier checks a random linear combination of the claims about any given vertex; this is a linear operation. Also, to avoid a blowup in the degree, we take the low-degree extension, which is also a linear operation. Both of these operations are ‘compatible’ with sumcheck, and thus zero knowledge is straightforwardly maintained.
14 3 Roadmap
After providing formal definitions in Section 4, the rest of the paper is organized as summarized by the table below. The shaded boxes denote some previous results that we rely on.
§10: Theorem 10.2 §11: Theorem 11.1 §7: Theorem 7.2 PZK analogue PZK analogue PZK analogue of Shamir’s protocol of GKR’s protocol of BFLS’s protocol for PSPACE for low-depth circuits for NEXP
§9.1: Theorem 9.1 §9.2: Theorem 9.2 PZK for sum-product PZK for sum-product
sum-product alternations circuit evaluation circuit satisfaction
[BCFGRS16] §6: Theorem 6.4 PZK analogue of LFKN’s strong PZK sumcheck protocol for #P
§6 Interactive Probabilistically Checkable Proofs [BCFGRS16] perfectly-hiding weak PZK sumcheck statistically-binding sum-only computations algebraic commitment
[BCFGRS16] §5: Theorem 5.1 [RS05] succinct constraint detection lower bounds for derandomize PIT for sums for multi-variate low-degree algebraic query complexity of products of univariates
Algebraic polynomials and their sums of polynomial summation Complexity
15 4 Preliminaries 4.1 Basic notations
For n ∈ N we denote by [n] the set {1, . . . , n}. For m, n ∈ N we denote by m + [n] the set {m + 1, . . . , m + n}. For a set , ∈ , ⊆ , and ∈ n, we denote by the vector that is restricted to the coordinates in . X n N I [n] ~x X ~xI xi i∈I ~x I Functions, distributions, fields. We use f : D → R to denote a function with domain D and range R; given a subset ˜ ˜ D of D, we use f|D˜ to denote the restriction of f to D. Given a distribution D, we write x ← D to denote that x is sampled according to D. We denote by F a finite field and by Fq the field of size q. Arithmetic operations over Fq take time polylog q and space O(log q).
Polynomials. We denote by F[X1,...,m] the ring of polynomials in m variables over F. Given a polynomial P in
F[X1,...,m], degXi (P ) is the degree of P in the variable Xi. The individual degree of a polynomial is its maximum
degree in any variable, max1≤i≤m degXi (P ); we always refer to the individual degree unless otherwise specified. We ≤d denote by F[X1,...,m] the subspace consisting of P ∈ F[X1,...,m] with individual degree at most d. Languages and relations. We denote by L a language consisting of instances x, and by R a (binary ordered) relation consisting of pairs (x, w), where x is the instance and w is the witness. We denote by Lan(R) the language corresponding to R, and by R|x the set of witnesses in R for x (if x 6∈ Lan(R) then R|x := ∅). As always, we assume that |w| is bounded by some computable function of n := |x|; in fact, we are mainly interested in relations arising from nondeterministic languages: R ∈ NTIME(T ) if there exists a T (n)-time machine M such that M(x, w) outputs 1 if and only if (x, w) ∈ R. Throughout, we assume that T (n) ≥ n. Low-degree extensions. Let F be a finite filed, H a subset of F, and m a positive integer. The low-degree extension m ˆ ≤|H|−1 m (LDE) of a function f : H → F is denoted f and is the unique polynomial in F[X1,...,m ] that agrees with f on H . In particular, fˆ: Fm → F is defined as follows: X fˆ(X~ ) := IHm (X,~ β~) · f(β~) , β~∈Hm
m (Xi−γ)(Yi−γ) ≤|H|−1 m ~ ~ Q P Q where IH (X, Y ) := i=1 ω∈H γ∈H\{ω} (ω−γ)2 is the unique polynomial in F[X1,...,m ] such that, for m m all (~α, β~) ∈ H × H , IHm (~α, β~) equals 1 when ~α = β~ and equals 0 otherwise. Note that IHm (X,~ Y~ ) can be generated and evaluated in time poly(|H|, m, log |F|) and space O(log |F| + log m), so fˆ(~α) can be evaluated in time |H|m · poly(|H|, m, log |F|) and space O(m · log |F|).
4.2 Sampling partial sums of random low-degree polynomials
≤d Let F be a finite field, m, d positive integers, and H a subset of F, and recall that F[X1,...,m] is the subspace of ≤d ≤m F[X1,...,m] consisting of those polynomials with individual degrees at most d. Given Q ∈ F[X ] and ~α ∈ F P 1,...,m (vectors over F of length at most m), we define Q(~α) := ~γ∈Hm−|~α| Q(~α,~γ), i.e., the answer to a query that specifies only a prefix of the variables is the sum of the values obtained by letting the remaining variables range over H. In Section 6 we rely on the fact, formally stated below and proved in [BCFGRS16], that one can efficiently ≤d ≤m sample the distribution R(~α), where R is uniformly random in F[X1,...,m] and ~α ∈ F is fixed, even conditioned on ≤m any polynomial number of (consistent) values for R(~α1),...,R(~α`) (with ~α1, . . . , ~α` ∈ F ). More precisely, the sampling algorithm runs in time that is only poly(log |F|, m, d, |H|, `), which is much faster than the trivial running time of Ω(dm) achieved by sampling R explicitly. This “succinct” sampling follows from the notion of succinct constraint detection studied in [BCFGRS16] for the case of partial sums of low-degree polynomials. Corollary 4.1 ([BCFGRS16]). There exists a probabilistic algorithm A such that, for every finite field F, positive ≤m ≤m integers m, d, subset H of F, subset S = {(α1, β1),..., (α`, β`)} ⊆ F × F, and (α, β) ∈ F × F, R(α1) = β1 h i . Pr A( , m, d, H, S, α) = β = Pr R(α) = β . . F ≤d . R←F[X1,...,m] R(α`) = β`
16 Moreover A runs in time m(d`|H| + d3`3) · poly(log |F|) = `3 · poly(m, d, |H|, log |F|).
4.3 Interactive probabilistically checkable proofs An Interactive Probabilistically Checkable Proof (Interactive PCP, IPCP) [KR08] is a Probabilistically Checkable Proof [BFLS91; FGLSS91; AS98; ALMSS98] followed by an Interactive Proof [Bab85; GMR89]. Namely, the prover P and verifier V interact as follows: P sends to V a probabilistically checkable proof π; afterwards, P and V π engage in an interactive proof. Thus, V may read a few bits of π but must read subsequent messages from P in full. An IPCP system for a relation R is thus a pair (P,V ), where P,V are probabilistic interactive algorithms working as described, that satisfies naturally-defined notions of perfect completeness and soundness with a given error ε(·); see [KR08] for details. We say that an IPCP has k rounds if this “PCP round” is followed by a k-round interactive proof. (Though note that [BCFGRS16] counts the PCP round towards round complexity.) Beyond round complexity, we also measure how many bits the prover sends and how many the verifier reads: the proof length l is the length of π in bits plus the number of bits in all subsequent prover messages; the query complexity q is the number of bits of π read by the verifier plus the number of bits in all subsequent prover messages (since the verifier must read all of those bits). In this work, we do not count the number of bits in the verifier messages, nor the number of random bits used by the verifier; both are bounded from above by the verifier’s running time, which we do consider. Overall, we say that a language L (resp., relation R) belongs to the complexity class IPCP[ε, k, l, q] if there is an IPCP system for L (resp., R) in which: (1) the soundness error is ε(n); (2) the number of rounds is at most k(n); (3) the proof length is at most l(n); (4) the query complexity is at most q(n). We sometimes also specify the time and/or space complexity of the (honest) prover algorithm and/or (honest) verifier algorithm. Finally, an IPCP is non-adaptive if the verifier queries are non-adaptive, i.e., the queried locations depend only on the verifier’s inputs; it is public-coin if each verifier message is chosen uniformly and independently at random, and all of the verifier queries happen after receiving the last prover message. All of the IPCPs discussed in this paper are both non-adaptive and public-coin.
4.4 Zero knowledge for Interactive PCPs We define the notion of zero knowledge for IPCPs that we consider: perfect zero knowledge via straightline simulators. This notion is quite strong not only because it unconditionally guarantees perfect simulation of the verifier’s view but also because straightline simulation typically implies desirable properties. We first provide context and then definitions. At a high level, zero knowledge requires that the verifier’s view can be efficiently simulated without the prover. Converting the informal statement into a mathematical one involves many choices, including choosing which verifier class to consider (e.g., the honest verifier? all polynomial-time verifiers?), the quality of the simulation (e.g., is it identically distributed to the view? statistically close to it? computationally close to it?), the simulator’s dependence on the verifier (e.g., is it non-uniform? or is the simulator universal?), and others. The definition below considers the case of perfect simulation via universal simulators against verifiers making a bounded number of queries to the proof oracle. Moreover, in the case of universal simulators, one distinguishes between a non-blackbox use of the verifier, which means that the simulator takes the verifier’s code as input, and a blackbox use of it, which means that the simulator only accesses the verifier via a restricted interface; we consider this latter case. Different models of proof systems call for different interfaces, which grant carefully-chosen “extra powers” to the simulator (in comparison to the prover) so to ensure that efficiency of the simulation does not imply the ability to efficiently decide the language. For example: in ZK IPs, the simulator may rewind the verifier; in ZK PCPs, the simulator may adaptively answer oracle queries. In ZK IPCPs (our setting), the natural definition would allow a blackbox simulator to rewind the verifier and also to adaptively answer oracle queries. The definition below, however, considers only simulators that are straightline [FS89; DS98], that is they do not rewind the verifier, because our constructions achieve this stronger notion. We are now ready to define the notion of perfect zero knowledge via straightline simulators for IPCPs [GIMS10].
Definition 4.2. Let A, B be algorithms and x, y strings. We denote by View hB(y),A(x)i the view of A(x) in an IPCP protocol with B(y), i.e., the random variable (x, r, s1, . . . , sn, t1, . . . , tm) where x is A’s input, r is A’s randomness, s1, . . . , sn are B’s messages, and t1, . . . , tm are the answers to A’s queries to the proof oracle sent by B.
17 Straightline simulators in the context of IPs were used in [FS89], and later defined in [DS98]. The definition below considers this notion in the context of IPCPs, where the simulator also has to answer oracle queries by the verifier. Note that since we consider the notion of perfect zero knowledge, the definition of straightline simulation needs to allow the efficient simulator to work even with inefficient verifiers [GIMS10]. Definition 4.3. We say that an algorithm B has straightline access to another algorithm A if B interacts with A, without rewinding, by exchanging messages with A and also answering any oracle queries along the way. We denote by BA the concatenation of A’s random tape and B’s output. (Since A’s random tape could be super-polynomially large, B cannot sample it for A and then output it; instead, we restrict B to not see it, and we prepend it to B’s output.) Definition 4.4. An IPCP system (P,V ) for a relation R is perfect zero knowledge (via straightline simulators) against unbounded queries (resp., against query bound b) with simulator overhead s: N × N → N if there exists a simulator algorithm S such that for every algorithm (resp., b-query algorithm) V˜ and instance-witness pair (x, w) ∈ R, V˜ x x w ˜ x x x S ( ) and View hP ( , ), V ( )i are identically distributed. Moreover, S must run in time O(s(| |, qV˜ (| |))), where ˜ qV˜ (·) is V ’s query complexity. The case of a language L is similar: the quantification is for all x ∈ L and the view to simulate is View hP (x), V˜ (x)i. Remark 4.5. Throughout this paper, an algorithm is b-query if it makes strictly fewer than b queries to its oracle. This is because all of our results will be of a ‘query threshold’ character, i.e. if the verifier makes b queries it learns some information, but any verifier making strictly fewer queries learns nothing. Remark 4.6. The standard definition of zero knowledge allows the simulator overhead s to be any fixed polynomial. Remark 4.7. The definition above places a strict bound on the running time of the simulator. This is in contrast to most zero knowledge results, which can only bound its expected running time.
We say that a language L (resp., relation R) belongs to the complexity class PZK-IPCP[ε, k, l, q, b, s] if there is an IPCP system for L (resp., R), with the corresponding parameters, that is perfect zero knowledge with query bound b; also, it belongs to the complexity class PZK-IPCP[ε, k, l, q, ∗, s] if the same is true with unbounded queries. In this paper we only consider zero knowledge against bounded queries. (Note that, even in this case, one can ‘cover’ all polynomial-time malicious verifiers by setting b to be superpolynomial in the input size.) Remark 4.8. Kalai and Raz [KR08] give a general transformation for IPCPs that reduces the verifier’s query complexity q to 1. The transformation preserves our zero knowledge guarantee, with a small increase in the simulator overhead.
4.5 Sumcheck protocol and its zero knowledge variant The sumcheck protocol [LFKN92] is a fundamental building block of numerous results in complexity theory and cryptography. We rely on it in Section 6, so we briefly review it here. The protocol consists of an Interactive Proof for a claim of the form “P ”, where is an -variate polynomial of individual degree with α1,...,αm∈H F (α1, . . . , αm) = a F m d coefficients in a finite field F, H is a subset of F, and a is an element of F. The prover and verifier receive (F, m, d, H, a) as input; in addition, the prover receives F as input while the verifier has only oracle access to F . In the i-th round, the P prover sends the univariate polynomial Fi(X) := α ,...,α ∈H F (c1, . . . , ci−1, X, αi+1, . . . , αm), and the verifier i+1 m P replies with a uniformly random element ci ∈ F and checks that Fi−1(ci−1) = α∈H Fi(α) (defining F0(c0) to be the element a). At the end of the interaction, the verifier also checks that Fm(cm) = F (c1, . . . , cm), by querying F at the random location (c1, . . . , cm). This interactive proof is public-coin, and has m rounds, communication complexity poly(log | |, m), and soundness error md . The prover runs in time poly(log | |, |H|m) and space poly(log | |, m, |H|) F |F| F F and the verifier runs in time poly(log |F|, m, d, |H|) and space O(log |F| · m). The sumcheck protocol is not zero knowledge, because the prover reveals partial sums of F to the verifier. If we assume the existence of one-way functions, the protocol can be made computational zero knowledge by leveraging the fact that it is public-coin [GMR89; IY87; BGGHKMR88] (in fact, if we further assume the hardness of certain problems related to discrete logarithms then more efficient transformations are known [CD98]); moreover, there is strong evidence that assuming one-way functions is necessary [Ost91; OW93]. Even more, achieving statistical zero knowledge for sumcheck instances would cause unlikely complexity-theoretic collapses [For87; AH91].
18 Nevertheless, [BCFGRS16] have shown that, in the Interactive PCP model (see Section 4.3), a simple variant of the sumcheck protocol is perfect zero knowledge. The variant is as follows: the prover sends a proof oracle containing the evaluation of a random m-variate polynomial A of individual degree d, conditioned on summing to 0 on Hm; the verifier replies with a random element ρ ∈ F; then the prover and verifier engage in a sumcheck protocol for the claim P “ ~α∈Hm ρF (~α) + A(~α) = a”, with the verifier accessing A via self-correction (after low-degree testing it). The proof oracle thus consists of |F|m field elements, and the verifier accesses only poly(log |F|, m, d) of them. The auxiliary polynomial A acts as a “masking polynomial”, and yields the following zero knowledge guarantee: there exists a polynomial-time simulator algorithm that perfectly simulates the view of any malicious verifier, provided it can query F in as many locations as the total number of queries that the malicious verifier makes to either F or A.
19 5 Algebraic query complexity of polynomial summation
We have described in Section 2.2 an algebraic commitment scheme based on the sumcheck protocol and lower bounds on the algebraic query complexity of polynomial summation. The purpose of this section is to describe this construction in more detail, and then provide formal statements for the necessary lower bounds. We begin with the case of committing to a single element a ∈ F. The prover chooses a uniformly random string N PN B ∈ F such that i=1 Bi = a, for some N ∈ N. Fixing some d ∈ N, G ⊆ F and k ∈ N such that |G| ≤ d + 1 and |G|k = N, the prover views B as a function from Gk to F (via an ordering on Gk) and sends the evaluation of a degree-d extension Bˆ : Fk → F of B. The verifier tests that Bˆ is indeed (close to) a low-degree polynomial but (ideally) cannot learn any information about a without reading all of B (i.e., without making N queries). Subsequently, the P ˆ ~ prover can decommit to a by convincing the verifier that β~∈Gk B(β) = a via the sumcheck protocol. To show that the above is a commitment scheme, we must show both binding and hiding. Both properties depend on the choice of d. The binding property follows from the soundness of the sumcheck protocol, and we thus would like the degree d of Bˆ to be as small as possible. A natural choice would be d = 1 (so |G| = 2), which makes Bˆ the unique multilinear extension of B. However (as discussed in Section 2.2) this choice of parameters does not provide P ˆ −1 −1 k any hiding: it holds that β∈{0,1}k B(β) = B(2 ,..., 2 ) · 2 (as long as char(F) =6 2). We therefore need to understand how the choice of d affects the number of queries to Bˆ required to compute a. This is precisely the setting of algebraic query complexity, which we discuss next. The algebraic query complexity (defined in [AW09] to study ‘algebrization’) of a function f is the (worst-case) number of queries to some low-degree extension Bˆ of a string B required to compute f(B). This quantity is bounded from above by the standard query complexity of f, but it may be the case (as above) that the low-degree extension confers additional information that helps in computing f with fewer queries. The usefulness of this information depends on parameters d and G of the low-degree extension. Our question amounts to understanding this dependence for the N PN function SUM : F → F given by SUM(B) := i=1 Bi. This has been studied before in [JKRS09]: if G = {0, 1} and d = 2 then the algebraic query complexity of SUM is exactly N. For our purposes, however, it is not enough to commit to a single field element. Rather, we need to commit to the m evaluation of a polynomial Q: F → F of degree dQ, which we do as follows. Let K be a subset of F of size dQ + 1. The prover samples, for each ~α ∈ Km, a random string B~α ∈ FN such that SUM(B~α) = Q(~α). The prover views these strings as a function B : Km ×Gk → F, and takes a low-degree extension Bˆ : Fm ×Fk → F. The polynomial Bˆ(X,~ Y~ ) ~ ~ P ˆ ~ ~ has degree dQ in X and d in Y ; this is a commitment to Q because β~∈Gk B(X, β) is a degree-dQ polynomial that agrees with Q on Km, and therefore equals Q. Once again we will decommit to Q(~α) using the sumcheck protocol, and so for binding we need d to be small. For hiding, as in the single-element case, if d is too small then a few queries to Bˆ can yield information about Q. Moreover, it could be the case that the verifier can leverage the fact that Bˆ is a joint low-degree extension to learn some linear combination of evaluations of Q. We must exclude these possibilities in order to obtain our zero knowledge guarantees. This question amounts to a generalization of algebraic query complexity where, given a list of strings B1,...,BM , we determine how many queries we need to make to their joint low-degree extension Bˆ to determine any nontrivial PM linear combination i=1 ci · SUM(Bi). We will show that the ‘generalized’ algebraic query complexity of SUM is exactly N provided d ≥ 2(|G| − 1) (which is also the case for the standard algebraic query complexity). In the remainder of the section we state our results in a form equivalent to the above that is more useful to us. Given ≤d ≤d0 an arbitrary polynomial Z ∈ F[X1,...,m,Y1,...,k], we ask how many queries are required to determine any nontrivial P m linear combination of ~y∈Gk Z(~α, ~y) for ~α ∈ F . The following theorem is more general: it states that not only do we require many queries to determine any linear combination, but that the number of queries grows linearly with the number of independent combinations that we wish to learn.
Theorem 5.1 (algebraic query complexity of polynomial summation). Let F be a field, m, k, d, d0 ∈ N, and G, K, L be finite subsets of F such that K ⊆ L, d0 ≥ |G| − 2, and |K| = d + 1. If S ⊆ Fm+k is such that there exist matrices 0 Lm×` S×` ≤d ≤d C ∈ F and D ∈ F such that for all Z ∈ F[X1,...,m,Y1,...,k] and all i ∈ {1, . . . , `} X X X C~α,i Z(~α, ~y) = D~q,iZ(~q) , ~α∈Lm ~y∈Gk ~q∈S
20 m m then |S| ≥ rank(BC) · (min{d0 − |G| + 2, |G|})k, where B ∈ FK ×L is such that column ~α of B represents Z(~α) ~ in the basis (Z(β))β~∈Km . We describe a special case of the above theorem that is necessary for our zero knowledge results, and then give an equivalent formulation in terms of random variables that we use in later sections. (Essentially, the linear structure of the problem implies that ‘worst-case’ statements are equivalent to ‘average-case’ statements.)
Corollary 5.2. Let F be a finite field, G be a subset of F, and d, d0 ∈ N with d0 ≥ 2(|G| − 1). If S ⊆ Fm+k is such that there exist (c~α)~α∈ m and (d~)~ m+k such that F β β∈F ≤d ≤d0 P P P • for all Z ∈ [X ,Y ] it holds that m c k Z(~α, ~y) = d Z(~q) and F 1,...,m 1,...,k ~α∈F ~α ~y∈G ~q∈S ~q 0 0 ≤d ≤d P P 0 • there exists Z ∈ [X ,Y ] such that m c k Z (~α, ~y) =6 0, F 1,...,m 1,...,k ~α∈F ~α ~y∈G then |S| ≥ |G|k.
Corollary 5.3 (equivalent statement of Corollary 5.2). Let F be a finite field, G be a subset of F, and d, d0 ∈ N with 0 0 m+k k ≤d ≤d d ≥ 2(|G| − 1). Let Q be a subset of F with |Q| < |G| and let Z be uniformly random in F[X1,...,m,Y1,...,k]. P The ensembles k Z(~α, ~y) m and Z(~q) are independent. ~y∈G ~α∈F ~q∈Q The proofs of these results, and derivations of corresponding upper bounds, are provided in Appendix A.
21 6 Zero knowledge sumcheck from algebraic query lower bounds
We leverage lower bounds on the algebraic query complexity of polynomial summation (Section 5) to obtain an analogue of the sumcheck protocol with a strong zero knowledge guarantee, which we use in the applications that we consider. P The sumcheck protocol [LFKN92] is an Interactive Proof for claims of the form ~x∈Hm F (~x) = a, where H is a subset of a finite field F, F is an m-variate polynomial over F of individual degree at most d, and a is an element of F. The sumcheck protocol is not zero knowledge (conjecturally). Prior work [BCFGRS16] obtains a sumcheck protocol, in the Interactive PCP model, with a certain zero knowledge guarantee. In that protocol, the prover first sends a proof oracle that consists of the evaluation of a random m-variate polynomial R of individual degree at most d; after that, the prover and the verifier run the (standard) sumcheck protocol on a new polynomial obtained from F and R. The purpose of R is to ‘mask’ the partial sums, which are the intermediate values sent by the prover during the sumcheck protocol. The zero knowledge guarantee in [BCFGRS16] is the following: any verifier that makes q queries to R learns at most q evaluations of F . This guarantee suffices to obtain a zero knowledge protocol for #P (the application in [BCFGRS16]) because the verifier can evaluate F efficiently at any point (as F is merely an arithmetization of a 3SAT formula). We achieve a much stronger guarantee: any verifier that makes polynomially-many queries to R learns at most a single evaluation of F (that, moreover, lies within a chosen subset Im of Fm). Our applications require this guarantee because we use the sumcheck simulator as a sub-simulator in a larger protocol, where F is a randomized low-degree extension of some function that is hard to compute for the verifier. The randomization introduces bounded independence, which makes a small number of queries easy to simulate. The main idea to achieve zero knowledge as above is the following. Rather than sending the masking polynomial R directly, the prover sends a (perfectly-hiding and statistically-binding) commitment to it in the form of a random ~ P ~ ~ (m + k)-variate polynomial Z. The ‘real’ mask is recovered by summing out k variables: R(X) := β~∈Gk Z(X, β). Our lower bounds on the algebraic query complexity of polynomial summation (Section 5) imply that any q queries to Z, with q < |G|k, yield no information about R. The prover, however, can elect to decommit to R(~c) for a single point ~c ∈ Im chosen by the verifier. This is achieved using the zero knowledge sumcheck protocol of [BCFGRS16] as a P ~ subroutine: the prover sends w := R(~c) and then proves that w = β~∈Gk Z(~c, β). The protocol thus proceeds as follows. Given a security parameter λ ∈ N, the prover sends the evaluations of ≤d ≤2λ ≤2λ two polynomials Z ∈ F[X1,...,m,Y1,...,k] and A ∈ F[Y1,...,k] as proof oracles. The verifier checks that both of these evaluations are close to low-degree, and uses self-correction to make for querying them. The prover sends two field m k k elements z1 and z2, which are (allegedly) the summations of Z and A over H × G and G , respectively. The verifier replies with a random challenge ρ ∈ F \{0}. The prover and the verifier then engage in the standard (not zero P knowledge) sumcheck protocol on the claim “ ~α∈Hm ρF (~α) + R(~α) = ρa + z1”. This reduces the correctness of this claim to checking a claim of the form “ρF (~c) + R(~c) = b” for some ~c ∈ Im and b ∈ F; the prover then decommits to w := R(~c) as above. In sum, the verifier deduces that, with high probability, the claim “ρF (~c) = b − w” is true if and only if the original claim was. If the verifier could evaluate F then the verifier could simply check the aforementioned claim and either accept or reject. We do not give the verifier access to F and, instead, we follow [Mei13] and phrase sumcheck as a reduction from a claim about a sum of a polynomial over a large product space to a claim about the evaluation of that polynomial at a single point. This view of the sumcheck protocol is useful later on when designing more complex protocols, which employ sumcheck as a subprotocol. The completeness and soundness definitions below are thus modified according to this viewpoint, where the verifier simply outputs the claim at the end. Our protocol will be sound relative to a promise variant of the sumcheck protocol, which we now define. Definition 6.1. The sumcheck relation and its promise variant are defined as follows. • The sumcheck relation is the relation RSC of instance-witness pairs (F, m, d, H, a),F such that: – is a finite field, H is a subset of , a is an element of , and m, d are positive integers with md < 1 ; F F F |F| 2 ≤d m – F is a polynomial in F[X1,...,m] and sums to a on H . yes no yes no • The sumcheck promise relation is the pair of relations (RSC, RSC) where RSC := RSC and RSC are the pairs ≤d m (F, m, d, H, a),F such that (F, m, d, H, a) is as above and F is in F[X1,...,m] but does not sum to a on H .
22 Remark 6.2. In the case where the verifier can easily determine that F is low-degree (e.g., F is given as an arithmetic circuit), a protocol for the promise relation can be used to check the plain relation. In our setting, the verifier cannot even access F , and so the promise is necessary. The definition below captures our zero knowledge goal for the sumcheck promise relation, in the Interactive PCP model. The key aspect of this definition is that the simulator is only allowed to make a single query to the summand polynomial F ; in contrast, the definition of [BCFGRS16] allows the simulator to make as many queries to F as the malicious verifier makes to the proof oracle. (Another aspect, motivated by the simulator’s limitation, is that we now have to explicitly consider a bound b on a malicious verifier’s queries.) This strong form of zero knowledge, achieved by severely restricting the simulator’s access to F , is crucial for achieving the results in our paper. Definition 6.3. A b-strong perfect zero knowledge Interactive PCP system for sumcheck with soundness error ε is a pair of interactive algorithms (P,V ) that satisfies the following properties.