<<

arXiv:2001.04383v2 [quant-] 29 Sep 2020 hnfn Ji Zhengfeng ¶ ∗ § ‡ † mi:[email protected] Email: [email protected] Email: [email protected] Email: [email protected] Email: [email protected] Email: 3 5 eateto optn n ahmtclSine,Califo Sciences, Mathematical and Computing of Department eateto optrSineadDprmn fMathemati of Department and Science Computer of Department rdc orltosi tityicue nteset the in included strictly is correlations product oko Fiz e.Mt.Py.21)ad(ug ta. .Ma J. al., et theory (Junge the and from 2012) conjecture embedding Phys. Connes’ of Math. refutation Rev. (Fritz, of work rbe:w hw ypoiiga xlcteape htthe that example, explicit an providing by show, v we entangled problem: the of undecidability connection, known a Using ihmlil -oeflqatmpoessaigentang sharing provers quantum -powerful multiple with otepolmo eiigwehratopae olclgam nonlocal two-player a whether deciding of problem the to 2019). STOC al., combinin et and (Fitzsimons 2019) of FOCS framework ( Wright, of and test (Natarajan low degree from low-individual quantum opments classical the the upon and builds 2018) FOCS proof Our languages. enumerable 2 1 eso htteclass the that show We nimdaebpouto u euti htteei neffici an is there that is result our of byproduct immediate An nttt o unu nomto n atr aionaI California Matter, and Information Quantum for Institute etefrQatmSfwr n nomto,Uiest of University Information, and Software Quantum for Centre ∗ 1 4 nn Natarajan Anand , eateto optrSine nvriyo ea tAust at Texas of University Science, Computer of Department MIP ∗ flnugsta a edcddb lsia eie inter verifier classical a by decided be can that languages of † 2,3 etme 0 2020 30, September hmsVidick Thomas , MIP Abstract ∗ C 1 = qc fqatmcmuigcreain.Following correlations. commuting quantum of RE eeti qa oteclass the to equal is lement ‡ leipisangtv nwrt Tsirelson’s to answer negative a implies alue i ta. 00 yitgaigrcn devel- recent integrating by 2020) al., et Ji, 3 fvnNuanalgebras. Neumann von of onWright John , closure n euto rmteHligProblem Halting the from reduction ent a nage value entangled has e dge eto NtrjnadVidick, and (Natarajan of test -degree hmwt h eusv compression recursive the with them g h hs 01 u eut rvd a provide results our 2011) Phys. th. C qa naIsiueo Technology of Institute rnia siueo Technology of nstitute ftesto unu tensor quantum of the of s nvriyo Toronto of University cs, ehooySydney Technology § 2,3,4 n er Yuen Henry and , RE 1 in frecursively of ra most at or acting 2 1 . ¶ 5 Contents

1 Introduction 5 1.1 Interactive proof systems ...... 6 1.2 Statementofresult ...... 10 1.3 Consequences...... 12 1.4 Openquestions ...... 14

2 Proof Overview 17

3 Preliminaries 24 3.1 Turingmachines ...... 24 3.2 Linearspaces ...... 25 3.3 Finitefields ...... 27 3.3.1 Subfieldsandbases...... 27 3.3.2 Bit string representations ...... 30 3.4 Polynomials and the low-degree code ...... 32 3.5 Linear spaces and registers ...... 33 3.6 Measurements and observables ...... 34 3.7 Generalized Pauli observables ...... 34

4 Conditionally Linear Functions, Distributions, and Samplers 38 4.1 Conditionally linear functions and distributions ...... 38 4.2 Conditionally linear samplers ...... 43

5 Nonlocal Games 46 5.1 Gamesandstrategies ...... 46 5.2 Distancemeasures ...... 48 5.3 Normalformverifiers...... 51

6 Types 53 6.1 Typed samplers, deciders, and verifiers ...... 53 6.2 Graphdistributions ...... 55 6.3 Detyping typed verifiers ...... 57

7 Classical and Quantum Low-degree Tests 60 7.1 The classical low-degree test ...... 60 7.1.1 Thegame...... 60 7.1.2 Complexity of the classical low-degree test...... 64 7.2 TheMagicSquaregame ...... 65 7.3 ThePaulibasistest ...... 69 7.3.1 Thegame...... 69 7.3.2 Canonical parameters and complexity of the Pauli basistest ...... 77

2 8 Introspection Games 80 8.1 Overview ...... 80 8.2 The introspective verifier ...... 81 8.2.1 Complexity of the introspective verifier ...... 86 8.2.2 Introspection theorem ...... 87 8.3 Completeness of the introspective verifier ...... 88 8.3.1 Preliminarylemmas ...... 88 8.3.2 Completeness of the introspective verifier ...... 90 8.4 Soundness of the introspective verifier ...... 92 8.4.1 ThePaulitwirl ...... 92 8.4.2 Preliminarylemmas ...... 98 8.4.3 Soundnessproof ...... 100

9 Oracularization 109 9.1 Overview ...... 109 9.2 Oracularizing normal form verifiers ...... 109 9.3 Completeness and complexity of the oracularized verifier...... 110 9.4 Soundness of the oracularized verifier ...... 112

10 Answer Reduction 114 10.1 Circuit preliminaries ...... 115 10.2 A Cook-Levin theorem for bounded deciders ...... 115 10.3 A succinct 5SAT description for deciders ...... 123 10.4 A PCP for normal form deciders ...... 128 10.4.1 Preliminaries ...... 128 10.4.2 ThePCP ...... 129 10.5 A normal form verifier for the PCP ...... 134 10.5.1 Parameters and notation ...... 134 10.5.2 The answer-reduced verifier ...... 134 10.5.3 Main theorem for answer reduction ...... 136 10.6 Completeness of the answer-reduced verifier ...... 138 10.7 Soundness of the answer-reduced verifier ...... 140

11 Parallel Repetition 148 11.1 The anchoring transformation ...... 148 11.2 Parallel repetition of anchored games ...... 149

12 Gap-preserving Compression 152 12.1 Proof of Theorem 12.1 ...... 152 12.2 An MIP∗ protocol for the ...... 158 12.3 An explicit separation ...... 162

A Analysis of the Pauli basis test 164 A.1 Preliminaries ...... 164 A.2 Strategies ...... 166 A.3 Expanding the Hilbert and defining commuting observables ...... 170

3 A.4 Combining the X and Z measurements...... 174 A.5 Applying the classical low-degree test ...... 181 A.6 Pulling the X and Z measurementsapart...... 189

4 1 Introduction

2 2 For integer n, k 2 define the quantum (spatial) correlation set C (n, k) as the subset of Rn k that ≥ qs contains all (pabxy) representing families of bipartite distributions that can be locally generated in non-relativistic quantum mechanics. Formally, (p ) C (n, k) if and only if there exist separable abxy ∈ qs Hilbert spaces A and B, for every x 1,..., n (resp. y 1,..., n ), a collection of projec- x H H y ∈ { } ∈ { } tions Aa a 1,...,k on A (resp. Bb b 1,...,k on B) that sum to identity, and a state (unit vector) { } ∈{ } H { } ∈{ } H ψ A B such that ∈H ⊗H x y x, y 1,2,..., n , a, b 1,2,..., k , p = ψ∗ A B ψ . (1) ∀ ∈ { } ∀ ∈ { } abxy a ⊗ b that due to the normalization conditions on ψ and on Ax and By , for each x, y, (p ) is a { a } { b } abxy probability distribution on 1,2,..., k 2. By taking direct sums it is easy to see that the set C (n, k) is { } qs convex. Let C (n, k) denote its closure (it is known that C (n, k) = C (n, k), see [Slo19a]). qa qs 6 qa Our main result is that the family of sets Cqa(n, k) n,k N is extraordinarily complex, in the following { } ∈ computational sense. For any 0 < ε < 1 define the ε-weak membership problem for Cqa as the problem of 2 2 deciding, given n, k N and a point p = (p ) Rn k , whether p lies in C (n, k) or is ε-far from ∈ abxy ∈ qa it in ℓ1 distance, promised that one is the case. Then we show that for any given 0 < ε < 1 the ε-weak membership problem for Cqa cannot be solved by a that halts with the correct answer on every input. We show this by directly reducing the Halting problem to the weak membership problem for Cqa: we show that for all 0 < ε < 1 and any Turing machine one can efficiently compute integers n, k N and a linear functional ℓ on Rn2k2 such that, whenever M halts it holds that ∈ M M sup ℓ (p) = 1, (2) M p Cqa(n,k) ∈ whereas if does not halt then M sup ℓ (p) 1 ε . (3) M ≤ − p Cqa(n,k) ∈ By standard results in convex optimization, this implies the aforementioned claim on the undecidability of the ε-weak membership problem for Cqa (for any 0 < ε < 1). Our result has interesting consequences for long-standing conjectures in quantum information theory and the theory of von Neumann algebras. Through a connection that follows from the work of Navascues, Piro- nio, and Acin [NPA08] the undecidability result implies a negative answer to Tsirelson’s problem [Tsi06]. Let Cqc(n, k) denote the set of quantum commuting correlations, which is the set of tuples (pabxy) arising from operators Ax and By acting on a single Hilbert space and a state ψ such that { a } { b } H ∈H x y x y x, y 1,..., n , a, b 1,..., k , p = ψ∗ A B ψ and A , B = 0. (4) ∀ ∈ { } ∀ ∈ { } abxy a b a b Then Tsirelson’s problem asks if, for all n, k, the sets Cqa(n, k)and Cqc(n, k) are equal. Using results 2 2 from [NPA08] we give integer n, k and an explicit linear function ℓ on Rn k such that 1 sup ℓ(p) = 1, but sup ℓ(p) , ≤ 2 p Cqc(n,k) p Cqa(n,k) ∈ ∈ which implies that C (n, k) = C (n, k). By an implication of Fritz [Fri12] and Junge et al. [JNP+11] qa 6 qc we further obtain that Connes’ Embedding Conjecture [Con76] is false; in other words, there exist type II1

5 von Neumann factors that do not embed in an ultrapower of the hyperfinite II1 factor. We explain these connections in more detail in Section 1.3 below. Our approach to constructing such linear functionals on correlation sets goes through the theory of inter- active proofs from complexity theory. To explain this connection we first review the concept of interactive proofs. The reader familiar with interactive proofs may skip the next section to arrive directly at a formal statement of our main complexity-theoretic result in Section 1.2.

1.1 Interactive proof systems An is an abstraction that generalizes the familiar notion of proof. Intuitively, given a formal statement z (for example, “this graph admits a proper 3-coloring”), a proof π for z is information that enables one to check the validity of z more efficiently than without access to the proof (in this example, π could be an explicit assignment of colors to each vertex of the graph). Complexity theory formalizes the notion of proof in a way that emphasizes the role played by the veri- fication procedure. To explain this, first recall that in complexity theory a language L is a subset of 0, 1 ∗, the set of all bit strings of any length, that intuitively represents all problem instances to which the{ answer} should be “yes”. For example, the language L = 3-COLORING contains all strings z such that z is the description (according to some pre-specified encoding scheme) of a 3-colorable graph G. We say that a language L admits efficiently verifiable proofs if there exists an algorithm V (formally, a polynomial-time Turing machine) that satisfies the following two properties: (i) for any z L there is a string π such that ∈ V(z, π) returns 1 (we say that V “accepts”), and (ii) for any z / L there is no string π such that V(z, π) accepts. Property (i) is generally referred to as the completeness∈ property, and (ii) is the soundness. The set of all languages L with both these completeness and soundness properties is denoted by the NP. Research in complexity and cryptography in the 1980s and 1990s led to a significant generalization of the notion of “efficiently verifiable proof”. The first modification is to allow randomized verification procedures by relaxing (i) and (ii) to high probability statements: every z L should have a proof π that is ∈ accepted with probability at least c (the completeness parameter), and for no z / L should there be a proof ∈ π that is accepted with probability larger than s (the soundness parameter). A common setting is to take 2 1 c = 3 and s = 3 ; standard amplification techniques reveal that the exact values do not significantly affect the class of languages that admit such proofs, provided that they are chosen within reasonable bounds. The second modification is to allow interactive verification. Informally, this means that instead of - ceiving a proof string π in its entirety and making a decision based on it, the verification algorithm (called the “verifier”) instead communicates with another algorithm called a “prover”, and based on the communi- cation decides whether z L. There are no restrictions on the computational power of the prover, whereas the verifier is required to run∈ in polynomial time.1 To understand how randomization and interaction can help for proof checking, consider the following example of an interactive proof for the language GRAPH NON-ISOMORPHISM, which contains all pairs of graphs (G , G ) such that G and G are not isomorphic.2 It is not known if GRAPH NON-ISOMORPHISM 0 1 0 1 ∈ NP, because it is not clear how to give an efficiently verifiable proof string that two graphs G0 and G1 are

1The reader may find the following mental model useful: in an interactive proof, an all-powerful prover is trying to convince a skeptical, but computationally limited, verifier that a string z (known to both) lies in the set L, even when it may be that in fact z / L. By interactively interrogating the prover, the verifier can reject false claims, i.e. determine with high statistical confidence ∈ whether z L or not. Importantly, the verifier is allowed to probabilistically and adaptively choose its messages to the prover. 2Here and∈ in the rest of the section, we implicitly assume that graphs and tuples of graphs have a canonical encoding as binary strings.

6 not isomorphic. (A proof of isomorphism is, of course, trivial: given a bijection from the vertices of G0 to those of G1 it is straightforward to verify that the bijection induces an isomorphism.) However, consider the following randomized, interactive verification procedure. Suppose the input to the verifier and prover is a pair of n-vertex graphs (G0, G1) (if the graphs do not have the same number of vertices, they are trivially non-isomorphic and the verification procedure can automatically accept). The verifier first selects a uniformly random b 0, 1 and a uniformly random permutation σ of 1,..., n and sends the graph ∈ { } { } H = σ(G ) to the prover. The prover is then supposed to respond with a bit b 0, 1 ; if b = b the b ′ ∈ { } ′ verifier accepts and if b = b it rejects. ′ 6 Clearly, if G0 and G1 are not isomorphic then there exists a prover strategy to compute b from H with probability 1: using its unlimited computational power, the prover can determine whether H is isomorphic to G0 or to G1. However, if G0 and G1 are isomorphic then the distribution of H is uniform over the isomorphism class of G0, which is the same as the isomorphism class of G1, and the prover (despite having unlimited computational power) cannot distinguish between whether the verifier generated H using G0 or 1 G1. Thus the probability that any prover can correctly guess b′ = b is exactly 2 . As a result, we have shown that the graph non-isomorphism problem has an interactive proof system with completeness c = 1 1 and soundness s = 2 . Note how little “information” is communicated by the prover: a single bit! The extreme succinctness of the “proof” comes from the fact that whether G0 is isomorphic to G1 determines whether a prover can reliably compute, given the data available to it (which is G0,G1, and H), the correct bit b. We denote by IP the class of languages that admit randomized interactive proof systems such as the one just described. The class IP is easily seen to contain NP, but it is thought to be a much larger class: one of the famous results of complexity theory is that IP is exactly the same as PSPACE [LFKN90, Sha90], the class of languages decidable by Turing machines using polynomial space.3 Thus a polynomial-time verifier, when augmented with the ability to interrogate an all-powerful prover and use randomization, can solve computational problems that are (believed to be) vastly more difficult than those that can be checked using static, deterministic proofs (i.e. NP problems).

Multiprover interactive proofs. We now discuss a generalization of interactive proofs called multiprover interactive proofs. Here, a polynomial-time verifier can interact with two (or more) provers to decide whether a given instance z is in a language L or not. In this setting, after the verifier and all the provers receive the common input z, the provers are not allowed to communicate with each other, and the verifier “cross-interrogates” the provers in order to decide if z L. The provers may coordinate a joint strategy ahead of time, but once the protocol begins the provers∈ can only interact with the verifier. As we will see, the extra condition that the provers cannot communicate with each other is a powerful constraint that can be leveraged by the verifier. Consider the computational problem of deciding membership in a promise language called GAP-MAXCUT. A promise language L is specified by two disjoint subsets L , L 0, 1 , and the task is to decide yes no ⊆ { }∗ whether a given instance z is in L or L , promised that z L L . In a proof system for a promise yes no ∈ yes ∪ no language, the completeness case consists of accepting with probability at least c if z L , and the sound- ∈ yes ness case consists of accepting with probability at most s if z Lno. If z / Lyes Lno, then there are no constraints on the behavior of the verifier. ∈ ∈ ∪ The promise language GAP-MAXCUT is defined as follows: GAP-MAXCUTyes (resp. GAP-MAXCUTno)

3The reason PSPACE is considered a “difficult” class of problems is because many computational problems believed to require super-polynomial or exponential time (such as 3-COLORING or deciding whether a quantified Boolean formula is true) can be solved using a polynomial amount of space.

7 is the set of all graphs G with a cut (i.e. a bipartition of the vertices) such that at least 90% of edges cross the cut (resp. at most 60% of edges cross the cut).4 For simplicity, we also assume that all graphs in GAP-MAXCUTyes GAP-MAXCUTno are regular, i.e. the degree is a constant across all vertices in the graph. ∪ The GAP-MAXCUT problem clearly lies in NP, since given a candidate cut it is easy to count the number of edges that cross it and verify that it is at least 90% of the total number of edges. Observe that the length of the proof and the time required to verify it are linear in the size of the graph (the number of vertices and edges). Finding the proof is of course much harder, but we are only concerned with the complexity of the verification procedure. Now consider the following simple two-prover interactive proof system for GAP-MAXCUT. Given a graph G, the verification procedure first samples a uniformly random edge e = u, v in G. It then sends a { } uniformly random x u, v to the first prover, and a uniformly random y u, v to the second prover. ∈ { } ∈ { } Each prover sees its respective question only and is expected to respond with a single bit, a, b 0, 1 ∈ { } respectively. The verification procedure accepts if and only if a = b if x = y, and a = b if x = y. We claim that the verification procedure described in the preceding paragraph is a6multiprover6 interactive proof system for the language GAP-MAXCUT, with completeness c = 0.95 and soundness s = 0.9, in the following sense. First, whenever G GAP-MAXCUT then there is a successful strategy for the provers: ∈ yes specifically, the provers can fix an optimal bipartition and consistently answer “0” when asked about a vertex from one side of the partition, and “1” when asked about a vertex from the other side; assuming there exists 1 1 a cut that is crossed by at least 90% of the edges, this strategy succeeds with probability at least 2 + 2 0.9, 1 where the first factor 2 arises from the case when both provers are sent the same vertex, in which case they always succeed. Conversely, suppose given a strategy for the provers that is accepted with probability p = 1 + 1 (1 δ) 2 2 − when the verification procedure is executed on a (regular) n-vertex graph G. We then claim that G has a cut crossed by at least a 1 2δ of all edges. To show this, we leverage the non-communication assumption on the provers.− Since either prover’s question is always a single vertex, their strategy can be rep- resented by a function from the vertices of G to answers in 0, 1 . Any such function specifies a bipartition { } of G. While the provers’ bipartitions need not be identical, the fact that they succeed with high probability, for the case when they are sent the same vertex, implies that they must be consistent with high probability. Finally, the fact that they also succeed with high probability when sent opposite endpoints of a randomly chosen edge implies that either prover’s bipartition must be cut by a large number of edges. Taking the contrapositive establishes the soundness property. We denote by MIP the class of languages that have multiprover interactive proof systems such as the one described in the preceding paragraph. Note that, in comparison to the NP verification procedure for GAP-MAXCUT considered earlier, the interactive, two-prover verification is much more efficient in terms of the effort required for the verifier. Assuming the graph is provided in a convenient format,5 it is possible to sample a random edge and verify the provers’ answers in time and space that scales logarithmically with the size of the graph. This exponential improvement in the efficiency of the verification procedure serves as the starting point for another celebrated result from complexity theory: MIP is exactly the same as the class NEXP [BFL91], which are problems that admit exponential-time checkable proofs.6 The class NEXP

4The specific numbers 90% and 60% are not too important; the only thing that really matters is that the first one is strictly less than 100% and the second strictly larger than 50%, as otherwise the problem becomes much easier. 5For example, the graph can be specified via a circuit that takes as input an edge — using some arbitrary ordering — and returns labels for the two endpoints of the edge. 6An example of such a problem is the language SUCCINCT-3-COLORING, which contains descriptions of polynomial-size circuits C that specify a 3-colorable graph GC on exponentially many vertices.

8 contains PSPACE, but is believed to be much larger; this suggests that the ability to interrogate more than one prover enables a polynomial-time verifier to verify much more complex statements.

Nonlocal games. In this paper we will only be concerned with multiprover interactive proof systems that consist of a single round of communication with two provers: the verifier first sends its questions to each of the provers, the provers respond with their answers, and the verifier decides whether to accept or reject. The class of problems that admit such interactive proofs is denoted MIP(2, 1), and it is known that MIP = MIP(2, 1) [FL92]. Such proof systems have a convenient reformulation using the language of nonlocal games, that we now explain. In a nonlocal game, we say that a verifier interacts with multiple non-communicating players (instead of provers — there is no formal difference between the two terms). An n-question, k-answer nonlocal game G is specified by two procedures: a question sampling procedure that samples a pair of questions (x, y) 1,..., n 2 for the players according to a distribution µ (known to the verifier and the players), and ∈ { } a decision procedure that takes as input the players’ questions and their respective answers a, b 1,..., k ∈ { } and evaluates a predicate D(x, y, a, b) 0, 1 to determine the verifier’s acceptance or rejection. In classical complexity theory, the main quantity∈ { associated} with a nonlocal game G is its classical value, which is the maximum success probability that two cooperating but non-communicating players have in the game. Formally, the classical value is defined as

val(G)= sup ∑ µ(x, y) ∑ D(x, y, a, b)pabxy , (5) p Cc(n,k) x,y a,b ∈ where the set Cc(n, k) is the set of classical correlations, which are tuples (pabxy) such that there exists a set Λ with probability measure ν and for every λ Λ functions Aλ, Bλ : 1,2,..., n 1,2,..., k such that ∈ { } → { }

λ λ x, y 1,2,..., n , a, b 1,2,..., k , pabxy = Pr (A (x)= a B (y)= b). ∀ ∈ { } ∀ ∈ { } λ ν ∧ ∼ This definition captures the intuitive notion that a classical strategy for the players is specified by (i) a distribution ν on Λ that represents some probabilistic information shared by the players that is independent of the verifier’s questions, and (ii) two functions Aλ, Bλ that represent each players’ “local strategy” for answering given their shared randomness λ and question x or y. 7 Note that due to the shared randomness n2 k2 λ, the set Cc(n, k) is a (closed) convex subset of [0, 1] . To make the connection with interactive proof systems, observe that the assertion that L MIP(2, 1) 8 ∈ precisely amounts to the specification of an efficient mapping from problem instances z to games Gz such that whenever z L then val(G ) 2 , whereas if z / L then val(G ) 1 . Thus the complexity ∈ z ≥ 3 ∈ z ≤ 3 of the optimization problem (5) captures the complexity of the decision problem L. The aforementioned characterization of MIP as the class NEXP by [BFL91] shows that in general this optimization problem is very difficult: it is as hard as deciding any language in NEXP.

7For the functional analyst we briefly note that if we define a tensor Rnk Rnk L = ∑ µ(x, y)D(x, y, a, b)exa eyb x,y,a,b ⊗ ∈ ⊗ then val(G)= L ℓn ℓ k ℓn ℓ k , with ǫ denoting the injective tensor norm of Banach spaces. (For more connections between 1 ( ∞) ǫ 1 ( ∞) interactive proofs,k nonlocalk ⊗ games and tensor⊗ norms we refer to the survey [PV16].) 8Here by “efficient” we mean that there should be a polynomial-time Turing machine that on input z returns (i) a polynomial-size randomized circuit that samples from µ, and (ii) a polynomial-size circuit that evaluates the predicate D.

9 1.2 Statement of result

We now introduce the main complexity class that is the focus of this paper: MIP∗, the “entangled-prover” analogue of the class MIP considered earlier. Informally the class MIP∗, first introduced in [CHTW04], contains all languages that can be decided by a classical polynomial-time verifier interacting with multiple quantum provers sharing entanglement. We focus on the class MIP∗(2, 1), which corresponds to the setting of one-round protocols with two provers. Equivalently, a language L is in MIP∗(2, 1) if and only if there is an efficient mapping from instances z 0, 1 to nonlocal games G such that if z L, then val∗(G ) 2/3 ∈ { }∗ z ∈ z ≥ and otherwise val∗(Gz) 1/3. Here, for an n-question, k-answer game G, we let val∗(G) denote its entangled value, which is≤ defined as

val∗(G)= sup ∑ µ(x, y) ∑ D(x, y, a, b)pabxy , (6) p Cq(n,k) x,y a,b ∈ where the set Cq(n, k) is the set of all finite-dimensional quantum correlations, i.e. correlations of the form (1) where is restricted to be finite-dimensional. Although the sets C (n, k) and C (n, k) in general H q qs are distinct [CS18], it is an easy exercise to verify that they have the same closure Cqa(n, k), and there- fore the supremum in (6) can equivalently be taken over Cq(n, k), Cqs(n, k) or Cqa(n, k). We use Cq for convenience in the analysis. Since Cc(n, k) Cq(n, k) for all n, k, we have that val(G) val∗(G); in other words, quantum spatial strategies can perform⊆ at least as well as classical strategies in a≤ nonlocal game. The consideration of quantum strategies and the set Cqs(n, k) for the definition of MIP∗ is motivated by a long line of works in the foundations of quantum mechanics around the topic of Bell inequalities, that are linear functionals which separate the sets Cc(n, k) and Cqs(n, k). The simplest such functional is the CHSH inequality [CHSH69], that shows Cc(2, 2) ( Cqs(2, 2). The CHSH inequality can be reformulated as a G G > G 1 game such that val∗( ) val( ). This game is very simple: it is defined by setting µ(x, y) = 4 for G 3 all x, y 0, 1 and D(x, y, a, b) = 1 if and only if a b = x y. It can be shown that val( ) = 4 and ∈ { 1 } 1 3 ⊕ ∧ val∗(G)= + > . The study of Bell inequalities is a large area of research not only in foundations, 2 2√2 4 where they are a tool to study the nonlocal properties of entanglement, but also in quantum cryptography, where they form the basis for cryptographic protocols for e.g. quantum key distribution [Eke91]. The introduction of entanglement in the setting of interactive proofs has interesting consequences for complexity theory; indeed it is not a priori clear how the class MIP∗ compares to MIP. Take a language G G 2 L MIP(2, 1), and let z be an instance. Then the associated game z is such that val( z) 3 if z L, ∈ G 1 G G ≥ ∈ and val( z) 3 otherwise. The fact that in general val∗( z) val( z) (and that as demonstrated by the CHSH game≤ inequality can be strict) cuts both ways. On the≥one hand, the soundness property can be affected, so that instances z / L could have val∗(G )= 1, meaning that we would not be able to establish ∈ z that L MIP∗. On the other hand, a language L MIP∗(2, 1) may not necessarily be in MIP, because for ∈ 2 ∈ 1 z L the fact that val∗(G ) does not automatically imply val(G ) > (in other words, the game G ∈ z ≥ 3 z 3 z may require the players to use a quantum strategy in order to win with probability greater than 1/3). Just as the complexity of the class MIP is characterized by the complexity of approximating the classical value of nonlocal games (the optimization problem in (5)), the complexity of MIP∗ is intimately related to the complexity of approximating the entangled value of games (the optimization problem in (6)). In [IV12] the first non-trivial lower bound on MIP∗ was shown, establishing that MIP = NEXP MIP∗. (Earlier results [KKM+11, IKM09] gave more limited hardness results, for approximating the entangled⊆ value to inverse polynomial precision.) This was proved by arguing that for the specific games con- structed by [BFL91] that show NEXP MIP, the classical and entangled values are approximately the ⊆

10 same. In other words, the classical soundness and completeness properties of the proof system of [BFL91] are maintained in the presence of shared entanglement between the provers. Following [IV12] a sequence of works [Vid16, Ji16, NV18b, Ji17, NV18a, FJVY19] established progressively stronger lower bounds on the complexity of approximating the entangled value of nonlocal games, culminating in [NW19] which showed that approximating the entangled value is at least as hard as NEEXP, the collection of languages decidable in non-deterministic doubly exponential time. This proves that NEEXP MIP∗, and since it is known that ⊆ NEXP ( NEEXP it follows that MIP = MIP∗. 6 In contrast to these increasingly strong lower bounds the only upper bound known on MIP∗ is the trivial inclusion MIP∗ RE, the class of recursively enumerable languages, i.e. languages L such that there ⊆ exists a Turing machine such that x L if and only if halts and accepts on input x. This inclusion follows since the supremumM in (6) can be∈ approximated fromM below by performing an exhaustive search in increasing dimension and with increasing accuracy. We note that, in addition to containing all decidable languages, this class also contains undecidable problems such as the Halting problem, which is to decide whether a given Turing machine eventually halts. Our main result is a proof of the reverse inclusion: RE MIP∗. Combined with the preceding observa- tion it follows that ⊆ MIP∗ = RE , which is a full characterization of the power of entangled-prover interactive proofs. In particular for any 0 < ε < 1, it is an undecidable problem to determine whether a given nonlocal game has entangled value 1 or at most 1 ε (promised that one is the case). −

Proof summary. The proof of the inclusion RE MIP∗ is obtained by designing an entangled-prover interactive proof for the Halting problem, which is⊆ complete for the class RE. Specifically, we design an efficient transformation that maps any Turing machine to a nonlocal game G such that, if halts (when run on an empty input tape) then there is a quantumM strategy for the proversM that succeedsM with probability 1 in G (i.e. val∗(G ) = 1), whereas if does not halt then no quantum strategy can M M 1 M 1 succeed with probability larger than in the game (i.e. val∗(G ) ). Furthermore, the game G 2 M ≤ 2 M has the property that whenever val∗(G ) = 1 then this fact is witnessed by a synchronous strategy, i.e. a strategy where the players always giveM the same answer when simultaneously asked the same question. Synchronous strategies, or correlations, were first introduced in [PSS+16] and have played an important role in approaches to CEP based on quantum information and the study of nonlocal games; see e.g. [KPS18] and references therein. In the paper we use a terminology of projective, consistent and commuting (PCC) strategies (see Definition 5.11 in Section 5.1), a notion which implies the notion of being synchronous. A very rough sketch of this construction is as follows (we give a detailed overview in Section 2). Given an infinite family of games Gn n N, we say that the family is uniformly generated if there is a polynomial- { } ∈ time Turing machine that on input n returns a description of the game G . Given a game G and p [0, 1] n ∈ let E (G, p) denote the minimum local dimension of an entangled state shared by the players in order for them to succeed in G with probability at least p. We proceed in two steps. First, we design a compression procedure for a specific class of nonlocal games that we call normal form. Given as input a uniformly generated family Gn n N of normal form {G } ∈ games, the compression procedure returns another uniformly generated family n′ n N of normal form G G { } ∈ games with the following properties: (i) for all n, if val∗( 2n )= 1 then val∗( n′ )= 1, and (ii) for all n, if 1 1 val∗(G n ) then val∗(G ) and moreover 2 ≤ 2 n′ ≤ 2 1 1 2Ω(n) E (G′ , ) max E G n , , 2 . n 2 ≥ 2 2 n   o 11 The construction of this compression procedure is our main contribution. Informally, it combines the recursive compression technique developed in [Ji17, FJVY19] with the so-called “introspection” technique of [NW19] that was used to prove NEEXP MIP∗. The introspection technique itself relies heavily on the quantum low individual degree test of [NV18a⊆ , JNV+20] to robustly self-test certain distributions that arise from constructions of classical probabilistically checkable proofs. The quantum low-degree test and the introspection technique allow us to avoid the shrinking gap limitation of the results from [FJVY19]. In the second step, we use the compression procedure in an iterated fashion to construct an interactive proof system for the Halting problem. Fix a Turing machine and consider the following family of (0) M G N N nonlocal games ,n n : for all n , if halts in at most n steps (when run on an empty input { M } ∈ ∈ M G(0) G(0) 1 9 tape), then val∗( ,n)= 1, and otherwise val∗( ,n) 2 . Constructing suchM a family of games is trivial;M furthermore≤ , they can be made in the “normal form” required by the compression procedure. However, consider applying the compression procedure to obtain a (1) G N N n family of normal form games ,n n . Then for all n , it holds that if halts in at most 2 steps { M } ∈ ∈ M G(1) G(1) 1 then val∗( ,n)= 1, and otherwise val∗( ,n) 2 , and furthermore any strategy that achieves a value M M ≤ Ω 1 2 (n) of at least 2 requires an entangled state of dimension at least 2 . Intuitively, one would expect that iterating this procedure and “taking the limit” gives a family of games (∞) (∞) G N G N ,n n such that if halts then val∗( ,n)= 1 for all n , whereas if does not halt then no { M } ∈ M M ∈ M 1 G(∞) N finite-dimensional strategy can succeed with probability larger than 2 in ,n, for all n ; in particular M ∈ G(∞) 1 val∗( ,n) 2 . Formally, we do not take such a limit but instead define directly the family of games (∞) M ≤ G n N as a fixed point of the Turing machine that implements the compression procedure. The game { ,n} ∈ G M G(∞) can then be taken as ,1. We describe this in more detail in Section 2. M M 1.3 Consequences Our result is motivated by a connection with Tsirelson’s problem from quantum information theory, itself related to Connes’ Embedding Conjecture in the theory of von Neumann algebras [Con76]. In a celebrated sequence of papers, Tsirelson [Tsi93] initiated the systematic study of quantum correlation sets. Recall the definition of the set of quantum spatial correlations

x y x y C (n, k)= (p ) p = ψ A B ψ , ψ A B, xy, A , B POVM , (7) qs abxy | abxy h | a ⊗ b | i | i∈H ⊗H ∀ { a }a { b }b  where here ψ ranges over all unit norm vectors ψ A B with A and B arbitrary (separable) Hilbert spaces,| i and a POVM is defined as a collection| i∈H of positi⊗Hve semidefiniteH operatorsH that sum to identity. (From now on we use the Dirac ket notation ψ for states.) Recall the closure Cqa(n, k) of Cqs(n, k). Tsirelson observed that there is a natural| alternativei definition to the quantum spatial correlation set, called the quantum commuting correlation set and defined as

C (n, k)= (p ) p = ψ Ax By ψ , (8) qc abxy | abxy h | a b | i  y y where ψ is a quantum state, Ax and B are POVMs for all x, y, and [Ax, B ] = 0 for all | i∈H { a } { b } a b a, b, x, y. Note the key difference with spatial correlations is that in (8) all operators act on the same (separa- ble) Hilbert space. The requirement that operators associated with different inputs (questions) x, y commute

9 1 There is nothing special about the choice of 2 ; this can be set to any constant that is less than 1.

12 is arguably a minimal requirement within the context of quantum mechanics for there to not exist any causal connection between outputs (answers) a, b obtained in response to the respective input. The set Cqc(n, k) is closed and convex, and it is easy to see that Cqa(n, k) Cqc(n, k) for all n, k 1. When Tsirelson initially introduced these sets he claimed that equality holds. However,⊆ it was later pointed≥ out that this is not obviously true. The question of equality between Cqc and Cqa (for all n, k) is now known as Tsirelson’s problem [Tsi06]. Let Cq(n, k) denote the same as Cqs(n, k) except that both A and B in (7) are restricted to finite-dimensional spaces. Then more generally one can consider the followingH chainH of inclusions C (n, k) C (n, k) C (n, k) C (n, k) , (9) q ⊆ qs ⊆ qa ⊆ qc for all n, k N, and ask which (if any) of these inclusions are strict. We let C , C , C , C denote the ∈ q qs qa qc of C (n, k), C (n, k), C (n, k), C (n, k), respectively, over all integers n, k N. In a breakthrough q qs qa qc ∈ work, Slofstra established the first separation between these four correlation sets by proving that C = qs 6 Cqc [Slo19b]; he later proved the stronger statement that Cqs = Cqa [Slo19a]. As a consequence of the technique used to demonstrate the separation Slofstra also obtains6 the complexity-theoretic statement that the problem of determining whether an element p lies in Cqc, even promised that if it does, then it also lies in Cqa, is undecidable. Interestingly, this is shown by reduction from the of the halting problem; for our result we reduce from the halting problem (see Section 1.4 for further discussion of this point). Since his work, simpler proofs of Slofstra’s results have been found [DPP19, MR18, Col19]. In [CS18], Coladangelo and Stark showed that Cq = Cqs by exhibiting a 5-input, 3-output correlation that can be attained using infinite-dimensional spatial6 strategies (i.e. infinite-dimensional Hilbert spaces, a state and POVMs satisfying (7)) but cannot be attained via finite-dimensional strategies. As already noted in [FNT14] (and further elaborated on by [FJVY19]), the undecidability of MIP∗(2, 1) 10 implies the separation Cqa = Cqc. This follows from the observation that if Cqa = Cqc, then there exists 6 G G G 1 an algorithm that can correctly determine if a nonlocal game satisfies val∗( )= 1 or val∗( ) 2 and always halts: this algorithm interleaves a hierarchy of semidefinite programs providing outer approximations≤ to the set Cqc [NPA08, DLTW08] with a simple exhaustive search procedure providing inner approximations to Cq. Our result that RE MIP∗(2, 1) implies that no such algorithm exists, thus resolving Tsirelson’s problem in the negative. ⊆ co We furthermore exhibit an explicit nonlocal game G such that val∗(G) < val (G) = 1, where co val (G) is defined as val∗(G) except that the supremum is taken over the set Cqc(n, k) in (8). This in turn yields an explicit correlation that is in the set Cqc but not in Cqa. This game closely resembles the game G described in the sketch of the proof that RE MIP∗, where is the Turing machine that runs the M ⊆ M co hierarchy of semidefinite programs on the game G and halts if it certifies that val (G ) < 1. It is in principle possible to determine an upper bound on theM parameters n, k for our separating correlationM from the proof. While we do not provide such a bound, there is no step in the proof that requires it to be astronomical; e.g. we believe (without proof) that 1020 is a clear upper bound.

Connes’ Embedding Conjecture. Connes’ Embedding Conjecture (CEC) [Con76] is a conjecture in the theory of von Neumann algebras. Briefly, CEC posits that every type II1 von Neumann factor embeds into an ultrapower of the hyperfinite II1 factor. We refer to [Oza13] for a precise formulation of the conjecture and connections to other conjectures in operator algebras, such as Kirchberg’s QWEP conjecture. In inde- pendent work Fritz [Fri12] and Junge et al. [JNP+11] showed that a positive answer to CEC would imply

10Technically [FNT14] make the observation for the commuting-prover analogue MIPco(2, 1), discussed further in Section 1.4, but the reasoning is the same.

13 a positive resolution of Tsirelson’s problem, i.e. that Cqa(n, k) = Cqc(n, k) for all n, k. (This was later promoted to an equivalence by Ozawa [Oza13].) Since our result disproves this equality for some n, k it also implies that CEC does not hold. In work that appeared subsequently to the first announcement of our results, Goldbring and Hart [GH16, GH20] show using arguments from continuous logic that the uncomputability of approximating val∗(G) refutes the CEC. Interestingly, their argument uses general consider- ations from logic and completely bypasses Tsirelson’s problem and its equivalence with Kirchberg’s QWEP conjecture. We note that using the constructive aspect of our result it may be possible to give an explicit description of a factor that does not embed into an ultrapower of the hyperfinite II1 factor, but we do not give such a construction.

Entanglement tests. As a step towards showing our result for any integer n 1 we construct a game G , with question and answer length polynomial in the size of the smallest Turing≥ machine that halts n Mn (on the empty tape) in exactly n steps (i.e. the Kolmogorov complexity of n), such that val∗(Gn) = 1 yet G 1 any quantum strategy that succeeds in n with probability larger than 2 must use an entangled state whose Schmidt rank is at least 2Ω(n). This is by far the most efficient entanglement test that we are aware of.

Prover and round reduction for MIP∗ protocols. Let MIP∗(k, ) denote the collection of languages decidable by MIP∗ protocols with k 2 provers and r rounds. Prior to our work it was known how to ≥ perform round reduction for MIP∗ protocols, at the cost of adding provers; it was shown by [Ji17, FJVY19] that MIP∗(k, r) MIP∗(k + 15, 1) for all k, r. However, it was an open question whether the complexity ⊆ of the class MIP∗ increases if we add more provers. Our main complexity-theoretic result implies that MIP∗ = MIP∗(2, 1). This follows from the following chain of inclusions: for all polynomially-bounded functions k, r, MIP∗(2, 1) MIP∗(k, r) RE MIP∗(2, 1) . ⊆ ⊆ ⊆ The first inclusion follows since the verifier in an MIP∗ protocol can always ignore extra provers and rounds; the second inclusion follows from a simple exhaustive-search procedure that enumerates over strategies for 11 a given MIP∗(k, r) protocol; the third result is proven in this paper. However, this method of reducing provers and rounds in a given MIP∗ protocol is indirect; it involves first converting a given MIP∗ protocol into a Turing machine that accepts if and only if the MIP∗ protocol 1 has value larger than 2 , and then constructing an MIP∗(2, 1) protocol to decide whether the Turing machine halts. In particular this transformation does not generally preserve the complexity of the provers and verifier in the original protocols. We leave it as an open question to find a more direct method for reducing the number of provers in an MIP∗ protocol.

1.4 Open questions We mention several questions left open by our work.

Explicit constructions of counter-examples to Connes’ Embedding Conjecture. We provide an ex- plicit counter-example to Tsirelson’s problem in the form of a game whose entangled value differs from its commuting-operator value. Through the aforementioned connection with Connes’ embedding conjec- ture [Fri12, JNP+11, Oza13], the counter-example may lead to the construction of interesting objects in

11 In fact, we note that the second term MIP∗(k, r) can be replaced by QMIP∗(k, r), which is the analogous class with a quantum verifier and quantum messages, since the first inclusion is trivial and the second remains true. As a result, we obtain that QMIP∗ = MIP∗(2, 1) as well.

14 other areas of mathematics. A first question is whether it can lead to an explicit description of a type II1 factor that does not satisfy the Connes embedding property. Such a construction could be obtained along co the lines of [KPS18], using the fact that our game G such that val∗(G) < val (G) = 1 has the property of being synchronous, i.e. perfect strategies in the game are required to return the same answer when both parties are provided the same question. Going further, one may ask if the example can eventually lead to a construction of a group that is not sofic, or even not hyperlinear (see e.g. [CLP15] for the connection).

The complexity of variants of MIP∗. Our result characterizes the complexity class MIP∗ as the set of recursively enumerable languages. One can also consider the complexity class MIPco, which stands for multiprover interactive proofs in the commuting-operator model. For the sake of the discussion we consider only two-prover one-round protocols; a language L is in MIPco(2, 1) if there exists an efficient reduction that maps z 0, 1 to a nonlocal game G such that if z L then valco(G ) 2 , and otherwise ∈ { }∗ z ∈ z ≥ 3 valco(G ) 1 . x ≤ 3 The semidefinite programming hierarchy of [NPA08, DLTW08] can be used to show that MIPco(2, 1) is contained in the complement of RE, denoted as coRE: to certify that z / L it suffices to run the hierarchy co G < 2 ∈ 12 until it obtains a certificate that val ( z) 3 . Since it is known that RE = coRE, this implies that co 6 MIP∗(2, 1) = MIP (2, 1). It is thus6 plausible that MIPco = coRE,13 which would provide a very pleasing “dual” complexity characterization to MIP∗ = RE. One possible route to proving this would be to adapt our gap-preserving compression framework to the commuting-operator setting by showing that each of the steps (question reduction, answer reduction, and parallel repetition) remain sound against commuting-operator strategies. Using the connection established in [FNT14], this would imply that the operator norm over the maximal C∗ algebra C∗(F2 F2), where F2 is the free group on two elements, is uncomputable. ∗ co Another interesting open question concerns the zero gap variants of MIP∗ and MIP , which we denote co by MIP0∗ and MIP0 , respectively. These classes capture the complexity of deciding whether a nonlocal game G has entangled value (or commuting-operator value respectively) exactly equal to 1. In [Slo19a], Slofs- tra shows that there is an efficient reduction from Turing machines to nonlocal games G such that G co G M M co does not halt if and only if val∗( ) = val ( ) = 1. This implies that coRE = MIP0 (2, 1) M M M and furthermore coRE MIP∗. However, since RE MIP∗(2, 1) MIP∗(2, 1), this implies that ⊆ 0 ⊆ ⊆ 0 MIP0∗(2, 1) is strictly bigger than both RE and coRE. In fact, it was shown by Mousavi, et al. [MNY20] G that the class MIP0∗(3, 1) (the complexity class corresponding to determining whether val∗( ) = 1 for three-player nonlocal games) is equal to Π0, which is the set of all languages L such that x L if and 2 ∈ only if y, zR(x, y, z) = 1 for a R that depends on L.14 Thus it is plausible that ∀ ∃ Π0 the complexity landscape of nonlocal games looks like the following: MIP∗ = RE, MIP0∗ = 2, and co co co MIP = MIP0 = coRE. Such statements about the complexity of MIP∗ versus MIP , in both the gapped and zero-gap cases, may reveal additional insights into the difference between the tensor product and commuting-operator models of correlations.

Acknowledgments. We thank Matthew Coudron, William Slofstra and Jalex Stark for enlightening dis- cussions regarding possible consequences of our work. We thank William Slofstra and Jalex Stark for

12RE = coRE follows from the fact that RE coRE is the set of decidable languages and RE contains undecidable languages. 13We note6 that the “co” modifier on both sides∩ of the equation MIPco = coRE refer to different things! 14 Π0 The class 2 is also characterized as being part of the second level of the arithmetical hierarchy from , Σ0 Π0 where RE = 1 and coRE = 1 form the first level.

15 suggestions regarding explicit separations between Cqa and Cqc. We thank Peter Burton, William Slofstra and Jalex Stark for comments on a previous version. We thank Lewis Bowen for pointing out typos and minor errors in a previous version. Zhengfeng Ji is supported by Australian Research Council (DP200100950). Anand Natarajan is sup- ported by IQIM, an NSF Physics Frontiers Center (NSF Grant PHY-1733907). Thomas Vidick is supported by NSF CAREER Grant CCF-1553477, AFOSR YIP award number FA9550-16-1-0495, a CIFAR Azrieli Global Scholar award, MURI Grant FA9550-18-1-0161 and the IQIM, an NSF Physics Frontiers Center (NSF Grant PHY-1125565) with support of the Gordon and Betty Moore Foundation (GBMF-12500028). Henry Yuen is supported by NSERC Discovery Grant 2019-06636. Part of this work was done while John Wright was at the Massachusetts Institute of Technology. He is supported by IQIM, an NSF Physics Fron- tiers Center (NSF Grant PHY-1733907), and by ARO contract W911NF-17-1-0433.

16 2 Proof Overview

In this section we give an overview of the proof of the inclusion RE MIP∗. Since all interactive proof systems considered in the paper involve a single-round interaction between⊆ a classical verifier and two quantum provers sharing entanglement we generally use the language of nonlocal games to describe such proof systems, and often refer to the provers as “players”. In a nonlocal game G (or simply “game” for short), the verifier can be described as the combination of two procedures: a question sampling procedure that samples a pair of questions (x, y) for the players according to a distribution µ (known to the verifier and the players), and a decision procedure (also known to all parties) that takes as input the players’ questions and their respective answers a, b and evaluates a predicate D(x, y, a, b) 0, 1 to determine the verifier’s ∈ { } acceptance or rejection. Given a description of a nonlocal game G, recall that val∗(G) denotes the entangled value of the game, which is defined as the supremum (6) of the players’ success probability in the game over all finite-dimensional tensor product strategies. (We refer to Section 5 for definitions regarding nonlocal games.) Our results establish the existence of transformations on families of nonlocal games Gn n N hav- ing certain properties. In order to keep track of efficiency (and ultimately, computability){ properties} ∈ it is important to have a way to specify such families in a uniform manner. Towards this we introduce the fol- lowing formalism. A uniformly generated family of games is specified through a pair of Turing machines =( , ) that satisfy certain conditions, in which case the pair is called a normal form verifier. The Tur- V S D ing machine (called a sampler) takes as input an index n N and returns the description of a procedure S ∈ that can be used to sample questions (x, y) in the game (this procedure itself obeys a certain format asso- ciated with “conditionally linear” distributions, defined below). The Turing machine (called a decider) D takes as input an index n, questions (x, y), and answers (a, b), and returns a single-bit decision. For the sake of this proof overview we assume that the sampling and decision procedures run in time polynomial in the index n; we refer to the running time of these procedures as the complexity of the verifier. Given a normal form verifier = ( , ) we associate to it an infinite family of nonlocal games n indexed by positive integers in theV naturalS way.D {V } The main technical result of this paper is a gap-preserving compression transformation on normal form verifiers. The following theorem presents an informal summary of the properties of this transformation. Recall that for a game G and probability 0 p 1, E (G, p) denotes the minimum local dimension of an ≤ ≤ entangled state shared by the players in order for them to succeed in G with probability at least p. Theorem 2.1 (Gap-preserving compression of normal form verifiers, informal). There exists a polynomial- time Turing machine Compress that, when given as input the description of a normal form verifier = V ( , ), outputs the description of another normal form verifier = ( , ) that satisfies the following S D V ′ S ′ D′ properties: for all n N, letting N = 2n, ∈ 1. (Completeness) If val∗( )= 1 then val∗( )= 1. VN Vn′ 1 1 2. (Soundness) If val∗( ) then val∗( ) . VN ≤ 2 Vn′ ≤ 2 Ω(n) 3. (Entanglement lower bound) E ( , 1 ) max E ( , 1 ), 22 . Vn′ 2 ≥ { VN 2 } The formal version of this theorem is stated in Section 12 as Theorem 12.1. The terminology compression is motivated by the fact, implicit in the informal statement of the theorem, that the time complexity of the ver- ifier’s sampling and decision procedures in the game , which is polynomial in n, is exponentially smaller Vn′ than the time complexity of the verifier in the game , which is polynomial in N and thus exponential in VN n.

17 Before giving an overview of the proof of Theorem 2.1 we sketch how the existence of a Turing machine Compress with the properties stated in the theorem implies the inclusion RE MIP∗. Recall that the ⊆ complexity class RE consists of all languages L such that there is a Turing machine that accepts instances M x in L, and does not accept instances x that are not in L (but is not required to terminate on such instances). To show RE MIP∗ we give an MIP∗ protocol for the Halting Problem, which is a complete problem for RE. The Halting⊆ Problem is the language that consists of all Turing machine descriptions such that halts when run on an empty input tape. (For the purposes of this overview, we blur the distinctionM betweenM a Turing machine and its description as a string of bits.) We give a procedure that given a Turing machine M as input returns the description of a normal form verifier M =( M, M) with the following properties. V S D N First, if does eventually halt on an empty input tape, then it holds that for all n , val∗( nM) = 1. M N 1 ∈ V Second, if does not halt then for all n , val∗( nM) 2 . We describeM the procedure that achieves∈ this. Informally,V ≤ the procedure returns the specification of a verifier = ( , ) such that proceeds as follows: on input (n, x, y, a, b) it first executes the V M SM DM DM Turing machine for n steps. If halts, then accepts. Otherwise, computes the description M M DM DM of the compressed verifier ′ = ( ′, ′) that is the output of Compress on input M, then executes the V S D 15 V decision procedure ′(n, x, y, a, b) and accepts if and only if ′ accepts. To show that thisD procedure achieves the claimed transformaDtion, consider two cases. First, observe that if eventually halts in some number of time steps T, then by definition val∗( ) = 1 for all n T. M VnM ≥ Using Theorem 2.1 along with an inductive argument it then follows that val∗( ) = 1 for all n 1. VnM ≥ Second, if never halts, then observe that for any n 1 Theorem 2.1 implies two separate lower bounds M ≥ 1 on the amount of entanglement required to win the game nM with probability at least 2 : the dimension 2Ω(n) V is (a) at least 2 , and (b) at least the dimension needed to win the game 2Mn with probability at least 1 V 2 . Applying an inductive argument it follows that an infinite amount of entanglement is needed to win the 1 game n with any probability greater than 2 . Thus, a sequence of finite-dimension strategies for n cannot V 1 1 V lead to a limiting value larger than 2 , and val∗( nM) 2 . We continue with an overview of the ideas behindV ≤ the proof of Theorem 2.1.

Compression by introspection. To start, it is useful to review the protocol introduced in [NW19] to show the inclusion NEEXP MIP∗. Fix an NEEXP-complete language L. The MIP∗ protocol for NEXP from [NV18b], when scaled⊆ up to decide languages from NEEXP, yields a family of nonlocal games G { z} that are indexed by instances z 0, 1 . The family of games decides L in the sense that for all z, the ∈ { }∗ game G has entangled value 1 if z L, and has entangled value at most 1 if z / L. Furthermore, if z ∈ 2 ∈ n = z is the length of z, the verifier of the game Gz has complexity poly(N) = exp( z ) (recall that we use this| | terminology to refer to an upper bound on the running time of the verifier’s sampling| | and decision procedure). Thus, this family of games does not by itself an MIP∗ protocol for L. To overcome this the main contribution in [NW19] is the design of an efficient compression procedure CompressNW that applies NW specifically to the family of games Gz . When given as input the description of Gz, Compress returns G { } G G G 1 the description of a game z′ such that if val∗( z) = 1, then val∗( z′ ) = 1, and if val∗( z) 2 , then G 1 G ≤ val∗( z′ ) 2 . Furthermore, the complexity of the verifier for z′ is poly(n). Thus the family of games G ≤ z′ decides L and this shows that NEEXP MIP∗, which is the best lower bound known on MIP∗ prior to{ our} work. ⊆ Presented in this way, it is natural to suggest iterating the procedure CompressNW to achieve e.g. the inclusion NEEEXP MIP∗. To explain the difficulty in doing so, we give a little more detail on the ⊆ 15 The fact that the decider M can invoke the Compress procedure on itself follows from a well-known result in computability theory known as Kleene’s recursionD theorem (also called Roger’s fixed point theorem)[Kle54, Rog87].

18 compression procedure. It consists of two main steps: starting from Gz, perform (1) question reduction, and (2) answer reduction. The goal of (1) is to reduce the length of the questions generated by the verifier in Gz from poly(N) to poly(n). The goal of (2) is to achieve the same with respect to the length of answers G expected from the players. Furthermore, the complexity of the verifier of the resulting game z′ should be reduced from poly(N) to poly(n). Part (1) is achieved through a technique referred to as “introspection” where, rather than sampling ques- tions (x, y) of length poly(N) as in the game Gz, the verifier instead executes a carefully crafted nonlo- cal game with the players that (a) requires questions of length poly(n), (b) checks that the players share poly(N) EPR pairs, and (c) checks that the players measure the EPR pairs in such a way as to sample for themselves a question pair (x, y) such that one player gets x and the other player gets y. In other words, the players are essentially forced to introspectively ask themselves the questions of Gz. After question reduction, the players still respond with poly(N)-length answers, which the verifier has to check satisfies the decision predicate of the original game Gz. The goal of Part (2) is to enable the decision procedure to implement the verification procedure while not requiring the entire full-length answers from the players. In the answer reduction scheme of [NW19] this is achieved by having the verifier run a probabilistically checkable proof (PCP) with the players so that they succinctly prove that first, they have introspected questions (x, y) from the correct distribution, and second, that they are able to generate poly(N)-length answers (a, b) that would satisfy the decision predicate of the original game Gz when executed on (x, y) and (a, b). Since the questions and answers in the PCP are of length poly(n), this achieves the desired answer length reduction. Iterating this scheme presents a number of immediate difficulties that have to do with the fact that the G G sampling and decision procedures of the verifier in z′ do not have such a nice form as those in z. First of all, the compression procedure of [NW19] can only “introspect” a specific question distribution of a nonlocal game from [NV18b]; we call this distribution a “low-degree test distribution”.16 However, the resulting question distribution of the question-reduced verifier, which is used to check the introspection, has a much more complex structure. A similar issue arises with the modifications required to perform answer reduction. In the PCP employed to achieve this the question distribution appears to be much more complex than the low-degree test distribution (this is in large part due to the need for a specially tailored PCP procedure that encodes separately different chunks of the witness, corresponding to answers from different G players). As a result it is entirely unclear at first whether the question distribution used by the verifier in z′ can be “introspected” for a second time. To overcome these difficulties we identify a natural class of question distributions, called conditionally linear distributions, that generalize the low-degree test distribution. We show that conditionally linear distri- butions can be “introspected” using conditionally linear distributions only, enabling recursive introspection. (In particular, they are a rich enough class to capture the types of question distributions produced by the com- pression scheme of [NW19].) We define normal form verifiers by restricting their sampling procedure to generate conditionally linear question distributions, and this allows us to obtain the compression procedure on normal form verifiers described in Theorem 2.1. Conceptually, the identification of a natural class of distributions that is “closed under introspection” is a key step that enables the introspection technique to be applied recursively. (As we will see later, other

16The nonlocal game of [NV18b] is part of a more general family of games called “low-degree tests”, which have been studied extensively for their applications in complexity theory [BFL91], property testing [RS96], and PCPs[AS98]. Generally, the question distribution of a low-degree test is a random question pair (x, y) where y is a randomly chosen affine subspace in Fm and x is a uniformly random point on y, where F is a finite field and m 2 is an integer. The specific low-degree test game in [NV18b] (which is based on the low-degree test of [RS97]) uses two-dimensional≥ subspaces; thus the question distribution of [RS97, NV18b] is referred to as the “plane-point distribution”.

19 closure properties of conditionally linear distributions, such as taking direct products, play an important role as well.) Since conditionally linear distributions are central to our construction we describe them next.

Conditionally linear distributions. Fix a vector space V that is identified with Fm, for a finite field F and integer m. Informally (see Definition 4.1 for a precise definition), a function L on V is conditionally linear (CL for short) if it can be evaluated by a procedure that takes the following form: (i) read a substring z(1) (1) of z; (ii) evaluate a linear function L1 on z ; (iii) repeat steps (i) and (ii) with the remaining coordinates z z(1), such that the next steps are allowed to depend in an arbitrary way on L (z(1)) but not directly on \ 1 z(1) itself. What distinguishes a function of this form from an arbitrary function is that we restrict the number of iterations of (i)—(ii) to a constant number (at most 9, in our case). (One may also think of CL functions as “adaptively linear” functions, where the number of “levels” of adaptivity is the number of iterations of (i)—(ii).) A distribution µ over pairs (x, y) V V is called conditionally linear if it is the ∈ × image under a pair of conditionally linear functions LA, LB : V V of the uniform distribution on V, i.e. → (x, y) (LA(z), LB(z)) for uniformly random z V. An∼ important class of CL distributions are low-degree∈ test distributions, which are distributions over question pairs (x, y) where y is a randomly chosen affine subspace of Fm and x is a uniformly random point on y. We explain this for the case where the random subspace y is one-dimensional (i.e. a line). Let m A V = VX VV where VX = VV = F . Let L be the projection onto VX (i.e. it maps (x, v) (x, 0) ⊕ B LN LN → where x VX and v VV). Define L : V V as the map (x, v) (L (x), v) where L : VX VX ∈ ∈ → 7→ v v → is a linear map that, for every v VV, projects onto a complementary subspace to the one-dimensional ∈ B subspace of VX spanned by v (one can think of this as an “orthogonal subspace” to the span of v ). L is { } conditionally linear because it can be seen as first reading the substring v VV (which can be interpreted as LN ∈ specifying the direction of a line), and then applying a linear map L to x VX (which can be interpreted v ∈ as specifying a canonical point on the line ℓ = x + tv : t F ). It is not hard to see (and shown formally { ∈ } in Section 7.1.1) that the distribution of (LA(z), LB(z)) for z uniform in V, is identical (up to relabeling) to the low-degree test distribution (x, ℓ) where ℓ is a uniformly random affine line in Fm, and x is a uniformly random point on ℓ. Our main result about CL distributions, presented in Section 8, is that any CL distribution µ, associated with a pair of CL functions (LA, LB) over a linear space V = Fm, can be “introspected” using a CL distribution that is “exponentially smaller” than the initial distribution. Slightly more formally, to any CL distribution µ we associate a two-player game Gµ (called the “introspection game”) in which questions m from the verifier are sampled from a CL distribution µ′ over F ′ for some m′ = poly log(m) and such that in any successful strategy for the game Gµ, when the players are queried on a special question labeled INTRO, they must respond with a pair (x, y) that is approximately distributed according to µ. (The game allows us to do more: it allows us to conclude how the players obtained (x, y) — by measuring shared EPR pairs in a specific basis — and this will be important when using the game as part of a larger protocol that involves other checks.) Crucially for us, the distribution µ′ only depends on a size parameter associated with (LA, LB) (essentially, the integer m together with the number of “levels” of adaptivity of LA and LB), but A B not on any other structural property of (L , L ). Only the decision predicate for the introspection game Gµ depends on the entire description of (LA, LB). We say a few words about the design of µ′ and the associated introspection game, which borrow heavily from [NW19]. Building on the “quantum low-degree test” introduced in [NV18a] it is already known how a verifier can force a pair of players to measure m EPR pairs in either the computational or Hadamard basis and report the (necessarily identical) outcome z obtained, all the while using questions of length polylogarithmic in m only. The added difficulty is to ensure that a player obtains, and returns, precisely the information about

20 z that is contained in LA(z) (resp. LB(z)), and not more. A simple example is the line-point distribution described earlier: there, the idea to ensure that e.g. the “point” player only obtains the first component, x of (x, v) VX VV, the verifier demands that the “point” player measures their qubits associated with ∈ ⊕ the space VV in the Hadamard, instead of computational, basis; due to the uncertainty principle this has the effect of “erasing” the outcome in the computational basis. The case of the “line” player is a little more complex: the goal is to ensure that, conditioned on the specification of the line ℓ received by the “line” player, the point x received by the “point” player is uniformly random within ℓ. This was shown to be possible in [NW19]. We can now describe how samplers of normal form verifiers are defined: these are Turing machines that specify an infinite family of CL distributions µn by computing, for each index n, the CL func- S A, n B, n { } tions (L , L ) associated with µn. (See Definition 4.12 for a formal definition of samplers.) Thus, the question distributions of a normal form verifier =( , ) are the CL distributions corresponding to . V S D S Question reduction. Just like the compression procedure of [NW19], the compression procedure Compress of Theorem 2.1 begins with performing question reduction on the input game. Given a normal form verifier = ( , ), the procedure Compress first computes a normal form verifier INTRO = ( INTRO, INTRO) V S D V S D where for all n N, the game INTRO consists of playing the original game where N = 2n, except ∈ Vn VN that instead of sampling the questions according to the CL distribution µN specified by the sampler N, G S the verifier executes the introspection game µN described in the previous subsection. Thus, in the game INTRO, when both players receive the question labeled INTRO they are expected to sample (x, y) respec- Vn tively according to µN, and respond with the sampled question together with answers a, b respectively. The decider INTRO on index n evaluates (N, x, y, a, b) and accepts if and only if accepts. As a result the D D D time complexity of decider INTRO on index n remains that of , i.e. poly(N). However, the length of INTRO D INTROD questions asked in n and the complexity of the sampler are exponentially reduced, to poly(n). V S INTRO For convenience we refer to the questions asked by the verifier in the “question-reduced” game n INTRO V as “small questions,” and the questions that are introspected by the players in n (equivalently, the questions asked in the original game ) as “big questions.” V VN Answer reduction. Having reduced the complexity of the question sampling, the next step in the compres- sion procedure Compress is to reduce the complexity of decider INTRO from poly(N) to poly(n) (which D necessarily implies reducing the answer length to poly(n)). To achieve this the compression procedure computes a normal form verifier AR = ( AR, AR) from INTRO such that both the sampler and decider V S D V complexity in AR are poly(n) (here, AR stands for “answer reduction”). Similarly toV the answer reduction performed in [NW19], at a high level this is achieved by composing the game INTRO with a probabilistically checkable proof (PCP). In our context a PCP is a proof encoding Vn that allows a verifier to check whether, given Turing machine and time bound T provided as input, there A exists some input x that accepts in time T. The PCP proof can be computed from , T, and the accepting A A input (if it exists) and has length polynomial in T and the description length of . Crucially, the verifier can check a purported proof while only reading a constant number of symbols|A| ofA it, each of length polylog(T, ), and executing a verification procedure that runs in time polylog(T, ). We use PCPs|A| for answer reduction as follows. The verifier in the game AR samples|A| questions as INTRO Vn Vn would and sends them to the players. Instead of receiving the introspected questions and answers (x, y, a, b) for the original game and running the decision procedure (N, x, y, a, b), the verifier instead asks the VN D players to compute a PCP Π for the statement that the original decider accepts the input (N, x, y, a, b) in D time T = poly(N). The verifier then samples additional questions for the players that ask them to return

21 specific entries of the proof Π. Finally, upon receipt of the players’ answers, the verifier executes the PCP verification procedure. Because of the efficiency of the PCP, both the sampling of the additional questions and the decision procedure can be executed in time poly(n).17 This very rough sketch presents some immediate difficulties. A first difficulty is that in general no player by themselves has access to the entire input (N, x, y, a, b) to , so no player can compute the entire proof Π. We discuss this issue in the next paragraph. A second difficulDty is that a black-box application of an existing AR PCP, as done in [NW19], results in a question distribution for n (i.e. the sampling of the proof locations to be queried) that is rather complex — and in particular, itV may no longer fall within the framework of CL distributions for which we can do introspection. To avoid this, we design a bespoke PCP based on the classical MIP for NEXP (in particular, we borrow and adapt techniques from [BSS05, BSGH+06]). Two essential properties for us are that (i) the PCP proof is a collection of several low-degree polynomials, two of INTRO which are low-degree encodings of each player’s big answer in the game n , and (ii) verifying the proof only requires (a) running low-degree tests, (b) querying all polynomialsV at a uniformly random point, and (c) performing simple consistency checks. Property (i) allows us to eliminate the extra layer of encoding in [NW19], who had to consider a PCP of proximity for a circuit applied to the low-degree encodings of AR the players’ big answers. Property (ii) allows us to ensure that the question distribution employed by n remains conditionally linear. V

Oracularization. The preceding paragraph raises a non-trivial difficulty. In order for the players to com- pute a proof for the claim that (N, x, y, a, b) = 1 they need to know the entire input (x, y, a, b). However, D in general a player only has access to their own question and answer: one player only knows (x, a) and the other player knows (y, b). The standard way of circumventing this difficulty is to consider an “orac- ularized” version of the game, where one player gets both questions (x, y) and is able to determine both answers (a, b), while the other player only gets one of the questions at random, and is only asked for one of the answers, that is then checked for consistency with the first player’s answer. While this technique works well for games with classical players, when the players are allowed to use quantum strategies using entanglement oracularization does not, in general, preserve the completeness prop- erty of the game. To ensure that completeness is preserved we need an additional property of a completeness- achieving strategy for the original game: that there exists a commuting and consistent strategy on all pairs of questions (x, y) that are asked in the game with positive probability. Here commuting means that the mea- surement Ax performed by the player receiving x commutes with the measurement By performed by { a }a { b }b the player receiving y.18 Consistent means that if both players perform measurements associated with the same question they obtain the same answer. If both properties hold then in the oracularized game when one player receives a pair (x, y) and the other player receives the question x (say), the first player can simultane- x y ously measure both Aa a and Bb b on their own space to obtain a pair of answers (a, b), and the second { x } { } player can measure Aa a to obtain a consistent answer a. For answer reduction{ } to be possible it is thus applied to the oracularized version of the introspection INTRO INTRO game n . This in turn requires us to ensure that the introspection game n has a commuting and V INTROV consistent strategy achieving value 1 whenever it is the case that val∗( n ) = 1. For this property to hold we verify that it holds for the initial game that is used to seed the compressionV procedure (this is true because we can start with an MIP∗ protocol for NEXP for which there exists a perfect classical strategy)

17This idea is inspired by the technique of composition in the PCP literature, in which the complexity of a verification procedure can be reduced by composing a proof system (often a PCP itself) with another PCP. 18We stress that the commuting property only applies to question pairs that occur with positive probability, and does not mean that all pairs of measurement operators are required to commute; indeed this would imply that the strategy is effectively classical.

22 and we also ensure that each of the transformations of the compression protocol (question reduction, answer reduction, and parallel repetition described next) maintains it.

Parallel repetition. The combined steps of question reduction (via introspection) and answer reduction AR (via PCP composition) result in a game n such that the complexity of the verifier is poly(n). Further- V AR more, if the original game N has value 1, then n also has value 1. Unfortunately the sequence of V V 1 transformations incurs a loss in the soundness parameters: if val∗( N) 2 , then we can only establish that AR < 1 V ≤ val∗( n ) 1 C for some positive constant C 2 (we call C the soundness gap). Such a loss would preventV us from≤ − recursively applying the compression procedure Compress an arbitrary number of times, which is needed to obtain the desired complexity results for MIP∗. To overcome this we need a final transformation to restore the soundness gap of the games after answer 1 reduction to a constant larger than 2 . To achieve this we use the technique of parallel repetition. The parallel repetition of a game G is another nonlocal game Gk, for some number of repetitions k, which consists of playing k independent and simultaneous instances of G and accepting if and only if all k instances accept. Intuitively, parallel repetition is meant to decrease the value of a game G exponentially fast in k, provided val∗(G) < 1 to begin with. However, it is an open question of whether this is generally true for the entangled value val∗. Nevertheless, some variants of parallel repetition are known to achieve exponential amplification. We use a variant called “anchored parallel repetition” and introduced in [BVY17]. This allows us to devise REP a transformation that efficiently amplifies the soundness gap to a constant. The resulting game n has AR REP V the property that if val∗( n ) = 1, then val∗( n ) = 1 (and moreover this is achieved using a com- V V AR > muting and consistent strategy), whereas if val∗( n ) 1 C for some universal constant C 0 then REP 1 V ≤ − val∗( n ) 2 . Furthermore, we have the additional property, essential for us, that good strategies in V R≤EP AR the game n require as much entanglement as good strategies in the game n (which in turn require as much entanglementV as good strategies in the game ). The complexity of theV verifier in REP remains VN Vn poly(n). The anchored parallel repetition procedure, when applied to a normal form verifier, also yields a normal form verifier: this is because the direct product of CL distributions is still conditionally linear.

Putting it all together. This completes the overview of the transformations performed by the compres- sion procedure Compress of Theorem 2.1. To summarize, given an input normal form verifier , question reduction is applied to obtain INTRO, answer reduction is applied to the oracularized version ofV INTRO to obtain AR, and anchored parallelV repetition is applied to obtain REP, which is returned by the compressionV procedure.V Each of these transformations preserves completenessV (including the commuting and consistent properties of a value-1 strategy) as well as the entanglement requirements of each game; moreover, the overall transformation preserves soundness.

23 3 Preliminaries

Notation. We use Σ to denote a finite alphabet. N is the set of positive integers. For w 0, 1 , w ∈ { } denotes 1 w. For w A, B , w = B if w = A and w = A otherwise. (For notational convenience we − ∈ { } often implicitly make the identifications 1 A and 2 B.) We use F to denote a finite field. We write ↔ ↔ Mn(F) to denote the set of n n matrices over F. We write I to denote the identity operator on a vector space. We write Tr( ) for the× matrix trace. We write to denote a separable Hilbert space. For a linear · H operator T, T denotes the operator norm. k k Asymptotics. All logarithms are base 2. We use the notation O( ), poly( ), and polylog( ) in the fol- · · · lowing way. For f , g : N R we write f (n) = O(g(n)) (omitting the integer n when it is clear from → + context) to mean that there exists a constant C > 0 such that for all n N, f (n) Cg(n). When we ∈ ≤ write f (a1,..., ak) = poly(a1,..., ak), this indicates that there exists a universal constant C > 0 (which may vary each time the notation is used in the paper) such that f (a ,..., a ) C(a a )C for all pos- 1 k ≤ 1 · · · k itive a1,..., ak. Similarly, when we write f (a1,..., ak) = polylog(a1,..., ak), there exists a universal k C constant C such that f (a1,..., ak) C ∏i=1 log (1 + ai) for all positive a1,..., ak. Finally, we write k≤ 19 log(a1,..., ak) as short hand for ∏i=1 log(1 + ai).

3.1 Turing machines Turing machines are a model of computation introduced in [Tur37]. Turing machines play a central role in our modeling of verifiers for nonlocal games. For an in-depth discussion of Turing machines, we refer the reader to Papadimitriou’s textbook [Pap94]. Here we establish notation used throughout the paper. All Turing machines considered in the paper are deterministic and use the binary alphabet. The tapes of a Turing machine are infinite one-dimensional arrays of cells that are indexed by natural numbers. A k-input Turing machine has k input tapes, one work tape, and one output tape. Each cell of a tape has symbols M taken either from the set 0, 1 or the blank symbol . At the start of the execution of a Turing machine, the work and output tapes{ are} initialized to have all blank⊔ symbols. A Turing machine halts when it enters a designated halt state. The output of a Turing machine, when it halts, is the binary string that occupies the longest initial stretch of the output tape that does not have a blank symbol. If there are only blank symbols on the output tape, then by convention we say that the Turing machine’s output is 0. Every k-input Turing machine computes a (partial) function f : ( 0, 1 )k 0, 1 where the M { }∗ → { }∗ function is only defined on subset S ( 0, 1 )k of inputs x on which halts. We use (x , x ,..., x ) ⊆ { }∗ M M 1 2 k to denote the output of a k-input Turing machine when x 0, 1 is written on the i-th input tape for M i ∈ { }∗ i 1,2,..., k . If does not halt on an input x, then we define (x) to be . A Turing machine that halts∈ { on all inputs} computesM a total function. M ⊥ We often leave the number of input tapes of a Turing machine implicit. The time complexity of a Turing machine on input x = (x1, x2,..., xk), denoted by TIME , x, is the number of time steps that M M M takes on input x before it enters its designated halt state; if never halts on input x, then we define M TIME ,x = ∞. TheM finite number of states and the transition rules of a Turing machine can be encoded as a bit M string 0, 1 , called the description of . For every integer k N and every string α 0, 1 , M ∈ { }∗ M ∈ ∈ { }∗ the k-input Turing machine described by α is denoted [α]k. We assume without loss of generality that for

19The additional 1 in the argument of the log( ) is to ensure that this quantity is strictly positive. ·

24 all k N, every bit string represents some k-input Turing machine and every k-input Turing machine is represented∈ by infinitely many different bit strings. Throughout the paper we frequently construct Turing machines that run or simulate other Turing ma- chines. Implicitly we assume that this simulation can be done efficiently, as given by the following re- sult [HS65]. We assume that there is an unambiguous, binary encoding enc (x) of k-tuples x ( 0, 1 )k k ∈ { }∗ that is computable in time O(k + x + + x ) where x is the i-th component of the x.20 | 1| · · · | k| i Theorem 3.1 (Efficient universal Turing machine [HS65]). For all k N, there exists a 2-input Turing ∈ machine such that for every x ( 0, 1 )k and α 0, 1 , (α,enc (x)) = [α] (x). Moreover, Uk ∈ { }∗ ∈ { }∗ Uk k k if [α] halts on input x in T steps then (α, x) halts within C( x T2) steps,21 where C is a constant k Uk | | · depending only on k and the number of states of [α] , and x denotes the sum of lengths x + + x . k | | | 1| · · · | k| The existence of an efficient universal Turing machine such as the one described in Theorem 3.1 implies that a Turing machine can call another Turing machine as a “subroutine” with polynomial overhead. M N Remark 3.2. Throughout this paper, we often define Turing machines with some number k of inputs, M and then for some string a 0, 1 define a (k 1)-input Turing machine whose behavior on input ∈ { }∗ − Ma (y1,..., yk 1) is to execute on input (a, y1,..., yk 1). Informally, the Turing machine a “hardwires” − M − M the input a onto the first tape of . We often informally write a(y1,..., yk 1)= (a, y1,..., yk 1). M M − M − More precisely, the Turing machine a on input (y1,..., yk 1) first computes the encoding M − e = enck(a, y1,..., yk 1), and then runs the universal Turing machine k on input ( , e). The description − U M of can be computed from and a in time O( a + + O(1)), where the O(1) comes from the Ma M | | |M| description length of the universal Turing machine k (and thus depends on k). U 2 Furthermore, the time complexity of the Turing machine a on input (y1,..., yk 1) is at most O(T ( a + M − | | y1 + + yk 1 )) where T is the time complexity of on input (a, y1,..., yk 1). This bound comes | | · · · | − | M − from the complexity of encoding the inputs from the k tapes into the enc ( ) format, and then simulating k · M on input (a, y1,..., yk 1). − Remark 3.3. Although the inputs and outputs of a Turing machine are strictly speaking binary strings, we oftentimes slightly abuse notation and specify Turing machines that treat their inputs and outputs as objects with more structure, such as finite field elements, integers, symbols from a larger alphabet, and so on. In this case we implicitly assume that the Turing machine specification uses a consistent convention to represent these structured objects as binary strings. Conventions for objects such as integers are straightforward. For representations of finite field elements, we refer the reader to Section 3.3.2.

3.2 Linear spaces Linear spaces considered in the paper generally take the form V = Fn for a finite field F and integer n 1. ≥ In particular, when we write “let V be a linear space”, unless explicitly stated otherwise we always mean a space of the form Fn. Let E = e , e ,..., e denote the standard basis of V, where for i 1,2,..., n , { 1 2 n} ∈ { } ei =(0,...,0,1,0,...,0) has a 1 in the i-th coordinate and 0’s elsewhere. We write End(V) to denote the set of linear transformations from V to itself. 20 A canonical choice of such an encoding is the following: a tuple (x1,..., xk) is in encoded into a concatenation of a “dual rail” encoding of each x : every bit of x is expanded via the map 0 01, 1 10, and the end of the string x is indicated by 00. i i 7→ 7→ i 21The factor x comes from the fact that simulating the movement of k tape heads using one may require the single tape head to sweep over the| entire| input x.

25 Definition 3.4 (Register subspace). A register subspace S of V is a subspace that is the span of a subset of the standard basis of V.22 We often represent such a subspace as an indicator vector u 0, 1 s, where ∈ { } s = dim(V), such that if e ,..., e is the standard basis of V then S = span e u = 1 . { 1 s} { i| i } Fn n Definition 3.5. Let E = ei be the standard basis of V = . For two vectors u = ∑i=1 uiei, v = n { } ∑i=1 viei in V, define the dot product n u v = ∑ uivi F . · i=1 ∈ Let S be a subspace of V. The subspace orthogonal to S in V is

S⊥ = u V : u v = 0 for all v S . ∈ · ∈  Although the notation S⊥ does not explicitly refer to V, the ambient space will always be clear from context. We note that over finite fields, the notion of orthogonality does not possess all of the same intuitive properties of orthogonality over fields such as R or C; for example, a non-zero subspace S may be orthogonal to itself (e.g. span (1, 1) over F ). However, the following remains true over all fields. { } 2 Lemma 3.6. Suppose S is a subspace of V. Then

(S⊥)⊥ = S .

Furthermore, dim(S)+ dim(S⊥)= dim(V).

Proof. The “furthermore” part follows from the fact that vectors in S⊥ are the solution to a feasible linear system of equations with dim(S) linearly independent rows; this implies that the solution space has dimen- sion exactly dim(V) dim(S). Next, we argue that S (S ) . Let u S. Since all vectors v S − ⊆ ⊥ ⊥ ∈ ∈ ⊥ are orthogonal to every vector in S, in particular u, this implies that u (S ) . By dimension counting, it ∈ ⊥ ⊥ follows that (S⊥)⊥ = S. Definition 3.7. Given a linear space V, two subspaces S and T of V are said to form a pair of complementary subspaces of V if S T = 0 , S + T = V . ∩ { } In this case, we write V = S T. Any x V can be written as x = xS + xT for xS S and xT T in ⊕ ∈ ∈ ∈ a unique way. We refer to xS (resp. xT) as the projection of x onto S parallel to T (resp. onto T parallel to S). We call the unique linear map L : V V that maps x xS the projector onto S parallel to T. → 7→ A given subspace may have many different complementary subspaces: consider the example of S = span (1, 1) in F2. Different complementary subspaces include T = span (1, 0) and T = span (0, 1) . { } 2 { } ′ { } It is convenient to define the notion of a canonical complement of a subspace S, given a basis for S. Definition 3.8. Let E be the standard basis of linear space V = Fn. Let F = v , v ,..., v V be a set { 1 2 m}⊂ of m linearly independent vectors in V. The canonical complement F⊥ of F is the set of n m independent n − vectors defined as follows. Write vi = ∑j=1 ai,j ej. Using a canonical algorithm for Gaussian elimination that works over arbitrary fields, transform the m n matrix (a ) to reduced row echelon form (b ). Let × i,j i,j J be the set of m column indices of the leading 1 entry in each row of (bi,j). The canonical complement is defined as F = e : j J . ⊥ { j 6∈ } 22The use of the term “register” is meant to create an analogy for how the space of multiple qubits is often partitioned into “registers” containing a few qubits each.

26 Remark 3.9. We emphasize that the canonical complement F⊥ is a set (rather than a subspace), and it is defined with respect to a set of linearly independent vectors F. While there are equivalent ways of defining a canonical complement of a subspace in a basis-independent manner, we use this particular definition because it makes it clear that the canonical complement of a set F is efficiently computable. Remark 3.10. Let E be the standard basis of V. Suppose subspace S is a register subspace of V spanned by E E. Then it is not hard to verify that the canonical complement of S is the span of E E and coincides 0 ⊆ \ 0 with S⊥. Lemma 3.11. Let S be the span of linearly independent vectors F = v ,..., v V and let F be the { 1 m} ⊆ ⊥ canonical complement of F as defined in Definition 3.8. Let T = span(F⊥). Then S T = 0 , S + T = V . ∩ { } Proof. Let A =(a ) be the m n matrix over F associated with the v as in Definition 3.8. Write A = UB i,j × i where U is invertible and B is in reduced row echelon form. Let J be as in Definition 3.8. Then the columns of A indexed by J are linearly independent and span Fm. This means that for any vector u V there is ∈ v S such that u = v for all j J. Then u = v + w for some w T. This shows S + T = V. Counting ∈ j j ∈ ∈ dimensions shows that necessarily S T = 0 . ∩ { } Definition 3.12. Let F V be a set of linearly independent vectors. Let F be the canonical complement ⊆ ⊥ of F. Define the canonical linear map L End(V) with kernel basis F as the projector onto T parallel to ∈ S, where S = span(F) and T = span(F⊥). When the basis F for S is clear from context, we refer to this map as the canonical linear map with kernel S. Definition 3.13. Let L End(V) be a linear map, and let F be a basis for ker(L) . Define L : V V ∈ ⊥ ⊥ → as the canonical linear map with kernel basis F. Lemma 3.14. Let L End(V) be a linear map and F a basis for ker(L) . Let L End(V) be the ∈ ⊥ ⊥ ∈ linear map defined in Definition 3.13. Then ker(L⊥)= ker(L)⊥.

Proof. Let F⊥ be the canonical complement of F. By definition, L⊥ is the projector onto span(F⊥) parallel to span(F) = ker(L)⊥. By Lemma 3.11, span(F⊥) and span(F) are complementary subspaces, and the projector onto span(F⊥) parallel to ker(L)⊥ must map all vectors in ker(L)⊥ to 0. Furthermore, if the projector maps a vector v to 0, it must be that v ker(L) . ∈ ⊥ 3.3 Finite fields

k Let p be a and q = p be a prime power. We denote the finite fields of p and q elements by Fp and Fq respectively. The prime p is the characteristic of field Fq, and field Fp is the prime subfield of Fq. We sometimes omit the subscript and simply use F to denote the finite field when the size of the field is implicit from context. For general background on finite fields, and explicit algorithms for elementary arithmetic operations, we refer to [MP13].

3.3.1 Subfields and bases F F F Let q be a prime power and k an integer. The field q is a subfield of qk and qk is a linear space of F k F F dimension k over q. Let ei i=1 be a basis of qk as a linear space over q. Introduce a bijection κq : k { k} k F k F between F k and F defined with respect to the basis e by q → q q q { i}i=1 κ : a (a )k q 7→ i i=1 27 k F where a = ∑i=1 aiei. This map satisfies several nice properties. First, the map is q-linear and, in particular, k addition in F k naturally corresponds to vector addition in F . Namely, for all a, b F k , q q ∈ q

κq(a + b)= κq(a)+ κq(b) .

k Second, multiplication by a field element in F k corresponds to a linear map on F . For all a F k , there q q ∈ q exists a matrix K M (F ) such that for all b F k , a ∈ k q ∈ q

κq(ab)= Ka κq(b) .

k The matrix Ka is called the multiplication table of a with respect to basis ei i=1. { } n We extend the map κ to vectors, matrices and sets over F k . For v =(v , v ,..., v ) F , define q q 1 2 n ∈ qk n κ (v)= κ (v ) Fkn . q q i i=1 ∈ q  Similarly, for matrix M =(M ) M (F k ), define i,j ∈ m,n q χ (M)= K M (F ) , q Mi,j ∈ mk,nk q  the block matrix whose -th block is the multiplication table of with respect to basis k . (i, j) KMi,j Mi,j ei i=1 For a set of vectors in Fn , define { } S qk

κ (S)= κ (v) : v S . q { q ∈ }

We omit the subscript and write κ and χ for κq and χq respectively when q equals to p, the characteristic of the field. F F The trace of qk over q is defined as

trqk q : a Tr(Ka) (10) → 7→ F for a qk , where Tr(Ka) is the trace of the multiplication table of a with respect to the basis ei . By ∈ F F F { } definition, the trace is an q-linear map from qk to q. An equivalent definition of the trace is

k 1 − qj trqk q(a)= ∑ a . → j=0

A dual basis e1′ , e2′ ,..., ek′ of e1, e2,..., ek is a basis such that trqk q(eie′j) = δi,j for all i, j { } { } → ∈ F qj k 1 1,2,..., k .A self-dual basis is one that is equal to its dual. If for some α qk the set α j=−0 forms a { F } F ∈ { } basis of qk over q, the basis is called a normal basis. We record some convenient facts about the maps κ( ) and χ( ) for self-dual bases. · ·

Lemma 3.15. Let q be a prime power, k an integer and e a self-dual basis for F k over F . The map { i} q q κ ( ) corresponding to e satisfies the following properties: q · { i} F 1. For all x qk , κq(x)= trqk q(xe1),...,trqk q(xek) . ∈ → → F  2. For all x, y qk , trqk q(xy)= κq(x) κq(y). ∈ → · 28 n 3. For all M M (F k ) and v F , we have χ (M)κ (v)= κ (Mv). ∈ m,n q ∈ qk q q q Proof. The properties follow from the definition of the map κ ( ) and the fact that e is a self-dual basis. q · { i}

For z Fn and V, W a pair of complementary subspaces, recall from Definition 3.7 the notation zV for ∈ the projection of z onto V and parallel to W.

Lemma 3.16. Let κ ( ) denote the map corresponding to a self-dual basis e for F k over F . Let V be a q · { i} q q subspace of Fn with linearly independent basis b ,..., b Fn . Then the following hold: qk { 1 t}⊆ qk Fnk 1. κq(V) is a subspace of q . 2. κ (e b ) is a linearly independent basis of κ (V) over F . { q i j }i,j q q 3. Let be complementary subspaces of Fn . Then and are comple- V, W qk V′ = κq(V) W′ = κq(W) mentary subspaces of Fkn, and furthermore for all vectors z Fn , we have κ (zV ) = κ (z)V′ and q ∈ qk q q W W κq(z )= κq(z) ′ . Proof. For the first item, we first verify that κ (V) is a subspace. Since V is a subspace, it contains 0 Fn , q ∈ qk and therefore κ (0) = 0 is also in κ (V). Let u , v κ (V). Using that κ is a bijection there exist q q ′ ′ ∈ q q u, v Fn such that u = κ (u) and v = κ (v). Therefore ∈ qk ′ q ′ q

u′ + v′ = κ (u)+ κ (v)= κ (u + v) κ (V) , q q q ∈ q where the inclusion follows because V is a subspace and thus contains u + v. Finally, for all x F , for all ′ ∈ q v V, we have that x′κq(v)= κq(x′v) κq(V) where we used that V is closed under scalar multiplication ∈F F F ∈ F by qk and thus by q (since q is a subfield of qk ). Thus κq(V) is closed under scalar multiplication by Fq. For the second item, note that an element v V can be expressed uniquely as v = ∑t v b for ∈ i=1 i i v F k . The element v can further be written as ∑ v e where v F . Thus v is a linear combination i ∈ q i j i,j j i,j ∈ q of the vectors e b , and therefore κ (v) is a linear combination of the vectors κ (e b ) . To establish that { j i} q { q j i } the vectors κ (e b ) are linearly independent, suppose towards contradiction that they are not. Then there { q j i } would exist α F such that at least one α is nonzero and i,j ∈ q i,j

0 = ∑ αi,jκq(ejbi) i,j

= κq ∑ ∑ αi,jej bi i j   

= κq ∑ βibi , i ! where we define β = ∑ α e . Since at least one α = 0 and the e are linearly independent over F , i j i,j j i,j 6 { j} q there exists i such that β = 0, which means that there is a non-trivial linear combination of the basis ele- i 6 ments b that equals 0 under κ ( ). Since κ ( ) is injective, we get a contradiction with linear independence i q · q · of the b . { i}

29 For the third item, we observe that κ (V) and κ (W) must be complementary because κ ( ) is a linear q q q · map as well as a bijection. Let v ,..., v and v ,..., v denote bases for V and W, respectively. { 1 m} { m+1 n} Thus the set v ,..., v forms a basis for Fn , and from the previous item, the set κ (e v ) is a basis { 1 n} qk { q j i }i, j Fkn for q . Furthermore, the sets κq(ejvi) j, i=1,...m and κq(ejvi) j, i=m+1,...n are bases for κq(V) and κq(W), respectively. { } { } There is a unique choice of coefficients α F such that κ (z)= ∑ α κ (e v ). But then i,j ∈ q q i,j i,j q j i

κq(z)= κq ∑ ∑ αi,jej vi i j    = κq ∑ αivi ,  i  where we define αi = ∑j αi,jej. Since κq( ) is a bijection, this implies that z = ∑i αivi, and therefore V m W n · z = ∑i=1 αivi (and similarly z = ∑i=m+1 αivi). This implies that

m V′ V κq(z) = ∑ ∑ αi,jκq(ejvi)= κq(z ) , i=1 j

W W and similarly κq(z) ′ = κq(z ). This completes the proof of the lemma.

3.3.2 Bit string representations As mentioned in Remark 3.3, we sometimes treat the inputs and outputs of Turing machines as representing elements of a finite field, or a vector space over a finite field. We discuss some important details about bit representations of finite field elements and arithmetic over finite fields. F In the paper we only consider fields 2k where k is odd. Definition 3.17. A field size q is called an admissible field size if q = 2k for odd k. F F Elements of 2 are naturally represented using bits. To represent elements of 2k as binary strings we k require the specification of a basis of F k over F . Given a basis e of F k , every element a F k has 2 2 { i}i=1 2 ∈ 2 a unique expansion a = ∑k a e and can be represented as the k-bit string corresponding to κ(a) Fk. i=1 i i ∈ 2 Note that we omitted the subscript 2 of κ as it maps to the linear space over the prime subfield F2. Thus the k binary representation of a F k is defined as the natural binary representation of κ(a) F (which in turn ∈ 2 ∈ 2 is the F2-representation of a). Throughout the paper we freely associate between the binary representation F F of a field element a 2k and its 2-representation, although—technically speaking—these are distinct objects. ∈

Given the representations κ(a), κ(b) of a, b F k , to compute the binary representation of a + b it ∈ 2 suffices to compute the addition bit-wise, modulo 2. Computing the multiplication of elements a, b requires the specification of the multiplication tables F k for the basis . Given representations Kei Mk( 2) i=1 ei k k F{ ∈ } { } κ(a)=(ai)i=1, κ(b)=(bi)i=1 for a, b 2k respectively, the representation κ(ab) of the product ab is computed as ∈ k k

κ(ab)= ∑ ai κ eib = ∑ ai Kei κ(b) . (11) i=1 i=1   F Thus, using our representation for field elements, efficiently performing finite field arithmetic in 2k reduces F F to having access to the multiplication table of some basis of 2k over 2.

30 The following fact provides an efficient deterministic algorithm for computing a self-dual normal basis F F for 2k over 2 and the corresponding multiplication tables for any odd k. Lemma 3.18. There exists a deterministic algorithm that given an odd integer k > 0, outputs a self-dual F F normal basis of 2k over 2 and the multiplication tables of the basis in poly(k) time. Proof. The algorithm of Shoup [Sho90, Theorem 3.2] shows that for prime p, an irreducible polynomial in Fp[X] of degree k can be computed in time poly(p, k). Then, the algorithm of Lenstra [LJ91, Theorem 1.1] F F shows that given such an irreducible polynomial, the multiplication table of a normal basis of pk over p can be computed in poly(k,log p) time. Finally, the algorithm of Wang [Wan89] shows that for odd k and F F a multiplication table K of a normal basis of 2k over 2, a multiplication table K′ for a self-dual normal F F basis of 2k over 2 can be computed in poly(k) time. Putting these three algorithms together yields the claimed statement.

k For q = 2 for k odd (i.e. q is an admissible field size) we use the shorthand tr for trq 2. → k F F Lemma 3.19. Let k be an odd integer and ei i=1 be a self-dual normal basis of 2k over 2. Then { } k tr(e )= 1 for all i, and furthermore the representation κ(1) of the unit 1 F k is the all ones vector in F . i ∈ 2 2 2i F Proof. Since ei is a normal basis, ei = α for some element α 2k . Furthermore, for every element { } 2 ∈ b F k , we have that tr(b )= tr(b). This is because ∈ 2 k 1 k 1 − i+1 − i tr(b2)= ∑ b2 = ∑ b2 = tr(b) , i=0 i=0

2k 2 where we use that b = b for all b F k . Since e = e , we get that tr(e )= tr(e ) for all i, j. It cannot ∈ 2 i+1 i i j be the case that tr(ei) = 0 for all i. Suppose that this were the case. This would imply that tr(b) = 0 for all b F k . But then for all j 1,..., k and for some b = 0, we would also have that b = tr(be )= 0 ∈ 2 ∈ { } 6 j j where b = ∑ b e with b F . This implies that b is the all zero element of F k , which is a contradiction. j j j j ∈ 2 2 Thus tr(ei)= 1 for all i = 1,2,..., k. The “furthermore” part follows from the expansion

k k 1 = ∑ tr(1 ei) ei = ∑ ei . i=1 · i=1

k Lemma 3.20. For any odd integer k, let e denote the self-dual normal basis of F k over F that is { i}i=1 2 2 returned by the algorithm specified in Lemma 3.18 on input k. Then the following can be computed in time poly(k) on input k:

1. The representation κ(a + b) of the sum a + b given the representations κ(a) and κ(b) of a, b F k . ∈ 2

2. The representation κ(ab) of the product ab given the representations κ(a) and κ(b) of a, b F k . ∈ 2

3. The multiplication table K M (F ) given the representation κ(a) of a F k . a ∈ k 2 ∈ 2 1 4. The representation κ(a ) of the multiplicative inverse of a F k , given the representation κ(a). − ∈ 2

5. The trace tr(a) given the multiplication table K of a F k . a ∈ 2

31 Furthermore, for all integers , the representations of projections S and T of Fn for com- n κ(x ) κ(x ) x 2k plementary subspaces of Fn can be computed in time, given the representations∈ , S, T 2k poly(k, n) κ(x) κ(v1), κ(v2),..., κ(vm) and κ(w1), κ(w2),..., κ(wn m) where vi and wj are bases for S and { } { − } { } { } T respectively. Proof. Given an odd integer k as input, by Lemma 3.18 it is possible to compute the self-dual basis e k { i}i=1 together with the multiplication tables Kei for i = 1,2,..., k. Addition is performed component-wise, and multiplication is done using Eq. (11). For the multiplication table Ka it suffices to compute the k products 1 1 κ(aei) for i 1,..., k . To compute inverses, observe that κ(1)= κ(aa− )= Kaκ(a− ). The matrices Ka ∈ { F } 1 1 are invertible over 2, so therefore κ(a− ) = Ka− κ(1); moreover, κ(1) can be computed by Lemma 3.19. Inverting the matrix can be done in poly(k) time via Gaussian elimination. The trace of an element a F k ∈ 2 is by definition the trace of the multiplication table Ka. For the “Furthermore” part, we observe that since v1, v2,..., vm w1, w2,..., wn m forms a basis n { } ∪ { − } for F , there is a unique way to write as a F k linear combination of and . Via Gaussian 2k x 2 vi wj F F { } { } elimination over 2k , the 2-representation of the coefficients of this linear combination can be computed in F poly(n, k) time. Here we use that addition, multiplication and division over 2k can be performed in time poly(k) using the previous items of the Lemma. Remark 3.21. Throughout this paper, whenever we refer to Turing machines that perform computations k with elements of a field Fq for an admissible field size q = 2 , we mean that that the Turing machines are k representing elements of Fq as vectors in 0, 1 using the basis specified by the algorithm of Lemma 3.18 and performing arithmetic as described in{ Lemma} 3.20.

3.4 Polynomials and the low-degree code In this section, we introduce some basic definitions about polynomials over finite fields. An m-variate polynomial f over Fq is a function of the form

α1 αm f (x1,..., xm)= ∑ cαx1 xm α 0,1,...,q 1 m · · · ∈{ − } where c is a collection of coefficients in F . We say that f has individual degree d if c = 0 only if { α} q α 6 α d for all 1 i m, and that it has total degree d if c = 0 only if ∑ α d. (Affine) multilinear i ≤ ≤ ≤ α 6 i i ≤ polynomials are polynomials with individual degree 1. Low-degree polynomials play an important role in this paper. For one, the classical and quantum low- degree tests of Section 7 are nonlocal games that efficiently certify that the players’ answers are consistent with low-degree polynomials. Low-degree polynomials are also crucial in the probabilistically checkable proofs (PCP) construction that are used in “answer reduction” transformation (Section 10). We recall the Schwartz-Zippel lemma:

Lemma 3.22 (Schwartz-Zippel lemma [Sch80, Zip79]). Let f , g : Fm F be two unequal polynomials q → q with total degree at most d. Then Pr [ f (x)= g(x)] d/q . x Fm ≤ ∼ q

The low-degree code. The Schwartz-Zippel lemma implies that the set of low-degree polynomials form an error-correcting code with good distance, which we call the low-degree code.23

23It is also known as the generalized Reed-Muller code in the coding theory literature.

32 Fix an integer m N and let M = 2m. For every y 0, 1 m define the following m-variate ∈ ∈ { } multilinear polynomial over Fq:

indm,y(x)= ∏ x ∏ (1 x ) . i · − i i:yi=1 i:yi=0 Here, we identify 0, 1 as a subset of F . Notice that, when restricted to the subcube 0, 1 m, ind is { } q { } m,y zero everywhere except for when x = y. For a FM, label the coordinates of a as a for y 0, 1 m ∈ q y ∈ { } (identifying the latter set with 1,..., M using, say, the lexicographic ordering on strings). For any such a { } define the multilinear polynomial g : Fm F , called the low-degree encoding of a, as follows: a q → q

ga(x)= ∑ ay indm,y(x) . (12) y 0,1 m · ∈{ } Note that for any y 0, 1 m, g (y)= a . Furthermore, the map a g is linear: for every x Fm, ∈ { } a y 7→ a ∈ q g (x)= a ind (x) , (13) a · m FM where indm(x) is the vector (indm,y(x))y 0,1 m q . ∈{ } ∈ Lemma 3.23. Let q be an admissible field size, and let M = 2m for some integer m. The complexity of computing the evaluation of the low-degree encoding g (x), given a FM and x Fm represented in a ∈ q ∈ q binary, is poly(M,log q). Proof. Computing g (x) requires computing the sum of products a ind (x) over all y 0, 1 m. Via a y · m,y ∈ { } Lemma 3.20, evaluating indm,y(x) takes time poly(m,log q) because it requires performing m multiplica- tions of F elements, and therefore the product a ind (x) requires poly(m,log q) time. Computing the q y · m,y sum over all y 0, 1 m requires M poly(m,log q)= poly(M,log q) time, as claimed. ∈ { } · Finally, for any H F we define the decoding map Dec ( ) which takes as input a polynomial g : ⊆ H · Fm F and returns the following vector a FM: for all y 0, 1 m, if g(y) H, then set a = g(y), q → q ∈ q ∈ { } ∈ y and otherwise set a = 0. Note that for all a HM we have Dec (g ) = a where g is the low-degree y ∈ H a a encoding of a. For notational brevity we often write Dec( ) to denote the boolean decoding map Dec 0,1 ( ). · { } · 3.5 Linear spaces and registers

For a set V, we write CV for the complex vector space of dimension V . The space CV is endowed with a | | canonical orthonormal basis x x V . By “a quantum state on V” we mean a unit vector {| i} ∈ ψ CV . | iV ∈ k F CV k CVi If V = i=1 Vk is the direct sum of subspaces Vk over , then can be identified with i=1 . As a special case, if is a basis of the decomposition = k (F ) yields the tensor product L ei V V i=1 ei N decomposition CV = { k} C F . We sometimes refer to the spaces C F as the “qudits” of CV (or of a state i=1 | | L| | on it). N Definition 3.24. For linear space V over finite field F, define the EPR state on CV CV by ⊗ 1 EPR V = ∑ x x . | i V x V | i ⊗ | i | | ∈ We also write EPR F as EPR and EPR aspEPR . | i q | qi | 2i | i 33 3.6 Measurements and observables Quantum measurements are modeled as positive operator-valued measures (POVMs). A POVM consists of a set of positive semidefinite operators Ma a S indexed by outcomes a S that satisfy the condition { } ∈ ∈ ∑a Ma = I. We sometimes use the same letter M to refer to the collection of operators defining the POVM. The probability that the measurement returns outcome a on state ψ is given by | i Pr(a)= ψ M ψ . h | a| i A POVM M = M is said to be projective if each operator M is a projector (M2 = M ). An observable { a} a a a is a unitary matrix. A binary observable is an observable O such that O2 = I, i.e. O has eigenvalues in 1, 1 . {− We} follow the convention that subscripts of the measurement index the outcome and superscripts of the measurement are used to index different measurements. For example, we use Mx to represent a { a, b} measurement indexed by x whose outcome consists of two parts a and b. In this case, by slightly abusing the notation, we use Mx and Mx to denote { a } { b } x x x x Ma = ∑ Ma, b , Mb = ∑ Ma, b . b a For any x, Mx and Mx are POVMs sometimes referred to as the “marginals” of Mx . { a } { b } { a, b} x Definition 3.25. Let Ma a A be a family of POVMs indexed by x . Let f : A B be an arbitrary { } ∈ ∈ X → function. We write Mx for the POVM derived from Mx by applying the function f before [ f ( )=b] { a } returning the outcome. More· precisely,  x x M[ f ( )=b] = ∑ Ma . · a: f (a)=b x If b is not in the image of f , then we define M[ f ( )=b] to be 0. In cases where the outcome a has a natural · x interpretation as a tuple (a1,..., ak) we sometimes slightly abuse notation and write M for some i [ai=b] ∈ 1,..., k to denote { } Mx = Mx . [ai=b] ∑ (a1,...,ak) (a1,...,ak): ai=b

3.7 Generalized Pauli observables

For prime number p, the generalized Pauli operators over Fp are a collection of observables indexed by a basis setting X or Z and an element a or b of Fp, with eigenvalues that are p-th roots of unity. They are given by σX(a)= ∑ j + a j and σZ(b)= ∑ ωbj j j , (14) j F | ih | j F | ih | ∈ p ∈ p 2πi p where ω = e , and addition and multiplication are over Fp. These observables obey the “twisted commu- tation” relations X Z ab Z X a, b F , σ (a) σ (b)= ω− σ (b) σ (a) . (15) ∀ ∈ p Similarly, over a field Fq we can consider a set of generalized Pauli operators, indexed by a basis setting X or Z and an element of F and with eigenvalues that are p-th roots of unity. For a, b F they are given by q ∈ q τX(a)= ∑ j + a j and τZ(b)= ∑ ωtr(bj) j j , j F | ih | j F | ih | ∈ q ∈ q

34 where addition and multiplication are over Fq. For all W X, Z , a Fq, and b Fp, powers of these observables obey the relation ∈ { } ∈ ∈ b τW(a) = τW (ab) . W p In particular, since pa = 0 for any a Fqwe get that that (τ (a)) = I for any a Fq. The observables obey analogous “twisted commutation”∈ relations to (15), ∈

X Z tr(ab) Z X a, b F , τ (a) τ (b)= ω− τ (b) τ (a) . (16) ∀ ∈ q It is clear from the definition that all of the τX operators commute with each other, and similarly all the τZ operators commute with each other. Thus, it is meaningful to speak of a common eigenbasis for all τX operators, and a common eigenbasis for all τZ operators. The common eigenbasis for the τZ operators is the computational basis. To map this basis to the common eigenbasis of the τX operators, one can apply the Fourier transform 1 tr(ab) F = ∑ ω− a b . (17) √q a,b F | ih | ∈ q Explicitly, the eigenbases consist of the vectors eW labeled by an element e Fq and W X, Z , given by | i ∈ ∈ { } 1 tr(ej) eX = ∑ ω− j , eZ = e . | i √q j | i | i | i We denote the POVM whose elements are projectors onto basis vectors of the eigenbasis associated with the observables τW by τW . Then for all W X, Z and a F , the observables τW(a) can be written as { e }e ∈ { } ∈ q W tr(ab) W τ (a)= ∑ ω τb . (18) b F ∈ q This relation can be inverted as

tr(ab) W tr(a(b b)) W W E ω− τ (a)= ∑ E ω ′− τ = τ , (19) F F b′ b a q b F a q ∈ ′∈ q ∈ where the second step follows from Lemma 8.4. For systems with many qudits, we will consider tensor products of the operators τW. Slightly abusing Fn W W W notation, for W X, Z and a q we denote by τ (a) the tensor product τ (a1) ... τ (an). These obey the twisted∈ { commutation} ∈ relations ⊗ ⊗

n X Z tr(a b) Z X a, b F , τ (a) τ (b) = ω− · τ (b) τ (a) , ∀ ∈ q where a b = ∑n a b F . For W X, Z and e Fn define the eigenstates · i=1 i i ∈ q ∈ { } ∈ q e = (e ) ... (e ) , | W i | 1 Wi⊗ ⊗ | n W i W and associated rank-1 projectors τe . k Since we only consider finite fields Fq such that q = 2 the maximally entangled state EPRq and the corresponding qudit Pauli observables/projectors are isomorphic to a tensor product of maximally| entangledi states EPR2 and qubit Pauli observables/projectors respectively; this is shown in the next lemma. This is used to| arguei that the Pauli basis test (described in Section 7.3) gives a self-test for Pauli observables and maximally entangled states over qubits.

35 Lemma 3.26. For all admissible field sizes q = 2k and integers L, there exists an isomorphism φ : q L 2 Lk (C )⊗ (C )⊗ such that → L Lk φ φ EPR ⊗ = EPR ⊗ , (20) ⊗ | qi | 2i and for all W X, Z and for all u FL ∈ { } ∈ q L k W = † W (21) τu φ σuij φ . Oi=1 Oj=1  Here, the (u ) denotes a vector of F values such that u = ∑ u e for all i 1,..., L with ij i,j 2 i j ij j ∈ { } e ,..., e being the self-dual basis of F over F specified by Lemma 3.18. For i 1,..., L and { 1 k} q 2 ∈ { } j 1,..., q the (i, j)-th factor σW denotes the projector 1 I +( 1)uij σW (1) acting on the s-th qubit ∈ { } uij 2 − of EPR Lk, where s =(i 1)k + j. | 2i⊗ −  Proof. Since q = 2k is an admissible field size, there exists a self-dual basis e ,..., e of F over F . { 1 k} q 2 Define the isometry θ : Cq (C2) k as θ a = a a a where κ(a)=(a , a ,..., a ) Fk → ⊗ | i | 1i ⊗ | 2i···| ki 1 2 k ∈ 2 is the bijection introduced in Section 3.3 corresponding to the basis e ,..., e . { 1 k} Let a F , and let κ(a)=(a ,..., a ) Fk. Then from (19), we get ∈ q 1 k ∈ 2 W tr(ab) W τa = E ( 1) τ (b) b F − ∈ q = E ( 1)tr(∑j ajejb) τW(b) b F − (22) ∈ q

∑j ajbj W = E ( 1) τ ∑ bjej , b F − ∈ q j ! where b = ∑ b e ; since the basis e is self-dual, we have that b = tr(be ). From (22) we get that j j j { j} j j k W ajbj W τ = E ∏( 1) τ (bjej) a F b1,..., bk 2 j=1 − ∈ (23) k ajbj W = ∏ E ( 1) τ (bjej) . b F − j=1 j∈ 2

W † W,j W,j Next we claim that for all c F2, we have τ (cej) = θ σ (c)θ where σ (c) = I if c = 0, and ∈ 2 k otherwise is the Pauli W observable acting on the j-th qubit of (C )⊗ . This can be verified by comparing the actions of both operators on the basis states of Cq. Thus we obtain that the right-hand side of (23) is equal to

k k ajbj † W,j † ajbj W ∏ E ( 1) θ σ (bj)θ = θ E ( 1) σ (bj) θ (24) F F = bj 2 − bj 2 − j 1 ∈ h i Oj=1  ∈  k = θ† σW θ. (25) bj  Oj=1 

36 Define = L. The projector W can be decomposed as the tensor product L W where W acts on φ θ⊗ τu i=1 τui τui the i-th factor of (Cq) L. Express each u as ∑ u e where u F . Then from Equation (25) we get that ⊗ i j ij j ij ∈ 2 N

L L k τW = τW = φ† σW φ, (26) u ui  uij  Oi=1 Oi=1 Oj=1   which establishes Equation (21). We observe that (20) follows immediately from (21).

37 4 Conditionally Linear Functions, Distributions, and Samplers

4.1 Conditionally linear functions and distributions We first introduce conditionally linear functions, which are used to specify the question distribution for games considered in the paper in a way that the question distribution can be “introspected”, as described in Section 8. Intuitively, a conditionally linear function takes as input an element x V = Fn for some ∈ n 0, and applies linear maps L sequentially on xVj where V , V ,... are a sequence of complementary ≥ j 1 2 register subspaces such that both the linear maps Lj and the subspace Vj may depend on the values taken by V V2 previous linear maps L1(x 1 ), L2(x ), etc. In the remainder of the section we use V to denote the linear space Fn for some integer n 0. For ≥ ease of notation we extensively use the subscript range notation. For example, if V1, V2,..., Vℓ are fixed subspaces of V and k 1,2,..., ℓ we write ∈ { } Vk = Vj , j : 1 jk M≤ M≥ and it is understood that V k and V k are identical to Vk 1, respectively. Moreover, if V′ is a ≤ ≥ − F V register subspace of V, F : V′ V′ a linear map, and x V, we write x to denote F(x ′ ). For example, L → ∈ V in the following definition x 1 is used as shorthand notation for L1(x 1 ). Definition 4.1. Let V be Fn for some n 0. For all integers ℓ 0 the collection of ℓ-level conditionally ≥ ≥ linear functions (implicitly, on V) is defined inductively as follows. 1. There is a single 0-level conditionally linear function, which is the 0 function on V.

2. Let ℓ 1 and suppose the collection of (ℓ 1)-level conditionally linear functions has been defined. ≥ − The collection of ℓ-level conditionally linear functions on V consists of all functions L on V that can be expressed in the following form. There exist complementary register subspaces V1 and V>1 of V, a linear function L on V , and for all v L (V ), an (ℓ 1)-level conditionally linear function L> 1 1 ∈ 1 1 − 1, v on V> , such that for all x V, 1 ∈ L1 V>1 L(x)= x + L>1, xL1 (x ) .

Remark 4.2. Note that for any integer ℓ 1 the collection of ℓ-level CL functions trivially contains the ≥ collection of (ℓ 1)-level CL functions: for this it suffices to note that the 0 function, which is a 0-level − CL function, is also a 1-level CL function by setting V = V, V> = 0 , L (x) = 0 for all x V, and 1 1 { } 1 ∈ L> L is the 0 map for all x V. 1, x 1 ∈ Definition 4.3. Let L, R : V V be conditionally linear functions. The conditionally linear distribution → µ corresponding to (L, R) is defined as the distribution over pairs (L(x), R(x)) V V for x drawn L,R ∈ × uniformly at random from V. Throughout the paper we abbreviate “conditionally linear functions” and “conditionally linear distribu- tions” as CL functions and CL distributions, respectively. The following lemma elucidates structural properties of ℓ-level CL functions. Recall that using our L

38 (i) For each k 1,2,..., ℓ , a function L k : V V called the k-th marginal of L; ∈ { } ≤ → (ii) For each k 1,2,..., ℓ and u L< (V), a register subspace V of V called the k-th factor ∈ { } ∈ k k, u space with prefix u;

(iii) For each k 1,2,..., ℓ and u L< (V), a linear map L : V V called the k-th linear ∈ { } ∈ k k, u k, u → k, u map of L with prefix u; such that the following conditions hold for all k 1,2,..., ℓ . ∈ { } 1. L k is a k-level CL function on V; ≤ ℓ 2. V = V L< for all x V; i=1 i, x i ∈ L k Li 3. L k(x)= ∑i=1 x for all x V, where Li is shorthand notation for L L

As in Item 3, we sometimes use Vk and Lk to denote Vk, u and Lk, u respectively, leaving the prefix u implicit. Proof. We first prove the “if” direction: if there exist spaces and functions satisfying the conditions in the lemma, the fact that L is an ℓ-level CL function follows from Items 1 and 4 of the lemma statement. We now prove the “only if” direction. Given a CL function L on V, we construct the k-th family of subspaces and functions for all k 1,..., ℓ by induction on the level ℓ. First consider the base case ∈ { } ℓ = 1. Since L<1 = 0, we omit the mentioning of the prefix u L<1(V). Define L 1 = L, the factor ∈ ≤ space V1 = V, and the linear map L1 = L. It is straightforward to verify that the conditions in the lemma hold for these choices of linear maps and spaces. Now, assume that the lemma holds for CL functions of level at most ℓ 1, and we prove the lemma for − ℓ-level CL functions. By definition, an ℓ-level CL function L can be written as

L1 V>1 L(x)= x + L>1, xL1 (x ) for some linear map L : V V and a family of (ℓ 1)-level CL functions 1 1 → 1 − > > > L 1, v : V 1 V 1 v L (V ) → ∈ 1 1  where V1 and V>1 are complementary register subspaces of V. Next, using the inductive hypothesis on the (ℓ 1)-level CL function L> we get that for all v L (V ) and all k 1,2,..., ℓ 1 there exist k-th − 1, v ∈ 1 1 ∈ { − } marginal functions L : V> V> , k-th factor spaces V , and k-th linear maps L of L> with v′ , k 1 → 1 v′, k, u v′ , k, u 1, v prefix u L (V> )≤such that the conditions of the lemma for L> hold. ∈ v′ , 1 L k : x x + L′ L < (x ) for x V; (27) ≤ 7→ x 1 , k ∈ ℓ (iii) For all k 2,3,..., and u L1 .

39 We verify that the conditions of the lemma are satisfied. Since L is by assumption a (k 1)-level CL v′ , 1, we get that L k is a k-level CL function on V from Eq. (27), establishing Item 1. By the ≤ induction hypothesis, we have for all v L (V ) and y V> , ∈ 1 1 ∈ 1 ℓ 1 − L V>1 = Vv′, i, y v′ , 1 L k(x)= v + Lv′ , 1, v with prefix Lv′ , 1, x 1 ≤ − by the inductive hypothesis. This shows that L k, Vk, u, and Lk, u satisfy the conditions of the lemma and completes the induction. ≤

We note that the marginal functions, factor spaces, and linear maps of a given CL function L may not be unique; for example, consider the identity function on a linear space V = Fn. This is clearly a 1-level CL function, but it can also be viewed as a k-level CL function for k 2,..., n with an arbitrary partition of ∈ { } V into factor spaces. Lemma 4.5. Let ℓ, k 0 be integers and U = Fn, V = Fm be linear spaces. Suppose L is a k-level CL ≥ function on U and R is an ℓ-level CL function on V for each u L(U). Then the concatenation T of L u ∈ and Ru u defined as { } U V T(x)= L(x )+ RL(xU)(x ) is a (k + ℓ)-level conditionally linear function on U V. ⊕ Proof. We prove the claim by induction on k. The case k = 0 follows from the Definition 4.1. Assume that the lemma holds for L being at most (k 1)-level. By Definition 4.1, there are complementary register − subspaces U and U> of U, a linear function L on U , and a family of (k 1)-level CL functions L> 1 1 1 1 − 1, v for v L1(U1) such that ∈ U L1 U>1 L(x )= x + L>1, xL1 (x ). L For all x 1 , define function T> L on U> V as 1, x 1 1 ⊕

U>1 V U>1 V T>1, xL1 (x ⊕ )= L>1, xL1 (x )+ RxL1 +xL>1 (x ),

40 the concatenation of L>1, xL1 and RL(xU) xL>1 where L>1 is the shorthand notation of L>1, xL1 . By the { ℓ } induction hypothesis, T>1, xL1 is (k + 1)-level conditionally linear. The lemma follows from Defini- tion 4.1. −

Lemma 4.6 (Direct sums of CL functions). Let V(1), V(2),..., V(m) be register subspaces of V such that V = m V(j). Suppose that, for each j 1,2,..., m , L(j) is an ℓ -level conditionally linear function j=1 ∈ { } j (j) m (j) ℓ ℓ ℓ on V L. Then the direct sum L = j=1 L is an -level CL function over V for = maxj j , where L is defined by { } L m L(x)= ∑ L(j)(x(j)) j=1 for all x = ∑m x(j) m V(j). j=1 ∈ j=1 Proof. It is easy to see thatL an ℓ-level CL function is also k-level conditionally linear for all k ℓ. Hence, ≥ it suffices to prove the claim where ℓj = ℓ for j = 1,2,..., m. We prove the theorem by an induction on ℓ. For ℓ = 1, the functions L(j) are linear and the claim follows by the fact that the direct sum of linear maps is linear. Assume now the theorem holds for conditionally linear functions of level at most ℓ 1 and L(j) are − ℓ-level conditionally linear functions for j = 1,2,,..., m. By definition, L(j) is the concatenation of condi- (j) (j) (j) (j) tionally linear functions L on V and L> v on V> of levels 1, and ℓ 1 respectively. Furthermore, 1 1 { 1, vj } j 1 −

m m (j) (j) (j) (j) (j) V L(x)= L (x )= v + L (x ) >1 , ∑ ∑ j >1, vj j=1 j=1  

(j) (j) (j) V1 where vj = L1 (x ) . By the induction hypothesis,   m (j) m (j) V (j) (j) V V> (j) (j) V L (x 1 )= L (x ) 1 , L> (x 1 )= L (x ) >1 1 ∑ 1 1, v ∑ >1, vj j=1   j=1  

m (j) are 1-level and (ℓ 1)-level conditionally linear respectively for v = ∑ v , V = V , and V> = − j j 1 j=1 1 1 m (j) ℓ j=1 V>1. This proves that L is -level conditionally linear. L L Lemma 4.7. For each i 1,..., m let L(i), R(i) : V(i) V(i) be ℓ -level conditionally linear functions ∈ { } → i and let L, R : V V be the direct sums L = i Li and R = i Ri, respectively, as defined in Lemma 4.6. → m Then the conditionally linear distribution µ is the product distribution ∏ µ (i) (i) over V V. L,RL L i=1 L , R × Proof. The distribution µL,R is the distribution over pairs (L(x), R(x)) where x is sampled uniformly from V V. By Lemma 4.6, this is equivalent to the distribution over pairs (L (xVi ))m , (R (xVi ))m where x × i i=1 i i=1 is chosen uniformly at random from V. This distribution is exactly the product of the distributions µ (i) (i)  L , R for i = 1,2,..., m. F CL functions used in the paper are frequently defined over a “large” field 2t (e.g., the CL functions used in the low degree tests of Section 7). However, the introspection protocol in Section 8 handles CL functions defined over F2. The following definition and lemma show that CL functions over prime power fields can be viewed as CL functions over the prime field via a “downsizing” operation.

41 Fn t Definition 4.8 (Downsizing CL functions). Let V = q be a linear space for a prime power q = p . Let L : V V be a function. Let κ( ) denote the downsize map from Section 3.3 corresponding to the basis → · e ,..., e of F over F specified by Lemma 3.18. In particular, κ is linear over F , and by Lemma 3.16, { 1 t} q p p the set κ(V) is the linear space Fnt. Define the downsized function Lκ : κ(V) κ(V) by Lκ = κ L κ 1. p → ◦ ◦ − Lemma 4.9. Let V = Fn for a prime power q = pt. Let L : V V be an ℓ-level CL function over V for q → some integer ℓ 0. Let L j, Vj, u, and Lj, u denote the j-th marginal functions, factor spaces, and linear ≥ ≤ maps corresponding to L as guaranteed by Lemma 4.4. Then Lκ : κ(V) κ(V) is an ℓ-level CL function κ Fnt κ κ → κ on V = κ(V)= p with marginal functions L j, factor spaces Vj, v, and linear maps Lj, v that satisfy the ≤ following for all j 1,..., ℓ . ∈ { } κ κ 1 1. The j-th marginal function L j of L is equal to κ L j κ− . ≤ ◦ ≤ ◦ 2. For all u L< (V), the j-th factor space Vκ and the j-th linear map Lκ of Lκ are equal to ∈ j j, κ(u) j, κ(u) κ(V ) and κ L κ 1 respectively. j, u ◦ j, κ(u) ◦ − Proof. We prove the lemma by induction on ℓ. Let L : V V be an ℓ-level CL function. For the base case ℓ →F Ft F = 1, observe that since κ is a linear bijection between q and p as linear spaces over p, the function κ Fnt L is linear, and thus a 1-level CL function over κ(V) = p . Furthermore, the first marginal function κ κ 1 κ κ 1 L 1 = L = κ L 1 κ− ; the factor space V1 = κ(V1)= κ(V), and L1 = L = κ L1 κ− . ≤ ◦ ≤ ◦ ◦ ◦ Assume that the statement of the lemma holds for some ℓ 1 1. Let L j, Vj, u, and Lj, u denote − ≥ ≤ the marginal functions, factor spaces, and linear maps corresponding to L as guaranteed by Lemma 4.4. Recursively define the following functions and spaces, for j 1,..., ℓ . ∈ { } κ 1 1. L j = κ L j κ− . ≤ ◦ ≤ ◦ 2. For all u L< (V), set Vκ = κ(V ) and set Lκ = κ L κ 1. ∈ j j, κ(u) j, u j, κ(u) ◦ j, κ(u) ◦ − κ κ κ κ We argue that L j , Vj, v , and Lj, v satisfy the conditions of Lemma 4.4 for the function L , which { ≤ } { } { } implies that Lκ is an ℓ-level CL function over κ(V). We first establish Item 4 of Lemma 4.4. Since L ℓ = L, this implies ≤ κ 1 1 κ L = κ L κ− = κ L ℓ κ− = L ℓ, ◦ ◦ ◦ ≤ ◦ ≤ as desired. Next, for all j 1,2,..., ℓ , for all y κ(V) with y = κ(x) for some x V, letting ∈ { } ∈ ∈ uj = L

L< L< κ L κ(V)= κ Vj, x j = κ Vj, x j = Vj, κ(x 1 such that V = V1 V>1, ℓ ⊕ a linear map L1 : V1 V1 and a collection of ( 1)-level CL functions L>1, v : V>1 V>1 v L (V ) → − { → } ∈ 1 1 L1 V>1 κ κ such that L(x) = x + L>1, xL1 (x ) for all x V. Observe that L1 is a 1-level CL function on V1 = κ κ ∈ κ κ(V1), and for v′ = κ(v) L (V ), the inductive hypothesis implies the function L> is an (ℓ 1)-level ∈ 1 1 1, v′ − CL function on Vκ = κ(V> ). Furthermore, since Lκ = Lκ, we have that for all y κ(V) with y = κ(x) >1 1 ℓ ∈ for some x V, ≤ ∈ κ κ κ κ κ V>1 L ℓ(y)= L (y)= L1(y)+ L>1, Lκ(y)(y ) ≤ 1 κ which implies that L ℓ is an ℓ-level CL function over κ(V)= κ(V1) κ(V>1). This establishes Item 1 of Lemma 4.4, and completes≤ the induction. ⊕

Lemma 4.10. Let V = Fn for some integer n and prime power q = pt. Let L, R : V V be CL functions. q → Let Lκ, Rκ : κ(V) κ(V) be the associated downsized CL functions, as defined in Definition 4.8. Then → the distribution µ κ κ over κ(V) κ(V) defined in Definition 4.3 is identical to the distribution of (x, y) L ,R × ∈ κ(V) κ(V) obtained by first sampling (x , y ) according to µ and then returning (κ(x ), κ(y )). × ′ ′ L,R ′ ′ Proof. The fact that Lκ, Rκ are well-defined CL functions follows from Lemma 4.9. The lemma is immedi- ate from the definition of µLκ,Rκ and the fact that κ is a bijection.

4.2 Conditionally linear samplers Samplers are Turing machines that perform computations corresponding to CL functions defined in Sec- tion 4.1. The inputs and outputs of the sampler are binary strings that are interpreted as representing data Fs of different types (integers, bits, vectors in q, etc.). See Section 3.3.2 and in particular Remark 3.21 for an in-depth discussion of representing structured objects on a Turing machine.

Definition 4.11. A function q : N N is an admissible field size function if for all n N, q(n) is an admissible field size as defined in Definition→ 3.17. ∈

Definition 4.12 (Conditionally linear samplers). Let q : N N be an admissible field size function, and → let s : N N be a function. A 6-input Turing machine is a ℓ-level conditionally linear sampler with → S field size q(n) and dimension s(n) if for all n N, letting q = q(n) and s = s(n), there exist ℓ-level CL ∈ w, n w, n functions LA, n, LB, n : Fs Fs with marginal functions L and factor spaces V for w A, B q → q { j } { j, u } ∈ { } satisfying the conditions of Lemma 4.4, such that for all w ≤ A, B , j 1,..., ℓ , z Fs : ∈ { } ∈ { } ∈ q On input (n, DIMENSION), the sampler outputs the dimension s(n). • S MARGINAL w, n On input (n, w, , j, z), the sampler outputs the binary representation of L j (z), • S ≤ w, n On input (n, w, LINEAR, j, u, y), the sampler outputs the binary representation of Lj, u (y), where u • w, n S is interpreted as an element of V

43 On input (n, w, FACTOR, j, u), the sampler outputs the j-th factor space Vw, n of Lw, n with prefix • S j, u w, n s Fs u L

TIMEκ( )(n)= O TIME (n) log q(n) . S S Furthermore, for every integer n N the CL functions of κ( ) on indexn are (LA, n)κ and (LB, n)κ, where ∈ S LA, n, LB, n are the CL functions of on index n. S 44 Proof. To show that κ( ) is an ℓ-level CL sampler we first show the “Furthermore” part, i.e. verify that S for any integer n 1 the CL functions (LA, n)κ and (LB, n)κ are its associated CL functions on index n, as defined in Definition≥ 4.12. Observe that for z V, the binary representation of z as an element of 0, 1 s log q passed as input to ∈ { } is, by definition (see Section 3.3.2), identical to the binary representation of κ(z). Using the definition S (Lw, n)κ = κ Lw, n κ 1 for w A, B this justifies that κ( ) returns the correct output when executed ◦ ◦ − ∈ { } S on inputs of the form (n, DIMENSION), (n, w, MARGINAL, j, z) and (n, w, LINEAR, j, u′, y). Next, if T is a register subspace of Fs with indicator vector C 0, 1 s, then κ(T) is a register subspace q ∈ { } Fs log q of 2 with indicator vector D defined from C as in Definition 4.15. Thus the output of κ( ) on input w, n S FACTOR w, n (n, w, , j, u′) is equal to the indicator vector of κ(Vj, u ), which is the j-th factor space of L with prefix u′ = κ(u). The time complexity of κ( ) is the same as with the sampler , except it takes O(log q(n)) times longer to output the factor spaceS indicator vectors. S

45 5 Nonlocal Games

We introduce definitions associated with nonlocal games and strategies that will be used throughout.

5.1 Games and strategies Definition 5.1 (Two-player one-round games). A two-player one-round game G is specified by a tuple ( , , , , µ, D) where X Y A B 1. and are finite sets (called the question alphabets), X Y 2. and are finite sets (called the answer alphabets), A B 3. µ is a probability distribution over (called the question distribution), and X × Y 4. D : 0, 1 is a function (called the decision predicate). X×Y×A×B→{ } Definition 5.2 (Tensor product strategies). A tensor product strategy S for a game G =( , , , , µ, D) X Y A B is a tuple ( ψ , A, B) where | i ψ is a pure quantum state in for finite dimensional complex Hilbert spaces , , • | i HA ⊗HB HA HB x x x A is a set A such that for every x , A = Aa a is a POVM over A, and • { } ∈X { } ∈A H y y y B is a set B such that for every y , B = B b is a POVM over B. • { } ∈ Y { b } ∈B H Definition 5.3 (Tensor product value). The tensor product value of a tensor product strategy S =( ψ , A, B) | i with respect to a game G =( , , , , µ, D) is defined as X Y A B G S x y val∗( , )= ∑ µ(x, y) D(x, y, a, b) ψ Aa Bb ψ . x, y, a, b h | ⊗ | i

For v [0, 1] we say that the strategy S passes (or wins) G with probability v if val∗(G, S ) v. The tensor∈ product value of G is defined as ≥

val∗(G)= sup val∗(G, S ) , S where the supremum is taken over all tensor product strategies S for G. Remark 5.4. Unless specified otherwise, all strategies considered in this paper are tensor product strate- gies, and we simply call them strategies. Similarly, we refer to val∗(G) as the value of the game G. Definition 5.5 (Projective strategies). We say that a strategy S = ( ψ , A, B) is projective if all the mea- y | i surements Ax and B are projective. { a }a { b }b Remark 5.6. A game G = ( , , , , µ, D) is symmetric if the question and answer alphabets are X Y A B the same for both players (i.e. = and = ), the distribution µ is symmetric (i.e. µ(x, y) = X Y A B µ(y, x)), and the decision predicate D treats both players symmetrically (i.e. for all x, y, a, b, D(x, y, a, b)= D(y, x, b, a)). Furthermore, we call a strategy S = ( ψ , A, B) symmetric if ψ is a state in , for some Hilbert space , that is invariant under permutation| i of the two factors,| i and the measurementH⊗H operators of both playersH are identical. We specify symmetric games G and symmetric strategies S using a more compact notation: we write G = ( , , µ, D) and S = ( ψ , M) where M denotes the set of measurement operators for both players. X A | i

46 Lemma 5.7. Let G =( , , , , µ, D) be a symmetric game such that val∗(G)= 1 ε for some ε 0. X Y A B − ≥ Then there exists a symmetric and projective strategy S =( ψ , M) such that val∗(G, S ) 1 2ε. | i ≥ − Proof. By definition there exists a strategy S ′ = ( ψ′ , A, B) such that val∗(G, S ′) 1 2ε. Using Naimark’s theorem (for example as formulated in [NW19| i , Theorem 4.2]) we can assume≥ without− loss of d d x y generality that ψ′ C C for some integer d and that for every x and y, A and B is a projective A′ B′ measurement. Let| i ∈ ⊗

1 C2 Cd C2 Cd ψ = 0 A 1 B ψ′ A B + 1 A 0 B ψτ′ A B ( A ) ( B ) , | i √2 | i | i | i ′ ′ | i | i | i ′ ′ ∈ ⊗ A′ ⊗ ⊗ B′  where ψ is obtained from ψ by permuting the two players’ registers. Observe that ψ is invariant under | τ′ i | i | i permutation of AA′ and BB′. x x For any question x = , define the measurement M = Ma a acting on the Hilbert space C2 Cd as follows: ∈ X Y { } ∈A ⊗ Mx = 0 0 Ax + 1 1 Bx . a | ih |⊗ a | ih |⊗ a x When Alice receives question x, she measures M on registers AA′, and when Bob receives question y, he y measures M on registers BB′. Using that by assumption the decision predicate D for G is symmetric, it is not hard to verify that val∗(G, S )= val∗(G, S ′). Definition 5.8. Let G = ( , , , , µ, D) be a game, and let S = ( ψ , A, B) be a strategy for G such X Y A B | i that the spaces canonically. Let S denote the support of the question distribution µ, HA ≃ HB ⊆X×Y i.e. the set of (x, y) such that µ(x, y) > 0. We say that S is a commuting strategy for G if for all question x y pairs (x, y) S, we have [Aa , Bb ] = 0 for all a , b , where [A, B] = AB BA denotes the commutator.∈ ∈ A ∈ B −

Definition 5.9 (Consistent measurements). Let be a finite set, let ψ a state, and Ma a a A | i∈H⊗H { } ∈A projective measurement on . We say that Ma a is consistent on ψ if and only if H { } ∈A | i a , M IB ψ = IA M ψ . ∀ ∈A a ⊗ | i ⊗ a | i Definition 5.10 (Consistent strategies). Let S = ( ψ , A, B) be a projective strategy with state ψ , for some Hilbert space , which is defined on| i question alphabets and and answer alphabets| i ∈ H⊗H H S X Y x and , respectively. We say that the strategy is consistent if for all x , the measurement Aa a A B y ∈X { } ∈A is consistent on ψ and if for all y , the measurement B b is consistent on ψ . | i ∈ Y { b } ∈B | i Definition 5.11. We say that a strategy S for a game G is PCC if it is projective, consistent, and commuting for G. Additionally, we say that a PCC strategy S is SPCC if it is furthermore symmetric. Remark 5.12. A strategy S = ( ψ , A, B) for a symmetric game G = ( , , , , µ, D) is called | i x x X X A A synchronous if it holds that for every x and a = b , ψ Aa Bb ψ = 0; in other words, the players never return different answers when∈ X simultaneousl6 y∈ asked A h the| same⊗ question.| i As shown in [PSS+16] the condition for a finite-dimensional strategy of being synchronous is equivalent to the condition that it is projective, consistent, and moreover ψ is a maximally entangled state. (The equivalence is extended to infinite-dimensional strategies, as well| i as correlations induced by limits of finite-dimensional strategies, in [KPS18].) Definition 5.13 (Entanglement requirements of a game). For all games G and ν [0, 1], let E (G, ν) denote ∈ the minimum integer d such that there exists a finite dimensional tensor product strategy S that achieves success probability at least ν in the game G with a state ψ whose Schmidt rank is at most d. If there is no | i finite dimensional strategy that achieves success probability ν, then define E (G, ν) to be ∞.

47 5.2 Distance measures We introduce several distance measures that are used throughout.

Definition 5.14 (Distance between states). Let ψn n N and ψn′ n N be two families of states in the {| i} ∈ {| i} ∈ same space . For some function δ : N [0, 1] we say that ψ and ψ are δ-close, denoted as H → {| ni} {| n′ i} ψ ψ , if ψ ψ 2 = O(δ(n)). (For convenience we generally leave the dependence of the | i ≈δ | ′i k| ni − | n′ ik states and δ on the indexing parameter n implicit.)

Definition 5.15 (Consistency between POVMs). Let be a finite set and µ a distribution on . Let X X ψ be a quantum state, and for all x , Ax and Bx POVMs. We write | i∈HA ⊗HB ∈X { a } { a } x x A IB IA B a ⊗ ≃δ ⊗ a on state ψ and distribution µ if | i E x x ∑ ψ Aa Bb ψ O(δ) . x µ h | ⊗ | i≤ ∼ a=b 6 In this case, we say that Ax and Bx are δ-consistent on ψ . { a } { a } | i Note that a consistent measurement according to Definition 5.9 is 0-consistent with itself, under the singleton distribution, according to Definition 5.15 (and vice-versa).

Definition 5.16 (Distance between POVMs). Let be a finite set and µ a distribution on . Let ψ X X | i∈H be a quantum state, and for all x , Mx and Nx two POVMs on . We say that Mx and Nx ∈ X { a } { a } H { a } { a } are δ-close on state ψ and under distribution µ if | i x x 2 E ∑ (Ma Na ) ψ O(δ) , x µ k − | ik ≤ ∼ a x x and we write Ma δ Na to denote this when the state ψ and distribution µ are clear from context. This distance is referred≈ to as the state-dependent distance. | i

Definition 5.17 (Distance between strategies). Let G = ( , , , , µ, D) be a nonlocal game and let X Y A B S =(ψ, A, B), S ′ =(ψ′, A′, B′) be strategies for G. For δ [0, 1] we say that S is δ-close to S ′ if the following conditions hold. ∈

1. The states ψ , ψ are states in the same Hilbert space and are δ-close. | i | ′i HA ⊗HB 2. For all x , y , we have Ax (A )x and By (B )y, with the approximations holding ∈ X ∈ Y a ≈δ ′ a b ≈δ ′ b under the distribution µ, and on either ψ or ψ . | i | ′i We record several useful facts about the consistency measure and the state-dependent distance without proof. Readers are referred to Sections 4.4 and 4.5 in [NW19] for additional discussion and proofs.

Fact 5.18 (Fact 4.13 and Fact 4.14 in [NW19]). For POVMs Ax and Bx , the following hold. { a } { a } x x x x 1. If A IB IA B then A IB IA B . a ⊗ ≃δ ⊗ a a ⊗ ≈δ ⊗ a x x x x x x 2. If A IB IA B and A and B are projective measurements, then A IB IA B . a ⊗ ≈δ ⊗ a { a } { a } a ⊗ ≃δ ⊗ a

48 x x x x x 3. If Aa IB δ IA Ba and either Aa or Ba is a projective measurement, then Aa IB δ1/2 ⊗x ≈ ⊗ { } { } ⊗ ≃ IA B . ⊗ a Fact 5.19 (Fact 4.20 in [NW19]). Let , , be finite sets, and let D be a distribution over question x x A B C pairs (x, y). Let Aa,b and Ba,b be operators whose outcomes range over the product set . { } y { } A × B Suppose a set of operators Ca,c , whose outcomes range over the product set , satisfies the condition y y { } A×C ∑ (C )†C I for all y. If Ax Bx on average over x sampled from the corresponding marginal a,c a,c a,c ≤ a,b ≈δ a,b of distribution D, then Cy Ax Cy Bx on average over (x, y) sampled from D. a,c a,b ≈δ a,c a,b Proof. Fix questions x, y and answers a , b . We have then that ∈A ∈ B y x y x 2 x x † y † y x x ∑ (Ca,c Aa,b Ca,cBa,b) ψ = ∑ ψ (Aa,b Ba,b) (Ca,c) (Ca,c)(Aa,b Ba,b) ψ (31) c − | i c h | − − | i

ψ (Ax Bx )†(Ax Bx ) ψ (32) ≤h | a,b − a,b a,b − a,b | i = (Ax Bx ) ψ 2 (33) a,b − a,b | i y † y y † y where the inequality follows from the fact that ∑c(Ca,c) Ca,c ∑a,c(Ca,c) Ca,c I. Thus we obtain the desired conclusion ≤ ≤

E y x y x 2 E x x 2 ∑ (Ca,c Aa,b Ca,cBa,b) ψ ∑ (Aa,b Ba,b) ψ δ. (x,y) D a,b,c − | i ≤ (x,y) D a,b − | i ≤ ∼ ∼

Fact 5.20. Let , denote finite sets, and let denote a set of functions g : . Let Ax , Bx X A G X →A { a } { a } be POVMs indexed by and outcomes in . Let Sx denote a set of operators such that for all x , X A { g } ∈ X ∑ (Sx)†Sx I. If Ax Bx on average over x, then Sx Ax Sx Bx . g g g ≤ a ≈δ a g g(x) ≈δ g g(x) Proof. We expand:

2 E x x x E x x † x † x x x ∑ Sg(Ag(x) Bg(x)) ψ = ∑ ψ (Ag(x) Bg(x)) (Sg) Sg(Ag(x) Bg(x)) ψ x − | i x h | − − | i g g x x † x † x x x = E ∑ ψ (Aa Ba ) ∑ (Sg) Sg (Aa Ba ) ψ x a h | − − | i  g:g(x)=a  x x † x x E ∑ ψ (Aa Ba ) (Aa Ba ) ψ ≤ x a h | − − | i x x 2 = E ∑ (Aa Ba ) ψ . x a k − | ik

x † x x † x The inequality follows from the fact that ∑g:g(x)=a(Sg) Sg ∑g(Sg) (Sg) I. The last line is at most δ by assumption, and we obtain the desired conclusion. ≤ ≤

x x Lemma 5.21. Let be a finite set. Let Aa a be a projective measurement and let Ba a be a set of A { } ∈A { } ∈A matrices. If Ax Bx, then for all subsets S , we have a ≈δ a ⊆A x x x ∑ Aa δ ∑ Aa Ba . a S ≈ a S · ∈ ∈

49 Proof. We expand:

2 2 x x x x x x E ∑ Aa Aa Ba ψ = E ∑ Aa Aa Ba ψ x x a S − · | i a S · − | i ∈   ∈   x x † x x x = E ∑ ψ (Aa Ba ) Aa (Aa Ba ) ψ x a Sh | − − | i ∈ x x 2 E ∑ (Aa Ba ) ψ . ≤ x a k − | ik In the second line we used the projectivity of Ax , and in the third line we used that Ax I. { a } a ≤ Fact 5.22 (Triangle inequality, Fact 4.28 in [NW19]). If Ax Bx and Bx Cx, then Ax Cx. a ≈δ a a ≈ǫ a a ≈δ+ǫ a + x x x Fact 5.23 (Triangle inequality for “ ”, Proposition 4.29 in [JNV 20]). If Aa IB ε IA Ba , Ca IB δ x x x≃ x x ⊗ ≃ ⊗ ⊗ ≃ IA B , and C IB IA D , then A IB IA D . ⊗ a a ⊗ ≃γ ⊗ a a ⊗ ≃ε+2√δ+γ ⊗ a x x x Fact 5.24 (Data processing, Fact 4.26 in [NW19]). Suppose A IB IA B . Then A IB a ⊗ ≃δ ⊗ a [ f ( )=b] ⊗ ≃δ x . · IA B[ f ( )=b] ⊗ · The state-dependent distance is the right tool for reasoning about the closeness of measurement operators in a strategy. The following lemma ensures that, when two families of measurements are close on a state, changing from one family of measurement to the other only introduces a small error to the value of the strategy. Lemma 5.25. Let Ax , Bx , Cx be POVMs. Suppose Bx is projective, and { a, b} { a, b, c} { a, c} { a, b, c} x x A IB IA B , a, b ⊗ ≈δ ⊗ a, b x x C IB IA B . a, c ⊗ ≈δ ⊗ a, c Then the following approximate commutation relation holds:

x x [A , C ] IB 0. a, b a, c ⊗ ≈δ x x x Proof. Applying Fact 5.19 to C IB IA B and A IB , we have a, c ⊗ ≈δ ⊗ a, c { a,b ⊗ } x x x x A C IB A B . (34) a, b a, c ⊗ ≈δ a, b ⊗ a, c x x x x Similarly, applying Fact 5.19 to Aa, b IB δ IA Ba, b and IA Ba,c , and using the fact that Ba,b,c is projective, we have ⊗ ≈ ⊗ { ⊗ } { }

x x x x A B δ IA B B a, b ⊗ a, c ≈ ⊗ a, c a, b (35) x = IA B . ⊗ a, b, c Combining Equations (34) and (35), we have

x x x A C IB IA B . (36) a, b a, c ⊗ ≈δ ⊗ a, b, c A similar argument gives x x x C A IB IA B . (37) a, c a, b ⊗ ≈δ ⊗ a, b, c The claim follows from Equations (36) and (37).

50 The following lemma is a slightly modified version of [NW19, Fact 4.34].

Lemma 5.26. Let k 0 be an integer and let ε > 0. Let be a finite set and µ a distribution over . ≥ X i, x X For each 1 i k let i be a set of functions gi : i and for each x let Gg g be a ≤ ≤ G Y → R ∈ X { } ∈Gi projective measurement. Suppose that for all i 1,..., k , satisfies the following property: for any two ∈ { } Gi g = g , the probability that g (y)= g (y) over a uniformly random y is at most ε. i 6 i′ ∈ Gi i i′ ∈ Y Let Ax be a projective measurement with outcomes (g ,..., g ) . For each g1, g2,..., gk 1 k ∈ G1 ×···×Gk 1 i k, suppose that on average over x µ and y sampled uniformly at random, ≤ ≤ ∼ ∈ Y x i, x A IB IA G . (38) [evaly( )i=ai] δ [eval ( )=a ] · ⊗ ≃ ⊗ y · i Define the POVM family Cx , for x , by { g1, g2,..., gk } ∈X Cx = Gk, x G2, x G1, x G2, x Gk, x . g1, g2,..., gk gk · · · g2 g1 g2 · · · gk Then on average over x µ and y sampled uniformly at random, ∼ ∈ Y x x A[eval ( )=(a , a ,..., a )] IB k(δ+ε)1/2 IA C[eval ( )=(a , a ,..., a )] . (39) y · 1 2 k ⊗ ≃ ⊗ y · 1 2 k Proof. The proof is identical to the one given in [NW19, Fact 4.34], with the only modification needed to insert the dependence on x for all measurements considered.

Lemma 5.27 (Fact 4.35 in [NW19]). Let be a distribution on (x, y1, y2) 1 2. For i 1, 2 D x ∈X×Y × Y ∈ { } let i be a collection of functions gi : i i and let (Gi)g g be families of measurements such that G Y → R { } ∈Gi (G )x is projective for every x. Suppose further that for every (x, y ) it holds that for g = g { 2 g}g 1 2 6 2′ ∈ G2 the probability, on average over y2 chosen from conditioned on (x, y1), that g2(y2)= g2′ (y2) is at most x,y1,y2 D η. Let A be a family of projective measurements with outcomes (a , a ) such that for { a1,a2 } 1 2 ∈ R1 × R2 i 1, 2 , ∈ { } x,y1,y2 x (40) Aai I δ I (Gi)[eval ( )=a ] ⊗ ≃ ⊗ yi · i and Ax,y1,y2 I I Ax,y1,y2 . (41) a1,a2 ⊗ ≃δ ⊗ a1,a2 Define a family of measurements Jx as { g1,g2 } x x x x (42) Jg1,g2 = (G2)g2 (G1)g1 (G2)g2 . Then there is a δpasting = δpasting(η, δ) = poly(η, δ) (43) such that x,y1,y2 x Aa1,a2 I δp I J[eval ( )=a ,eval ( )=a ] . (44) ⊗ ≃ ⊗ y1 · 1 y2 · 2 5.3 Normal form verifiers We introduce a normal form for verifiers in nonlocal games. The normal form uses Turing machines to specify the two actions performed by the verifier in a game: the generation of questions and the verification of answers. For the generation of questions, we use the formalism of samplers introduced in Section 4.2. The normal form for verifiers gives a uniform method to specify an infinite family of nonlocal games.

51 Definition 5.28 (Decider). A decider is a 5-input Turing machine that on all inputs of the form (n, x, y, a, b) D where n is an integer and x, y, a, b 0, 1 ∗, halts and returns a single bit. Let TIME (n) denote the ∈ { } D D maximum time complexity of over all inputs of the form (n, x, y, a, b). When the decider outputs 0 we D D say that it rejects, otherwise we say that it accepts. Furthermore, we call the input n to a decider the index.

Definition 5.29. A normal form verifier is a pair =( , ) where is a sampler with field size q(n)= 2 V S D S and is a decider. The description length of is defined to be = max , , the maximum of the descriptionD lengths of and . The numberV of levels of verifier|V| is defined{|S to| be|D the|} number of levels of its sampler . S D V S Next we introduce a notion of bounded normal form verifiers, in which a single parameter (in this case, an integer λ) specifies a bound on the time complexity of , as well as on the description length of the verifier. V

Definition 5.30 (λ-bounded verifiers). Let λ N be an integer. A normal form verifier = ( , ) is ∈ V S D λ-bounded if the following two conditions hold

1. The time complexity bounds of the verifier TIME (n), TIME (n) are at most nλ for n 2.24 S D ≥ 2. The description length of the verifier is bounded by λ. |V| Normal form verifiers specify an infinite family of nonlocal games indexed by natural numbers in the following way.

Definition 5.31. Let = ( , ) be a normal form verifier. For n N, we define the following nonlocal V S D ∈ TIME (n) game n to be the n-th game corresponding to the verifier . The question sets and are 0, 1 S . V TIME (n) V X Y { } The answer sets and are 0, 1 D . The question distribution is the distribution µ , n specified in A B { } S Definition 4.14. The decision predicate is the function computed by (n, , , , ), when the last four inputs D · · · · are restricted to . The value of the game is denoted by val∗( ). X×Y×A×B Vn We note that the game n is well-defined since for a normal form verifier the distribution µ , n is sup- V S ported on 0, 1 TIME (n) 0, 1 TIME (n) and a normal form decider always halts with a single-bit output. { } S × { } S Definition 5.32 (Verifier with commuting strategy). Let = ( , ) be a normal form verifier. For v : V S D N [0, 1] say that has a value-v commuting strategy if for all n N, the game n has a value-v(n) commuting→ strategy. V ∈ V

24We do not require the bound to hold for n = 1 as nλ is always 1 and hence it is usually not satisfied.

52 6 Types

We augment the definition of conditionally linear functions with a construct we call types. A type t is an element of a type set , and a -typed family of conditionally linear functions is a collection Lt t T T { } ∈T containing a CL function Lt for each type t . The utility of this definition is that it allows us to ∈ T define another object, namely conditionally linear distributions parameterized by an undirected graph G = ( , E) on the set of types known as a type graph. Given two -typed families of conditionally linear T T functions Lu u , Rv v , the ( , G)-typed conditionally linear distribution corresponding to them is { } ∈T { } ∈T T the distribution which samples a pair of types (u, v) uniformly at random from the edges of G (with each endpoint having equal probability as being chosen for u or v, respectively) and then samples (x, y) from

µLu, Rv . The output is the pair ((u, x), (v, y)). The normal form verifiers we present in the paper frequently use typed CL distributions to sample their questions, rather than untyped CL distributions. Types allow us to model the parts of their question distributions which are unstructured and unsuitable for being sampled from CL distributions. A common use of types is to allow the verifier to use previously defined games as subroutines. Here, the type helps indicate which subroutine the verifier selects, and an edge in the type graph between two different types allows us to introduce a test that cross-checks the results of one subroutine with the results of another. Finally, we show how to convert any typed CL distribution into an equivalent (in the precise sense defined below) untyped CL distribution with two additional levels, a technique we call detyping. This entails showing how to “simulate” the graph distribution of G =( , E), i.e. the uniform distribution on its edges, using an untyped CL distribution. The simulation we give isT based on rejection sampling and is only approximate: its quality degrades exponentially with the number of types in . As a result, we will ensure throughout the paper that all type sets we consider are of a small, in fact generallyT constant, size. This section is organized as follows. In Section 6.1 we define typed variants of CL distributions, sam- plers, deciders, and verifiers. In Section 6.2 we define a CL distribution which samples from the graph distribution of a given graph G = ( , E). In Section 6.3 we define a canonical way to detype typed sam- plers, deciders, and verifiers using theT graph sampler from Section 6.2. We then prove the main result of the section, Lemma 6.18, which relates the value of the detyped normal form verifier to the value of the original typed verifier.

6.1 Typed samplers, deciders, and verifiers Definition 6.1 (Typed conditionally linear functions). Let be a finite set and V be Fn for some integer T n 0.A -typed family of ℓ-level conditionally linear functions (implicitly, on V) is a collection Lt t ≥ T { } ∈T such that, for each t , Lt is an ℓ-level conditionally linear function on V. ∈T Definition 6.2 (Graph distribution). Let G = (U, E) be an undirected graph with vertex set U and edge set E. Edges in E are written as multisets u, v of two vertices; the case u = v represents a self-loop. { } Suppose there are m edges, k of which are self-loops. Then the graph distribution µG of G is the distribution over U U such that for every (u, v) U U, × ∈ × 1/(2m k) if u, v E, µG(u, v)= − { } ∈ (0 otherwise.

This is identical to the uniform distribution over pairs (u, v) U U such that u, v E. ∈ × { } ∈

53 Definition 6.3 (Typed conditionally linear distributions). Let be a type set and L = Lu u , R = T { } ∈T Rv v be -typed families of conditionally linear functions on V. Let G =( , E) be a graph with vertex { } ∈T T T set . The ( , G)-typed conditionally linear distribution µG corresponding to (L, R) is the distribution T T L, R over pairs ((u, x), (v, y)), where (u, v) is drawn from µG and (x, y) is drawn from µLu, Rv . Definition 6.4 (Typed conditionally linear samplers). Let q : N N be an admissible field size function → and s : N N be a function. Let be a finite type set. A 6-input Turing machine is a -typed, ℓ-level → T S T conditionally linear sampler with field size q(n) and dimension s(n) if for all n N, letting q = q(n) and ∈ A, n B, n s = s(n), there exist -typed families of ℓ-level conditionally linear functions Lt t and Lt t Fs T w{, n } ∈T { } ∈T on V = q where t , w A, B , the conditionally linear function Lt has marginal functions w, n ∈ T w, n∈ { } Lt, j and factor spaces Vt, j, u satisfying the conditions of Lemma 4.4, and for all t , w A, B , { ≤ } { } ∈T ∈ { } j 1,..., ℓ , and z V: ∈ { } ∈ On input (n, DIMENSION), the sampler returns the dimension s(n). • S MARGINAL w, n On input (n, w, , j, z, t), the sampler returns the binary representation of Lt, j(z). • S ≤ w n On input (n, w, LINEAR, j, u, y, t), the sampler outputs the binary representation of L , (y), • S t, j, u On input (n, w, FACTOR, j, u, t), the sampler returns the factor space Vw, n of Lw, n, represented as • S t, j, u t an indicator vector in 0, 1 s. { } ( ) We call Fs n the ambient space of . We call LA, n , LB, n the CL functions of on index n. The time q(n) S { t } { t } S complexity of , denoted TIME (n), is the number of steps before halts for index n. S S S We assume that types t are represented using binary strings of length at most log ; if a type ∈ T ⌈ |T |⌉ t is given as input to the sampler and is not an element of , then the sampler returns 0. Furthermore, as described in Remark 4.13 forS un-typed samplers, we writeT typed samplers with different numbers of arguments depending on the input.

Definition 6.5 (Distribution of a typed sampler). Let be a -typed sampler and G = ( , E) be a graph. w w S T T Let L = L t for w A, B be the CL functions of on index n. The distribution of sampler with { t } ∈ { } S S graph G on index n, denoted µG , is the ( , G)-typed conditionally linear distribution corresponding to , n T (LA, LB). S Definition 6.6 (Downsizing typed CL samplers). Let be a typed sampler. The typed downsized sampler S κ( ) is defined as in Definition 4.15 with the only difference that the type t is included as part of the input toS the sampler, as in Definition 6.4.

Lemma 6.7. Let be a -typed ℓ-level CL sampler, for some finite set and integer ℓ 0. Let q(n) and S T T ≥ s(n) be as in Definition 6.4. Then κ( ) defined in Definition 6.6 is a -typed ℓ-level CL sampler with field S T size 2, dimension s(n) log q(n), and time complexity

TIMEκ( )(n)= O(TIME (n) log q(n)) . S S w,n κ Furthermore, for every integer n 1, the CL functions of κ( ) on index n are (Lt ) w A,B ,t , as defined in Definition 4.8. ≥ S { } ∈{ } ∈T

Proof. The proof is analogous to the proof of Lemma 4.16, and we omit it.

54 Definition 6.8 (Typed decider). A typed decider is a 7-input Turing machine that on all inputs of the form D (n, u, x, v, y, a, b) where n is an integer and u, x, v, y, a, b 0, 1 , halts and returns a single bit. When ∈ { }∗ D returns 0 we say that it rejects, otherwise we say that it accepts. We use TIME (n) to denote the time D D complexity of on inputs of the form (n,...). D Definition 6.9. Let be a finite set and let G = ( , E) be a graph. A ( , G)-typed normal form verifier T T T is a pair =( , ) where is a -typed sampler with field size q(n)= 2 and is a typed decider. V S D S T D Definition 6.10. Let = ( , ) be a ( , G)-typed normal form verifier. For n N, we define the V S D T ∈ following nonlocal game n to be the n-th game corresponding to the verifier . The question sets TIMEV (n) TIME (n) V X and are 0, 1 S . The answer sets and are 0, 1 D . The question distribution Y T × { G} A B { } is the distribution µ , n specified in Definition 6.5. The decision predicate is the function computed by (n, , , , ), when theS last four inputs are restricted to . The value of the game is denoted D · · · · X×Y×A×B by val∗( ). Vn For w A, B and a question (u, x) to player w we refer to u as the question type and x as the question content. ∈ { }

6.2 Graph distributions We describe a construction of conditionally linear distributions which sample from the graph distribution (see Definition 6.2) of a graph G =(U, E). We begin with a technical definition, followed by the definition of the conditionally linear distribution.

Definition 6.11 (Neighbor indicator). Given a graph G =(U, E), the neighbor indicator of a vertex u U ∈ is the vector neigh (u) FU in which, for all v U, G ∈ 2 ∈ 1 if u, v E, neighG(u)v = { } ∈ (0 otherwise.

In addition, the F -encoding of a vertex u U is the vector enc (u) FU FU given by enc (u) = 2 ∈ G ∈ 2 × 2 G (eu, neighG(u)), where eu is the standard basis vector with a 1 in the u-th position and 0’s everywhere else. Definition 6.12 (Graph sampler). Let G =(U, E) be a graph with n vertices. Then the conditionally linear A B functions corresponding to G are the pair of functions LG, LG on linear space VG specified in Fig. 1 where V = VVA VNA VVB VNB. G ⊕ ⊕ ⊕ These conditionally linear functions do not simulate the graph distribution in the sense of sampling directly from it. The following proposition, however, does show a sense in which these functions simulate the graph distribution, namely via rejection sampling.

Proposition 6.13 (Simulating the graph distribution). Let G = (U, E) be a graph with n vertices and m A B edges, k of which are self-loops. Let LG, LG be the conditionally linear functions corresponding to G (see Figure 1 for the definition and associated notation). Let (x, y) µLA, LB . Consider the event G that there ∼ G G E exists u, v U such that the following two statements are true. ∈ VVA VNA VVB VNB (i) x ⊕ = encG(u) and y ⊕ = encG(v),

VNB VNA (ii) (x )u =(y )v = 1.

55 Subspaces

VVA VNA VVB VNB FU FU FU FU 2 2 2 2

A Conditionally linear function LG

1st factor subspace VVA VNA 1st linear function Identity⊕ function

2nd factor subspace VVB VNB ⊕ 2nd linear functions For all x VVA VNA, suppose there exists a u U such that ∈ ⊕ A ∈ x = encG(u). Then for all y VVB VNB, LG, 2, x zeroes out all VNB ∈ ⊕ A entries of y except for (y )u. Otherwise, LG, 2, x = 0.

B Conditionally linear function LG

1st factor subspace VVB VNB 1st linear function Identity⊕ function

2nd factor subspace VVA VNA ⊕ A 2nd linear functions Similarly defined as those for LG by swapping VVA and VNA with VVB and VNB respectively.

Figure 1: Specification of the conditionally linear functions corresponding to G.

Then

1. Pr ( )=(2m k)/16n. x,y EG − 2. Conditioned on , (u, v) are distributed as the graph distribution of G (see Definition 6.2). EG Note that occurs if and only if both xVNB and yVNA are nonzero. In particular, if x and y are sampled EG from µ A B and given to the respective players, then at least one of them knows when the event G does not LG, LG occur. E

A Proof. Let z be drawn uniformly at random from VVA VNA VVB VNB, and let x = L (z) and ⊕ ⊕ ⊕ G y = LB (z). Then with probability n2/16n, there exist u, v U such that G ∈

VVA VNA VVB VNB x ⊕ = encG(u) and y ⊕ = encG(v) .

Conditioned on this occurring, u and v are distributed as independent, uniformly random vertices in U. If we further condition on u, v E, which occurs with probability (2m k)/n2, then by definition, (u, v) { } ∈ − is distributed as the graph distribution of G. But this event is exactly the event that G holds on (x, y), establishing the proposition. E

56 6.3 Detyping typed verifiers We give a canonical method for taking a typed normal form verifier and producing an untyped normal form verifier which simulates it. Throughout this section, denotes a finite set, G = ( , E) denotes a A B T T graph, and LG, LG denote the conditionally linear functions corresponding to G acting on the vector space V = VVA VNA VVB VNB of dimension 4 over F , as in Definition 6.12. G ⊕ ⊕ ⊕ ·|T | 2 Definition 6.14 (Detyped CL functions). Let LA = LA , LB = LB be -typed families of ℓ-level { t } { t } T conditionally linear functions on V. We define the detyped CL functions corresponding to (LA, LB) on G to A B A B be the pair of (ℓ + 2)-level CL functions (R , R )= Detype (L , L ) on linear space VDETYPE = V V G G ⊕ as follows. For w A, B , and z Lw(V ), define the family of ℓ-level CL functions Lw on V as ∈ { } ∈ G G { z } 0 if zVNw = 0, Lw = z w VVw (Lt otherwise, for z = et.

V V We note that when z Nw is nonzero, it is always the case that z Vw = et for some type t, by Definition 6.12. For w A, B , Rw is the concatenation of Lw and Lw (cf. Lemma 4.5). ∈ { } G { z }z Definition 6.15 (Detyped samplers). Let be a -typed sampler. For each n N, let LA, n , LB, n be S T ∈ { t } { t } the CL functions of with graph G on index n, and set (RA, n, RB, n) = Detype (LA, n, LB, n). Then the S G detyped sampler Detype ( ) is the (standard) sampler whose CL functions on index n are RA, n, RB, n. Its G S dimension function is sDETYPE(n)= 4 + s(n). |T | Definition 6.16 (Detyped deciders). Let be a typed decider. We define the detyped decider Detype ( ) D G D to be the (standard) decider that behaves as follows: on input (n, x, y, a, b), it attempts to parse x =(x′, x′′), y = (y , y ) V 0, 1 (using a canonical scheme for representing pairs of strings). If it cannot, it ′ ′′ ∈ G × { }∗ accepts. Otherwise, suppose that there exists u, v E such that, using notation from Definition 6.11, { } ∈ G

x′ =(eu, neigh (u), 0, eu) , y′ =(0, ev, ev, neigh (v)) VVA VNA VVB VNB . G G ∈ ⊕ ⊕ ⊕ Then it returns the output of on input (n, u, x , v, y , a, b). Otherwise, it accepts. D ′′ ′′ Definition 6.17 (Detyped verifiers). Let =( , ) be a ( , G)-typed normal form verifier. We define the V S D T detyped verifier, denoted by Detype( ), to be the (standard) normal form verifier (Detype ( ), Detype ( )). V G S G D Lemma 6.18 (Typed verifiers to detyped verifiers). Let =( , ) be a ( , G)-typed normal form verifier. V S D T The detyped verifier Detype( )=(Detype ( ), Detype ( )) satisfies the following properties: for all V G S G D n N, ∈ 1. (Completeness) If has a value-1 PCC strategy, then Detype( ) has a value-1 PCC strategy. Vn V n 2. (Soundness) If val∗(Detype( ) ) 1 ε, then val∗( ) 1 16 ε. Furthermore, V n ≥ − Vn ≥ − |T | ·

E (Detype( ) , 1 ε) E ( , 1 16|T | ε). V n − ≥ Vn − · ℓ ℓ 3. (Sampler parameters) If is an -level sampler, then DetypeG( ) is an ( + 2)-level sampler. The time complexity of DetypeS ( ) satisfies the following: S G S

TIMEDetype ( )(n)= poly( , TIME (n)). G S |T | S 57 4. (Decider complexity) The decider DetypeG( ) has time complexity poly( , TIME (n)). D |T | D 5. (Efficient computability) The descriptions of Detype ( ) and Detype ( ) are polynomial time G S G D computable from the description of G and the descriptions of and , respectively. S D Proof. Throughout this proof, we fix an index n. Let s = s(n) be the dimension of . The ambient space Fs S of is V = 2 and the ambient space of DetypeG( ) is VDETYPE = VG V. Let u, v . For this proof, weS introduce the notation S ⊕ ∈T

A B view (u)=(eu, neigh (u), 0, eu), view (v)=(0, ev, ev, neigh (v)) VVA VNA VVB VNB . G G ∈ ⊕ ⊕ ⊕

Supposing that players A and B receive x and y in VDETYPE, and supposing that (x, y) satisfies event G A B E from Proposition 6.13, then xVG = view (u) and yVG = view (v) for some u, v E. { } ∈

Completeness. Let S = ( ψ , A, B) be a value-1 PCC strategy for n. We construct a PCC strategy DETYPE | i V S for Detype( )n with value 1. This strategy also uses the state ψ . When a player receives a question, they perform measurementsV described as follows. | i

VG A Player A: given x VDETYPE the player checks if for some u , x = view (u). If so, they perform the measurement∈ ∈ T (u, xV ) A a to obtain an outcome a, which they use as theirn answer.o If not, they reply with the empty string. (This entails performing the measurement whose POVM element corresponding to the empty string is the identity matrix.)

VG B Player B: given y VDETYPE, the player checks if for some v , y = view (v). If so, they perform the measurement∈ ∈ T (v, yV ) B b n o to obtain an outcome b, which they use as their answer. If not, they reply with the empty string.

This strategy is projective and consistent because the only measurements it uses are those in S and “trivial” measurements containing the identity matrix. Suppose the players receive questions x and y such that both xVG = viewA(u) and yVG = viewB(v). In this case, the questions (u, xV ) and (v, yV ) are in the support of the question distribution of . As a result, the players succeed with probability 1 on these questions, V and their measurements always commute. For the remaining pairs of questions, the decider DetypeG( ) always accepts, and the measurements always commute by virtue of the fact that at least one is trivial,D i.e. containing the identity matrix as a POVM element.

Soundness. Let S =( ψ , A, B) be a strategy for Detype( )n with value 1 ε. Suppose G has m edges, of which are self-loops.| Fori any ( ) drawn from V , the decider−Detype ( ) automatically k x, y µDetypeG ( ), n G V V S D accepts unless (x G , y G ) satisfies event G from Proposition 6.13, which occurs with probability (2m V EV A B − k)/16|T |. When this happens, x G and y G are distributed as view (u) and view (v), where (u, v) are distributed as the graph distribution on G. As a result, conditioned on , the probability that S succeeds EG on Detype( ) is equal to the probability that the strategy S =( ψ , A , B ) succeeds on , where V n ′ | i ′ ′ Vn A B u, xV (view (u),xV) v, yV (view (v),yV) (A′)a = Aa , (B′)b = Bb .

58 This means that 2m k 2m k val∗(Detype( )n, S )= 1 − + − val∗( n, S ′) V − 16 16 · V  |T |  |T | 1 1 1 + val∗( n, S ′) . ≤ − 16 16 · V  |T |  |T | Thus, S has value at least 1 16 ǫ. This proves the first statement in the soundness. As for the second, ′ − |T | · S and S use the same state ψ , and therefore both strategies have the same Schmidt rank, which by ′ | i definition is at least E ( , 1 16 ε). Vn − |T | · Complexity. Definition 6.14 implies that Detype ( ) is an (ℓ + 2)-level sampler by Lemma 4.5 and G S that VDETYPE has dimension 4 + s. The claimed time bounds of Detype ( ) and Detype ( ) follow |T | G S G D from the fact that these perform simple, poly( )-time computations followed by running and as subroutines. |T | S D

59 7 Classical and Quantum Low-degree Tests

In this section we introduce the classical and quantum low individual degree tests, that form the basis of our classical, quantum-sound and quantum PCP constructions. For quantum soundness of the classical test we refer to [JNV+20]. For the quantum test, we refer to [NV18a] and its adaptation to the low individual degree test given in Appendix A. The protocol in this work combines both of these tests, using the quantum low individual degree test for question reduction and the classical low individual degree test for answer reduction. Section 7.1 below introduces the classical test, and Section 7.3 does the same for the quantum test. Prior to doing this, we introduce the Magic Square game in Section 7.2, a key subroutine in the quantum test.

7.1 The classical low-degree test We begin with a generalization of the classical low-degree test known as the “simultaneous individual low- degree test”. We sometimes refer to this as the “classical low-degree test” for short. The low-degree test is used as a subroutine in the Pauli Basis test (see Section 7.3) as well as the answer-reduction normal form verifier (see Section 10). We describe the test as a nonlocal game in Section 7.1.1 and show that the question distribution can be implemented as a (typed) conditionally linear distribution (see Section 6 for the definition of typed CL distributions).

7.1.1 The game The game GLD is parametrized by a tuple ldparams = (q, m, d, k) where m, d, k N are integers, q N GLD ∈ ∈ is an admissible field size, and m divides q. We sometimes write ldparams to emphasize the dependence of the classical low-degree test on the parameter tuple ldparams. We first provide a high-level description of the game for k = 1. It is based on the low-individual degree variant of the multilinearity test of [BFL91], where one player (called the “points player”) receives a random point u Fm, and the other player (called the “lines player”) receives a random axis-parallel line ∈ q ℓ that contains u, i.e. the line parallel to the i-th axis which goes through point u, for uniformly random i 1,..., m . The points player is supposed to respond with a value a F , and the lines player is ∈ { } ∈ q supposed to respond with a univariate polynomial fℓ of degree at most d such that fℓ(ui) = a, where ui is the i-th coordinate of u. Suppose the players agree beforehand on a global polynomial g : Fm F such that every variable q → q has degree at most d. Then a winning strategy is the following: the points player responds with g(u), and the lines player responds with the univariate polynomial fℓ that is the restriction of g to line ℓ; formally, for t F , ∈ q fℓ(t)= g(u1, u2,..., ui 1, t, ui+1,..., um) . − It is easy to see that fℓ(t) has degree at most d in the variable t. Thus in this case the players will pass the low-degree test with probability 1. The low-degree testing theorem states that the converse approximately holds: if the players pass the low-degree test with probability close to 1, then their responses must be approximately consistent (in some sense) with a global polynomial g with individual degree d. When k > 1, the players are supposed to respond with a tuple of answers; and the test is intended to check that the players’ responses are consistent with a k-tuple of functions g : Fm F that are i q → q polynomials where every variable has degree at most d. The game that we use is slightly more elaborate than the “axis-parallel lines-versus-points” test just described. In addition, we have two other subtests. One is a “diagonal line-vs-points” test: here, the points

60 player receives a point u Fm as before, but the lines player receives a line ℓ that goes through u but is ∈ q not necessarily axis-aligned. Instead, it’s chosen by first picking an index i 1,2,..., m uniformly at Fm ∈ { } random, picking v q uniformly at random, and letting v′ = πi 1(v)=(0,0,...,0, vi, vi+1,..., vm) ∈ − (i.e., πi 1 zeroes out the first i 1 coordinates of its input). Then, the line is defined as the set − − ℓ = u + tv′ : t F , { ∈ q} Fm which is a uniformly random line in the affine subspace of q corresponding to all points v whose first (i 1) coordinates match those of u. The points player responds with a value a Fq, the lines player − 25 ∈ responds with a univariate polynomial fℓ of degree md, and the verifier checks that fℓ(tu) = a (where tu Fq is a projection of u onto the line ℓ with respect to a canonical parameterization of ℓ). ∈The other additional test is a “consistency test”, where both players receive either the same points ques- tion, the same axis-parallel lines question, or the same diagonal lines question. In all cases they are expected to respond with the same answer.

Axis-parallel lines and diagonal lines. Before presenting a formal description of the game GLD, we first define what we mean by lines, and their properties. Definition 7.1 (Lines). A line ℓ in Fm specified by a pair (u, v) (Fm)2 is the subset q ∈ q ℓ(u, v)= u + tv : t F Fm . (45) ∈ q ⊆ q We call the vector v a direction of line ℓ(u, v).26 A line ℓ(u, v) is axis-parallel if there is a coordinate i 1,2,..., m for which v is equal to ei, the Fm ℓ∈ { } ℓ i-th elementary basis vector in q . We say that such a line is parallel to the i-th direction. A line (u, v) is diagonal if there exists an i 0,1,..., m 1 for which v = v = = v = 0. ∈ { − } 1 2 · · · i The following Proposition establishes that, for a fixed direction, a line ℓ is an equivalence class of points on the line. Proposition 7.2. Let u, v Fm. Then for all u ℓ(u, v), we have that ℓ(u, v)= ℓ(u , v). ∈ q ′ ∈ ′ Proof. Fix u ℓ(u, v). We have that u = u + tv for some t F . Let a ℓ(u, v). This means that ′ ∈ ′ ∈ q ∈ a = u + sv for some s F . We also have that a = u +(s t + t)v = u + tv +(s t)v = u +(s t)v, ∈ q − − ′ − so ℓ(u, v) ℓ(u , v). A similar argument shows that ℓ(u , v) ℓ(u, v). ⊆ ′ ′ ⊆ We now specify a canonical representation of lines ℓ as pairs (u0, v) where u0 is a canonical representa- tive of ℓ specified as follows. Let LLN : Fm Fm denote the canonical linear map with the one-dimensional v q → q kernel basis v (see Definition 3.12 for a definition of canonical linear map). Note that for all u ℓ, we L{N } ℓ LN ∈ have that Lv (u) is a point in . This is because since the image and kernel of Lv are complementary Fm subspaces by definition (see Definition 3.12), any u q has a unique decomposition u = u0 + u1 where LN LN ∈ u0 is in the image of Lv and u1 ker(Lv ), which by definition is the subspace spanned by v. Thus F ∈ ℓ ℓ LN ℓ u1 = tv for some t q and we have that u0 = u tv . In particular, if u then u0 = Lv (u) ∈ LN − ∈ ∈ LN ∈ as well. Furthermore, since Lv is a projection onto its image by definition, we have Lv (u0)= u0. ℓ F LN LN Next, note that for any u, u′ , since u u′ = sv for some s q it follows that Lv (u)= Lv (u′). ∈ LN − ∈ Thus u0 = u0′ and this shows that Lv gives a means to compute a canonical representative of lines. 25Since ℓ is no longer axis-parallel, the restriction of an individual degree-d polynomial g to a line ℓ may yield a degree md polynomial. 26For convenience we explicitly allow v = 0, in which case the line is reduced to a singleton.

61 Definition 7.3 (Canonical representative of a line). Let ℓ be a line in direction v. A canonical representative ℓ Fm of is the point u0 q such that ∈ LN u0 = Lv (u) for all u ℓ. ∈ Question distribution. We now present the question distribution of the game GLD as a typed CL distribu- tion (see Section 6 for the definition of typed CL functions and typed CL distributions). The type set LD of the game GLD we use is T LD = POINT, ALINE, DLINE . T { } The distribution over question types is uniform over LD LD. T ×T We introduce the corresponding typed CL functions LPOINT, LALINE, LDLINE where the corresponding CL distributions µLALINE ,LPOINT and µLDLINE ,LPOINT implement the axis line-versus-point and diagonal line-versus- point distributions, respectively. The functions are parametrized by a field size q, a dimension m, and three complementary register subspaces VX, VI, VV of some ambient space. The register VX (which is isomorphic Fm F to q ) is called the point register, VI (which is isomorphic to q) is called the coordinate register, and VV Fm (which is isomorphic to q ) is called the direction register, respectively. We let V denote the direct sum m m VX VI VV. We identify elements of V as triples (u, s, v) F F F . ⊕ ⊕ ∈ q × q × q Define LPOINT to be the 1-level CL function that projects onto VX: for every (u, s, v) V, ∈

LPOINT(u, s, v)=(u, 0, 0) . (46)

Define LALINE to be the following 2-level CL function: forall (u, s, v) V, ∈ ( )=( LN( ) ) (47) LALINE u, s, v Lei u , s, 0 where LLN is the linear function used to compute the canonical representative of lines (see Definition 7.3), and i = χ(s) where we define χ(s) to be the unique integer such that q s =(χ(s) 1) + r (48) − m for some 0 r < q/m, and we interpret the field element s F as an integer between 0 and q 1). The ≤ ∈ q − function LALINE is the concatenation of the 1-level CL function that projects the subspaces VI VV down LN ⊕ to VI (i.e. zeroes out the entire VV register), and the family of 1-level CL functions L acting on VX, { ei } indexed by i 1,2,..., m . Thus LALINE is a 2-level CL function. ∈ { } Define LDLINE to be the following 3-level CL function: for all (u, s, v) V, ∈ LN LDLINE(u, s, v)=(L (u), s, π (v)) (49) πi 1(v) i 1 − − where i = χ(s). The function LDLINE is the concatenation of the following CL functions (see Lemma 4.5 for definition of CL function concatenation): (i) the identity function on VI (which is a 1-level CL function); (ii) for every i 1,2,..., m the 1-level CL function on VV that maps v πi 1(v) (i.e. it zeroes out the ∈ { } 7→ − LN first i 1 coordinates of v), and (iii) for every pair (s, v′) VI VV, the 1-level CL function L acting on − ∈ ⊕ v′ VX. Thus LDLINE is a 3-level CL function.

The following two lemmas establish that the CL functions LPOINT, LALINE, LDLINE give rise to the axis- parallel line-versus-point and diagonal line-versus-point distributions.

62 Lemma 7.4. Consider the pair (u, ℓ) sampled in the following way: first, sample the tuple ((u, 0, 0), (u0, s, 0)) ℓ ℓ from the CL distribution µLPOINT ,LALINE , and let = (u0, ei) where i = χ(s). Then u is a uniformly random point in Fm and ℓ is a line that goes through u and is parallel to a uniformly random axis i 1,2,..., m . q ∈ { } Proof. Since s is uniformly random in Fq and m divides q by assumption, we have that i is uniformly random between 1,2,..., m and therefore (u , s) specifies a uniformly random axis-parallel line ℓ(u , e ). Since { } 0 0 i u = LLN(u), we have that u ℓ(u , e ). 0 ei ∈ 0 i Lemma 7.5. Consider the pair (u, ℓ) sampled in the following way: first, sample the tuple ((u, 0, 0), (u0, s, v)) ℓ ℓ from the CL distribution µLPOINT ,LDLINE , and let = (u0, v) where i = χ(s). Then u is a uniformly random point in Fm and ℓ is a random diagonal line that goes through u and shares the first i 1 coordinates with q − u, for uniformly random i. Proof. The proof of this follows identically to Lemma 7.4.

For later convenience we make the following definition.

Definition 7.6 (Line-point distributions). The axis line-point distribution DALINE is the distribution µLALINE ,LPOINT over pairs (ℓ = (u0, ei), u) (we omit the 0 elements for simplicity). The diagonal line-point distribution is ℓ the distribution µLDLINE ,LPOINT over ( =(u0, s, v), u). The line-point distribution DLINE is the equal mixture of the axis-parallel line-point distribution and the diagonal line-point distribution.

Decision procedure The decision procedure LD for the game GLD is presented in Figure 2. The table at the top specifies a parsing scheme for the questionsD and answers, depending on the type of question. For example, when a player receives a question with type POINT, the question content x, a bit string of length m log q, should be interpreted by the decision procedure and the players as an element of the vector space Fm q , as indicated in Section 3.3.2. Similarly the answer to a question with type POINT is expected to be a bit Fk string of length k log q, and is interpreted as an element of q. For questions with type ALINE, the question Fm F content is a (m log q + log q)-bit string, which is parsed as a pair (u0, s) q q, which in turn can ℓ Fm ∈ × be interpreted as a specification for an axis-parallel line (u0, ei) in q where i = χ(s), as described in Lemma 7.4. The answer is interpreted as the description of k degree-d univariate polynomials defined on the line ℓ(u0, ei), each polynomial given by a list of d + 1 coefficients. For questions with type DLINE, the question content is a (2m log q + log q)-bit string, which is parsed as a triple (u , s, v) Fm F Fm 0 ∈ q × q × q that specifies a diagonal line ℓ(u0, v′) where v′ = πi 1(v) as described in Lemma 7.5. The answer is − interpreted as the description of k degree-md univariate polynomials defined on the line ℓ(u0, v′). If the answers returned by the players do not fit this format the decision procedure rejects. We define a special class of measurements that are relevant to the soundness properties of the low-degree test.

Definition 7.7 (Low-degree polynomial measurements). Define PolyMeas(m, d, q) to be the set of POVM measurements whose outcomes correspond to (individual) degree-d polynomials of m variables over Fq. More generally, for an integer k and tuples m =(m1, m2,..., mk), d =(d1, d2,..., dk) and q =(q1, q2,..., qk), we let PolyMeas(m, d, q, k) be the set of measurements G = Gg1, g2,..., gk such that for i 1,2,..., k , Fmi F { } ∈ { } gi is a polynomial gi : qi qi with individual degree di (see Section 3.4 for a definition of individual degree). →

Quantum soundness of a version of the classical low-degree test for polynomials of low total degree was claimed in [NV18a], building on an analysis of a three-prover version of the test in [Vid16]. Unfortunately

63 Type Question Content Answer Format

POINT u Fm Element of Fk ∈ q q ALINE (u , s) Fm F k degree-d polynomials f : F F 0 ∈ q × q j q → q DLINE (u , s, v) Fm F Fm k degree-md polynomials f : F F 0 ∈ q × q × q j q → q

LD Input to : (tA, xA, tB, xB, aA, aB). In all cases where no action is indicated, accept. For D w A, B , ∈ { }

1. (Consistency test) If tA = tB, accept iff aA = aB.

2. (Axis-parallel line-versus-point test) If tw = ALINE and tw = POINT, accept iff f (s)=(a ) for all j 1,2,..., k where s F is such that x = u + se where j w j ∈ { } ∈ q w 0 i i = χ(s).

3. (Diagonal line-versus-point test) If tw = DLINE and tw = POINT, accept iff fj(s) = (a ) for all j 1,2,..., k where s F is such that x = u + sv , where v = w j ∈ { } ∈ q w 0 ′ ′ πi 1(v) with i = χ(s). −

Figure 2: The decision procedure LD for the simultaneous low-degree test, parameterized by ldparams = D (q, m, d, k). The function χ(s) is defined in Equation (48), and πi 1(v) zeroes out the first i 1 coordinates − − of v. there is a gap in the soundness analysis of this test. For our construction we use the low individual degree test, whose quantum soundness is shown in [JNV+20]. The test can be extended to the case of general k via a standard reduction, as in [NW19, Theorem 4.43]. We quote this result adapted to our notation. Lemma 7.8 (Quantum soundness of the simultaneous classical low-degree test). There exists a function a b b bmd δLD(ε, q, m, d, k) = a(dmk) (ε + q + 2 ) for universal constants a 1 and 0 < b 1 such that − − ≥ ≤ the following holds. For all ε > 0 and parameter tuple ldparams =(q, m, d, k), for all projective strategies (ψ, A, B) that succeed with probability at least 1 ε in the game GLD there exists measurements − ldparams Gw PolyMeas(m, d, q, k) ∈ on , for w A, B , such that Hw ∈ { } POINT, u B A IB δ IA G , a1, a2,..., ak ⊗ ≃ LD ⊗ [(g1(u),g2(u),...,gk(u))=(a1, a2,..., ak)] A POINT, u G IB δ IA B , [(g1(u),g2(u),...,gk(u))=(a1, a2,..., ak)] ⊗ ≃ LD ⊗ a1, a2,..., ak A B G IB IA G , g1, g2,..., gk ⊗ ≃δLD ⊗ g1, g2,..., gk m where δLD = δLD(ε, q, m, d, k), and the approximation holds under the uniform distribution over u F . ∈ q 7.1.2 Complexity of the classical low-degree test. The CL functions and decision procedure of the low-degree test are incorporated as subroutines in some of the normal form verifiers constructed in subsequent sections. The next lemma establishes the time com- plexity of these procedures as a function of the parameter tuple ldparams = (q, m, d, k). The lemma also

64 establishes the time complexity of computing the description of the decision procedure LD as a Turing machine, given the parameter tuple ldparams as input. D

Lemma 7.9 (Complexity of the classical low-degree test). Let ldparams =(q, m, d, k) denote a parameter tuple.

1. The time complexity of the decision procedure LD parametrized by ldparams is poly(m, d, k,log q). D

2. The time complexity of evaluating marginals of the CL functions LPOINT, LALINE, and LDLINE at a given input point is poly(m,log q). 3. The Turing machine description of the decision procedure LD parametrized by ldparams can be D computed from ldparams in polylog(q, m, d, k) time.

Proof. Finite field arithmetic over Fq can be performed in time polylog q, by Lemma 3.20. The most LD F Fk F expensive step in is to evaluate a univariate polynomial f : q q at some point tu q that D m → ∈ corresponds to a point u F , which takes time poly(m, d, k,log q). The function LPOINT is a projection ∈ q onto VX, which takes time poly(m,log q) to compute (it just involves “zeroing out” the registers outside of VX). The functions LALINE and LDLINE require computing a canonical linear map, which requires performing Gaussian elimination and can be done in time poly(m,log q). The Turing-machine description of the decision procedure LD can be uniformly computed from the D integers (q, m, d, k) expressed in binary; the complexity of computing the description comes from describing the parameter tuple ldparams, which takes time that is at most polynomial in the bit length of (q, m, d, k).

7.2 The Magic Square game We recall the Magic Square game of Mermin and Peres [Mer90, Per90, Ara02]. The Magic Square game is a simple self-test for EPR pairs (it tests for two of them). In addition, it allows one to test that a pair of observables anticommutes. Here we use it as a building block to construct the quantum low-degree test. There are several formulations of the Magic Square game; here we present it as a binary constraint satisfaction game [CM14]. In this formulation of the game (denoted by GMS) there are 6 linear equations defined over 9 variables that take values in F2. The variables correspond to the cells of a 3 3 grid, as depicted in Figure 3. Five of the equations correspond to the constraint that the sum of the variables× in each row and the first two columns must be equal to 0, and the last equation requires that the sum of the variables in the last column must be equal to 1.

x1 x2 x3

x4 x5 x6

x7 x8 x9

Figure 3: The Magic Square game

65 The question set MS of the Magic Square game is the following: T MC = CONSTRAINT : i = 1, 2, . . . , 6 , T { i } MV = VARIABLE : j = 1, 2, . . . , 9 , T { j } MS = MC MV . T T ∪T The questions CONSTRAINT for i 1, 2, 3 correspond to the three row constraints, the questions i ∈ { } CONSTRAINT4,CONSTRAINT5 correspond to the first two column constraints, and question CONSTRAINT6 corresponds to the third column constraint. In the Magic Square game, the verifier first samples a constraint CONSTRAINT MC uniformly i ∈ T at random, and then samples VARIABLEj, one of the three variables in the row or column correspond- ing to CONSTRAINTi, uniformly at random. One player is randomly assigned to be the CONSTRAINT player, and the other is assigned to be the VARIABLE player. The CONSTRAINT player is sent the question CONSTRAINT and is expected to respond with three bits (β , β , β ) F3, where (v , v , v ) are the i v1 v2 v3 ∈ 2 1 2 3 indices of the three variables corresponding to CONSTRAINTi. The VARIABLE player is given question VARIABLE and is expected to respond with a single bit γ F . The players win if the CONSTRAINT j ∈ 2 player’s answers satisfy the equation associated with CONSTRAINTi, and γ = βj. More precisely, the veri- fier samples an edge of the type graph (see Section 6) GMS in Fig. 4, sends one endpoint to a random player, and the other endpoint to the other player.

VARIABLE1

VARIABLE CONSTRAINT1 2

VARIABLE3 CONSTRAINT2

VARIABLE4 CONSTRAINT3

VARIABLE5 CONSTRAINT4 VARIABLE6

CONSTRAINT5 VARIABLE7

CONSTRAINT6 VARIABLE8

VARIABLE9

Figure 4: Type graph GMS for the Magic Square game.

The following theorem records the self-testing (also known as rigidity) properties of the Magic Square game. Although we do not explicitly refer to the theorem, its self-testing properties are crucial to the Pauli basis test. In particular, it is used to enforce anticommutation relations between certain pairs of operators. Theorem 7.10 (Rigidity of the Magic Square game). Let S = ( ψ , A, B) be a strategy that succeeds in | i the Magic Square game GMS with probability 1 ε, where ψ is a state. Then there exist local − | i∈HA ⊗HB isometries φ : , φ : (where , = (C2) 2 and , A HA → HA′ ⊗HA′′ B HB → HB′ ⊗HB′′ HA′ HB′ ∼ ⊗ HA′′ HB′′ are finite dimensional) and a state AUX such that | i∈HA′′ ⊗HB′′ 1. φ φ ψ EPR 2 AUX O(√ε), A ⊗ B| i − | 2i⊗ ⊗ | i ≤

66 ˜ x x † ˜ y y † 2. Letting Aa = φA Aa φA and Bb = φBBb φB, we have that

A˜ VARIABLE1 I (σX ) I b ⊗ B′B′′ ≈√ε b A′ ⊗ A′′B′B′′ A˜ VARIABLE5 I (σZ) I b ⊗ B′B′′ ≈√ε b A′ ⊗ A′′B′ B′′ I B˜ VARIABLE1 I (σX) A′ A′′ ⊗ b ≈√ε A′ A′′B′′ ⊗ b B′ I B˜ VARIABLE5 I (σZ) A′ A′′ ⊗ b ≈√ε A′ A′′B′′ ⊗ b B′ 2 where the √ statement holds with respect to the state EPR2 ⊗ AUX A B and the answer ≈ ε | iA′B′ ⊗ | i ′′ ′′ summation is over b 0, 1 . As a consequence, letting ∈ { } VARIABLE VARIABLE A˜ VARIABLE1 = A˜ 1 A˜ 1 0 − 1 VARIABLE VARIABLE A˜ VARIABLE5 = A˜ 5 A˜ 5 0 − 1

it holds that

A˜ VARIABLE1 A˜ VARIABLE5 I A˜ VARIABLE5 A˜ VARIABLE1 I ⊗ B′B′′ ≈√ε − ⊗ B′B′′ I B˜ VARIABLE1 B˜ VARIABLE5 I B˜ VARIABLE5 B˜ VARIABLE1 . A′ A′′ ⊗ ≈√ε − A′ A′′ ⊗ Proof. See [WBMS16].

We will need the following theorem, which shows that any pair of anticommuting observables can be used to form a value-1 strategy for the Magic Square game.

Theorem 7.11. Let A = Ab b F and B = Bb b F be two-outcome projective measurements acting on { } ∈ 2 { } ∈ 2 (Cq) n which are consistent on EPR n, and let = A A and = B B be the correspond- ⊗ | qi⊗ OA 0 − 1 OB 0 − 1 ing observables. Suppose that A B = B A. Then there exists a symmetric strategy S =(ψ, M) for the Magic Square game with theO followingO −O properties.O

1. S is an SPCC strategy of value 1.

2. The state ψ has the form ψ = EPR n EPR . | i | i | qi⊗ ⊗ | 2i 3. For b 0, 1 , we have MVARIABLE1 = A I and MVARIABLE5 = B I. ∈ { } b b ⊗ b b ⊗ Proof. The strategy S is based on the canonical two-qubit strategy for the Magic Square game as described n in, for example, [Ara02]. The state is ψ = EPRq ⊗ EPR2 . We specify the measurements of S in Figure 5 as an operator solution for the| i Magic| Squarei game,⊗ | meanti to be read as follows: each cell con- tains a two-outcome projective measurement E , E on (Cq) n C2 written as its 1-valued observable { 0 1} ⊗ ⊗ ± E E . When Player A or B receives the question VARIABLE for j 1,...,9 , they measure their 0 − 1 j ∈ { } share of ψ using the measurement specified by the cell corresponding to VARIABLE and receive a single- | i j bit measurement. When they receive the question CONSTRAINT for i 1, 2, . . . , 6 , they simultaneously i ∈ { } perform the three measurements in the corresponding row or column on ψ to obtain three bits. For exam- | i ple, if Player A receives question VARIABLE , they measure ψ using the measurement A I, A I 1 | i { 0 ⊗ 1 ⊗ } corresponding to the observable I (where the first operator acts on EPR n and the second acts on OA ⊗ | qi⊗ EPR2 ). Similarly, on question VARIABLE5, they use the measurement B0 I, B1 I . This establishes Item| 3 iof the theorem. { ⊗ ⊗ }

67 I I σX σX OA ⊗ ⊗ OA ⊗

I σZ I σZ ⊗ OB ⊗ OB ⊗

σZ σX σZσX OA ⊗ OB ⊗ OAOB ⊗ Figure 5: Observables for Magic Square strategy

First, we show that this gives a well-defined strategy. The VARIABLEj measurements are well-defined because each cell contains a 1-valued observable. This is obvious for all j = 9; when j = 9, the bottom- ± 6 right cell contains σZσX. Because and anti-commute, OAOB ⊗ OA OB σZσX = σXσZ = σXσZ =( σZσX)†. (50) OAOB ⊗ −OAOB ⊗ OBOA ⊗ OAOB ⊗ As a result, this matrix is Hermitian. In addition,

( σZσX)2 =( σZσX) ( σXσZ)=( ) (σZσX σX σZ)= I, OAOB ⊗ OAOB ⊗ · OBOA ⊗ OAOB · OBOA ⊗ · X Z where the first step uses Equation (50) and the final step uses the fact that A, B, σ , σ are 1-valued observables and hence square to the identity. As a result, this matrix is HermitianO O and squares to the± identity. Therefore, it is a 1-valued observable. ± As for the CONSTRAINTi measurements, we must show that the three measurements in each row and column are simultaneously measurable. This is equivalent to the three 1-valued observables being simul- taneously diagonalizable, which is equivalent to them being pairwise commuting.± This can be easily verified for the cases of i = 1, 2, 4, 5 (i.e. the first two rows and columns). In the case of i = 3, commutativity of Z X A σ and B σ follows from Equation (50). Since these two matrices commute, they also commute O ⊗ O ⊗ Z X Z X with their product ( A σ )( B σ )= A B σ σ . The case of i = 6 is similar. By construction,OS ⊗is symmetric,O ⊗ and weO haveO already⊗ shown that it is projective. It remains to show that it is commuting, consistent, and value 1. To show that it is commuting, it suffices to show that the measurement for each cell is simultaneously measurable with all three measurements in its row or column, which was already proved above. Now we show consistency. By linearity, because A and B are consistent on EPR n, so too are and . We claim that this implies the observable in each cell of Figure 5 is | qi⊗ OA OB consistent on ψ . To see why, consider the j = 9 case: | i Z X n Z X n ( σ σ )A IB EPR ⊗ EPR =( σ )A (I σ )B EPR ⊗ EPR OAOB ⊗ ⊗ · | qi ⊗ | 2i OAOB ⊗ ⊗ ⊗ · | qi ⊗ | 2i X Z n =( I)A (I σ σ )B EPR ⊗ EPR OAOB ⊗ ⊗ ⊗ · | qi ⊗ | 2i X Z n =( I)A ( σ σ )B EPR ⊗ EPR OA ⊗ ⊗ OB ⊗ · | qi ⊗ | 2i X Z n = IA ( σ σ )B EPR ⊗ EPR ⊗ OBOA ⊗ · | qi ⊗ | 2i Z X n = IA ( σ σ )B EPR ⊗ EPR . ⊗ OAOB ⊗ · | qi ⊗ | 2i X Z The first two steps use the consistency of σ , σ on EPR2 , the next two use the consistency of B, A on n | i O O EPRq ⊗ , and the last step is by Equation (50). The remaining cases of j 1,...,8 are similar, and we omit| them.i ∈ { }

68 Now, we show that the measurements in S are consistent. Let E = E , E be any of the measure- { 0 1} ments in the cells of Figure 5. We have shown that = E E is consistent on ψ . But for each OE 0 − 1 | i b 0, 1 , E = (E + E +( 1)b(E E ))/2 = (I +( 1)b )/2, and so each E is consistent ∈ { } b 0 1 − 0 − 1 − OE b on ψ by the following calculation | i 1 b 1 1 b E IB ψ = (I +( 1) ) IB ψ = ψ + ( 1) IB ψ b ⊗ | i 2 − OE ⊗ | i 2 | i 2 − OE ⊗ | i 1 1 b 1 b = ψ + ( 1) IA ψ = IA (I +( 1) ) ψ = IA E ψ , 2 | i 2 − ⊗ OE| i 2 ⊗ − OE | i ⊗ b| i where the third equality is by the consistency of . As a result, the VARIABLE measurements are con- OE j sistent. As for the CONSTRAINTi measurements, each such measurement Fa,b,c a,b,c 0,1 is of the form { } ∈{ } F = E1 E2 E3, where E , E , and E are VARIABLE measurements. But then consistency of F follows a,b,c a · b · c 1 2 3 from the VARIABLE consistencies:

1 2 3 1 2 3 1 3 2 F IB ψ =(E E E )A IB ψ =(E E )A (E )B ψ =(E )A (E E )B ψ a,b,c ⊗ | i a · b · c ⊗ | i a · b ⊗ c | i a ⊗ c · b | i 3 2 1 1 2 2 = IA (E E E )B ψ = IA (E E E )B ψ = IA F ψ , ⊗ c · b · a | i ⊗ a · b · c | i ⊗ a,b,c| i where the second-to-last step is because the VARIABLEj measurements in the same row or column commute. Hence, all measurements are consistent. Since all measurements are consistent, this implies that the answer bit of the player receiving a VARIABLE question is always consistent with the corresponding answer bit of the player receiving the CONSTRAINT question. Similarly, the answers of the player receiving theCONSTRAINT question always satisfy the given constraint; observe that in all rows and the first two columns, the observables multiply to I, whereas the observables in the last column multiply to I. This implies that the strategy is value-1, concluding the proof. −

7.3 The Pauli basis test We introduce the quantum low-degree test of [NV18a] in the form of a slight modification to it by [NW19] known as the Pauli basis test. Informally, the quantum low-degree test asks the players to measure a large number of qubits and return a highly compressed version of the measurement outcome. The Pauli basis test simply asks that the players return their uncompressed measurement outcomes, and it is designed by direct reduction to the quantum low-degree test. In Section 7.3.1 we describe the Pauli basis test as a nonlocal game GPAULI and show that its question distribution is implementable via a CL distribution, as we did with the classical low-degree test in Section 7.1. In Section 7.3.2 we exhibit canonical parameters for the Pauli basis test and give bounds on the time complexity of executing the test.

7.3.1 The game We start by discussing parameter settings. The game GPAULI is parametrized by a tuple

qldparams =(q, m, d) ,

N GPAULI where q, m, d are integers. We sometimes write qldparams to emphasize the dependence of the Pauli basis test on the∈ parameters. M Informally, the test is meant to certify that the players share a state of the form EPRq ⊗ , where m | i Fm M = 2 . Its question set includes questions that are axis-parallel/diagonal lines and points in q , which

69 are meant to correspond to questions in the classical low-degree test, and questions of the form (PAULI, W), for W X, Z . Upon receipt of a question of the latter form, the players are expected to perform the ∈ {W } POVM τh h FM and report the outcome h as their answer. { } ∈ q Definition 7.12 (Admissible parameters). We say that the tuple qldparams = (q, m, d) is admissible if q is an admissible field size (Definition 3.17).

Question distribution. We now describe the question distribution of the game GPAULI, and show that it is a (typed) CL distribution. The question types in the game GPAULI are

PAULI = POINT, ALINE, DLINE, PAULI, PAIR X, Z MS PAIR , (51) T { } × { } ∪T ∪ { } where MS is the question type set of the Magic Square game defined in Sec tion 7.2. BeforeT presenting the CL functions of the Pauli basis test, we first give an intuitive description of the PAULI question distribution of the Pauli basis test: a pair of questions ((tA, xA), (tB, xB)) in G can be sampled via the following procedure:

PAULI 1. Sample a pair of types by sampling an edge (tA, tB) of the graph G given in Figure 6 uniformly at random (including the self-loops).

2. Sample the following uniformly at random:

m (a) (Points) uX, uZ F , ∈ q (b) (Directions) s F , v Fm, ∈ q ∈ q (c) (Qubit basis for (anti-)commutation) rX, rZ F . ∈ q 3. For w A, B and W X, Z , ∈ { } ∈ { } (a) If tw =(POINT, W), then set xw = uW, (b) If t = (ALINE ), then set = ( ), where = LN( ), with = ( ) (see w , W xw u0, s u0 Lei uW i χ s Section 7.1.1 for definition of LLN and χ(s)),

(c) If tw = (DLINE, W), then set xw = (u0, s, v′), where i = χ(s), v′ = πi 1(v), and u0 = LN − L (uW), v′ (d) If t = CONSTRAINT for some i 1,...,6 , then set x =(uX, uZ, rX, rZ), w i ∈ { } w (e) If t = VARIABLE for some j 1,...,9 , then set x =(uX, uZ, rX, rZ), w j ∈ { } w (f) If tw = PAIR, then set xw =(uX, uZ, rX, rZ),

(g) If tw =(PAIR, W), then set xw =(uX, uZ, rX, rZ),

(h) If tw =(PAULI, W), then set xw = 0.

We now specify the corresponding CL functions for each of the question types in the Pauli basis test.

70 (DLINE, X) (ALINE, X)

(PAULI, X) VARIABLE1 (POINT, X)

CONSTRAINT1 (PAIR, X) VARIABLE2

PAIR CONSTRAINT 2 VARIABLE3 (PAIR, Z) CONSTRAINT VARIABLE 3 4 (POINT, Z) (PAULI, Z) VARIABLE5 CONSTRAINT4

VARIABLE6 (DLINE, Z) (ALINE, Z) CONSTRAINT5

VARIABLE7 CONSTRAINT6

VARIABLE8

VARIABLE9

Figure 6: Graph GPAULI for the Pauli basis test. Each vertex also has a self-loop which is not drawn on the figure for clarity.

Conditional linear functions for the Pauli basis test. The CL functions corresponding to the Pauli basis test question distribution are parameterized by the parameter tuple qldparams = (q, m, d). Let VPAULI denote the linear space (Fm)2 F Fm (F )2. The space VPAULI is decomposed into a direct sum q × q × q × q of the following register subspaces: VX, VZ (which are m-dimensional), VI (which is 1-dimensional), VV PAULI (which is m-dimensional), and VRX , VRZ (which are 1-dimensional). We identify elements of V as m 2 m m 2 PAULI PAULI tuples (uX, uZ, s, v, rX, rZ) (F ) F F (F ) . We define CL functions Lt : V V ∈ q × q × q × q → for every type t PAULI: ∈T

1. For W X, Z , define LPOINT,W(uX, uZ, s, v, rX, rZ)= LPOINT(uW, s, v) where LPOINT is the 1-level CL function∈ { defined} in Equation (46).27

2. For W X, Z , define LALINE,W(uX, uZ, s, v, rX, rZ) = LALINE(uW, s, v) where LALINE is the 2- level CL∈ function { } defined in Equation (47).

3. For W X, Z , define LDLINE,W(uX, uZ, s, v, rX, rZ) = LDLINE(uW, s, v) where LDLINE is the 3- level CL∈ function { } defined in Equation (49).

MS 4. For t PAIR, (PAIR, X), (PAIR, Z) , Lt(uX, uZ, s, v, rX, rZ)=(uX, uZ, 0, 0, rX, rZ), i.e., it ∈T ∪ { } projects onto VX VZ VR VR . Thus Lt is a 1-level CL function. ⊕ ⊕ X ⊕ Z 5. For W X, Z , define L = 0 as the identically 0 function. Thus it is a 0-level CL function. ∈ { } PAULI,W

We note that the maps L OINT , L INE , L INE act as the zero function on V VR VR where P ,W AL ,W DL ,W W ⊕ X ⊕ Z W = X if W = Z and W = Z if W = X.

27 PAULI The range VX VI VV or VZ VI VV of LPOINT is embedded in V in the natural way. The same convention is used ⊕ ⊕ ⊕ ⊕ also in Item 2 and 3 for LALINE and LDLINE .

71 The question distribution of GPAULI is thus a typed CL distribution on PAULI VPAULI PAULI PAULI T × ×T PAULI× V where (tA, xA, tB, xB) is sampled by first uniformly sampling an edge (tA, tB) from the graph G PAULI defined in Figure 6, sampling a uniformly random z V , and then setting x = Lt (z) for w ∈ w w ∈ A, B . { } Decision procedure. The decision procedure for GPAULI is presented in Figure 7. Similarly to Figure 2, we provide a table that summarizes a parsing scheme for the questions and answers, depending on the type of question. The answers are bit strings that are interpreted as more structured objects such as ele- ments over Fq, vectors, or polynomials, depending on the question. In the “low-degree check”, the de- cision procedure PAULI calls the classical low-degree decision procedure LD parametrized by the tuple D D ldparams =(q, m, d, 1) (defined in Section 7.1) as a subroutine.

72 Type Question Content Answer Format

(POINT, W) y Fm Element of F ∈ q q (ALINE, W) (u , s) Fm F Polynomial f : F F W ∈ q × q q → q (DLINE, W) (u , s, v) Fm F Fm Polynomial f : F F W ∈ q × q × q q → q m 2 2 2 PAIR (uX, uZ, rX, rZ) (F ) F (βX, βZ) F ∈ q × q ∈ 2 m 2 2 (PAIR, W) (uX, uZ, rX, rZ) (F ) F Element of F ∈ q × q 2 m 2 2 3 CONSTRAINT (uX, uZ, rX, rZ) (F ) F (α , α , α ) F i ∈ q × q v1 v2 v3 ∈ 2 m 2 2 VARIABLE (uX, uZ, rX, rZ) (F ) F Element of F j ∈ q × q 2 FM (PAULI, W) 0 Element of q Table: Question and answer format of the Pauli basis game.

PAULI On input (tA, xA, tB, xB, aA, aB), the decision procedure performs the following checks D for w A, B : ∈ { } 1. (Consistency check). If tA = tB, accept iff aA = aB. 2. (Low-degree). Let ldparams =(q, m, d, 1).

(a) If t = (POINT, W), t = (ALINE, W), accept if LD accepts w w Dldparams (POINT, xw, ALINE, xw, aw, aw). (b) If t = (POINT, W), t = (DLINE, W), accept if LD accepts w w Dldparams (POINT, xw, DLINE, xw, aw, aw).

3. (Consistency check). If tw = (POINT, W), tw = (PAULI, W), accept if gaw (xw) = aw, where g is the low-degree encoding of a FM defined in Section 3.4. aw w ∈ q In the remaining four cases, the decision procedure first computes the number

γ = tr (ind (uX)rX) (ind (uZ)rZ) , (52) m · m where we recall the ind ( ) vector from Section 3.4.  m · 4. (Commutation check). If t =(PAIR, W), t = PAIR, accept if a = β or γ = 0. w w w W 6 5. (Consistency check). If tw = (POINT, W), tw = (PAIR, W), accept if tr(awrW) = aw or γ = 0. 6 6. (Magic square check). If tw = CONSTRAINTi, tw = VARIABLEj, accept if γ = 0, or aw satisfies constraint CONSTRAINTi and αj = aw.

7. (Consistency check). If tw = (POINT, W), tw = VARIABLEj, accept if γ = 0 or if j = 1, W = X, and tr(awrX)= aw or if j = 5, W = Z, and tr(awrZ)= aw.

Figure 7: Specification of the decision procedure PAULI . Dqldparams

73 We describe an honest, value-1 PCC strategy for the Pauli basis test.

GPAULI Lemma 7.13. The Pauli basis test qldparams has a value-1 SPCC strategy. Proof. We begin by specifying the value-1 strategy S =( ψ , M). The state is | i M ψ = EPR ⊗ EPR . | i | qi ⊗ | 2i Now we specify the measurements. We start with measurements associated with questions of type POINT, ALINE,DLINE, and PAULI. Using notation introduced in Section 3.4, for W X, Z , ∈ { } (POINT,W), y W M a = τ[g (y)=a] I , · ⊗ (ALINE,W), ℓ W M f = τ[(g ) ℓ= f ] I , · | ⊗ (DLINE,W), ℓ W M f = τ[(g ) ℓ= f ] I , · | ⊗ (PAULI ) M ,W = τW I . a a ⊗ Here, the notation used to post-process measurement outcomes is defined in Definition 3.25. Thus, for example, the measurement on question ((POINT, W), y) corresponds to first performing the measure- W FM ment τh h FM , receiving an outcome h q , and then outputting the value a = gh(y). { } ∈ q ∈ For ALINE and DLINE questions, the question content ℓ denotes either an axis-parallel line ℓ(u0, ei) ℓ Fm for some i 1,2,..., m , or a diagonal line (u0, v′) for some v′ q . As described at the beginning ∈ { } F∈m F of Section 7.3.1, axis-parallel lines are specified by pairs (u0, s) q q and diagonal lines are speci- Fm F Fm ∈ × ℓ fied by triples (u0, s, v) q q q . The measurement on question ((ALINE, W), ), for example, ∈ × × W FM coresponds to first performing the measurement τh h FM to get an outcome h q , and reporting the { } ∈ q ∈ univariate polynomial fℓ(t)= gh(u0 + tei) where ℓ is specified by (u0, s) and i = χ(s). Next we specify the POVMs associated with questions of type CONSTRAINT,VARIABLE, and PAIR. Fm 2 Questions with these question types have a question content that is a tuple ω =(uX, uZ, rX, rZ) ( q ) F2 F ∈ × q. Given such a tuple, consider the two 2-valued POVMs A = Ab b F and B = Bb b F defined as { } ∈ 2 { } ∈ 2 A = τX I , (53) b [tr(g (uX) rX)=b] · ⊗ B = τZ I . (54) b [tr(g (uZ) rZ )=b] · ⊗

We would like to determine when the two observables A = A0 A1 and B = B0 B1 commute or anti-commute. Towards this we derive alternative expresO sions for− these observablesO from− which their commutativity becomes plain from inspection. We begin by inspecting the first matrices on the right-hand sides of Equations (53) and (54):

τW = τW [tr(g (uW ) rW )=b] ∑ h · h:tr(gh(uW) rW )=b W = ∑ τh , (55) h:tr((h ind (u )) r )=b · m W · W

74 where (55) follows by Definition 13. As a result,

τW τW [tr(g (uW ) rW )=0] [tr(g (uW ) rW )=1] · − · W W = ∑ τh ∑ τh h:tr((h ind (u )) r )=0 − h:tr((h ind (u )) r )=1 · m W · W · m W · W tr((h indm(uW )) rW ) W = ∑( 1) · · τh h − W = τ (indm(uW)rW ) , (56) where the last step uses (18). As a result, Equation (56) and Equations (53) and (54) imply that

= τX(ind (u )r ) I, = τZ(ind (u )r ) I . OA m X X ⊗ OB m Z Z ⊗ Now, let γ = tr((ind (u )r ) (ind (u )r )) F , as in Equation (52). Then by Equation (16), m X X · m Z Z ∈ 2 =( 1)γ . OAOB − OBOA As a result, γ quantifies whether the observables and commute, and therefore whether the measure- OA OB ments A and B commute. If γ = 0, they commute. If γ = 1, they anti-commute. We now specify the POVMs associated with questions of type CONSTRAINT,VARIABLE, and PAIR, considering separately the cases when γ = 0 or γ = 1.

1. If γ = 0, for each β , β F define X Z ∈ 2 PAIR, ω M = Aβ Bβ , (57) βX, βZ X · Z

(PAIR,X), ω (PAIR,Z), ω Ma = Aa , Ma = Ba .

The POVM associated with questions of type CONSTRAINT and VARIABLE are defined to be trivial. In particular, we define VARIABLE CONSTRAINTi, ω j, ω M 0, 0, 0 = M 0 = I , and the remaining POVM elements in these measurements are set to be zero.

CONSTRAINT , ω 2. If γ = 1, then A and B anti-commute. In this case we define measurements M i and ARIABLE O O MV j, ω, for all i 1,...,6 and j 1,...,9 to be those guaranteed by Theorem 7.11. ∈ { } ∈ { } Measurements associated with inputs of type PAIR are defined to be trivial. In particular, we define

PAIR, ω (PAIR,X), ω (PAIR,Z), ω M0, 0 = M0 = M0 = I , and the remaining matrices in these measurements are set to be zero.

This completes the specification of the strategy. Now, we show that S is a value-1 SPCC strategy. It is clearly symmetric and projective. To show that it is consistent, we note that all measurements are Pauli basis measurements, which are consistent, or measure- ments produced by Theorem 7.11, which are also consistent. The only exception is the PAIR measurement in the γ = 0 case, which by Equation (57) is a product of two commuting, consistent measurements, and so it is therefore also consistent. To show that it is commuting, we note that for each W X, Z , all ∈ { }

75 (POINT, W), (ALINE, W), (DLINE, W) and (PAULI, W) measurements commute as they are all measure- ments in the Pauli W basis. Next, if γ = 0 then the CONSTRAINT and VARIABLE measurements commute trivially, and for W X, Z , the (POINT, W) measurement commutes with the (PAIR, W) measure- ∈ { } PAIR,ω ment, as they are both W basis measurements, and the measurement M = Aβ Bβ commutes with βX, βZ X · Z (PAIR,W),ω M because A and B commute. On the other hand, if γ = 0, then the PAIR measurements com- a 6 mute trivially, and the CONSTRAINT and VARIABLE measurements commute by Theorem 7.11. Finally, the (POINT, X) measurement commutes with VARIABLE1 because both are X-basis measurements, and likewise both (POINT, Z) and VARIABLE5 are Z-basis measurements. It remains to show that S is value-1. Consider first the first three tests executed by the decision proce- dure in Figure 7. The strategy passes the consistency checks with probability 1 because it is projective and consistent. It passes the low-degree checks because it answers those consistently with an honest strategy in the classical low-degree test. Next, consider the remaining four tests. Fix an ω = (uX, uZ, rX, rZ) and γ as in (52). If γ = 0, then the strategy passes the commutation check with probability 1 by construction. As for the consistency check in Item 5, we can write the (POINT, W) measurement as follows:

(POINT,W), y W (PAIR,W),ω M = τ I = Ma I. [tr( rW )=aw] [tr(g (y)rW)=aw] w · · ⊗ ⊗ As a result, due to the consistency of the (PAIR, W) measurement, the consistency check is passed with prob- ability 1. On the other hand, if γ = 0, then the strategy passes the Magic Square check by Theorem 7.11. 6 As for the consistency check in Item 7, we can write the (POINT, X) measurement as follows:

(POINT,X), y M = τX I = MVARIABLE1, ω I, [tr( rX)=aw] [tr(g (y)rX)=aw] aw · · ⊗ ⊗ where the last step is by Theorem 7.11. As these measurements are consistent, this test is passed with probability 1.

Soundness of the Pauli basis test. We state the soundness properties of the Pauli basis test. The following is an adaptation of the self-testing statement in [NW19, Theorem 6.4]. Theorem 7.14. There exists a function

a b b bmd δQLD(ε, m, d, q)= a(md) (ε + q− + 2− ) for universal constants a 1 and 0 < b < 1 such that the following holds. For all admissible parameter ≥ S GPAULI tuples qldparams = (q, m, d) and for all strategies = ( ψ , A, B) for the game qldparams that succeed with probability at least , there exist local isometries | i 1 ε φA : A A′ A′′ , φB : B B′ B′′ − q M H m→H ⊗H H →H ⊗H (where ψ A B and , = (C ) with M = 2 ) and a state AUX such A′′ B′′ ∼ ⊗ A′ B′ that | i∈H ⊗H H H | i∈H ⊗H M 1. φA φ ψ AUX EPR δQLD(ε, m, d, q), ⊗ B| i − | i ⊗ | qi⊗ ≤ 2. Letting A˜ x = φ Ax φ† and B˜ y = φ Byφ† , we have for W X, Z a A a A b B b B ∈ { } (PAULI ) A˜ ,W I (τW) I u ⊗ B′ B′′ ≈δQLD u A′′ ⊗ A′ B′B′′ (PAULI ) I B˜ ,W I (τW ) , A′ A′′ ⊗ u ≈δQLD A′ A′′B′ ⊗ u B′′ M where the δ statement holds with respect to the state AUX A B EPRq ⊗ and the answer ≈ QLD | i ′ ′ ⊗ | iA′′B′′ summation is over u FM. ∈ q 76 We provide a proof of Theorem 7.14 in Appendix A. The strategies described in the proof of Lemma 7.13 and in the conclusion of Theorem 7.14 are de- scribed in terms of qudit Pauli measurements and maximally entangled states defined over q-dimensional qudits. However, it is more convenient for our application of the Pauli basis test to instead deal with qubits. In particular, we use the Pauli basis test in the “introspection game” of Section 8, where the players are com- manded to sample questions according to a sampler of a normal form verifier. By definition of normal form verifier, is a sampler over F , and therefore itS is natural to use a test for qubit Pauli measurements. S 2 Since we use field sizes q that are powers of 2, Lemma 3.26 shows that the conclusions of Theorem 7.14 can also be described in terms of testing for maximally entangled states over qubits and qubit Pauli mea- surements.

Corollary 7.15. There exists a function

a b b bmd δQLD(ε, q, m, d)= a(md) (ε + q− + 2− ) for universal constants a 1 and 0 < b < 1 such that the following holds. For all admissible parameter ≥ S GPAULI tuples qldparams =(q, m, d) and for all strategies =( ψ , A, B) for the game qldparams that succeeds with probability at least , there exist local isometries | i 1 ε φA : A A′ A′′ , φB : B B′ B′′ − 2 M log q H →Hm ⊗H H →H ⊗H (where ψ A B and , = (C ) with M = 2 ) and a state AUX A′′ B′′ ∼ ⊗ A′ B′ such that| i∈H ⊗H H H | i∈H ⊗H

M log q 1. φA φ ψ AUX EPR δQLD(ε, m, d, q), ⊗ B| i − | i ⊗ | 2i⊗ ≤ 2. Letting A˜ x = φ Ax φ† and B˜ y = φ Byφ † , we have for W X, Z a A a A b B b B ∈ { } (PAULI ) A˜ ,W I (σW ) I u ⊗ B′ B′′ ≈δQLD u A′′ ⊗ A′ B′ B′′ PAULI I B˜ ( ,W) I (σW ) A′ A′′ ⊗ u ≈δQLD A′ A′′ B′ ⊗ u B′′ M log q where the δQLD statement holds with respect to the state AUX A B EPR2 ⊗ and the answer ≈ | i ′ ′ ⊗ | iA′′B′′ summation is over u FM. Here, σW denotes the tensor product of single-qubit Pauli measurements ∈ q u

M log q W σuij Oi=1 Oj=1 F Flog q where κ(ui)=(uij)1 j log q (using the bijection κ : q 2 from Section 3.3). ≤ ≤ → Proof. Lemma 3.26 implies that there are local isometries that map EPR M to EPR M log q, and map | qi⊗ | 2i⊗ the generalized Pauli measurements W to M log q W where ( )=( ). Combining τu i=1 j=1 σuij κ ui ui1,..., ui log q this with the isometries guaranteed by Theorem 7.14, we obtain the statement of the Corollary. N N

7.3.2 Canonical parameters and complexity of the Pauli basis test We specify a canonical setting of the parameter tuple qldparams as a function of an integer R. We call this canonical setting introparams because we use these in the “introspection” section (Section 8). We then give bounds on the complexity of computing the decision procedure and CL functions of the Pauli basis test corresponding to the parameter tuple qldparams, as a function of R.

77 Definition 7.16 (Canonical parameters of the Pauli basis test). Let a, b be the universal constants specified in Theorem 7.14. Let c denote the smallest even integer that is at least (b + 2a)/b. For all integers R 2, ≥ define the tuple introparams(R)=(q, m, d) where

q = 2k for k = c log log R + 1. • ⌈ ⌉ m = c log R + 1. • ⌈ ⌉ d = 1. • Intuitively, this choice of parameters is such that the Pauli basis test certifies the presence of an M = 2m- qudit EPR state (of local dimension q), or equivalently an M log q-qubit EPR state.

Lemma 7.17. For all integers R 2, the parameter tuple introparams(R) is admissible (see Defini- ≥ tion 7.12), and furthermore there exist universal constants a 1 and 0 < b 1 such that the function ′ ≥ ′ ≤ δQLD(ε, m, d, q) from Theorem 7.14 is at most

a b b a′(log(R) ′ ε ′ + log(R)− ′ ) . · Proof. We verify the admissibility of introparams(R) first: the field size q = 2k is admissible because k is odd. Next, we show the desired bound on the error function δQLD(ε, m, d, q). We show this by first establish- bmd b ing that 2− q− , which is true because k = m. Note that for R 2, we have m 2c log R, and ≤ c ≤ ≥ ≤ furthermore q = 2k log R. Putting everything≥ together, we get

a b b bd δQLD(ε, m, d, q)= a(md) (ε + q− + 2− ) (58) a 2a b b a(2c) log (R)(ε + 2q− ) (59) ≤ a 2a b 2a cb 2a(2c) (log R) ε +(log R) − (60) ≤  a b b  a′((log R) ′ ε +(log R)− ) (61) ≤ a where a′ is defined as the quantity 2a(2c) and we used the fact that cb 2a b. This completes the proof of the lemma. − ≥

Lemma 7.18. Fix an integer R 2, let introparams(R)=(q, m, d). The integers (q, m, d)= introparams(R) ≥ can be computed in time polylog R when given the binary representation of R as input. Proof. This follows from the definitions of the parameters in Definition 7.16.

Lemma 7.19. Let R 2 be an integer, and let introparams(R) denote the parameter tuple specified by Definition 7.16. ≥

1. The time complexity of the decision procedure PAULI parameterized by introparams(R) is poly(R). D 2. The time complexity of computing marginals of the CL functions Lt as well as the associated factor spaces, for t PAULI, is polylog R. ∈T

78 3. The Turing machine description of the decision procedure PAULI parameterized by introparams can D be computed from the binary presentation of R in polylog(R) time.

Proof. Finite field arithmetic over Fq can be performed in time polylog q, by Lemma 3.20. The parameters of introparams(R), which the decision procedure PAULI implicitly computes given R, can be computed D in time polylog(R), by Lemma 7.18. The most expensive step in PAULI is to evaluate the low-degree FM Fm D encoding ga(y) where a q and y q , which takes time poly(M,log q)= poly(R) by Lemma 3.23. ∈ ∈ PAULI The complexity of computing the CL functions Lt for types t is dominated by the complexity ∈T of computing the CL functions LALINE, LDLINE, and LPOINT from the classical low-degree test, which takes time poly(m,log q)= polylog(R). The factor spaces of Lt depend only on the question type t (of which there are only constantly many), and outputting the length-m indicator vectors of the factor spaces takes O(m)= polylog(R) time. Finally, the time complexity of computing the description of PAULI from the binary representation D of R requires polylog R time, because the checks performed in the decision procedure PAULI depend on D introparams which ultimately depends on R; we assume that the decision procedure computes the parameter tuple introparams based on R. Thus the time complexity is dominated by the time to write R in binary.

79 8 Introspection Games

8.1 Overview Consider a normal form verifier =( , ) (see Definition 5.31). In this section we design a normal form V S D verifier INTRO = ( INTRO, INTRO) (called the introspective verifier) such that in the n-th game INTRO V S D Vn (called the introspection game; see Definition 5.31 for the definition of the n-th game associated with a normal form verifier) the verifier expects the players to sample for themselves questions x and y distributed n as their questions in the game N for index N = 2 —this is the “introspection” step. Note the exponen- tial separation of the indices ofV the introspection game versus the original game! Our construction of the introspection game generalizes the introspection technique of [NW19]. Recall the execution of the N-th game corresponding to the “original” verifier (see Definition 5.31). V Let n 1, N = 2n, and suppose that the CL functions of on index N are LA, LB acting on an ambient ≥ S space V. In the game , the verifier first samples z V uniformly at random and gives each player VN ∈ w A, B the question x = Lw(z). In this context, the string z is referred to as the “seed”. The players ∈ { } w respond with answers aA and aB, respectively, and the verifier accepts or rejects according to the output of (N, xA, xB, aA, aB). D In the introspection game, with some constant probability independent of n, the verifier INTRO sends Vn the question pair (INTRO, A) to player w and (INTRO, B) to the other player w, where w A, B and ∈ { } recall that w = B if w = A and w = A otherwise. The verifier expects player w to measure their share of the state EPR using a coarse-grained Z-basis measurement whose outcomes range over LA(V), and | iV similarly player w measures the state EPR V using a coarse-grained Z-basis measurement with outcomes B | i that range over L (V). If the players perform these measurements honestly, then the outcomes (xA, xB) are distributed exactly according to µ , N, the question distribution of the game N. Players w and w are then S V expected to respond with the question xw and xw that they each sampled, together with strings aw and aw, respectively, which are intended to be the answers for the question pair (xA, xB) in the game . In other VN words, the players introspectively sample the question pair (xA, xB) and then respond with the question itself and an answer for it. To facilitate comprehension, we call the players that interact with the introspective verifier INTRO the “introspecting players”, and the players that interact with the “original” verifier the “original players”.V To V INTRO ensure that the introspecting players follow the above procedure honestly, the introspective verifier n first uses the (binary) Pauli basis test described in Section 7.3 to force the introspecting players to shareV the state EPR . The Pauli basis test also ensures that the players measure σW and report the outcome honestly | iV when they receive questions (PAULI, W) for W X, Z . For v A, B the verifier cross-checks the ∈ { } ∈ { } question pairs (INTRO, v) and (PAULI, Z) to enforce that the honest measurement is performed for question (INTRO, v). The main difficulty in the soundness analysis is to ensure that the answer of player w who received question (INTRO, v) depends only on Lv(z) and not on any other information about the string z. First assume for simplicity that Lv is a linear function. As shown below (based on Lemma 8.5), Lv(z) can be obtained by measuring a specific collection of σZ observables; namely, the set

Z v σ (u) : u ker(L )⊥ . (62) { ∈ } To prevent the player from obtaining any additional information the verifier needs to enforce that the player does not additionally measure any σZ(u) for u / ker(Lv) . (We refer to any such σZ as a “prohibited” ∈ ⊥ σZ.)

80 The introspective verifier achieves this by sometimes sending the READ type question (READ, v) or HIDEk type questions (HIDEk, v) to the players. When receiving the READ question, the players are required to also measure observables from the set

σX (r) : r ker(Lv) , (63) { ∈ } which (as shown in Lemma 8.6 below) commute with every Z-basis measurement in (62). On the other hand, any prohibited σZ(u) must anticommute with at least one of the σX(r), as otherwise u would be in v X ker(L )⊥. As a result, honestly measuring the σ observables of (63) has the effect of preventing the player from measuring any of the prohibited σZ observables, so that the answer a can effectively only depend v v on the question L (z). (In the protocol, the verifier asks the player to measure the function (L )⊥(z) in the X basis, rather than all of the σX observables. By Lemma 8.5 the two are equivalent.) Similarly to how questions (PAULI, Z) and (INTRO, v) are cross-checked, the questions (PAULI, X), (HIDEk, v) and (READ, v) are cross-checked in order to ensure that the honest X-basis measurements are performed for the HIDEk questions and the READ questions. When the CL functions Lv are ℓ-level for ℓ > 1, the introspective verifier sends one of O(ℓ) different hiding questions to the players, chosen at random; together these hiding checks ensure that each of the con- stituent linear maps of Lv are honestly measured. Intuitively, these hiding questions “interpolate” between questions (PAULI, Z), (INTRO, v) and (PAULI, X) in a way that, for all pairs of questions asked by the veri- fier, the honest measurements commute. (See Figure 10 for an overview of the honest measurements.) This strong commuting property of the strategy is essential for the oracularization transformation in Section 9 to be possible. A key property of the introspection game is that the distribution of questions (which include the Pauli basis test questions as well as the introspection questions and the hiding questions) is also condition- ally linear. This means that the introspection game can be ultimately specified by a normal form verifier INTRO =( INTRO, INTRO), which is crucial for recursive compression of games. Moreover, while the time complexityV S of the introspectiveD verifier’s decider INTRO remains polynomially related to that of on index D D N, the time complexity of the sampler INTRO is polylog(N) (exponentially smaller), due to the efficiency of the Pauli basis test. Finally, INTRO Sonly depends on through the number of levels ℓ of as well as upper bounds on the time complexitiesS of and . V S S D 8.2 The introspective verifier Let =( , ) be an ℓ-level normal form verifier. We call , , and the “original” verifier, sampler, and decider,V respectively.S D For all original verifier , we defineV theS introspectiveD verifier INTRO in this section. V V We use N = 2n to denote the index of the original verifier that is simulated by the introspective verifier V INTRO on index n. V The introspective verifier corresponding to and parameters (λ, ℓ) is a typed normal form verifier V ˆ INTRO = ( ˆINTRO, ˆ INTRO), sketched in Section 8.1 and specified in detail in the present section (see SectionV 6 forS the definitionD of typed normal form verifiers). In the following descriptions of the sampler ˆINTRO and decider ˆ INTRO, all parameters are functions of the index n, the number of levels ℓ of the sampler S D , and the parameter λ; we often leave this dependence implicit. S Let R = Nλ, and let introparams(R)=(q, m, d) denote the parameter tuple specified in Section 7.3.2. Note that introparams is implicitly a function of n (since R is a function of n). The associated parameter tuple introparams is intended to parametrize a Pauli basis test that certifies an R log q-qubit EPR state; intuitively, the R log q-qubit EPR state is meant to serve as the source of randomness for the CL functions of the original sampler . S 81 Recall that the players in the introspection game are referred to as “introspecting players” and the players in the original game are referred to as “original players”. We use the following notation in order to distin- guish between questions and answers meant for the introspecting players versus the original players. The questions and answers of the introspecting players are denoted by hatted variables (e.g., xˆ and aˆ). Similarly, the associated question types are denoted using hatted variables tˆ. The questions and answers of the original players in the original game are denoted using non-hatted variables (e.g. x, a, and so on). VN Types and type graph. The type set INTRO for the introspective verifier ˆ INTRO is T V ℓ INTRO = PAULI INTRO, SAMPLE, READ HIDE A, B , T T ∪ ∪ k × { } k=1   [    where PAULI is the type set of the Pauli basis test, defined in Section 7.3. The type graph GINTRO is specified in FigureT 8.

(DLINE, X) (ALINE, X) (HIDE1, A) (HIDE2, A) (HIDEℓ, A) · · · (PAULI, X) VARIABLE1 (POINT, X)

CONSTRAINT1 (PAIR, X) (HIDE1, B) (HIDE2, B) · · · (HIDEℓ, B) VARIABLE2

PAIR CONSTRAINT 2 VARIABLE3 (PAIR, Z) (SAMPLE, B) (INTRO, B) (READ, B) CONSTRAINT VARIABLE 3 4 (POINT, Z) (PAULI, Z) VARIABLE5 CONSTRAINT4 (SAMPLE, A) (INTRO, A) (READ, A) VARIABLE6 (DLINE, Z) (ALINE, Z) CONSTRAINT5

VARIABLE7 CONSTRAINT6

VARIABLE8

VARIABLE9

Figure 8: Type graph GINTRO for the introspection game. Each vertex also has a self-loop which is not drawn in the figure for clarity.

Sampler. We first define a 3-level ( INTRO, GINTRO)-typed sampler ˜INTRO, which has field size q(n) and T S dimension 3m(n)+ 3, where q(n) and m(n) are specified by introparams(R). Note that the dimension is the same as that of the ambient space VPAULI of the CL functions of the Pauli basis test for R log q qubits, specified in Section 7.3.1. Fix n N. We specify the CL functions of ˜INTRO on index n. Since the functions LA, n and LB, n are ∈ S tˆ tˆ identical for all n and tˆ, we omit the superscripts A and B. For types tˆ PAULI, the CL functions Ln are ∈ T tˆ given by those specified in Section 7.3.1, parameterized by introparams(R). For types tˆ INTRO PAULI, ∈T \T the associated CL functions are defined to be 0-level CL functions (i.e., they are the 0 map). This means that

82 for question types such as tˆ = (INTRO, v) or tˆ = (SAMPLE, v) for v A, B the associated question is solely comprised of the question type label. ∈ { } Finally, we define the typed sampler ˆINTRO to be κ( ˜INTRO), the downsized sampler (Definition 6.6) S S corresponding to ˜INTRO. The typed CL functions associated with the Pauli basis test are 3-level; using S Remark 4.2 and Lemma 4.9 it follows that ˆINTRO is a 3-level typed sampler. The following lemma establishes the complexityS of the sampler ˆINTRO as well as the complexity of S computing a description of it from the parameters (λ, ℓ).

Lemma 8.1. There exists a 2-input Turing machine IntroSampler that on input (λ, ℓ) outputs a description of the sampler ˆINTRO in time polylog(λ, ℓ). Furthermore, S ℓ 1. TIME ˆINTRO (n)= poly(n, λ, ), and S 2. ˆINTRO is a 3-level typed sampler. S Proof. Define the following 9-input Turing machine RawIntroSampler, that does not depend on any param- eters (so that its description length is constant). On input (λ, ℓ, n, x1,..., x6), RawIntroSampler computes the output of the typed sampler ˆINTRO (parameterized by (λ, ℓ)) with input tapes set to (n, x ,..., x ). S 1 6 In more detail, the Turing machine RawIntroSampler first computes introparams(R) for R = Nλ and N = 2n. Using Lemma 7.18, this computation takes time poly log(R). Next, depending on the contents of the last 7 input tapes of RawIntroSampler, the Turing machine evaluates the dimension of ˆINTRO (which S can easily be computed from introparams(R)), or one of the CL functions, or returns a representation of a factor space of ˆINTRO. If the type passed as input is tˆ PAULI then by Lemma 7.19 this takes time S NTRO AULI ∈ T polylog(R). If tˆ I P then this can be done in O(log ℓ) time (to read the input type). Overall, ∈T \T RawIntroSampler runs in time poly log(R, ℓ). We now define the Turing machine IntroSampler as follows. On input (λ, ℓ), IntroSampler outputs the description of RawIntroSampler with the first two input tapes hardwired to λ and ℓ, respectively, yielding the sampler ˆINTRO corresponding to parameters (λ, ℓ). Computing this description takes O(log λ + log ℓ) time. S The time complexity of ˆINTRO follows from the time complexity of RawIntroSampler and the number of levels is by construction. S

Decider. The typed decider ˆ INTRO is specified in Figure 9. We explain how to interpret the figure, includ- ing the notation. (It may alsoD be helpful to review the description of the intended strategy for the players in the game, described in Section 8.3.)

Question and answer format. The decider takes as input a tuple (n, tˆA, xˆA, tˆB, xˆB, aˆA, aˆB) where (tˆw, xˆw) denotes the question for introspecting player w A, B , and aˆw denotes their answer. As in the specifi- cation of the Pauli basis test, in Figure 9 we include∈ { an “answer} key” indicating how the players’ answers are parsed, depending on the question type. When the question type is from PAULI, the question and an- swer format are as described in Figure 7. When the question type is in INTROT PAULI, the answer format is described in the table at the top of Figure 9.28 For each such question,T there\T is an associated variable v A, B that indicates to the introspecting player which original player it is supposed to impersonate in the∈ introspection { } game.

28There is no question format specification for the question types in INTRO PAULI , because the question is solely comprised of the question type label. T \ T

83 Type Answerformat

(INTRO, v)(y, a) V 0,1 ∈ × { }∗ (SAMPLE, v)(z, a) V 0,1 ∈ × { }∗ (READ, v)(y, y , a) V V 0,1 ⊥ ∈ × × { }∗ (HIDE , v)(y, y , x) V V V k ⊥ ∈ × ×

In the following, whenever or is called as a subroutine, ˆ INTRO aborts and rejects if the S λ D D subroutine takes more than N time steps. On input (n, tˆA, xˆA, tˆB, xˆB, aˆA, aˆB), the decider ˆ INTRO first computes the dimension s(N) of V by calling the original sampler on input D λ S (N, DIMENSION). The decider rejects if max s(N), aˆA , aˆB 9N . | | | | ≥ Otherwise, it performs the following tests for all w, v A, B . (If no test applies, the decider  accepts.) ∈ { }

PAULI PAULI 1. (Pauli test). If tˆA, tˆB , the decider accept if accepts the input ∈ T Dintroparams (tˆA, xˆA, tˆB, xˆB, aˆA, aˆB). 2. (Sampling tests). V (a) If tˆw =(PAULI, Z) and tˆw =(SAMPLE, v), accept if aˆw = zw. v (b) If tˆw =(INTRO, v), tˆw =(SAMPLE, v), accept if yw = L (zw) and aw = aw. 3. (Hiding tests).

(a) If tˆw =(INTRO, v), tˆw =(READ, v), accept if yw = yw and aw = aw.

(b) If tˆw =(HIDEℓ, v), tˆw =(READ, v), accept if yw, <ℓ = yw, <ℓ, and yw⊥ = yw⊥. (c) If tˆ =(HIDE , v), tˆ =(HIDE , v) for some k 1,2,..., ℓ 1 , accept if w k w k+1 ∈ { − }

yw, k+1 = xw, >k+1 , ≤ ≤

v ⊥ and y⊥w k+ = L + (xw, k+1) where u = yw, k. , 1 k 1, u ≤ V V>  v ⊥ 1 1 (d) If tˆw = (PAULI, X), tˆw = (HIDE1, v), accept if yw⊥, 1 = L1 (aˆw ) and aˆw = x > . w, 1  4. (Game test). If tˆ = (INTRO, A) and tˆ = (INTRO, B), accept if accepts w w D (N, yw, yw, aw, aw).

5. (Consistency test). If tˆA = tˆB, accept if and only if aˆA = aˆB.

Figure 9: The typed decider ˆ INTRO for the introspective verifier, parameterized by integers λ, ℓ, on index n D v n. N denotes 2 , V is the ambient space for , and L v A,B the associated CL functions on index N. S { } ∈{ }

84 In the figure, V denotes the ambient space of the original sampler on index N = 2n. Since V is Fs(N) S isomorphic to 2 , where s(N) is the dimension of , the space V is identified in a canonical way as the FR S register subspace of 2 spanned by e1,..., es(N) where ei is the i-th elementary basis vector. For example, if tˆ = (READ, v), then syntactically the player’s answer is a triple (y, y , a) in FR FR 0, 1 . w ⊥ 2 × 2 × { }∗ We assume that the decider computes the dimension s(N) of the subspace V by calling on input D S (N, DIMENSION), and if y, y⊥ are not presented as vectors in the subspace V, then the decider rejects. Thus in the analysis we directly consider y, y⊥ as vectors in V. In Figure 9, the components of the players’ answers are subscripted by the player index. For example, if player w receives question (HIDEk, v), then their answers are denoted by (yw, yw⊥, xw). The notation used in the “answer key” is meant to give an indication of the intended meaning of the players’ answers. We use y to denote a vector that is supposed to be the result of measuring a CL function v L ; y⊥ is supposed to be the result of measuring “dual” linear maps L⊥ (as used in Step 3 in Figure 9); x is supposed to be the result of σX measurements, and z is supposed to be the result of σZ measurements. We use a to denote the introspected answers meant for the original decider . D CL functions and factor spaces. For v A, B , let Lv = Lv, N denote the CL function for original ∈ { } player v specified by on index N = 2n. For z V, the decider computes Lv(z) by calling on input S ∈ D S (N, v, MARGINAL, ℓ, z). ℓ v v For y V and k 1,..., we define register subspaces Vk (y) by induction on k. For k = 1, V1 (y) ∈ ∈29 { v } v < is the first factor space of L and is independent of y. Suppose Vj (y) has been defined for all j k. Then v k 1 v v v v we define the marginal space Vk(y) to v ≤ be the complementary register subspace to V k(y) within V. ˆ INTRO ≤ v The decider computes factor spaces Vj (y) from y V in the following sequential manner: D v ∈ first, the indicator vector for the factor space V1 is computed by calling the original sampler on input v ℓ S (N, v, FACTOR, 1, 0). Let y1 denote the projection of y to V1 . Then, for j 2,..., , the factor space v ∈ { } Vj (y) is computed by calling on input (N, v, FACTOR, j, y j 1), where y j 1 is the projection of y to v S ≤ − ≤ − V j 1(y). ≤ − We give more details on the implementation of decider ˆ INTRO specified in Figure 9. D 1. The decider ˆ INTRO first checks that the answers are not too long; the maximum length answer should D be either a tuple (y, y , a) where y, y V and a is an answer intended for the original decider on ⊥ ⊥ ∈ D index N (which we assume runs in time at most Nλ), or a tuple (y, y , x) V V V. This check ⊥ ∈ × × is necessary in order to ensure that the decider ˆ INTRO halts in time poly(N). D 2. If the question types for the players are both in PAULI, ˆ INTRO executes the decision procedure PAULI for the Pauli basis test parametrized by introparamsT (seeD Section 7.3.1 for the definition of DPAULI). D The Pauli basis test is intended to ensure that the players share an R log-qubit EPR state, where R = Nλ. This number of qubits is chosen so that, under the assumption that the original verifier is V λ-bounded (see Definition 5.30), the time complexity of on index N (and therefore its dimension) is S at most R = Nλ. When analyzing the Completeness, Soundness, and Entanglement properties of the introspective verifier INTRO, we assume that the original verifier is λ-bounded (see Theorem 8.3). V V 3. In Step 2a of ˆ INTRO, player w A, B receives question (PAULI, Z) and player w receives question D ∈ { } (SAMPLE, v), for some v A, B . According to the answer key, player w is expected to return an ∈ { } 29See Lemma 4.4 for the definitions of factor spaces of a CL function.

85 answer aˆ FR and player w is expected to respond with a pair (y , a ) V 0, 1 . Thus, w ∈ 2 w w ∈ × { }∗ the dimension of answer aˆw may be larger than that of yw; this is the reason that Step 2a checks consistency between yw and the projection of aˆw to V.

4. In Step 2b of ˆ INTRO, player w A, B receives question (INTRO, v) and player w receives question D ∈ { } (SAMPLE, v). As specified by the “answer key”, player w responds with (y , a ) V 0, 1 and w w ∈ × { }∗ player w responds with (z , a ) V 0, 1 . The decider ˆ INTRO checks that a = a and w w ∈ × { }∗ D w w y = Lv(z ) where Lv denotes the CL function for player v specified by for index N = 2n. The w w S CL function is computed by calling on input (N, v, MARGINAL, ℓ, z). S INTRO 5. In Step 3b, ˆ checks that the answer of player w who received (HIDEℓ, v) is consistent with the D answer of player w who received (READ, v). One of the checks is that yw,<ℓ = yw,<ℓ ; these are, v v respectively, the projections of yw to V<ℓ(yw) and yw to V<ℓ(yw).

6. In Steps 3c, the vectors xw, >k+1 and xw, >k+1 denote the projections of xw and xw to the factor spaces v v V>k+1(yw) and V>k+1(yw), respectively. Similarly, yw⊥, k+1 denotes the projection of y⊥w to the factor v v space Vk+1(yw) and xw, k+1 denotes the projection of xw to the factor space Vk+1(yw). We emphasize 30 that the latter factor space depends on yw and not yw. The decider also has to compute (Lv ) (x ). According to Definition 3.13, this requires k+1, yw, k ⊥ w, k+1 ≤ specifying a basis for ker(Lv ) . To compute the value, the decider performs the following k+1, yw, k ⊥ steps: ≤

(a) Call on input (N, v, FACTOR, k + 1, yw, k) to obtain a subset H = h1,..., hm of the S ≤ { } Fs(N) v canonical basis for 2 that is a basis of the register subspace Vk+1(yw). (b) For i 1,2,..., m compute ci = (N, v, LINEAR, k + 1, yw, k, hi). Compute a matrix ∈ { } v S ≤ representation M for L in the basis H, whose columns are the vectors c ,..., cm as k+1, yw, k 1 v ≤ elements of Vk+1(yw). (c) Using a canonical algorithm for Gaussian elimination, compute a basis F for ker(M). (d) Compute the canonical complement S of F, as in Definition 3.8. S is a basis for ker(Lv ) . k+1, yw, k ⊥ ≤ (e) To compute (Lv ) on input x , compute the canonical linear map with kernel basis k+1, yw, k ⊥ w, k+1 ≤ S (see Definition 3.12) on input xw, k+1. FR 7. In Step 3d, the player w that receives (PAULI, X) is expected to return an answer aˆw in 2 . Part of this step checks that the projection of aˆw to V>1 is equal to xw, >1 (which is the projection to V>1 of the third component of the answer triple of player w that receives question (HIDE1, v)).

8.2.1 Complexity of the introspective verifier In this section we determine the complexity of the introspective verifier. The following lemma establishes the complexity of the decider ˆ INTRO as well as the complexity of computing a description of it from the D parameters (λ, ℓ) and the description of the original verifier =( , ). V S D 30 This is because it is the w player who receives the (HIDEk+1, v) question, and in the “honest” strategy (described in Sec- tion 8.3) they are supposed to measure k registers to obtain a sequence of values (yw,1, yw,2,..., yw,k) which specify the (k + 1)-st factor subspace, whereas the w player is only supposed to measure k 1 registers. −

86 Lemma 8.2. There exists a polynomial time Turing machine IntroDecider that on input ( , λ, ℓ) outputs a description of the decider ˆ INTRO. Furthermore, the decider ˆ INTRO has time complexity V D D λn ℓ TIME ˆ INTRO (n)= poly(2 , ). D Proof. Define the following 10-input Turing machine RawIntroDecider. The Turing machine does not de- pend on any parameters, so its size is constant. On input

( , λ, ℓ, n, x , x ,..., x ), V 1 2 6 RawIntroDecider computes the output of the decider ˆ INTRO with input tapes set to (n, x ,..., x ). In D 1 6 more detail, the Turing machine ˆ INTRO first computes introparams(R) for R = Nλ and N = 2n. Using D Lemma 7.18, this computation takes time polylog(r). It then executes the tests described in Figure 9. Write = ( , ). The complexity of performing the entire procedure is subsumed by the complexity of the followingV S steps:D

1. Running the decider PAULI, which takes time poly(R) by Lemma 7.19; D 2. Running the original decider (on index N = 2n) for at most Nλ steps; D 3. Running the original sampler (on index N) in order to compute the dimension s(N) and the marginal and factor spaces, andS the CL functions as described in Section 8.2. is called at most S poly(s(N), ℓ) times; due to the timeout, each call takes time at most Nλ.

v 4. Computing (Lk, u)⊥(xw, k) in Step 3c. This only requires to perform Gaussian elimination and other simple finite field manipulations that can be implemented in poly(s(N),log F ) time. | | All other tests are elementary. Thus the time complexity of RawIntroDecider is poly(r, ℓ). Note that the bound is independent of : due to the abort condition in the definition of ˆ INTRO, the Turing machine ˆ INTRO V D D aborts if the runtime of or is larger than Nλ. S D We now define the Turing machine IntroDecider: on input ( , λ, ℓ), it outputs the description of V RawIntroDecider with the first three input tapes hardwired to , λ, and ℓ, respectively, yielding the de- V cider ˆ INTRO corresponding to original verifier and parameters (λ, ℓ). Computing this description takes D V NTRO poly( ,log λ,log ℓ) time. The time complexity of ˆ I follows from the time complexity of the Tur- ing machine|V| RawIntroDecider. D

8.2.2 Introspection theorem We are now ready to state the introspection theorem. Theorem 8.3 (Introspection theorem). There is a polynomial time Turing machine IntroVerifier which takes as input a tuple ( , λ, ℓ) for λ, ℓ N and returns the description of the introspective verifier INTRO. For NTROV ∈ V all ℓ N, I is a 5-level normal-form verifier with complexity measures ∈ V

TIME INTRO (n)= poly n, λ, ℓ , S λn ℓ TIME INTRO (n)= poly 2 , . D Moreover, for all ℓ, there exist constants a > 0, 0 < b < 1 (depending on ℓ), and a function δ(ε, n) = a b b a (λn) ε +(λn)− such that for all n N and ε 0 the following statements hold if is a λ-bounded ℓ-level verifier. ∈ ≥ V 

87 1. (Completeness) If n has a projective, consistent, and commuting (PCC) strategy with value 1 then V2 INTRO also has a PCC strategy with value 1. Vn INTRO 2. (Soundness) If val∗( ) > 1 ε, then val∗( n ) 1 δ(ε, n). Vn − V2 ≥ − 3. (Entanglement) The entanglement bound E defined in Definition 5.13 satisfies

INTRO 2λn E ( , 1 ε) max E ( n , 1 δ(ε, n)), (1 δ(ε, n)) 2 . Vn − ≥ V2 − − n o Proof. The Turing machine IntroVerifier does the following on input ( , ℓ): it first computes V ˆINTRO = IntroSampler(λ, ℓ) , S ˆ INTRO = IntroDecider( , λ, ℓ) , D V using Lemmas 8.1 and 8.2, runs the detyping procedure from Definition 6.17 to compute

INTRO = Detype( ˆ INTRO), V V and then outputs the resulting detyped verifier. This takes time poly( ,log λ,log ℓ), a polynomial in the input length. The complexity parameters of the typed sampler ˆINTRO|Vand| typed decider ˆ INTRO follow from Lemmas 8.1 and 8.2. The complexity parameters of the detypedS verifier INTRO thenD follow from V Lemma 6.18. As ˆ INTRO is a 3-level verifier, INTRO is 5-level as detyping will increase the number of levels V V by at most 2. The completeness part of the theorem is proven in Section 8.3 and the soundness and entanglement properties are proven in Section 8.4.

8.3 Completeness of the introspective verifier

n In this section we establish the completeness property of the introspection game: if for N = 2 , N has INTRO V a PCC strategy (see Definition 5.11) with value 1, then so does n . For this we describe the actions that are expected of the players in the introspection game (i.e. theV “honest strategy”). We first prove several preliminary lemmas that will be used in both the completeness and soundness analysis.

8.3.1 Preliminary lemmas The lemmas in this section are stated for general fields F and generalized Pauli observables and projectors, although in the application to introspection games we use F = F2, ω = 1, and qubit Pauli observables W −W CFk and projectors. Furthermore, the Pauli observables τ (v) and projectors τu act on for some integer k (in our application, we set k = R).

Lemma 8.4 (Fact 3.2 of [NW19]). Let V be a subspace of Fk. For all v V , 6∈ ⊥ tr(u v) E ω · = 0, u V ∼ where the expectation is over a uniformly random vector u from V. The next lemma generalizes Eq. (19).

88 Lemma 8.5. Let L : Fk Fk be a linear map, and let W X, Z . For each a in the range of L, let → ∈ { } u Fk be such that L(u )= a. a ∈ a 1. For each v ker(L)⊥, ∈ W tr(ua v) W τ (v)= ∑ ω · τ[L( )=a] . a Fk · ∈ 2. For all a in the range of L, W E tr(v ua) W τ[L( )=a] = ω− · τ (v) . · v ker(L) ∼ ⊥ Proof. Let V denote the image of Fk under L. Let a V, v ker(L) . For all u, u L 1(a), we have ∈ ∈ ⊥ ′ ∈ − that u u′ ker(L) and thus tr(u v) = tr(u′ v). As a result, using (18), for any v ker(L)⊥ it holds that − ∈ · · ∈

W tr(u v) W tr(u v) W tr(ua v) W tr(ua v) W τ (v)= ∑ ω · τu = ∑ ∑ ω · τu = ∑ ω · τ[L( )=a] = ∑ ω · τ[L( )=a] , u Fk a V u L 1(a) a V · a Fk · ∈ ∈ ∈ − ∈ ∈ where in the last equality we used that for a / V, the projector τW vanishes. This shows the first item. ∈ [L( )=a] For the second item, · W W τ[L( )=a] = ∑ τu · u L 1(a) ∈ − tr(u v) W = ∑ E ω− · τ (v) Fk u L 1(a) v ∈ − ∼   tr((u +u) v) W = ∑ E ω− a · τ (v) Fk u ker(L) v ∈ ∼   ker(L) = E ω tr(u v) ω tr(ua v)τW (v) | Fk | ∑ − · − · v Fk u ker(L) | | ∈  ∼   tr(v u ) W = E ω− · a τ (v) , v ker(L) ∈ ⊥ where the second equality follows from (19) and the last uses the fact that Fk = ker(L) ker(L) , as | | | || ⊥| shown in Lemma 3.6, and Lemma 8.4 applied to the expectation over u. Lemma 8.6 (Commuting X and Z measurements). For all linear maps L, R : Fk Fk such that → ker(R)⊥ ker(L) , ⊆ the measurements Z and X commute. τ[L( )=b] b Fk τ[R( )=d] d Fk · ∈ · ∈ F Proof. Let b, d  k. If either b is not in the range of L, or d is not in the range of R, then at least one of Z X ∈ τ[L( )=b] or τ[R( )=d] is 0, and thus the operators trivially commute. Otherwise, both b and d are in the range · · of L and R, respectively. Let a L 1(b), c R 1(d). By Lemma 8.5, 0 ∈ − 0 ∈ − Z E tr(u a0) Z τ[L( )=b] = ω · τ (u) , · u ker(L) ∈ ⊥ X E tr(v c0) X τ[R( )=d] = ω · τ (v) . · v ker(R) ∈ ⊥ Z For any v ker(R)⊥, by assumption v ker(L), so for u ker(L)⊥ it holds that u v = 0. Thus τ (u) X ∈ ∈Z X ∈ · and τ (v) commute, and it follows that τ[L( )=b] and τ[R( )=d] commute as well. · · 89 8.3.2 Completeness of the introspective verifier

Proof of the completeness part of Theorem 8.3. We analyze the completeness of the typed verifier ˆ INTRO; the corresponding completeness of the detyped verifier INTRO = Detype( ˆ INTRO) follows from LemmaV 6.18, V V and the fact that the type set INTRO has size O(ℓ). T Completeness. We first show completeness. Let n 1 be an index for ˆ INTRO and N = 2n be the ≥ V corresponding index for . Since we assume that the verifier is λ-bounded, we have that V V max TIME (N), TIME (N) Nλ = R . (64) S D ≤  NTRO This assumption on the time complexity of ensures that ˆ I never aborts due to a timeout. Let Lv = V D Lv, N denote the CL function of the original sampler on index N corresponding to player v A, B . Let S ∈ { } R = Nλ, and let introparams(R)=(q, m, d), as in Section 8.2. Set Q = M log q where M = 2m ; this represents the number of qubits that are certified by the Pauli basis test parameterized by introparams(R). S S INTRO Let = ( AUX , A, B) be a PCC strategy for N with value 1. We first construct a PCC strategy n | i NTRO V for the typed verifier ˆ I with value 1. We then conclude using Lemma 6.18. Vn

v v v V1 V2 V3 AUX

(INTRO, v) σZ σZ σZ Ax/Bx L1 L2 L3 (SAMPLE, v) σZ σZ σZ Ax/Bx

(READ, v) σZ σX σZ σX σZ σX Ax/Bx L1 L1⊥ L2 L2⊥ L3 L3⊥

Z X Z X X (HIDE3, v) σ σ σ σ σ I L1 L1⊥ L2 L2⊥ L3⊥ Z X X X (HIDE2, v) σ σ σ σ I L1 L1⊥ L2⊥ X X X (HIDE1, v) σ σ σ I L1⊥ The left-most column denotes the introspection/hiding questions that a player may receive. The top row denotes the registers corresponding to the factor spaces of a CL function Lv (we note that the partition of the registers depends on the prefixes), as well as the register corresponding to the Z Z state AUX coming from the original PCC strategy S . We use σ as shorthand for σ v Lj [L ( )=xj] | i j, x

Figure 10: Summary of the honest strategy S INTRO for INTRO for a 3-level sampler. n Vn

INTRO Remark 8.7. Note that by definition in n the players receive questions (x, y) that are sampled - INTRO V cording to the distribution µG associated with the downsized typed sampler κ( ˜INTRO). Using the ˆINTRO , n S definition of the downsized typedS sampler, Definition 6.6, and Lemma 4.10 the distribution is identical to the

90 INTRO distribution µG , up to the bijective mapping κ. This mapping can be computed by the players them- ˜INTRO , n S INTRO selves. Therefore, we construct a strategy for players that receive questions from µG , and a strategy ˜INTRO , n INTRO S for questions from µG follows immediately. ˆINTRO , n S The strategy S INTRO uses the state EPR (Q+1) AUX , where recall that EPR = 1 ( 00 + n | 2i⊗ ⊗ | i | 2i √2 | i 11 ). For all register subspaces K FQ (see Definition 4.1 for the definition of register subspace), when- | i ⊆ 2 ever we refer to “register K”, we mean the qubits of EPR Q corresponding to K (see Section 3.5). The | 2i⊗ most frequent register subspace we consider is V, spanned by e1,..., es(N). We write V for the complement of V, i.e. the register subspace spanned by es(N)+1,..., eQ. (Note that s(N) R Q by assumption (64) Q ≤ ≤ and the definition of Q.) Then EPR2 ⊗ = EPR V EPR V. S PAULI | i | i ⊗ | i Let n be the honest Pauli strategy with respect to introparams defined in the proof of Lemma 7.13. Cq C2 log q S PAULI Using the isomorphism between and ( )⊗ specified by Lemma 3.26, the measurements of n can be treated as qubit Pauli measurements acting on the maximally entangled state over qubits. Furthermore the questions and answer labels of the measurements can be viewed as binary strings using the bijection F Flog q PAULI κ : q 2 specified in Section 3.3. For a question with a type from the player measures → (Q+1) S PAULI T the shared state EPR2 ⊗ using the measurements specified by n , and reports the measurement outcomes. When| a playeri receives questions of type tˆ INTRO PAULI, they perform measurements ∈ T \TW described as follows. (Below, whenever we write a Pauli operator σa the register on which the operator acts should always be clear from context, and is implicit from the space in which the outcome a ranges.) S INTRO The reader may find it helpful to consult Figure 10 to see a summary of the honest strategy n for the special case when ℓ = 3.

(INTRO, v): The player performs the measurement

Z σ[Lv( )=y] (65) · to obtain an y V. Intuitively, the player has now introspectively sampled the question y for original ∈ y player v in game N. The player then measures AUX using player A’s measurement Aa from S V y| i { } if v = A and using player B’s measurement B if v = B to obtain an answer a. The player replies { a } with (y, a).

(SAMPLE, v): The player measures their share of EPR in the Z basis to obtain z V. Using this, they | iV ∈ compute the question y = Lv(z). The player then uses player v’s strategy and question y to measure AUX and obtain outcome a. The player replies with (z, a). | i (READ, v): The player first performs all measurements as in the (INTRO, v) question for player v and records the outcomes as y L(V) and a 0, 1 ∗. For j 1,2,..., ℓ , the player measures the v ∈ ∈ { } ∈ { } register Vj (y) with the measurement X (66) σ[L ( )=y ] y ⊥j · ⊥j ⊥j  v to obtain y⊥ = y⊥ + + y⊥ℓ . Here L⊥ is shorthand for the function (L )⊥ defined in Item 6 of 1 · · · j j, y

in (65) follows from Lemma 8.6 and the fact that ker(L⊥j )⊥ = ker(Lj) by Lemma 3.6 and the definition of L in Section 8.2.) The player measures AUX with player v’s measurement strategy in ⊥j | i S for question y to obtain a and replies with (y, y⊥, a).

91 IDE Z (H k, v): The player performs the following sequence of measurements: first measure σ[Lv( )=y ] on { 1 · 1 } register Vv to obtain y . Then, use y to specify the second linear function Lv ( ) and measure 1 1 1 2, y1 v Z · register V using σ v to obtain y2. This process continues until the (k 1)-th linear 2, y1 [L ( )=y2] { 2, y1 · } − v v map L ( ) has been measured to obtain y in factor space V . Let y = y + y2 + k 1, yk X using σ to obtain outcome x> . Let x = x> . The player replies with (y, y⊥, x). { x>k } k k By definition, when player w performs the honest measurement for question (INTRO, A) and player w performs the honest measurement for question (INTRO, B), the joint outcome (y, y′) has distribution µ , N. In this case, the players play according to strategy S and succeed with probability 1. In all other cases,S it is straightforward to verify that the players succeed in all tests performed by ˆ INTRO (Figure 9) with D probability 1. As a result, the value of this strategy is 1. The strategy S INTRO is projective by construction. It is also consistent because of the assumed consis- tency of the strategy S as well as consistency of the honest Pauli strategy S PAULI. Furthermore, note that S INTRO only calls S for both players on question pairs such that both types are in INTRO, SAMPLE, READ . { } In all these cases, S is called on a pair of questions (y, y ) distributed as (Lv(z), Lv′ (z)) for v, v A, B ′ ′ ∈ { } and z uniform in V. When v = v′, any such pair by definition has positive probability under µ , n, and so 6 S by assumption the associated measurements from S commute. On the other hand, when v = v′, then the players apply the same measurements from S , and because S only uses projective measurements, their measurements commute as well. Examining all other cases, it follows by direct inspection that the strategy commutes on all question pairs whose corresponding types appear as an edge in the graph GINTRO. Thus the strategy commutes with respect to the support of the distribution µ ˆINTRO , n. S Remark 8.8. For future reference, we note that on any input ( , λ, ℓ), the Turing machine IntroVerifier V always returns a normal form verifier ˆ INTRO = ( ˆINTRO, ˆ INTRO) — even if itself is not the description V S D V of a normal form verifier. This is because, for any two integer λ, ℓ, IntroSampler(λ, ℓ) returns a sampler with field size 2, and for any = ( , ), λ and ℓ the decider ˆ INTRO specified in Figure 9 takes 7 inputs and always halts with a single-bitV output,S D even if or themselvesD do not halt. S D 8.4 Soundness of the introspective verifier The main result of this section is the soundness part of Theorem 8.3.

8.4.1 The Pauli twirl A key tool in the proof of soundness is the Pauli twirl. In this section we introduce the Pauli twirl and establish several of its properties. The section closely follows Sections 8 and 10 of [NW19]. To begin, we define the twirl with respect to an arbitrary distribution over unitaries.

92 Definition 8.9 (Twirl). Let µ be a probability distribution over a finite set of unitary matrices. Then for any matrix A, the twirl of A with respect to µ, denoted Tµ(A), is defined as

† Tµ(A)= E UAU . U µ ∼  In the next two lemmas we consider the Pauli twirl, in which the distribution µ is over subsets of Pauli observables. First, we show how the Pauli twirl acts on Pauli matrices. Then, using this, we derive an expression for the Pauli twirl applied to general matrices.

Lemma 8.10 (Pauli twirl of Pauli matrices). Let V be a subspace of Fk, let W X, Z , and let µ be the ∈ { } uniform distribution over τW (w) : w V . Let W = W and u Fk. Then { ∈ } ′ 6 ∈ τW′ (u) if u V , W′ ⊥ Tµ(τ (u)) = ∈ (0 if u / V⊥ . ∈ Proof. Let u Fk. Let c = 1 if W = Z and let c = 1 if W = X. Then ∈ − W′ W W′ W † Tµ(τ (u)) = E (τ (z)τ (u)τ (z) ) z V ∼ c tr(u z) W W W † = E (ω · · τ ′ (u)τ (z)τ (z) ) z V ∼ c tr(u z) W = E ω · · τ ′ (u) . z V  ∼  The lemma now follows from Lemma 8.4 and the fact that c u V if and only if u V . · ∈ ⊥ ∈ ⊥ Lemma 8.11 (Pauli twirl of general matrices). Let V = Fk, and let L : V V be a linear map. Let ζ be → the uniform distribution over τZ(z) z V and χ the uniform distribution over τX(x) x ker(L) . { V | ∈ } { | ∈ } Let M be a matrix acting on C A, where A is a finite dimensional Hilbert space. Then there exist y ⊗H H matrices M y L(V) acting on A such that the twirl of M with respect to ζ and χ is given by { } ∈ H T T Z y ( χ ζ IA)(M)= ∑ τ[L( )=y] M . (67) ◦ ⊗ y V · ⊗ ∈

Moreover, if we apply (67) to each element of a POVM measurement Ma , then for each y V, the set y { } ∈ M also forms a POVM measurement. { a } X Z Proof. The collection τ (x)τ (z) x,z V forms a basis for the complex linear space of matrices acting on CV. As a result, we can{ write } ∈ X Z M = ∑ τ (x)τ (z) Mx,z , x,z V ⊗ ∈ for matrices M on A. We now use Lemma 8.10 to compute the twirl first with respect to ζ and then x,z H with respect to ζ and χ:

X Z Z (Tζ IA)(M)= ∑ Tζ (τ (x))τ (z) Mx,z = ∑ τ (z) M0,z , ⊗ x,z V ⊗ z V ⊗ ∈ ∈ Z Z (Tχ Tζ IA)(M)= ∑ Tχ(τ (z)) M0,z = ∑ τ (z) M0,z . ◦ ⊗ z V ⊗ z ker(L) ⊗ ∈ ∈ ⊥

93 For all y V, let u denote an arbitrary element of L 1(y) if y is in the image of L; otherwise, set u = 0. ∈ y − y Expanding τZ(z) using the first part of Lemma 8.5,

Z tr(uy z) Z ∑ τ (z) M0,z = ∑ ∑ ω · τ[L( )=y] M0,z z ker(L) ⊗ z ker(L) y · ⊗ ∈ ⊥ ∈ ⊥ Z tr(uy z) = ∑ τ[L( )=y] ∑ ω · M0,z . (68) y · ⊗ z ker(L)  ∈ ⊥ 

y tr(uy z) Equation (67) follows by setting M = ∑z ker(L) ω · M0,z. ∈ ⊥ For the “moreover” part, note first that whenever M 0 it holds that any twirl satisfies 0 Tµ(M). As y ≥ ≤ Z a result, each matrix M must be positive semi-definite due to Equation (68) and the fact that the τ[L( )=y] y { · } matrices are orthogonal projections. Next, suppose Ma is a POVM measurement, and write N = ∑a Ma { } y y for the identity matrix. Then by linearity, for each y V, ∑a Ma = N . In addition, N0,0 = I, and ∈ y N = 0 otherwise. As a result, Ny = N = I, and so M also forms a POVM. x,z 0,0 { a } In the next few lemmas we derive a sufficient condition for a measurement to be close to its own Pauli twirl, namely that it satisfies certain commutation relations with the Pauli basis measurements.

Lemma 8.12 (Commuting with Pauli basis implies commuting with Pauli observables). Let , be finite X A sets and D be a distribution over . For each x , let V be a register subspace of V = Fk, and let X ∈ X x Lx : Vx Vx be a linear map. → CV Consider a state ψ = EPR V AUX , where EPR V A B, for A, B ∼= , is defined | i | i ⊗ | i | i ∈H ⊗H Hx H in Definition 3.24 and AUX A B is arbitrary. For each x , let Ma a be a measurement | i∈H ′ ⊗H ′ ∈X { } ∈A on A . Let W X, Z . Then the following are equivalent on state ψ : H ⊗HA′ ∈ { } | i On average over x D, • ∼ Mx , τW I I I 0. a [L ( )=y] Vx A′ B ε x · ⊗ ⊗ ⊗ ≈   On average over x D and v drawn uniformly from ker(L ) , • ∼ x ⊥ x W M , τ (v) I I IB ε 0. a ⊗ Vx ⊗ A′ ⊗ ≈   Proof. For x , a , y in the range of L , and v Fk define ∈X ∈A x ∈ ∆x = Mx , τW I I I , a,y a [L ( )=y] Vx A′ B x · ⊗ ⊗ ⊗ x  x W  ∆ (v)= M , τ (v) I I IB . a a ⊗ Vx ⊗ A′ ⊗ By the second item of Lemma 8.5, for each v ker(L ) ,  ∈ x ⊥ x tr(uy v) x ∆a (v)= ∑ ω · ∆a,y , y V ∈ x 1 where for every y in the range of Lx, uy is a fixed element in L−x (y). The expression for the closeness of x ∆a to 0 on average over x D and v ker(Lx)⊥ (i.e. the second quantity of the Lemma statement), is equal to ∼ ∼ x 2 E E ∑ ∆a (v) ψ , x D v ker(Lx)⊥ a | i ∼ ∼

94 and can be expanded as

tr((u u ) v) x † x E E ψ ω y′ − y · ∆ ∆ ψ . (69) ∑ ∑ a,y a,y′ x D v ker(L ) h | | i ∼ x ⊥ a y,y′ ∼  If y = y , then by definition L (u ) = L (u ), and so u u is not in ker(L ). Lemma 8.4 (with 6 ′ x y 6 x y′ y′ − y x V = ker(Lx)⊥) thus implies that (69) equals

x † x x 2 E ∑ ψ ∑ ∆a,y ∆a,y ψ = E ∑ ∆a,y ψ x D h | | i x D | i ∼ a y ∼ a,y   which is the closeness of ∆x to 0 on average over x D. Thus, ∆x(v) 0 on average over x and v if a,y ∼ a ≈ε and only if ∆x 0 on average over x. a,y ≈ε Lemma 8.13 (Commuting implies twirl). Let be a finite set and D a distribution on . For each x , x X X ∈X let M be a POVM on A, and let µ be a distribution over unitary matrices acting on A. Suppose { a } H x H that on average over x D and U µ it holds that ∼ ∼ x x † M , U IB 0, a ⊗ ≈ε   where the commutator is evaluated on a state ψ A B. Then on average over x D, | i∈H ⊗H ∼ x x M IB T (M ) IB . a ⊗ ≈ε µx a ⊗ Proof. Observe that

E T x x 2 E E x † 2 ∑ µx (Ma ) Ma IB ψ = ∑ U Ma , U IB ψ . (70) x D − ⊗ | i x D U µ ⊗ | i ∼ a ∼ a ∼ x    Applying Jensen’s inequality, the right-hand side of (70) is at most

x † 2 x † 2 E E ∑ U Ma , U IB ψ = E E ∑ Ma , U IB ψ , x D U µ ⊗ | i x D U µ ⊗ | i ∼ ∼ x a ∼ ∼ x a     using the unitary invariance of the Euclidean norm. This last quantity is O(ε), by assumption.

Lemma 8.14 (Commuting with each implies commuting with both). Let be a finite set and D a distri- x X bution on . For each x , let M be a POVM on A and let µ , µ be two distributions over X ∈ X { a } H x,1 x,2 unitary matrices acting on A. Suppose that for each i 1, 2 , on average over x D and U µ , H ∈ { } ∼ i ∼ x,i x † M , U IB 0, (71) a i ⊗ ≈ε   where the expression is evaluated on some state ψ A B, where B = A. Suppose further that | i∈H ⊗H H ∼ H on average over x D and U µ , ∼ 2 ∼ x,2 † U IB I U . (72) 2 ⊗ ≈ε A ⊗ 2 (The corresponding statement for i = 1 is not needed.) Then on average over x D, U µ , and ∼ 1 ∼ x,1 U2 µx,2, ∼ x † † M , U U IB 0. a 1 2 ⊗ ≈ε   95 Proof. The claim follows from the following sequence of approximations:

x † † x † M U U IB M U U (by (72)) a 1 2 ⊗ ≈ε a 1 ⊗ 2 U† Mx U (by (71) for i = 1) ≈ε 1 a ⊗ 2 † x † U M U IB (by (72)) ≈ε 1 a 2 ⊗ † † x U U M IB , (by (71) for i = 2) ≈ε 1 2 a ⊗ x † † where each step also uses Fact 5.19. This is equivalent to M , U U IB 0, completing the proof. a 1 2 ⊗ ≈ε The following is a slight generalization of [NW19, Fact 4.25], and we give a similar proof.

Lemma 8.15 (Close to sub-measurement implies close to measurement). Let , be finite sets and D be x X A x a distribution on . Suppose that for each x , Aa a is a projective measurement and Ba a is X x ∈ X { } ∈A x x { } ∈A a set of matrices such that each Ba is positive semidefinite and ∑a Ba I. Suppose Ca a is a POVM x x ≤ x x { } ∈A such that Ca Ba for all x and a. Then if, on average over x D, Aa ε Ba , then, on average over x ≥ x ∼ ≈ x D, A 1/2 C . ∼ a ≈ε a Proof. By the triangle inequality,

x x 2 x x 2 x x 2 E ∑ (Aa Ca ) ψ 2 E ∑ (Aa Ba ) ψ + 2 E ∑ (Ca Ba ) ψ . x a k − | ik ≤ x a k − | ik x a k − | ik

The first term on the right-hand side is O(ε) by assumption. For the second,

x x 2 x x 2 x x E ∑ (Ca Ba ) ψ = E ∑ ψ (Ca Ba ) ψ E ∑ ψ (Ca Ba ) ψ x a k − | ik x a h | − | i≤ x a h | − | i x x 2 = 1 E ∑ ψ Ba ψ 1 E ∑ ψ (Ba ) ψ , − x a h | | i≤ − x a h | | i where the middle inequality uses 0 Cx Bx I for all x, a. Write 1 = E ∑ ψ (Ax)2 ψ , which holds ≤ a − a ≤ x ah | a | i because A is a projective measurement. Then

x 2 x 2 x x x x E ∑ ψ ((Aa ) (Ba ) ) ψ = E ∑ ψ (Aa + Ba )(Aa Ba ) ψ x a h | − | i ℜ x a h | − | i   x x 2 x x 2 E ∑ (Aa + Ba ) ψ ∑ (Aa Ba ) ψ ≤ x k | ik · k − | ik r a r a where the first equality follows from the fact that Ax and Bx are Hermitian. For each x X the first square a a ∈ root is O(1). This allows us to move the expectation into the second square root by Jensen’s inequality. The result is O(ε1/2) by assumption.

Now we put everything together to show the main result of this section.

Lemma 8.16. Let , be a finite sets and D be a distribution over . For each x , let Vx be a register Xk A X ∈X subspace of V = F , let Ux be a register subspace of Vx, and let Lx : Ux Ux be a linear map. → V Consider a state ψ = EPR AUX , where EPR A B, for A, B = C , is defined | i | iV ⊗ | i | iV ∈H ⊗H H H ∼ AUX x in Definition 3.24 and A B is arbitrary. For each x , let My,a y Ux, a be a | i∈H ′ ⊗H ′ ∈ X { } ∈ ∈A

96 projective measurement on . Suppose that on average over the following conditions Vx A′ x D hold. H ⊗H ∼

(Consistency): Mx I τZ I I I 0, y Vx [L ( )=y] Ux A′ B ε ⊗ − x · ⊗ ⊗ ⊗ ≈ x Z (Commutation): M I , τ I I  IB ε 0, y, a ⊗ Vx z ⊗ Ux ⊗ A′ ⊗ ≈ Mx I , τX I I I 0. y, a  Vx [L ( )=y ] Ux A′  B ε ⊗ ⊥x · ⊥ ⊗ ⊗ ⊗ ≈   Here, the projector Z acts on the register subspace , and and denote the complementary τ[L ( )=y] Ux Ux Vx x · H register subspace of Ux and Vx, respectively, within V. Then for each x and y Ux, there exists a x, y ∈ X ∈ POVM measurement M on such that on average over x D, a a Vx Ux A′ ∈A H \ ⊗H ∼  x Z x, y (My, a IV ) IB ε1/2 τ[L ( )=y] Ma IV IB . ⊗ x ⊗ ≈ x · ⊗ ⊗ x ⊗   Proof. For each x X, let ζ be the uniform distribution over U and χ be the uniform distribution ∈ x x x over ker(Lx). We apply Lemma 8.12 to each of the two commutation assumptions and use the fact that ker(L ) = ker(L ) from Lemma 3.14. Lemma 8.12 implies that on average over x and v ζ if ⊥x ⊥ x ∼ X ∼ x W = Z or v χ if W = X, ∼ x x W M I , τ (v) I I IB ε 0. y, a ⊗ Vx ⊗ Ux ⊗ A′ ⊗ ≈ By Lemma 8.14, this implies that on average over x D, u ζ , and v χ , ∼ ∼ x ∼ x x Z X M I , τ (u)τ (v) I I IB ε 0. y, a ⊗ Vx ⊗ Ux ⊗ A′ ⊗ ≈ By Lemma 8.13 and Lemma 8.11, this implies that on average overx D, ∼ x x M I IB ε Tχ Tζ (M ) I IB y, a ⊗ Vx ⊗ ≈ x ◦ x y, a ⊗ Vx ⊗ Z x, y′ = ∑ τ My, a I IB , (73) [Lx( )=y′] Vx y V · ⊗ ⊗ ⊗  ′∈ x  for some POVM measurement x, y′ on . My, a Vx Ux A′ In the following sequence of{ equations,} H whenever\ ⊗H an operator does not act on a subsystem it should be assumed that it is appropriately tensored with the identity. For clarity, we explicitly indicate using a subscript A or B whether a Pauli operator acts on A or B. Then on average over x D we have H H ∼ Mx = Mx Mx (Mx is projective) y, a y, a · y x Z ε My, a τ[L ( )=y] A (Consistency assumption) ≈ · x · x Z 0 My, a τ[L ( )=y] B (Paulis are self-consistent) ≈ ⊗ x · Z  x, y′ Z ε ∑ τ[L ( )=y ] A My, a τ[L ( )=y] B (Equation (73)) ≈ x · ′ ⊗ ⊗ x · y′     Z Z x, y′ 0 ∑ τ[L ( )=y ]τ[L ( )=y] A My, a (Paulis are self-consistent) ≈ x · ′ x · ⊗ y′ Z x, y  = τ[L ( )=y] A My, a , x · ⊗  97 where 0 indicates equality with respect to the state EPR V AUX , and we have repeatedly used ≈ | i ⊗ | i x, y Fact 5.19. This is essentially the statement promised by the lemma, except that My, a a does not nec- x, y { }x, y essarily sum to identity (since we only sum over a). To remedy this, define Ma = ∑ M and note that y′ y′, a Mx, y Mx, y, so by Lemma 8.15 and the fact that Mx is projective, a ≥ y, a x Z x, y (M I ) I 1/2 τ M I I . (74) y, a Vx B ε [L ( )=y] a Vx B ⊗ ⊗ ≈ x · ⊗ ⊗ ⊗   Mx y is the POVM measurement guaranteed in the lemma statement, which concludes the proof. { a } 8.4.2 Preliminary lemmas We show a few simple lemmas that allow us to argue about measurements that have a decomposition across a tensor product of two Hilbert spaces, within the space of a single player.

Lemma 8.17. Let , be finite sets. Let ψ A B be a state. Consider the following: for all A B | i∈H ⊗H a , ∈A 1. Let A , B , , and be Hilbert spaces such that H , a H , a HA′, a HB′, a A = A and B = B , H H , a ⊗HA′, a H H , a ⊗HB′, a and let ψQUE A B be a “question state” and ψANS be an “answer | , ai∈H , a ⊗H , a | , ai∈HA′, a ⊗HB′, a state” such that ψ = ψ ψ . | i | QUE, ai ⊗ | ANS, ai 2. Let Q be projectors on such that Q I forms a projective measurement on and a A, a a A , a A a a H { ⊗ H ′ } H let A b and B b be matrices acting on A , a. { b} ∈B { b} ∈B H ′ 3. Let D be the distribution on obtained by measuring ψ using Qa I a . A | i { ⊗ HA′, a } ∈A Then the following are equivalent: On average over a D and with respect to state ψ , • ∼ | i a a (I A, a Ab) IB ε (I B, a Bb) IB . H ⊗ ⊗ ≈ H ⊗ ⊗ a a Q A IB Q B IB on state ψ . • a ⊗ b ⊗ ≈ε a ⊗ b ⊗ | i Proof.Expand   a a 2 ∑ (Qa Ab Qa Bb) IB ψQUE, a ψANS, a a,b k ⊗ − ⊗ ⊗ · | i ⊗ | ik a a 2 = ∑ Qa (Ab Bb) IB ψQUE, a ψANS, a a,b k ⊗ − ⊗ · | i ⊗ | ik 2 2 a a = ∑ Qa I B, a ψQUE, a (Ab Bb) I ψANS, a ⊗ H | i · − ⊗ HB′, a | i a,b 2 2 a a = ∑ Qa IB ψ (Ab Bb) I ψANS, a k ⊗ | ik · − ⊗ HB′, a | i a,b 2 E a a = ∑ (Ab Bb) I ψANS, a a D − ⊗ HB′, a | i ∼ b E a a 2 = ∑ I A, a (Ab Bb) IB ψ QUE, a ψANS, a . H a D b ⊗ − ⊗ · | i ⊗ | i ∼

98 In going from the third to the fourth line we used the fact that

2 2 2 Qa I B, a ψQUE, a = Qa I IB ψQUE, a ψANS, a = Qa I IB ψ . ⊗ H | i ⊗ HA′, a ⊗ | i ⊗ | i ⊗ HA′, a ⊗ | i

Hence, the first line is O(ε ) if and only if the last one is.

Lemma 8.18. Let , , be finite sets. Let ψ A B be a state, and let Aa, b a ,b and A B C | i∈H ⊗H { } ∈A ∈B Ba, c a ,c be POVMs acting on A. Suppose further that: { } ∈A ∈C H 1. The measurements approximately commute, i.e.

A , B IB 0, a, b a, c ⊗ ≈δ where the approximation holds with respect to the state ψ . | i

2. For all a , there exist Hilbert spaces A , , B , , and states ψQUE A ∈ A H , a HA′, a H , a HB′, a | , ai∈H , a ⊗ B , ψANS such that H , a | , ai∈HA′, a ⊗HB′, a

A = A , H H , a ⊗HA′, a B = B , H H , a ⊗HB′, a ψ = ψANS ψQUE . | i | , ai ⊗ | , ai a a 3. For all a , there exist projectors Qa on A , a and matrices A b , Bc c acting on A , a ∈ A H ′ { b} ∈B { } ∈C H ′ such that Qa I is a projective measurement on A and { ⊗ HA′, a } H A = Q Aa , B = Q Ba . a, b a ⊗ b a, c a ⊗ c Then a a I A, a Ab , I A, a Bc IB δ 0, H ⊗ H ⊗ ⊗ ≈ on average over a D where D is the distribution on obtained by measuring ψ using Q ∼ A | i { a ⊗ I a . HA′, a } ∈A Proof. The assumptions of the lemma imply that

a a a a (Q A B ) IB =(Q A Q B ) IB (Q is a projector) a ⊗ b c ⊗ a ⊗ b · a ⊗ c ⊗ a =(A B ) IB (Item 3) a,b · a,c ⊗ δ (Ba,c Aa,b) IB (Item 1) ≈ · a ⊗ a =(Qa Bc Qa Ab) IB (Item 3) ⊗ a · a ⊗ ⊗ =(Q B A ) IB . (Q is a projector) a ⊗ c b ⊗ a We apply Lemma 8.17 as follows. The set “ ” in Lemma 8.17 is the same as here, and the set “ ” is A A B the product set here. The matrices “ Aa ” are Aa Ba here and “ Ba ” are Ba Aa here. We then B×C { b} { b c } { b} { c b} obtain, on average over a D, ∼ a a a a (I A,a AbBc ) IB δ (I A,a Bc Ab) IB . H ⊗ ⊗ ≈ H ⊗ ⊗ This implies the conclusion of the lemma.

99 y Lemma 8.19. Let be a finite set and for all y let A be a POVM on A. Let B be a Y ∈ Y { x, z} H { x, y, z} projective measurement on B. Suppose that H y ∑ ψ Ax, z Bx, y, z ψ 1 δ . (75) x, y, zh | ⊗ | i≥ − Then with respect to state ψ , | i y IA B A B . ⊗ x, y, z ≈δ x, z ⊗ x, y Proof. Using the fact that B is projective, we have B = B B for all x, y, z, so that (75) implies { x, y, z} x, y, z x, y z y ∑ ψ Ax, z Bx, yBz ψ 1 δ . x, y, zh | ⊗ | i≥ − y Define Aˆ = ∑ A B and Bˆ = IA B . The above equation simplifies to z x, y x, z ⊗ x, y z ⊗ z

∑ ψ Aˆ z Bˆz ψ 1 δ . y, zh | | i≥ −

This implies that Aˆ Bˆ as z ≈δ z ˆ ˆ 2 ˆ 2 ˆ 2 ˆ ˆ ∑ (Az Bz) ψ = ∑ ψ Az + Bz ψ 2 ψ Az Bz ψ 2δ , z − | i z h | | i− h | | i≤

where the equality uses the fact that Aˆ z and Bˆz commute. To conclude the proof, we have

y′ y IA Bx, y, z = IA Bx, yBz δ (IA Bx, y) ∑ A Bx , y = Ax, z Bx, y, ⊗ ⊗ ≈ ⊗ x′, z ⊗ ′ ′ ⊗ x′, y′ where the approximation follows from Aˆ Bˆ and Fact 5.19. z ≈δ z

8.4.3 Soundness proof We analyze the soundness of the introspective verifier.

Proof of the soundness part of Theorem 8.3. Let = ( , ) be a normal form verifier such that is an V S D S ℓ-level sampler. Recall the definition of the introspective verifier ˆ INTRO =( ˆINTRO, ˆ INTRO) corresponding V S D to from Section 8.2. Fix an index n 1 and let N = 2n. Let R = Nλ and introparams(R)=(q, m, d), asV in Section 8.2. ≥ As in the proof of completeness part of Theorem 8.3, we make the following simplifications. First, we analyze the soundness of the typed introspective verifier ˆ INTRO; the soundness of the detyped verifier NTRO NTRO V NTRO I = Detype( ˆ I ) follows from Lemma 6.18 and the fact that the type set I has size O(ℓ). Second,V analogouslyV to Remark 8.7 without loss of generality for notational simplicityT we consider strategies for questions sampled from ˜INTRO rather than the downsized sampler INTRO. INTROS > < < S S ˆ ˆ Suppose that val∗( n ) 1 ε for some 0 ε 1, and let = (ψ, A, B) be a strategy for ˆ INTRO V −S ˆ INTRO n with value at least 1 ε. Since has success probability that is strictly positive, the decider doesV not automatically reject,− which means that D s(N) Nλ . (76) ≤ We analyze each of the tests performed by ˆ INTRO (see Figure 9) in sequence, and state consequences of each test. We start with Item 1, the Pauli test.D

100 a b b Lemma 8.20. There exists a function δ1(ε, R)= a((log R) ε− +(log R)− ) for universal constants a > 1 and 0 < b < 1 such that the following holds. Let Q = M log q, where M = 2m. There is a projective strategy S =( ψ , A, B) for ˆ INTRO that succeeds with probability at least 1 δ and furthermore ′ | i Vn − 1 Q ψ AB = EPR2 ⊗ AUX A B (77) | i | iA′ B′ ⊗ | i ′′ ′′ for some bipartite state AUX , and for all W X, Z , | i ∈ { } PAULI,W W PAULI,W W Ax = σx , Bx = σx , (78) where σW acts on the first s(N) qubits of player A’s share (resp. B’s share) of EPR Q. x | 2i⊗ INTRO ˆ ˆ Proof. Given the definition of the type graph G , for ((tA, xˆA), (tB, xˆB)) sampled according to µ ˆINTRO ,n PAULI PAULI S it holds that (tˆA, tˆB) with constant probability. Therefore, conditioned on the Pauli test, ∈T ×T Item 1, being executed, S must succeed in the test with probability 1 O(ε). − Observe that conditioned on the test being executed, the distribution of ((tˆA, xˆA), (tˆB, xˆB)) is, by defi- nition, exactly the distribution of questions in the Pauli basis game with parameters qldparams, as described in Section 7.3.1. By Corollary 7.15 it follows that there exists a local isometry φ = φA φB and a state ⊗ AUX such that | i∈HA′′ ⊗HB′′ Q 2 φ( ψ ) EPR ⊗ AUX δ′(ε, R) , (79) k | i − | 2i ⊗ | ik ≤ where δ′(ε, R) is an upper bound on δQLD(O(ε), q, m, d) that only depends on ε and R, as stated in Lemma 7.17. xˆ xˆ In addition, defining A = φA(Aˆ ) for all questions xˆ and answers aˆ, for W X, Z it holds that aˆ aˆ ∈ { } PAULI,W W A IB σ IB , (80) x ⊗ ≈δ′(ε,R) x ⊗ and a similar set of equations hold for operators associated with the second player. Using Naimark’s the- orem as formulated in [NW19, Theorem 4.2], at the cost of extending the state AUX we may assume | i that the measurements are projective without loss of generality. Define the strategy S ′ which uses the state Q xˆ xˆ EPR2 ⊗ AUX and measurement operators Aaˆ and Baˆ for all questions xˆ, except for (PAULI, W)- | i ⊗ | i { } W { } S type questions where instead the Pauli measurements σaˆ are used. Using (79) and (80) the strategy ′ succeeds in ˆ INTRO with probability at least 1 δ (ε, R). Vn − ′ The claimed bound on δ1 follows from the bound given in Lemma 7.17.

In the remainder of the proof we analyze the strategy S ′ from Lemma 8.20. We use the following notation conventions:

1. We use indices A and B to label each player’s Hilbert space after application of the isometry φ from Lemma 8.20. FQ 2. We write V for the register subspace of 2 spanned by e1,..., es(N) and V for its complement. (Note that by definition of introparams in Section 7.3.2 it holds that s(N) R Q, where the first inequality follows from (76).) ≤ ≤

W 3. Whenever we write a Pauli operator σa the register on which the operator acts should always be clear from context, and is implicit from the space in which the outcome a ranges.

101 4. For measurement operators in the introspection game, the variables for the measurement outcomes follow the specification of the “answer key” in Figure 7 (for PAULI-type questions) and Figure 9 T PAULI,W (for all other question types). For example, the measurement operators Ax corresponding Q { } to question type (PAULI, W) are indexed by vectors x F where Q = M log q where M = 2m. ∈ q The measurement operators corresponding to question type (INTRO, v) for v A, B are indexed 9R 31 ∈ { } by pairs (y, a) V 0, 1 ≤ . We often refer to marginalized measurement operators, e.g., the INTRO∈,v × { } INTRO,v operator Ay denotes marginalizing Ay, a over all a. In these cases, the part of the answer that is marginalized over will be clear from context.

5. We use the notation δ to denote a function which is polynomial in δ1, although the exact expression may differ from occurrence to occurrence. The polynomial itself may depend on ℓ, but we leave this dependence implicit; due to the use of inductive steps that involve taking the square root of the error ℓ times in sequence (e.g. Lemma 8.25) the exponent generally depends on ℓ. The next two lemma derive conditions implied by Items 2 and 3 of the checks performed by the decider ˆ INTRO described in Figure 9. As these two parts are performed independently for the two possible values of D v A, B , we only discuss the case where v = A. For notational simplicity, whenever possible we omit ∈ { } INTRO SAMPLE v when referring to the measurement operators. For example L, Ay, a and Bz, a are used as shorthand A INTRO, A SAMPLE, A notation for L , Ay, a , and Bz, a respectively. Lemma 8.21 (Sampling test, Item 2 of Figure 9). For each k 1,2,..., ℓ , ∈ { } SAMPLE Z IA B σ IB , (81) ⊗ z ≃δ z ⊗ INTRO SAMPLE A IB IA B , (82) y k, a δ [L k( )=y k], a ≤ ⊗ ≃ ⊗ ≤ · ≤ where z ranges over V and y k ranges over L k(V). Moreover, analogous equations hold with operators acting on the other side of the≤ tensor product. ≤ ˆ ˆ Proof. When ((tA, xˆA), (tB, xˆB)) is sampled according to µ ˆINTRO ,n, each check in Item 2 of Figure 9 is executed with probability Ω(1/ℓ) (this is due to the number ofS types in INTRO and the structure of the type INTRO T graph G ). Therefore, in each of the checks specified by Items 2a and 2b, the strategy S ′ succeeds with probability at least 1 O(ℓδ), conditioned on the test being executed. Item 2a for w = A combined − with (78) implies (81). Item 2b for w = A, combined with Fact 5.24, implies (82). The lemma follows from repeating the same argument with the tensor factors interchanged.

Lemma 8.22 (Hiding test, Item 3 of Figure 9). For each k 1,..., ℓ + 1 , ∈ { } INTRO READ A IB IA B , (83) y

102 ˆ ˆ Proof. When ((tA, xˆA), (tB, xˆB)) is sampled according to µ ˆINTRO ,n, each check in Item 3 of Figure 9 is executed with probability Ω(1/ℓ). Therefore, in each of theS checks specified by Items 3a, 3b, and 3c (conditioned on the right types) the strategy S succeeds with probability at least 1 O(ℓδ). ′ − Item 3a for w = A combined with Fact 5.24 implies (83). Combining Eqs. (81) and (82) with (83) yields READ Z (86) IA By

HIDEk HIDEk+1 A IB δ IA B . (87) y

We exploit the tests performed in Item 3 further to show the following lemma. Lemma 8.23. For all k 1,2,..., ℓ , ∈ { } HIDEk Z X X IA B δ σ σ σx> IB , yk [L

Proof. The proof is by induction on k. We first show the case k = 1. Under the distribution µ ˆINTRO ,n, the check in Item 3d of Fig. 9 is executed with probablity Ω(1/ℓ). That part of the check for w = SA together with Eq. (78) implies that

HIDE1 X X IA B δ σ σ > IB . (89) y , x> [L⊥ ( )=y⊥] x 1 ⊗ 1⊥ 1 ≈ 1, y<1 · 1 ⊗ ⊗  This proves the case for k = 1. Next we perform the induction step. Assume that the lemma holds for some k 1,2,..., ℓ 1 . The check of Item 3c is executed with probablity Ω(1/ℓ); using that S succeeds in ∈ { − } ′ Item 3c (conditioned on the right types having been sampled) with probablity at least 1 O(ℓδ), we have − HIDE ∑ ψ AHIDEk B k+1 ψ 1 O(ℓδ) . yk+1 y k, y⊥ , x>k+1 h | k+1, y k · k+1 ⊗ ≤ k+1 | i≥ − y k, y⊥ , x>k+1 ≤ ≤ k+1 We now apply Lemma 8.19, choosing the measurements A, B and outcomes x, y, z in the lemma as follows:

HIDE “A” : AHIDEk , “B” : B k+1 , yk+1 y k, y⊥ , x>k+1 k+1, y k · k+1 ≤ k+1 ≤ “x” : yk+1) .

103 Note that here A does not depend on y, so we use the same A for all values of y. Lemma 8.19 with the above choices of parameters implies that

HIDEk+1 HIDEk HIDEk+1 IA B δ A By k y k, y⊥ , x>k+1 yk+1 ≤ ⊗ ≤ k+1 ≈ k+1, y k · k+1 ⊗ ≤ Z X X HIDEk+1 δ σ σ σ By [Lk+1 k ≈ · ⊗ k+1, yk+1 ≈ · ≤ · ≤ ⊗ k+1, yk+1 ≈ · ⊗ k+1, y

Lemma 8.24. For all k 1,..., ℓ , ∈ { } READ Z X IA B δ σ σ IB . (90) y

Proof. Lemma 8.23 and Fact 5.24 imply that

HIDEk Z X A IB δ σ σ IB . (91) y

Lemma 8.25. For each k 1,2,..., ℓ + 1 , there exists a product state ANCk = ANCk,A ANCk,B ∈ { } |INTRO, yi< | i ⊗ | i ∈ and, for each < < , a projective measurement k that acts on A′ B′ y k L k(V) Ay k, a A A′ H k ⊗H k ∈ ≥ H ⊗H k such that the following holds. First, for all , the operator INTRO, y

104 Commutation with Z-basis measurements. We first prove that on average over y

INTRO “A” : A , Z

INTRO, Z

“ ” INTRO, Z

We choose the Hilbert spaces “ A ” and “ B ” as the register subspace V< (y< ), and “ ” and H ,a H ,a k k HA′,a “ ” as the register subspace < tensored with , the Hilbert space of the state AUX . B′,a V k(y k) A′′ B′′ H ≥ Q H ⊗HS | i Thus for every y

EPR EPR AUX 2 V

Let µL

Commutation with X-basis measurements. Next, we first prove that on average over y

105 We again apply Lemma 5.25, choosing the measurements and outcomes in the lemma as follows:

INTRO, Z

“a” : y

INTRO, Z

“a” : y

INTRO, Z

Z INTRO, y

INTRO, y

“x” : yk, a) , “Lx” : Lk, yk, a . ≥ The “Consistency” condition is implied by Equation (101) and the “Commutation” conditions are implied INTRO, y k by Equations (97) and (99). We obtain that for all y k there exists POVM measurements Ay> , a ≤ that ≤ k act as identity on the register subspace spanned by basis vectors for the subspace V< (y< ) and, on average k k over y< µ , we have k ∼ L

INTRO, yk, a IB . (102) ≥ ⊗ ≈ k, y

INTRO, Z ≤ IB . (104) δ [L k( )=y k] y k, a ≈ ≤ · ≤ ⊗ ⊗ where the second line follows from (102) and Lemma 8.17. By(104), replacing the projective measurement INTRO, Z ≤  [L k( )=y k] y k, a ≤ · ≤ ⊗ INTRO, y k in the strategy S results in a strategy that succeeds with probability at least . To show that ≤ k′′ 1 δ Ay>k, a can furthermore be turned into a projective measurement we use Naimark’s theorem− as formulated{ in [NW19,} Theorem 4.2]. By inspecting the proof of Naimark’s theorem, one can see that the state ANC it pro- | ′i duces is a product state with no entanglement. This yields a strategy S with ancilla state ANC = k′′+1 | k+1i ANC ANC , which establishes the induction hypothesis for k + 1. This completes the proof of Lemma 8.25. | ki| ′i

Taking k = ℓ + 1 in Lemma 8.25 we obtain a strategy S = S with value 1 δ in which, given ′′ ℓ′′+1 − question (INTRO, A), player A performs the measurement

Z INTRO, y σ[LA ( )=y] Aa , (105) · ⊗ INTRO, y for a family of measurements A acting on the state EPR AUX ANC where ANC = a | iV ⊗ | i ⊗ | i | i ANCℓ is unentangled. An analogous argument for Player B’s measurements shows that we may addi- | +1i  tionally assume Player B responds to the question (INTRO, B) using the measurement

Z INTRO, y σ[LB ( )=y] Ba , (106) · ⊗ INTRO, y  for a family of measurements Ba acting on the state EPR V AUX ANC′ , where ANC′ is unentangled. Summarizing, the strategy S uses the state | i ⊗ | i ⊗ | i | i  ′′ Q θ = EPR ⊗ AUX ANC ANC′ , | i | i ⊗ | i ⊗ | i ⊗ | i and the measurements given by Equations (105) and (106). To conclude the proof of the soundness part of the theorem we analyze Item 4 in Figure 9. The test in Item 4 is executed with probability Ω(1/ℓ), so the strategy S succeeds with probability at least 1 δ ′′ − in that test, conditioned on the right types (here we absorb factors of O(ℓ) into δ). Using (105) and (106), conditioned on the test being executed the distribution of the part (yA, yB) of the players’ answers in the test is exactly the distribution µ ,N associated with game N. As a result, the strategy which uses the state S V INTRO, y INTRO, y EPR AUX ANC ANC and measurements A , B succeeds with probability | iV ⊗ | i ⊗ | i ⊗ | ′i { a } { a } at least 1 δ in the game N. Thus, val∗( N) 1 δ. It remains to show that δ has the required form. As − V a b V ≥ − b n λ δ = poly(δ′), δ′(ε, R) = a′ (log R) ′ ε ′ +(log R)− ′ , and R = (2 ) , there is a constant C 1 such that ≥  1 a b b C δ(ε, n) C a′ (λn) ′ ε ′ +(λn)− ′ ≤  1/C a /C b /C  b /C C(a′) (λn) ′ ε ′ +( λn)− ′ ≤ · a b b a (λn) ε +(λn)− , ≤ ·   107 for a = max C(a )1/C, a /C , and b = b /C, where the second line uses the inequality (x + y)1/C ′ ′ ′ ≤ x1/C + y1/C for x, y > 0 and C 1.  This establishes the soundness≥ part of the theorem. To show the entanglement bound, we observe that local isometries do not change the Schmidt rank of INTRO, y INTRO,y a state. Define ψ′ = φ( ψ ) ANC ANC′ . Since the strategy ( ψ′ , Aa , Ba ) is | IiNTRO | iINTRO⊗ | i ⊗ | i | i { } { INTRO} δ-close to ( θ , A , y , B , y ) which has value 1 δ, the strategy ( ψ , AINTRO, x , B , y ) | i { a } { a } − | ′i { a } { a } has value at least 1 2δ in the game , and therefore the Schmidt rank of φ( ψ ) (and thus of ψ ) must − VN | i | i be at least E ( , 1 2δ). Here, we use the fact that ancilla states are product states and therefore have VN − Schmidt rank 1. Moreover, recall that φ( ψ ) is δ-close to EPR Q AUX whose Schmidt coefficients are all at | i | 2i⊗ ⊗ | i most 2 Q/2. For any bipartite state a with Schmidt rank at most R and b whose Schmidt coefficients are − | i | i all at most β, it follows from the Cauchy-Schwarz inequality that a b 2 Rβ2. Therefore the Schmidt |h | i| ≤ rank of φ( ψ ) (and thus of ψ ) is at least | i | i λ (1 δ)2 2Q (1 δ)2 2N , − · ≥ − · where we used that Q = M log q R (using the canonical parameter settings of Definition 7.16) and ≥ δ φ( ψ ) θ 2 = 2 2 θ φ( ψ ). Combining the two lower bounds on the Schmidt rank of ψ ≥ k | i − | ik − ℜh | | i | i shows the desired lower bound on E ( ˆ INTRO, 1 ε). Vn −

108 9 Oracularization

9.1 Overview In this section we introduce the oracularization transformation. At a high level, the oracularization GORAC of a nonlocal game G is intended to implement the following: one player (called the oracle player) is supposed to receive questions (x, y) meant for both players in the original game G, and the other player (called the isolated player) only receives either x or y (but not both), along with a label indicating which player in the original game the question is associated with (we refer to such players as the original players, e.g. “original A player” and “original B player”). The oracle player is supposed to respond with an answer pair (a, b), and the other player is supposed to respond with an answer c. The oracle and isolated player win the oracularized game if (x, y, a, b) satisfies the predicate of the original game G and the isolated player’s answer is consistent with the oracle player’s answer. The oracularization step is needed in preparation to the next section, in which we perform answer re- duction on the introspection game. To implement answer reduction we need at least one player to be able to compute a proof, in the form of a PCP, that the decider of the original game would have accepted the questions (x, y) and answers (a, b). This requires the player to have access to both questions, and be able to compute both answers.

9.2 Oracularizing normal form verifiers Let =( , ) be a normal form verifier such that is an ℓ-level sampler for some ℓ 0. We specify the V S D S ≥ typed oracularized verifier ˆ ORAC =( ˆORAC, ˆ ORAC) associated with as follows. V S D V

Sampler. Define the type set ORAC = ORACLE, A, B . (In the remainder of this section we refer to RAC T { RAC} RAC the types in O as roles.) Define the type graph GO that is the complete graph on vertex set O (including self-loopsT on all vertices). Define the ORAC-type sampler ˆORAC as follows. For a fixedT index T S n N, let V be the ambient space of and Lw for w A, B be the pair of CL functions of on index ∈ S ∈ { } S n. ORAC w ORAC Define two -typed families of CL functions Lt : V V , for w A, B and t , as follows: T { → } ∈ { } ∈T t w L if t A, B , Lt = ∈ { } (Id if t = ORACLE . In other words, if a player gets the type t A, B , then they get the question that original player t would ∈ { } have received in the game played by . If they get type t = ORACLE, then they get the entire seed z that is Vn used by the sampler , from which they can compute both LA(z) and LB(z), the pair of questions sampled for the players in gameS . Vn By definition, the sampler distribution µ ˆORAC , n has the following properties. S 1. Conditioned on both players receiving the ORACLE role, both players receive z for a uniformly ran- dom z V. ∈ 2. Conditioned on both players receiving the isolated player role, the player(s) with role A (respec- tively, B) receives LA(z) (respectively, LB(z)) for a uniformly random z V. ∈ 3. Conditioned on player w A, B receiving the ORACLE role and player w receiving the isolated ∈ { } player role, their question tuple is distributed according to (ORACLE, z), (v, Lv(z)) if w = A and  109 (v, Lv(z)), (ORACLE, z) if w = B, where z V is uniformly random and v indicates the role of ∈ player w.  Decider. The typed decider ˆ ORAC is specified in Figure 11. D

ORAC Input to decider ˆ : (n, tA, xA, tB, xB, aA, aB). For w A, B , if t = ORACLE, then D ∈ { } w parse aw as a pair (aw,A, aw,B). Perform the following steps sequentially.

1. (Game check). For all w A, B , if t = ORACLE, then compute x = Lv(x ) for ∈ { } w w,v w v A, B . If rejects (n, x A, x B, a A, a B), then reject. ∈ { } D w, w, w, w, 2. (Consistency checks).

(a) If tA = tB and aA = aB, then reject. 6 (b) If for some w A, B , t = ORACLE, t A, B , and a t = a , then reject. ∈ { } w w ∈ { } w, w 6 w 3. Accept if none of the preceding steps rejects.

Figure 11: Specification of the typed decider ˆ ORAC. D

9.3 Completeness and complexity of the oracularized verifier We determine the complexity of the oracularized verifier and establish the completeness property. Theorem 9.1 (Completeness and complexity of the oracularized verifier). Let = ( , ) be a normal V S D form verifier. Let ˆ ORAC = ( ˆORAC, ˆ ORAC) be the corresponding typed oracularized verifier. Then the following hold. V S D (Completeness) For all n N, if has a PCC strategy of value 1, then ˆ ORAC has a symmetric • ∈ Vn Vn PCC strategy of value 1. (Sampler complexity) The sampler ˆORAC depends only on (and not on ). Moreover, the time • complexity of ˆORAC satisfies S S D S

TIME ˆORAC (n)= O (TIME (n)) , S S Furthermore, if is an ℓ-level sampler, then ˆORAC is a ℓ-level typed sampler. S S (Decider complexity) The time complexity of ˆ ORAC satisfies • D

TIME ˆ ORAC (n)= poly (TIME (n), TIME (n)) . D D S (Efficient computability) There is a Turing machine OracleVerifier which takes as input =( , ) • V S D and returns ˆ ORAC =( ˆORAC, ˆ ORAC) in time poly( ). V S D |V| Remark 9.2. Unlike with the Introspection transformation, we do not detype the oracularized verifier ˆ ORAC; this is because the analysis of the Answer Reduction transformation in Section 10 directly reduces toV the analysis of the typed oracularized verifier ˆ ORAC defined here. V Proof. We analyze the completeness and complexity properties of the typed verifier ˆ ORAC. V 110 Completeness. For n N, let S = ( ψ , A, B) be a PCC strategy for n with value 1. Consider the ∈ S ORAC | i ˆ ORAC V following symmetric strategy = ( ψ , M) for n . Depending on the role received, each player performs the following: | i V

1. Suppose the player receives role v A, B and question x. Then the player performs the measure- ∈ { } ment that player v would on question x according to strategy S to obtain outcome a (either Ax or { a } Bx , depending on v). The player replies with a. { a } w 2. Suppose the player receives role v = ORACLE and question x. The player first computes yw = L (x) for w A, B where for w A, B , Lw is the CL functions of corresponding to player w. Then, ∈ { } ∈ { } ORACLE, x S the player measures using the POVM M where { aA , aB } ORACLE, x yB yA (107) MaA, aB = BaB AaA .

yA yB The projectors AaA and BaB commute because (yA, yB) is distributed according to µ , n (over the S choice of x) and S is a commuting strategy for . Thus MORACLE, x is a projector. The player replies Vn aA, aB with (aA, aB). The strategy S ORAC is symmetric and projective by construction, and consistency follows from the S ˆ ORAC consistency of . We now argue that the strategy is commuting and has value 1 in the game n . We consider all possible pairs of roles. V

1. (Oracle, isolated) Suppose without loss of generality that player w = A gets the ORACLE role and player w = B gets the isolated player B role. Then player w gets question x and player w gets B v question L (x), where x is uniformly sampled from V. The oracle player computes yv = L (x) for all v A, B . Notice that (yA, yB) is distributed according to µ , n. The two players return ∈ { } S (aA, aB), a′B with probability

 ORACLE, x yB yB yA yB ψ Ma , a B ψ = ψ BaB AaA B ψ h | A B ⊗ aB′ | i h | ⊗ aB′ | i yA yB yB = ψ AaA BaB B ψ h | ⊗ aB′ | i yA yB = δ ψ Aa Ba ψ , aB, aB′ h | A ⊗ B | i ORACLE, x where the first equality uses the definition of MaA, aB from Eq. (107), the second equality uses S S the consistency of , and the third equality uses the projectivity of . Notice that when aB = a′B, this is exactly the probability of obtaining answers (aA, aB) when player A and player B get question pair (yA, yB) in the game . Since S is value-1, the answers satisfy the decision procedure of Vn Vn with probability 1. Thus the oracle’s answers pass the “Game check” of the oracularized decider with certainty, and furthermore the oracle’s answers are consistent with the isolated player’s answers and thus pass the “Consistency check” with certainty as well. Commutativity of MORACLE, x and ByB follows from the commutativity of S for the game . aA, aB aB Vn 2. (Both oracle) If both players get the ORACLE role, then both players receive the same question x V. Using a similar analysis as for the previous item, the players return the same answer pair (thus passing∈ the “Consistency check”) and pass the “Game check”. Both players’ measurements commute because they are identical.

111 3. (Both isolated) Suppose that both players receive the same isolated player role (e.g., they both receive the isolated player role A). They then perform the same measurements, which produce the same outcomes due to the consistency of the strategy S , and thus they pass the “Consistency check”. Otherwise, suppose that one player receives the A role and the other player receives the B role. Then the decider ˆ ORAC automatically accepts. Furthermore, their measurements commute because their D questions are distributed according to µ , n, and S is a commuting strategy with respect to µ , n. S S Complexity. It is clear from the definition that ˆORAC depends only on . The time complexity of the sampler ˆORAC is dominated by those of the samplerS . The complexityS of ˆ ORAC is dominated by the S S D complexity of and performing consistency checks. The sampler ˆORAC is a max ℓ, 1 -level sampler D S { } because is an ℓ-level sampler and the new CL functions for t = ORACLE are 1-level. S Efficient computability. The description of ˆORAC can be computed, in polynomial time, from the de- scription of alone. The description of ˆ ORACScan be computed in polynomial time from the descriptions of and S. Moreover, in each case theD computation amounts to copying the description of and respectively,S D and adding constant-sized additional instructions. S D

9.4 Soundness of the oracularized verifier Theorem 9.3 (Soundness of the oracularized verifier). Let = ( , ) be a normal form verifier and RAC RAC RAC V S D ˆ O = ( ˆO , ˆ O ) be the corresponding typed oracularized verifier. Then there exists a function V S D δ(ε)= poly(ε) such that for all n N the following hold. ∈ ORAC 1. If val∗( ˆ ) > 1 ε, then val∗( ) 1 δ(ε). Vn − Vn ≥ − 2. For all ε > 0, we have that

E ˆ ORAC, 1 ε E ( , 1 δ(ε)) Vn − ≥ Vn − where E ( ) is as in Definition 5.13.  · Proof. Fix n N. Let S ORAC = ψ , A, B be a projective strategy for ˆ ORAC with value 1 ε for some ∈ | i Vn − 0 < ε 1. Let (t, x) ORAC V be a question to player w = A. In the event that t = ORACLE (which ≤ ∈T ×  occurs with probability 1/3), let y = Lv(x) for each v A, B . From the consistency check performed v ∈ { } by ˆ ORAC and item 1 of Fact 5.18, we have that for all v A, B and on average over x sampled by ˆORACD , ∈ { } S ORACLE, x v, yv A IB IA B . (108) av ⊗ ≈ε ⊗ av Here, we used that with probability 1/9 player w = A gets the ORACLE role and player w = B gets the isolated player v role; conditioned on this, player B gets question yv. Using the fact that the POVM elements AORACLE, x are projective and Fact 5.19, we get { aA, aB } ORACLE, x ORACLE, x B, yB A IB A B aA, aB ⊗ ≈ε aA, aB ⊗ aB ORACLE, x B, yB ε A Ba ≈ aA ⊗ B (109) B, yB A, yA IA B B ≈ε ⊗ aB aA A, yA B, yB IA B B . ≈ε ⊗ aA aB 112 Using Item 2a of the consistency check, we have that on average over a random y = LA(x) V, ∈ A, y A, y A IB IA B . (110) a ⊗ ≈ε ⊗ a Define, for all x LA(V), measurement operators Cx where Cx = AA, x. Similarly, define Dx = BB, x. ∈ { a }a a a a a This defines a strategy S =( ψ , C, D) for the game n that we now argue succeeds with high probability. | i A V B Let x V be uniformly random. Let yA = L (x) and yB = L (x). ∈ ORACLE, x B, yB A, yA A IB IA B B aA, aB ⊗ ≈ε ⊗ aB aA A B A , yA B , yB ≈ε aA ⊗ aB = CyA DyB . aA ⊗ aB The first approximation follows from Equation (109). The second approximation follows from Equa- y B, yB tion (110) and Fact 5.19 (where we let Cb in Fact 5.19 represent BaB ). The last equality follows from yA yB definition of CaA and DaB . The pair of questions (yA, yB) V V is distributed according to µ , n. ∈ × S The game check part of ˆ ORAC succeeds with probability 1 O(ε), which implies that the answer pair D ORACLE, x − (aA, aB) that arises from the measurement A IB is accepted by the decider on question pair aA, aB ⊗ D (yA, yB) with probability 1 O(ε). This in turn implies that the strategy S = ( ψ , C, D) succeeds with − | i probability 1 O(√ε) in the game . As an additional consequence, the Schmidt rank of ψ must be at − Vn | i least E ( , 1 O(√ε)). Vn −

113 10 Answer Reduction

In this section we show how to transform a normal form verifier = ( , ) into an “answer reduced” V S D normal form verifier AR =( AR, AR) such that the values of the associated nonlocal games are directly related, yet the answer-reducedV S verifier’sD decision runtime is only polylogarithmic in the answer length of the original verifier (the answer-reduced verifier’s sampling runtime remains polynomially related to the sampling runtime of ). The polylogarithmic dependence is achieved by composing a probabilistically checkable proof (PCP)S with the oracularized verifier given in Section 9. This step generalizes the answer reduction technique of [NW19, Part V]. Given index n N, the answer reduced verifier AR simulates the oracularization ORAC of on ∈ V V V index n. To do so, it first samples questions x and y using the oracularized sampler ORAC and distributes S them to the players, who compute answers a and b. Let us suppose that the first player is assigned the ORACLE role, and parse their question and answer as pairs x = (xA, xB) and a = (aA, aB), while the second player is an “isolated” player receiving the question xA and responding with answer b. Instead of executing the decider ORAC on the answers (a, b), the verifier AR asks the first player to compute a PCP D V AR Π of (n, xA, xB, aA, aB) = 1, and the second player to compute an encoding g of answer b. then D b V requests randomly chosen locations of the proof Π and the encoding gb, and executes the PCP verifier on the players’ answers. By the soundness of the PCP, AR accepts with high probability only if the player’s V answers satisfy (n, xA, xB, aA, aB)= 1 and b = aA. There are severalD challenges that arise when implementing answer reduction. One challenge, already encountered in [NW19], is that we need to ensure the PCP Π computed by the first player can be cross-tested against the encoding gb computed by the second player (who doesn’t know the entire structure of the PCP Π). This was handled in [NW19] by using a special type of PCP called a probabilistically checkable proof of proximity (PCPP), which allows one to efficiently check that a specific string x is a satisfying assignment to a Boolean formula ϕ, as opposed to simply checking that ϕ is satisfiable. In a PCPP, an encoding of the specific string x is provided separately from the proof of satisfiability. The answer reduction scheme of [NW19] was able to use an “off-the-shelf” PCPP in a relatively black-box fashion to handle this. In our answer reduction scheme, however, there is a further requirement: we need the question distri- bution of the answer reduced verifier to be conditionally linear. This is necessary to maintain the invariant that the verifier after each step of the compression procedure (introspection, answer reduction, parallel repe- tition) is a normal form verifier. Unfortunately, simulating the question distributions of off-the-shelf PCPPs with conditionally linear distributions can be quite cumbersome. Instead, we design a bespoke PCP verifier for the protocol whose question distribution is more easily seen to be conditionally linear. This section is organized as follows. We start with some preliminaries on formulas and encodings in Section 10.1. In Section 10.2 we show how to use the Cook-Levin reduction to reduce the Bounded Halting problem for deciders to a succinct satisfiability problem called Succinct-3SAT. Following this, in Section 10.3, we reduce the Succinct-3SAT instance to an instance of a related problem called Succinct Decoupled 5SAT, which is easier to use in our answer reduction step. Then in Section 10.4 we introduce a PCP for Succinct Decoupled 5SAT. The verifier for the PCP expects a proof consisting of the evaluation tables of low-degree polynomials, including the low-degree encodings of the players’ answers a and b. In Section 10.5 we provide the definition of a normal-form verifier AR that executes the composition of ORAC with the PCP verifier from Section 10.4. In Section 10.6 we showV completeness of the construction andV analyze its complexity. In Section 10.7 we prove soundness.

114 10.1 Circuit preliminaries Recall the definitions pertaining to Turing machines from Section 3.1.

Remark 10.1 (Plugging integers into circuits). Let be a circuit with a single input of length n. Inputs to C C are strings x 0, 1 n. In this section, we will also allow to receive inputs a 0, 1, . . . , 2n 1 . In ∈ { } C ∈ { − } doing so, we use the convention that a number a between 0 and 2n 1 is interpreted as its n-digit binary − encoding binary (a) when provided as input to a set of n single-bit wires. In other words, (a) = (x), n C C where x = binaryn(a). More generally, if the circuit has k different inputs of length n ,..., n , then we can evaluate it on C 1 k inputs a 0, 1, . . . , 2n1 1 ,..., a 0, 1, . . . , 2nk 1 as follows: 1 ∈ { − } k ∈ { − } (a ,..., a )= (x ,..., x ) , C 1 k C 1 k where x = binary (a ),..., x = binary (a ). 1 n1 1 k nk k A 3SAT formula is a Boolean formula in conjunctive normal form in which at most three literals appear m in each clause. More precisely, ϕ is a 3SAT formula on N variables x1, x2,..., xN if it has the form j=1 Cj and each clause is the disjunction of at most three literals, where a literal is either a variable or its Cj xVi negation x . We use xo to denote the literal x if o = 1 and x if o = 0. ¬ i i i ¬ i Definition 10.2 (Succinct description of 3SAT formulas). Let N = 2n, and let ϕ be a 3SAT formula on N variables named x0,..., xN 1. Let be a Boolean circuit with 3 inputs of length n and three single-bit − C inputs. Then is a succinct description of ϕ if for each i , i , i 0,1,..., N 1 and o , o , o 0, 1 , C 1 2 3 ∈ { − } 1 2 3 ∈ { } (i , i , i , o , o , o )= 1 (111) C 1 2 3 1 2 3 if and only if xo1 xo2 xo3 is a clause in ϕ. In Equation (111), we use the notation from Remark 10.1. i1 ∨ i2 ∨ i3 Definition 10.3 (Succinct-3SAT problem). The Succinct-3SAT problem is the language containing encod- ings of circuits in which is a succinct description of a satisfiable 3SAT formula ϕ. C C 10.2 A Cook-Levin theorem for bounded deciders

Definition 10.4 (Bounded Halting problem). The k-input Bounded Halting problem is the language BHk containing the set of tuples (α, T, z ,..., z ) where α is the description of a k-input Turing machine, T N, 1 k ∈ z ,..., z 0, 1 , and accepts input (z ,..., z ) in at most T time steps. 1 k ∈ { }∗ Mα 1 k We begin by defining natural encodings of a decider’s tape alphabet and set of states.

Definition 10.5 (Decider encodings). Let be a decider with tape alphabet Γ = 0, 1, and set of D 2 { ⊔} states K. We will write encΓ : Γ  0, 1 for the function which encodes the elements of Γ, and a special “” symbol described below,∪ { as} →length-two { } binary strings in the following manner:

encΓ(0)= 00 , encΓ(1)= 01 , encΓ( )= 10 , encΓ()= 11 . ⊔ In addition, we write enc : K 0, 1 κ for some arbitrary fixed κ-bit encoding of the elements of K, K → { } where κ = log( K ) . ⌈ | | ⌉

115 Now, we give the main result of this section. It states that any decider can be converted into a circuit D C which succinctly represents a 3SAT formula ϕ that carries out the time T computation of . In addition, 3SAT D is extremely small—size poly log(T) rather than poly(T). C Proposition 10.6 (Succinct representation of deciders). There is an algorithm with the following properties. Let be a decider, let n, T, Q, and σ be integers with Q T and σ, and let x and y be strings of D ≤ |D| ≤ length at most Q. Then on input ( , n, T, Q, σ, x, y), the algorithm outputs a circuit on 3m + 3 inputs D m C which succinctly describes a 3SAT formula ϕ3SAT on M = 2 variables. Furthermore, ϕ3SAT has the following property:

For all a, b 0, 1 2T , there exists a c 0, 1 M 4T such that w = (a, b, c) satisfies ϕ if and • ∈ { } ∈ { } − 3SAT only if there exist a , b 0, 1 of lengths ℓ , ℓ T, respectively, such that prefix prefix ∈ { }∗ a b ≤ T ℓ T ℓ a = encΓ(a , − a ) and b = encΓ(b , − b ) prefix ⊔ prefix ⊔ and accepts (n, x, y, a , b ) in time T. D prefix prefix Finally, the following statements hold:

1. The parameter m controlling the number of inputs to the circuit depends only on T and σ, and m(T, σ)= O(log(T)+ log(σ)),

2. has at most s(n, T, Q, σ)= poly(log(n),log(T), Q, σ) gates, C 3. The algorithm runs in time poly(log(n),log(T), Q, σ),

4. Furthermore, explicit values for m(T, σ) and s(n, T, Q, σ) can be computed in time polynomial in n,log(T), Q, σ. Proposition 10.6 is essentially the standard fact that Succinct-3SAT is an NEXP-complete language, i.e. that every nondeterministic computation which takes time 2n can be represented as a Succinct-3SAT instance of size only poly(n). However, it has several peculiarities which requires us to prove it from scratch rather than simply appealing to the NEXP-completeness of Succinct-3SAT. First, we require that the coordinates of a and b embed into w not randomly but as its lexicographically first coordinates (for reasons that are explained below in Section 10.3). Second, we need explicit bounds on how quantities such as the size of relate to quantities such as σ, an upper bound on the description length of in bits. To proveC Proposition 10.6, we follow the standard proof that Succinct-3SAT is NEXPD -complete as pre- sented in [Pap94]. This proof observes that the Cook-Levin reduction, which is used to show that 3SAT is NP-complete, produces a 3SAT instance whose clauses follow such a simple pattern that they can be described succinctly using an exponentially-smaller circuit. One key difference in our proof is that we will apply the Cook-Levin reduction directly to the 5-input Turing machine , which by Section 3.1 has 7 tapes; D traditional proofs such as the one in [Pap94] would first convert to a single-tape Turing machine single, and then apply the Cook-Levin reduction for single-tape TuringD machines to . Though this addsD no- Dsingle tational overhead to our proof, it allows us to more easily track of which variables in ϕ3SAT correspond to the strings a and b (see Proposition 10.6 to see what these refer to). Proof of Proposition 10.6. The Cook-Levin reduction considers the execution tableau of when run for D time T. The execution tableau contains, for each time t 1,..., T , variables describing the state of ∈ { } D and the contents of each of its tape cells at time t. More formally, it consists of the following three sets of variables.

116 1. For each time t 1,..., T , tape i 1,...,7 , and tape position j 0,1,..., T + 1 , the tableau contains two∈ { Boolean-valued} variables∈ { } ∈ { }

c =(c , c ) 0, 1 2 t,i,j t,i,j,1 t,i,j,2 ∈ { } which are supposed to correspond to the contents of the j-th tape cell on tape i at time t according to encΓ( ). The variables with j 0, T + 1 do not correspond to any cell on the tape; rather, the · ∈ { } j = 0 variables correspond to the left-boundary of the tape, and the j = T + 1 variables correspond to the right-boundary of the first T cells on the tape. These are expected to always contain the special boundary symbol “”, i.e. c should be equal to encΓ() whenever j 0, T + 1 . As we will t,i,j ∈ { } see below, it is convenient to define these so that for each t 1,..., T , i 1,...,7 , and ∈ { } ∈ { } j 1,..., T , the variable ct,i,j also has a variable to its left ct,i,j 1 and to its right ct,i,j+1. ∈ { } − 2. For each time t 1,..., T , tape i 1,...,7 , and tape position j 0,1,..., T + 1 , the ∈ { } ∈ { } ∈ { } tableau contains Boolean-valued variables h 0, 1 which are supposed to indicate whether the t,i,j ∈ { } i-th tape head is in cell j at time t. For the boundary cells j 0, T + 1 , we expect that h = 0 for ∈ { } t,i,j all t 1,..., T and i 1,...,7 . ∈ { } ∈ { } 3. For each time t 1,..., T , the tableau contains κ Boolean-valued variables ∈ { } s =(s ,..., s ) 0, 1 κ t t,1 t,κ ∈ { } which are supposed to correspond to the state of at time t according to enc ( ). D K · Finally, we let denote the set of all of these variables. In other words, V = c h s . V { t,i,j,k}t,i,j,k ∪ { t,i,j}t,i,j ∪ { t,k}t,k In total, the number of variables in the execution tableau is given by

= O T2 + T log( K ) = O T2 + T log( ) . (112) |V| · | | · |D|   The first term in Equation (112) corresponds to the tape cell encodings ct,i,j and ht,i,j, and the second term corresponds to the Turing machine state encodings s . The second equality uses the fact that K . t | | ≤ |D| As stated above, we expect the variables in to correspond to some time-T execution of the decider . V D However, in general these are just arbitrary 0, 1 -valued variables. We now describe a set of constraints { } placed on these variables which, if satisfied, ensure they do indeed correspond to some time-T execution of . These constraints will be split into two categories: (i) the constraints corresponding to the boundary, D which ensure that the t = 1 variables are initialized to a valid starting configuration and the j 0, T + 1 variables are set according to Items 1 and 2, and (ii) the constraints corresponding to the execution∈ { of }, D which ensure that the variables at each time (t + 1) follow from the variables at time t according to the computation of . We start with the boundary constraints, which are simple enough to be described with a 3SAT formula. D Definition 10.7. The boundary formula ϕ is the 3SAT formula on the variables described as Boundary V follows. Let o = enc (start) 0, 1 κ , where start K is the start state of . For the t = 1 boundary, K ∈ { } ∈ D

117 ϕBoundary contains the following set of clauses.

Indices Clauses i 1,...,7 h (113) ∈ { } 1,i,1 i 1,...,7 , j = 1 h (114) ∈ { } 6 ¬ 1,i,j k 1,..., κ sok (115) ∈ { } 1,k i 1,...,7 , j 1,..., T c c (116) ∈ { } ∈ { } ¬ 1,i,j,1 ∨ ¬ 1,i,j,2 i 1,...,5 , j < j′ 1,..., T c c (117) ∈ { } ∈ { } ¬ 1,i,j,1 ∨ 1,i,j′,1 i 6, 7 , j 1,..., T c and c (118) ∈ { } ∈ { } 1,i,j,1 ¬ 1,i,j,2 This is meant to be read as follows: for each row, the “Indices” column specifies the range of the indices that the clauses in the “Clauses” column are quantified over. For example, row (113) specifies that for all i 1,...,7 , ϕBoundary contains the clause h1,i,1. For the j 0, T + 1 boundary, ϕBoundary contains the∈ following { } set of clauses. ∈ { }

Indices Clauses t 1,..., T , i 1,...,7 , j 0, T + 1 , k 1, 2 c (119) ∈ { } ∈ { } ∈ { } ∈ { } t,i,j,k Rows (113) and (114) ensure that at time t = 1, each tape has exactly one tape head, and it is located on cell j = 1. Row (115) ensures that at time t = 1, the state is given by the start state start. (Recall the notation xo to denote the literal x if o = 1 and x if o = 0.) For the remaining rows, we recall that under i i ¬ i the encoding of the tape alphabet, encΓ( ) = 10 and encΓ() = 11. As a result, (i) row (119) ensures ⊔ that for all times and tapes, the cells j 0, T + 1 contain the  symbol, (ii) row (116) ensures that for ∈ { } time t = 1, no cell j / 0, T + 1 contains the  symbol, and (iii) row (118) ensures that for time t = 1 ∈ { } and tapes 6 and 7, all cells j 1,..., T contain the symbol. Finally, row (117) says that for tapes ∈ { } ⊔ i 1,...,5 , if cell j contains , then every cell j > j must contain as well. This means that the five ∈ { } ⊔ ′ ⊔ strings encoded by c ,..., c each consist of a string of 0’s and 1s followed by a string of ’s. In short, 1,1 1,5 ⊔ suppose we write n, x, y, a, and b for the prefixes of these strings with no ’s. If ϕ is satisfied, then ⊔ Boundary the execution tableau correctly encodes that the tapes of contain inputs n, x, y, a, and b at time t = 1. Next, we describe the execution constraints. These areD more complicated than the boundary constraints, and so we will begin by describing them in terms of a general Boolean circuit known as the local check circuit. For any time t 1,..., T 1 and tape positions j ,..., j 1,..., T , the local check circuit ∈ { − } 1 7 ∈ { } can check that the execution tableau properly encodes these tape positions at time t + 1 by looking only at the encodings of these tape positions and their neighbors (i.e. the tape positions j 1 for each i 1,...,7 ) i ± ∈ { } at time t. Definition 10.8. In this definition, we will define the local check circuit . It has the following inputs. CCheck For each i 1,...,7 , it has the eight inputs • ∈ { }

cCheck,0,i, 1 cCheck,0,i,0 cCheck,0,i,1 hCheck,0,i, 1 hCheck,0,i,0 hCheck,0,i,1 − − (120) cCheck,1,i,0 hCheck,1,i,0

where the c-inputs are in 0, 1 2 and the h-inputs are in 0, 1 . { } { } It has two inputs s , s 0, 1 κ. • Check,0 Check,1 ∈ { }

118 In addition, for each time t 1,..., T 1 and tape positions j ,..., j 1,..., T , we will define a ∈ { − } 1 7 ∈ { } circuit Check,t,j1,...,j7 by associating the inputs of Check with certain variables in . We do this by associating the followingC eight inputs from with the correspondingC variables in EquationV (120): V

ct,i,j 1 ct,i,j ct,i,j +1 ht,i,j 1 ht,i,j ht,i,j +1 i− i i i− i i ct+1,i,ji ht+1,i,ji as well as by associating st and st+1 from with sCheck,0 and sCheck,1, respectively. To define the behavior of , itV will be more convenient to define the behavior of the circuits CCheck for all values of t, j ,..., j . However, it will be clear that each of these is in fact the same CCheck,t,j1,...,j7 1 7 circuit applied to different inputs, and hence this will define Check as well. Now, the circuit Check,t,j1,...,j7 acts as follows. C C

Suppose for each i 1,...,7 , exactly one of the tape positions j 1, j , and j + 1 at time t • i i i contains a tape head.∈ First,{ } computes the transition function− of the Turing machine CCheck,t,j1,...,j7 applied to the contents of these 7 tape positions and the state of at time t, which produces the state D of and contents of these tape positions at time t + 1, as well as directions to move the 7 tape heads D in. Then checks that the variables for the tape positions j ,..., j and the state of at CCheck,t,j1,...,j7 1 7 D time t + 1 match what they should be. In addition, if a tape head is marked as moving into a tape cell containing a  symbol, that tape head remains in place instead.

Suppose for each i 1,...,7 , none of the tape positions j 1, j , or j + 1 contain a tape head • ∈ { } i − i i at time t. Then checks that the variables for the tape positions j ,..., j at time t + 1 CCheck,t,j1,...,j7 1 7 match the variables at time t. Otherwise, accepts. • CCheck,t,j1,...,j7 This defines , and therefore . CCheck,t,j1,...,j7 CCheck Definition 10.8 shows the utility of introducing the boundary variables ct,i,0 and ct,i,T+1. The circuit Check checks the contents of a cell ct,i,j at time t + 1 by looking at the cell and its neighbors ct,i,j 1, ct,i,j+1 C − at time t. However, those cells with j 1, T only have either a left neighbor or a right neighbor, and so without the boundary variables we’d have∈ { to introduce} two other local check circuits designed just for these boundary cases. The boundary variables then allow us to use the same local check circuit for all cells. Proposition 10.9. The circuit has size at most poly( ) and can be computed in time poly( ). CCheck |D| |D| Proof. The circuit has 7 12 + 2 κ = 84 + 2κ total Boolean inputs, giving a total of 284 4κ = O( K 2) possible input strings. For· each possible· fixed input string, will check if the actual input· is equal| | to CCheck the fixed input, which takes O(κ) gates, and then it will accept if the fixed input should be accepting. This 2 takes O( K κ) gates. Computing Check requires looping over all possible input strings and checking which ones| | are· accepting or rejecting.C This requires computing the transition function of , a task which D takes time poly( ). The proposition follows by noting that K . |D| | | ≤ |D| Proposition 10.10. Suppose that the execution tableau satisfies ϕ and the circuit , for Boundary CCheck,t,j1,...,j7 each t 1,..., T 1 and j ,..., j 1,..., T . Then the execution tableau correctly encodes the ∈ { − } 1 7 ∈ { } execution of on input (n, x, y, a, b) when run for time T. D

119 Proof. To show this, we will show for each time t 1,... T that the variables in the execution tableau ∈ { } corresponding to time t correctly encode the state of and the contents of the seven tapes at time t. The D proof is by induction on t. The base case of t = 1 follows from the tableau satisfying ϕBoundary. Next we perform the induction step. Assuming the statement holds for time t 1,..., T 1 , we ∈ { − } will show it holds for time t + 1 as well. Let i 1,...,7 be a tape, and consider a tape position ∗ ∈ { } j 1,..., T . We will show that the variables correctly encode the contents of this tape position at time i∗ ∈ { } t + 1. Suppose one of the tape positions j 1, j , or j + 1 at time t has a tape head. For each of the other i∗ − i∗ i∗ tapes i = i∗, we select a tape position ji such that either ji 1, ji, or ji + 1 has a tape head at time t. (These positions6 are guaranteed to exist since each tape has exactl−y one tape head.) By the induction hypothesis, for each tape i 1,...,7 the variables corresponding to tape cells j 1, ∈ { } i − j , and j + 1 correctly encode the contents of these cells at time t. By assumption, evaluates i i CCheck,t,j1,...,j7 to 1. In this case, it calculates the transition function of to compute the contents of the tape cells j ,..., j D 1 7 at time t + 1 and checks that the corresponding variables encode these contents. As a result, the tape position on tape is correctly encoded. In addition, it computes the state of at time and checks that the ji∗ i∗ t + 1 corresponding variables encode this state. This completes the inductionD step. The case when none of the tape positions j 1, j , and j + 1 at time t contain a tape head follows similarly. Finally, the variables i∗ − i∗ i∗ for all tape positions j 0, T + 1 are correctly encoded due to ϕ being satisfied. ∈ { } Boundary Proposition 10.10 gives a set of constraints that ensure the execution tableau properly encodes the ex- ecution of . Our next step will be to convert these constraints into a single 3SAT formula. This entails transformingD each circuit into a 3SAT formula. We do so using the following reduction. CCheck,t,i1,...,i7 Proposition 10.11 (Circuit-to-3SAT). There is an algorithm which, on input a size-r circuit on variables C x 0, 1 n, runs in time poly(r) and outputs a 3SAT formula ϕ on variables x 0, 1 n and y 0, 1 r ∈ { } ∈ { } ∈ { } with O(r) clauses such that for all x, (x)= 1 if and only if there exists a y such that x and y satisfy ϕ. C Proof. This is the textbook circuit-to-3SAT reduction. For each gate i 1,..., r , the algorithm intro- ∈ { } duces a variable yi 0, 1 , so that the total collection of variables is total = xi i 1,...,n yi i 1,...,r . ∈ { } V { } ∈{ } ∪ { } ∈{ } Consider gate i 1,..., r , and let z , z be the pair of variables feeding into it. If gate i is an ∈ { } 1 2 ∈ Vtotal AND gate, then ϕ includes the constraints

( z z y ) , (z z y ) , (z z y ) , ( z z y ) . (121) ¬ 1 ∨ ¬ 2 ∨ i 1 ∨ 2 ∨ ¬ i 1 ∨ ¬ 2 ∨ ¬ i ¬ 1 ∨ 2 ∨ ¬ i These constraints are satisfied if and only if y = z z . The case of gate i being an OR gate follows i 1 ∧ 2 similarly. Finally, ϕ includes the constraint (y ), where i 1,..., r is the output gate. It follows that i∗ ∗ ∈ { } x, y satisfy ϕ if and only if (x) = 1 and for each i 1,..., r , y is the value computed by gate i in C ∈ { } i circuit on input x. In total, ϕ has 4r + 1 clauses and is computable in time poly(r). C We now apply the algorithm from Proposition 10.11 to , which by Proposition 10.9 has size r CCheck ≤ poly( ). This produces a 3SAT formula ϕ on the variables in plus auxiliary variables a for |D| Check CCheck k k 1,..., r added by the reduction. Now, for each t 1,..., T 1 and j ,..., j 1,..., T , we ∈ { } ∈ { − } 1 7 ∈ { } define a 3SAT formula ϕ analogously to . We begin by associating those variables Check,t,j1,...,j7 CCheck,t,j1,...,j7 in ϕ which come from ’s inputs with the variables in as in Definition 10.8. Next, for each Check CCheck V k 1,..., r , we introduce a new variable a 0, 1 and associate it with the variable a . This ∈ { } t,j1,...,j7,k ∈ { } k defines ϕCheck,t,j1,...,j7 . In summary, the final 3SAT instance produced by the Cook-Levin reduction is

ϕ := ϕBoundary ϕCheck,t,j1,...,j7 . (122) ∧ t,j ,...,j 1^ 7  120 By Proposition 10.11, the execution tableau properly encodes the execution of if and only if there exists D a setting to the auxiliary variables satisfying the 3SAT formula ϕ. In total, ϕ contains

V + O(T8) r O T2 + T log( )+ T8 poly( ) = poly(T, ) (123) | | · ≤ · |D| · |D| |D| variables. The second term on the left-hand side of Equation (123) corresponds to the auxiliary variables in each of the O(T8) copies of ϕ . The inequality follows by Equation (112) and the fact that r Check ≤ poly( ). |D| Our next is to represent the 3SAT formula ϕ succinctly. To do this, we will provide a circuit which C succinctly describes a 3SAT formula ϕ3SAT which, while not literally equal to ϕ, will be isomorphic to it. This means that, for example, ϕ3SAT may not even have the same number of variables as ϕ, but any variable in ϕ will correspond in a clear and direct manner to a variable in ϕ3SAT, and any remaining variables in ϕ3SAT do not appear in any clauses. This circuit is constructed as follows. Definition 10.12. In this definition we construct the circuit . It has three inputs z , z , z of length m, C 1 2 3 which we specify below in Equation (124), and three inputs o , o , o 0, 1 . For each ν 1, 2, 3 , each 1 2 3 ∈ { } ∈ { } zν is supposed to specify a variable in ϕ according to a format we will now specify. If zν is not properly formatted, then it does not correspond to a variable in ϕ; if any of z , z , z is not properly formatted, then 1 2 3 C automatically outputs 0. Below, we will often write substrings of the zν’s as though they are integers from some specified range, i.e. a b,..., c . This means that a is represented as a binary string of length ∈ { } log(c + 1) , which is to be interpreted as the binary encoding of an integer between b and c. ⌈ ⌉ The input z is formatted as a string (ω, α, β , β , β , β ). The first substring ω has length m ( α + ν 1 2 3 4 − | | β + + β ) bits and is formatted to be the all-zeroes string. Its purpose is to pad the inputs to have | 1| · · · | 4| the length m we specify below. Next, α is formatted as an integer α 1, 2, 3, 4 . The variable encoded ∈ { } by zν is specified by βα, and the other three β’s should be the all-zeros string. We now specify the encoding of βα, conditioned on the value of α.

1. β1 is formatted as (t, i, j, k), where

t 1,..., T , i 1,...,7 , j 0,1,..., T + 1 , k 1, 2 . ∈ { } ∈ { } ∈ { } ∈ { }

This corresponds to the variable ct,i,j,k.

2. β2 is formatted as (t, i, j), where

t 1,..., T , i 1,...,7 , j 0,1,..., T + 1 . ∈ { } ∈ { } ∈ { }

This corresponds to the variable ht,i,j.

3. β3 is formatted as (t, k), where

t 1,..., T , k 1,..., κ . ∈ { } ∈ { }

This corresponds to the variable st,k.

4. β4 is formatted as (t, j1,..., j7, k), where

t 1,..., T 1 , j ,..., j 1,2,..., T , k 1,..., r . ∈ { − } 1 7 ∈ { } ∈ { }

This corresponds to the variable at,j1,...,j7,k.

121 In total, the length of these substrings is

α + β + + β = O(log(T)+ log(κ)+ log(r)) = O(log(T)+ log( )) . | | | 1| · · · | 4| |D| As a result, because σ, the length of the “padding” ω can be chosen so that each z has length |D| ≤ i m = O(log(T)+ log(σ)) . (124)

Checking that each zν is properly formatted can be done by checking that certain substrings of z1, z2, and z3 encode integers which fall within specified ranges. This can be done using O(m) gates. Having specified the inputs, we can now specify the execution of the circuit, and we may assume that z1, z2, z3 are properly formatted. Implementing the clauses from ϕBoundary is simple; we specify how to implement the clause from Equation (113).

Suppose for input z , α = 2. Then β can be parsed as (t, i, j). The circuit accepts if j = 1 and • 1 2 o = 1, regardless of z and z . This ensures that ϕ includes x xo2 xo3 for any z , z , o , o , 1 1 2 3SAT z1 ∨ z2 ∨ z3 2 3 2 3 which is equivalent to including the arity-one clause xz1 . This can be implemented with O(m) gates. Similar arguments can be used to implement the clauses from Equations (114)-(119) using O(m) gates apiece.

Implementing the clauses from the formulas ϕCheck,t,j1,...,j7 is more challenging. From Equation (121), we can see that any constraint in this formula always involves a variable of the form at,j1,...,j7,k for some k 1,..., r . ∈ { } 1. First check if one of its inputs z1, z2, z3 corresponds to such a variable. This can be done with O(1) gates simply by checking if for any of the zi’s, a = 4. If so, this specifies the values of t, j1,..., j7.

2. Each variable in ϕCheck,t,j1,...,j7 is associated with a variable in ϕCheck. The circuit checks if all the zi’s

are contained in ϕCheck,t,j1,...,j7 and then it computes which variable in ϕCheck they are associated with. We include below the example of checking whether z1 is associated with the variable cCheck,1,1,0,0. The circuit first computes t + 1, which takes O(m) gates. It then looks at z and checks • C 1 if α = 1. If so, then β1 = (t′, i′, j′, k′), and so it tests the equalities t′ = t + 1, i′ = 1, j′ = j1, and k′ = 0, each of which takes O(m) gates to test. If so, then z1 is associated with cCheck,1,1,0,0. There are poly( ) variables in ϕ and, for each variable z , it takes O(m) gates to determine |D| Check i whether zi is associated with this variable. As a result, because σ, this takes poly(σ,log(T)) gates to compute. |D| ≤

3. Whether z1, z2, and z3 share a clause in ϕCheck,t,j1,...,j7 depends only on which variables in ϕCheck they are associated with. As a result, after having computed these variables, the algorithm can hard-code whether the circuit should accept. C This completes the description of . In total, it contains poly(σ,log(T)) gates. Computing first requires C C computing ϕ , which takes time poly( ) poly(σ) by Propositions 10.9 and 10.11. After that, the Check |D| ≤ steps outlined above for construction take time poly(σ,log(T)). C The circuit succinctly describes ϕ in the loose sense described above. Recall that ϕ accepts if and C only if it encodes the execution of up to time T. We now modify to (i) hard code n, x, and y onto ’s D C D input tapes, and (ii) ensure that accepts. We demonstrate how to do so by hard coding n as an example. D 122 The input n is described by some string ν of length ℓ = O(log(n)). We would like to hard-code the • ℓ string (ν, T ) into the first tape of . To do this, we first check if z corresponds to the variable ⊔ − D 1 ct,i,j,k for some values of t, i, j, k. Then, we check if t = 1, the time where the inputs appear on the tapes, and i = 1, corresponding to the first tape. If so, we branch on whether j ℓ. If it is, then if ≤ k = 1 the circuit accepts if o1 = 0, and if k = 2 the circuit accepts if o1 = νj. Otherwise, if j > ℓ, then if k = 1 the circuit accepts if o1 = 1, and if k = 2 the circuit accepts if o1 = 0. This can be done with poly(log(n),log(T)) gates.

This modifies ϕ so that it only accepts if n is written on its first tape at input. We can similarly hard-code x and y onto the second and third input tapes and hard-code the accepting state as the final state of . In D total, this takes poly(log(n), x , y ,log(T), σ), which is poly(log(n), Q,log(T), σ) because x and y | | | | have length at most Q. It remains to ensure that the variables corresponding to the 4th and 5th at time t = 1 are the lexicographically- first named variables in ϕ. However, this is simple and can be done using poly(log(n), Q,log(T), σ) gates. This concludes the construction.

10.3 A succinct 5SAT description for deciders

Proposition 10.6 allows us to convert any decider and inputs n, x, y into a 3SAT formula ϕ3SAT, suc- cinctly described by a circuit , which represents it.D However, there are two undesirable properties of this construction, which we describeC below. o o o i1 i2 i3 1. First, evaluating any clause (w w w ) of ϕ3SAT requires evaluating the same assignment w i1 ∨ i2 ∨ i3 at three separate points. While this is fine when the assignment w is provided in full to the verifier, it can be a problem when the verifier is only able to query the points in w by interacting with a prover. In this case, the verifier might send the prover the values i1, i2, i3, who responds with three M bits b1, b2, b3 0, 1 , purported to be the values wi1 , wi2 , wi3 for some assignment w 0, 1 . As it will turn∈ out, { in} the answer reduced protocol below, the verifier will actually be able∈ { to force} the prover to reply using three different assignments. In other words, the prover will have three assignments w , w , w 0, 1 M such that, for any i , i , i provided to it by the verifier, it will 1 2 3 ∈ { } 1 2 3 respond with b1 = w1,i1 , b2 = w2,i2 , b3 = w3,i3 . However, even given this there is no guarantee that the three assignments are the same assignments (i.e. that w1 = w2 = w3). In the work of [NW19], this was accomplished by an additional subroutine called the intersecting lines test, which would enforce consistency between w1, w2, and w3. In this work, on the hand, we would like to relax the assumption that w1, w2, and w3 must be the same. This will allow us to not use the intersecting lines test, simplifying the answer reduction protocol. (In fact, the answer reduced verifier will not be querying the assignments directly, but rather low-degree encodings of these assignments.)

2. Second, we are guaranteed that w = (a, b, c) satisfies ϕ if and only if accepts (n, x, y, a, b). 3SAT D However, it is inconvenient that a and b are contained as substrings of w. To see why, recall from Section 9 that the oracularized verifier sometimes gives one prover a pair of questions (x, y) and another prover just one of the questions—say, x. The answer reduced verifier will sample its questions similarly; as for its answers, it might expect the first prover to respond with a string w =(a, b, c) that satisfies ϕ3SAT and the second prover to respond with a string a′ such that a = a′. Verifying that a = a′ requires the verifier to sample a uniformly random point from w, restricted to the coordinates in a. As it turns out, generating a uniform point from a substring is extremely cumbersome, though not impossible, to do when the verifier’s questions are expected to be sampled from conditional linear

123 functions. To remove this complication, though, we would instead prefer if the provers’ answers were formatted in a way that gave the verifier direct access to the strings a and b.

In the remainder of this section, we will show how to modify the Succinct-3SAT circuits produced by Proposition 10.6 in order to ameliorate these two difficulties. Doing so entails modifying the ϕ3SAT formula from Proposition 10.6 to produce a 5SAT formula ϕ5SAT. The clauses of ϕ5SAT will be of the form

ao1 bo2 wo3 wo4 wo5 , i1 ∨ i2 ∨ 1,i3 ∨ 2,i4 ∨ 3,i5 where a, b, w1, w2, and w3 are five separate assignments which are not assumed to be equal. The guarantee is that (a, b, w , w , w ) satisfies ϕ if and only if accepts (n, x, y, a, b). This addresses the two concerns 1 2 3 5SAT D from above: each clause is totally decoupled, meaning it samples 5 variables from 5 different assignments, and so no consistency check must be performed between the assignments. In addition, the first two strings exactly correspond to a and b, addressing the second item. We now formally define decoupled 5SAT instances and how they succinctly represent bounded deciders. Following that, we show how the succinct 3SAT instance ϕ3SAT produced by Proposition 10.6 can be modi- fied to produce a succinct 5SAT instance ϕ which represents . 5SAT D Definition 10.13 (Decoupled 5SAT and its succinct descriptions). A block of variables xi is a tuple xi =

(xi,0,..., xi,Ni 1). A formula ϕ on 5 blocks x1, x2,..., x5 of variables is called a decoupled 5SAT formula if every clause− is of the form xo1 xo2 xo3 xo4 xo5 , (125) 1, i1 ∨ 2, i2 ∨ 3, i3 ∨ 4, i4 ∨ 5, i5 for i 0,1,..., N 1 and o ,..., o 0, 1 . (Recall from Definition 10.2 that the notation xo means j ∈ { j − } 1 5 ∈ { } x if o = 1 and x if o = 0.) ¬ For each i 1, 2, . . . , 5 , suppose each N is a power of two, and write it as N = 2ni . Let be ∈ { } i i C a circuit with five inputs of length n , n ,..., n and five single-bit inputs. Then succinctly describes 1 2 5 C decoupled ϕ if, for all i 0,1,..., N 1 and o , o ,..., o 0, 1 , j ∈ { j − } 1 2 5 ∈ { } (i , i ,..., i , o , o ,..., o )= 1 (126) C 1 2 5 1 2 5 if and only if the clause in (125) is included in ϕ. As in Definition 10.2, we slightly abuse notation and use the convention that a number a between 0 and 2ni 1 is interpreted as its binary encoding binary (a) − ni when provided as input to a set of ni single-bit wires. Definition 10.14 (Succinct descriptions for bounded deciders). Let be a decider. Fix an index n N and ℓ D ∈ a time T N. Let L = 2 be a power of two that is at least as large as 2T. Let x and y be strings, r N ∈ ∈ and R = 2r. Consider a circuit with two inputs of length ℓ, three inputs of length r, and 5 single-bit inputs. Let ϕ C C be the decoupled 5SAT instance with two blocks of variables of size L and three blocks of size R which C succinctly describes. Then we say that succinctly describes (on inputs n, x, and y and time T) if, for L C R D all a, b 0, 1 , there exists w1, w2, w3 0, 1 such that a, b, w1, w2, w3 satisfy ϕ if and only if there ∈ { } ∈ { } C exist a , b 0, 1 of lengths ℓ , ℓ T, respectively, such that prefix prefix ∈ { }∗ a b ≤ L/2 ℓ L/2 ℓ a = encΓ(a , − a ) and b = encΓ(b , − b ) prefix ⊔ prefix ⊔ and accepts (n, x, y, a , b ) in time T. D prefix prefix

124 In this definition of succinct descriptions, the answers a and b are isolated, in that the first input of of C length ℓ indexes into a and the second input of length ℓ indexes into b. The next proposition shows how to construct such descriptions.

Proposition 10.15 (Explicit succinct descriptions). There is a Turing machine SuccinctDecider with the following properties. Let be a decider, let n, T, Q, and σ be integers with Q T and σ, and D ≤ |D| ≤ let x and y be strings of length at most Q. Then on input ( , n, T, Q, σ, x, y), SuccinctDecider outputs a D circuit with two inputs of length ℓ (T), three of length r (T, σ), and five single-bit inputs which succinctly C 0 0 describes on inputs n, x, and y and time T. Moreover, the following hold. D 1. ℓ (T)= log(2T) . 0 ⌈ ⌉ 2. r0(T, σ)= O(log(T)+ log(σ)), 3. has at most s (n, T, Q, σ)= poly(log(T),log(n), Q, σ) gates, C 0 4. SuccinctDecider runs in time poly(log(T),log(n), Q, σ), and the parameters ℓ0, r0, s0 can be com- puted from n, T, Q, σ in time poly(log(T),log(n),log(Q), σ). Proof. The Turing machine SuccinctDecider begins by running the algorithm in Proposition 10.6 on input ( , n, T, Q, σ, x, y) to produce a circuit 3SAT on 3r0 + 3 inputs, where r0 = O(log(T)+ log(σ)) is the D C ℓ parameter m from the proposition. Set ℓ = log(2T) , L = 2 0 and R = 2r0 . Given this, SuccinctDecider 0 ⌈ ⌉ returns the circuit with inputs i , i 0,1,..., L 1 , i , i , i 0,1,..., R 1 , o , o ,..., o C 1 2 ∈ { − } 3 4 5 ∈ { − } 1 2 5 ∈ 0, 1 , and { } (i , i ,..., i , o , o ,..., o )= 1, C 1 2 5 1 2 5 if one of the following conditions hold.

(i , i , i , o , o , o )= 1, C3SAT 3 4 5 3 4 5 (i < 2T) (i = i ) (o = o ) , 1 ∧ 1 3 ∧ 1 6 3 (i < 2T) (i = i 2T) (o = o ) , 2 ∧ 2 3 − ∧ 2 6 3 (i 2T) (i is odd) (o = 1) , 1 ≥ ∧ 1 ∧ 1 (i 2T) (i is even) (o = 0) , 1 ≥ ∧ 1 ∧ 1 (i 2T) (i is odd) (o = 1) , 2 ≥ ∧ 2 ∧ 2 (i 2T) (i is even) (o = 0) , 2 ≥ ∧ 2 ∧ 2 (i = i ) (o = o ) , 3 4 ∧ 3 6 4 (i = i ) (o = o ) . 4 5 ∧ 4 6 5

It is not hard to verify that testing “(i1 < 2T)” can be done with O(ℓ0) AND and OR gates, and testing (i2 = i3 2T) can be done with O(m) AND and OR gates. Using similar estimates for the remaining sub-circuits,− we compute

size( )= size( )+ O(r + ℓ )= size( )+ O(r ) C C3SAT 0 0 C3SAT 0 poly(log(T),log(n), Q, σ)+ O(log(T)+ log(σ)) . ≤ In addition, due to the simplicity of these modifications, we conclude that the runtime of SuccinctDecider is dominated by the runtime of the algorithm from Proposition 10.6, which is poly(log(T),log(n), Q, σ).

125 Now we show that succinctly describes on inputs n, x, and y and time T. To begin, we describe C D the decoupled 5SAT formula ϕ . Let us first consider the constraints in ϕ which are implied by the final constraint, i.e. those of the formC C

o1 o2 o3 o4 o5 a b (w1) (w2) (w3) i1 ∨ i2 ∨ i3 ∨ i4 ∨ i5 whenever i = i and o = o . For any fixed i , i , i , the negations o , o , o can take any values, and 4 5 4 6 5 1 2 3 1 2 3 as a result, the first three bits in the constraint vary over all assignments in 0, 1 3. This means that these o4 o5 { } constraints are satisfied if and only if (w2) (w3) is satisfied whenever i4 = i5 and o4 = o5. This, in i4 ∨ i5 6 turn, is equivalent to the constraint w3 = w4. Carrying out similar arguments for the entire circuit, we can express the formula ϕ as follows. C

ϕ (a, b, w1, w2, w3)= ϕ3SAT(w1, w2, w3) (w1,1 = a1) (w1,2 = b1) C ∧ ∧ L/2 T L/2 T (a =(10) − ) (b =(10) − ) (w = w ) (w = w ). ∧ 2 ∧ 2 ∧ 1 2 ∧ 2 3

Here, we write ϕ3SAT(w1, w2, w3) for the formula in which, for each constraint in ϕ3SAT, the first variable is taken from w1, the second from w2, and the third from w3. In addition, we write a = (a1, a2), where a is the first 2T bits in a and a is the remaining L 2T bits, and similarly for b = (b , b ). We also 1 2 − 1 2 write w1 = (w1,1, w1,2, w1,3), where w1,1 contains the first 2T bits in w1, w1,2 contains the second 2T bits, and w1,3 contains the remaining R 4T bits. As a result, ϕ is satisfied only if w1 = w2 = w3 = R −4T C (a1, b1, c) for some string c 0, 1 − . In this case, calling w = (a1, b1, c), ϕ is satisfied only if ∈ { } C ϕ3SAT(w) is. By Proposition 10.6, this implies that there exists string aprefix of length ℓa T such that T ℓ ≤ a = encΓ(a , a ). This, in turn, implies that 1 prefix ⊔ − T ℓ L 2T L/2 ℓ a =(a , a )=(encΓ(a , − a ), (10) − )= encΓ(a , − a ) , 1 2 prefix ⊔ prefix ⊔ using the fact that encΓ( ) = 10, and similarly for b. Finally, Proposition 10.6 implies that accepts ⊔ D (n, x, y, aprefix, bprefix) in time T. This completes the proof. As stated above, moving from 3SAT to 5SAT allows us to devote the first two inputs to a and b. In addition, we have added extra constraints into ϕ which enforce that w1 = w2 = w3, which means that we can relax this assumption on these assignments. C We now show a simple transformation that takes in a succinct circuit and outputs another succinct C circuit whose 5 inputs are “padded” to contain more input bits. C′ Proposition 10.16 (Padding). Let be a circuit of size s with two inputs of length ℓ, three inputs of length r, C and 5 single-bit inputs. Suppose succinctly describes on inputs n, x, and y and time T. Then there is an C D algorithm which takes as input ( , ℓ, r, ℓ , r ), with ℓ ℓ and r r, and in time poly(s, ℓ , r ) outputs a C ′ ′ ′ ≥ ′ ≥ ′ ′ circuit with the following properties. First, has two inputs of length ℓ , three inputs of length r , and 5 C′ C′ ′ ′ single-bit inputs, and its size is s + poly(ℓ , r ). Second, it succinctly describes on inputs n, x, and y and ′ ′ D time T. ℓ ℓ Proof. Write L = 2 , R = 2r and L = 2 ′ , R = 2r′ . The algorithm constructs the circuit which on ′ ′ C′ inputs i , i 0,..., L 1 , i , i , i 0,..., R 1 , and o ,..., o 0, 1 , outputs 1 if and only if 1 2 ∈ { ′ − } 3 4 5 ∈ { ′ − } 1 5 ∈ { }

126 one of the following conditions hold:

(i , i < L) (i , i , i < R) (i , i , i , i , i , o , o , o , o , o )= 1, 1 2 ∧ 3 4 5 ∧C 1 2 3 4 5 1 2 3 4 5 (i L) (i is odd) (o = 1) , 1 ≥ ∧ 1 ∧ 1 (i L) (i is even) (o = 0) , 1 ≥ ∧ 1 ∧ 1 (i L) (i is odd) (o = 1) , 2 ≥ ∧ 2 ∧ 2 (i L) (i is even) (o = 0) . 2 ≥ ∧ 2 ∧ 2 Now we show that succinctly describes on inputs n, x, and y and time T. To begin, we describe the C′ D decoupled 5SAT formula ϕ . The second and third constraints imply that for each i1 L, ϕ contains the C′ ≥ C′ constraint (a ) if i is odd and ( a ) if i is even. Likewise, the fourth and fifth constraints imply that for i1 1 ¬ i1 1 each i2 L, ϕ contains the constraint (bi ) if i2 is odd and ( ai ) if i2 is even. Thus, we can express the ≥ C′ 2 ¬ 2 formula ϕ as follows. C′

(L′ L)/2 (L′ L)/2 ϕ (a, b, w1, w2, w3)= ϕ (a1, b1, w1,1, w2,1, w3,1) (a2 =(10) − ) (b2 =(10) − ). (127) C′ C ∧ ∧ Here, we write a =(a , a ) and b =(b , b ), where a , b have length L and a , b have length L L, and 1 2 1 2 1 1 2 2 ′ − for each i 1, 2, 3 , we write w =(w , w ), where w has length R and w has length R R. ∈ { } i i,1 i,2 i,1 i,2 ′ − Now, suppose there exist w1, w2, w3 such that a, b, w1, w2, w3 satisfy ϕ . Then because succinctly C′ C represents , there exists aprefix, bprefix of lengths ℓa, ℓb T such that accepts (n, x, y, aprefix, bprefix). In addition, D ≤ D L/2 ℓ a = encΓ(a , − a ), 1 prefix ⊔ and likewise for b1. Equation (127) then implies that

L/2 ℓ (L L)/2 a =(a , a )=(encΓ(a , − a ), (10) ′− ) 1 2 prefix ⊔ L/2 ℓ (L L)/2 =(encΓ(a , − a ),encΓ( ) ′− ) prefix ⊔ ⊔ L /2 ℓ = encΓ(a , ′ − a ), prefix ⊔ where the third step used the fact that encΓ( ) = 10. As a similar statement holds for b, this establishes ⊔ that succinctly describes on inputs n, x, and y and time T. C′ D By combining Proposition 10.15 and Proposition 10.16 (choosing ℓ′ = r′ = r0 where r0 is from Proposition 10.15), we get the following:

Proposition 10.17 (Explicit padded succinct descriptions). There is a Turing machine PaddedSuccinctDecider with the following properties. Let be a decider, let n, T, Q, and σ be integers with Q T and σ, D ≤ |D| ≤ and let x and y be strings of length at most Q. Then on input ( , n, T, Q, σ, x, y), PaddedSuccinctDecider D outputs a circuit with five inputs of length m(T, σ) and five single-bit inputs which succinctly describes C D on inputs n, x, and y and time T. Moreover, the following hold.

1. m(T, σ)= O(log(T)+ log(σ)) and 2m 2T. ≥ 2. has at most s(n, T, Q, σ)= poly(log(T),log(n), Q, σ) gates, C 3. PaddedSuccinctDecider runs in time poly(log(T),log(n), Q, σ), and the parameters m(T, σ), s(n, T, Q, σ) can be computed from n, T, Q, σ in time poly(log(T),log(n),log(Q), σ).

127 10.4 A PCP for normal form deciders We give a probabilistically checkable proof (PCP) for the Bounded Halting problem specialized to the case of normal form deciders. Our PCP will use standard techniques from the algebraic, low-degree-code-based PCP literature. In particular, we slightly modify the PCP for Succinct-3SAT described in [NW19, Section 11] (which itself is based on the proof of the PCP theorem in [Har04]) to apply it to the decoupled Succinct- 5SAT instances described in Section 10.3. We follow their treatment closely. As the PCPs constructed in this section are only an intermediate object towards the normal form verifier introduced in the next section we do not include standard definitions on PCPs, and refer to these references (in particular [Har04]) for background. We begin with some preliminaries.

10.4.1 Preliminaries A key part of the PCP will be to design a function f : Fn F which is zero on the subcube H = → subcube 0, 1 n. Our next proposition shows that given such a function f , there is a way of writing it so that the fact { } that it is zero on Hsubcube is self-evidently true. Doing so involves showing that f can be written in a simple basis of polynomials which are constructed to be zero on the subcube. This fact is standard in the literature (see, for example, [BSS08, Lemma 4.11]), and we include its proof for completeness.

Proposition 10.18 (Polynomial basis of zero functions). Let F be a field. Define zero : F F as the → univariate polynomial x x(1 x). Define H = 0, 1 n. Suppose f : Fn F is an individual 7→ − subcube { } → degree d polynomial such that f (x) = 0 for all x H . Then there exist polynomials c ,..., c : ∈ subcube 1 n Fn F such that for all x Fn, → ∈ n f (x)= ∑ ci(x) zero(xi) . i=1 · In addition, for each i 1,..., n , c has individual degree-d. ∈ { } i Proof. To prove this, we first prove the following statement for each k 0,1,..., n : there exists an ∈ { } individual degree-d polynomial r : Fn F and polynomials c ,..., c : Fn F such that k → 1 k → k f (x)= ∑ ci(x) zero(xi)+ rk(x) . (128) i=1 ·

In addition, for each i 1,..., k , c has individual degree d and the degree of x in r is at most 1. ∈ { } i i k The proof is by induction on k, the base case of k = 0 being trivial. Now, we perform the induction step. Assuming that Equation (128) holds for k, we will show that it holds for k + 1 as well. Let rk be the polynomial guaranteed by the inductive hypothesis. We now divide rk by zero(xk+1) using polynomial division. This guarantees a polynomial ck+1 and an individual degree-d polynomial rk+1(x) such that

r (x)= c (x) zero(x )+ r (x) . k k+1 · k+1 k+1 In addition, c still has individual degree d (and in fact the degree of x in c is at most d 2). k+1 k+1 k+1 − Furthermore, for each i 1,..., k + 1 , rk+1 has degree at most 1 in xi. Plugging this into Equation (128), we see that ∈ { } k+1 f (x)= ∑ ci(x) zero(xi)+ rk+1(x) . (129) i=1 · This completes the induction.

128 Applying the k = n case of Equation (128), we see that

n f (x)= ∑ ci(x) zero(xi)+ r(x) , (130) i=1 · where the degree of each variables in r is at most 1. Since f (x) vanishes on the subcube Hsubcube, it must also be that r(x) vanishes on Hsubcube as well. We claim that r must therefore be the zero polynomial. We prove this by showing the following statement: for every polynomial s(x1,..., xn) with individ- ual degree 1 (i.e. a multilinear polynomial) that vanishes on all points x 0, 1 n, s must be the zero polynomial. We show this by induction on the number of variables. ∈ { } Consider the base case n = 1. Then s is a univariate polynomial with degree at most 1. However it is zero on two points, so therefore it must be the zero polynomial. Now for the inductive step: assuming the proposition holds for some n 1, we show that it holds for n + 1 as well. For any multilinear polynomials, we can write ≥ s(x1,..., xn+1)= xn+1s1(x1,..., xn)+ s2(x1,..., xn) where both s1 and s2 are n-variate multilinear polynomials. Fix xn+1 = 0. Then s(x1,..., xn, 0) = s (x ,..., x ). Since s vanishes on 0, 1 n+1, this implies that s vanishes on 0, 1 n as well. By the 2 1 n { } 2 { } inductive hypothesis, s2 is the identically zero polynomial. Now fix xn+1 = 1. Then s(x1,..., xn, 1) = s (x ,..., x ). Again, since s vanishes on 0, 1 n+1, this implies that s vanishes on 0, 1 n as well. Again 1 1 n { } 1 { } by the inductive hypothesis, s1 is the identically zero polynomial. This shows that s is the identically zero polynomial, completing the induction. Thus, r is the zero polynomial. Applying this fact to Equation (130), we arrive at the statement in the proposition.

10.4.2 ThePCP The problem. The input to the PCP verifier is a tuple ( , n, T, Q, σ, γ, x, y). Here, is a decider, n, T, D D Q, σ and γ are integers with Q T and σ, and x and y are a pair of strings of length at most Q each. ≤ |D| ≤ The goal of the verifier is to check whether there exists two strings aprefix and bprefix of length at most T such that halts on input (n, x, y, aprefix, bprefix) in time at most T. To do that the verifier makes random queries toD a specially encoded PCP proof Π, and decides whether to accept or reject based on the parts of Π that it reads. We first set the parameters used in the PCP construction.

Definition 10.19 (Parameters for the PCP). For all integers n, T, Q, σ, γ N such that Q T and ∈ ≤ |D| ≤ σ define the tuple pcpparams(n, T, Q, σ, γ)=(q, m, d, m′, s) as follows. Let m = m(T, σ), and s = s(n, t, Q, σ) be as in Proposition 10.17. Let a′ > 1 and 0 < b′ < 1 be the universal constants from Lemma 7.8. Define the following integers.

1. Let m′ = 5m + 5 + s. 2. Let q = 2k where k is the smallest odd integer satisfying the following:

(a) k (γb + 3a )/b log s. ≥ ′ ′ ′ ·  k  (b) (2 + 5k)m′/2 < 1/2. (c) km /2k s b′γ. ′ ≤ − 3. Let d = k.

129 Given n, T, Q, σ, γ represented in binary, the parameter tuple pcpparams(n, T, Q, σ, γ) can be computed in time poly(log(n),log(T),log(Q),log(γ),log(σ)). Next, we define the format of a valid PCP proof, which for our construction consists of evaluation tables of low-degree polynomials.

Definition 10.20. Given n, T, Q, σ, γ N and (q, m, d, m , s) = pcpparams(n, T, Q, σ, γ), a low-degree ∈ ′ PCP proof is a tuple Π of evaluation tables of polynomials Fm F and Fm′ g1,..., g5 : q q c0,..., cm′ : q F with all polynomials having individual degree at most . We divide the →input variables of → q d m′ c0,..., cm′ into blocks as follows: Fm′ z =( x ,..., x , o , w ) . q ∋ 1 5 Fm Fm F5 Fs q q q q |{z} |{z} Definition 10.21. Given a low-degree PCP|{z} proof Π and|{z} a point z = (x ,..., x , o, w) Fm′ , where 1 5 ∈ q x ,..., x Fm, o F5, and w Fs , the evaluation of Π at z is given by 1 5 ∈ q ∈ q ∈ q eval (Π)=(α ,..., α , β ,..., β ) F6+m′ , z 1 5 0 m′ ∈ q where αi = gi(xi) and βj = cj(z).

Theorem 10.22. There exists a Turing machine AR with the following properties. M 1. (Input format) The input to AR consists of two parts: a “decider specification” and a “PCP view.” M (a) (Decider specification) Let be a decider, n, T, Q, σ and γ be integers with Q T, and σ = D ≤ , and let x and y be strings of length at most Q. Let (q, m, d, m′, s)= pcpparams(n, T, Q, σ, γ) be|D| as in Definition 10.19. Then the corresponding decider specification is the tuple

( , n, T, Q, σ, γ, x, y). D (b) (PCP view) Let z Fm′ and let Ξ F6+m′ . Then the PCP view is the pair (z, Ξ). ∈ q ∈ q The Turing machine AR returns either 1 (accept) or 0 (reject). M 2. (Complexity): AR runs in time at most poly(log(T),log(n), Q, σ, γ). M For the remaining items, fix a decider specification ( , n, T, Q, σ, q, x, y); we think of AR as a function of the PCP view input only. D M

2. (Completeness): Suppose a , b 0, 1 are two strings of length ℓ , ℓ T, respectively, prefix prefix ∈ { }∗ a b ≤ such that halts in time T on input (n, x, y, a , b ). Write D prefix prefix M/2 ℓ M/2 ℓ a = encΓ(a , − a ) and b = encΓ(b , − b ) , prefix ⊔ prefix ⊔ where recall that by choice 2T M and is a special symbol that is encoded using two bits; thus ≤ ⊔ a and b are each an M-bit string. Then there exists a low-degree PCP proof (Definition 10.20) Π = with and , the low-degree encodings of and , respectively (g1,..., g5, c0,..., cm′ ) g1 = ga g2 = gb a b m (see Section 3.4), which causes AR to accept with probability 1 over a uniformly random z F ′ : M ∈ q

Pr AR(z, evalz(Π)) = 1 = 1. Fm z q ′ M ∈  130 3. (Soundness): Let Π be a low-degree PCP proof such that at a = (g1,..., g5, c0,..., cm′ ) AR 1 M uniformly random z accepts with probability larger than psound = 2 :

Pr AR(z, evalz(Π)) = 1 > psound . Fm z q ′ M ∈  Then there exist strings a, b 0, 1 M with the following properties. ∈ { } (a) There exist strings a , b 0, 1 of length ℓ , ℓ , respectively, such that prefix prefix ∈ { }∗ a b M/2 ℓ M/2 ℓ a = encΓ(a , − a ) and b = encΓ(b , − b ) . prefix ⊔ prefix ⊔ (b) halts in time T on input (n, x, y, a , b ). D prefix prefix (c) a = Dec(g1) and b = Dec(g2), where Dec( ) is the Boolean decoding map of the low-degree encoding defined in Section 3.4. ·

Proof of Theorem 10.22. We present the description of the Turing machine AR in Figure 12. M

The Turing machine AR takes the following as input: M Decider specification ( , n, T, Q, σ, γ, x, y) • D PCP view (z, Ξ) • The Turing machine performs the following steps sequentially:

1. Compute = PaddedSuccinctDecider( , n, T, Q, σ, x, y). The circuit has five m-bit C D C inputs and five single-bit inputs, and contains at most s AND and OR gates.

2. Compute the Tseitin formula corresponding to , which is a boolean formula on m′ = F Fm′ F C 5m + 5 + s variables. Let arith : q q denote the arithmetization of . (See [NW19, Section 3.8] for a definitionF of the→ Tseitin transformation and its arithmetization.)F

3. Parse z = (x, o, w) Fm′ where x ,..., x Fm, o F5, and w Fs . Parse ∈ q 1 5 ∈ q ∈ q ∈ q Ξ =(α ,..., α , β ,..., β ) F6+m′ . 1 5 0 m′ ∈ q 4. (Formula test) Reject if β0 = arith(x, o, w) (α1 o1) (α5 o5). Otherwise, con- tinue. 6 F · − · · · −

5. (Zero on subcube test) Reject if β = ∑m′ β zero(z ). Otherwise, accept. 0 6 i=1 i · i

Figure 12: The decision procedure AR. M

Before establishing the Complexity, Completeness, and Soundness properties of AR, we point out M several important items. Let M = 2m.

1. We recall what it means for the circuit computed in Step 1 of Figure 12 to succinctly describe C M the 5SAT formula ϕ . Let the inputs of the formula ϕ be strings a, b, u3, u4, u5 0, 1 . Then C C ∈ { }

131 m 5 for all x1,..., x5 0, 1 and o 0, 1 , we have that (x1,..., x5, o) = 1 if and only if o1 o2 o3 ∈o4 { }o5 ∈ { } C ax bx u u u is a clause in ϕ , where the coordinates of a, b, u3, u4, u5 are indexed 1 ∨ 2 ∨ 3,x3 ∨ 4,x4 ∨ 5,x5 C by strings of length m. Furthermore, the formula ϕ is related to the decider in the following way. For all a, b 0, 1 M, C M D ∈ { } there exist u3, u4, u5 0, 1 such that a, b, u3, u4, u5 satisfy ϕ if and only if there exist aprefix, bprefiix ∈ { } C ∈ 0, 1 of lengths ℓ , ℓ T, respectively, such that { }∗ a b ≤ M/2 ℓ M/2 ℓ a = encΓ(a , − a ) and b = encΓ(b , − b ) prefix ⊔ prefix ⊔ and accepts (n, x, y, a , b ) in time T. D prefix prefix 2. The boolean formula is related to in the following way. For all x ,..., x 0, 1 m and o F C 1 5 ∈ { } ∈ 0, 1 5, (x ,..., x , o)= 1 if and only if there exists a w 0, 1 s such that (x ,..., x , o, w)= { } C 1 5 ∈ { } F 1 5 1. The size of the formula is linear in the size of the circuit , which is s. F C 3. The arithmetization := arith ( ) is a function : Fm′ F such that Farith q F Farith q → q (x, o, w) 0, 1 m′ , (x, o, w)= (x, o, w) . (131) ∀ ∈ { } Farith F By [NW19, Proposition 3.29], is an individual degree 2 polynomial32. Farith

Complexity. We bound the complexity of the Turing machine AR. From Proposition 10.17, Step 1 takes M time poly(log(T),log(n), Q, σ). Computing the Tseitin formula in Step 2 takes time that is polynomial in the size of the circuit , which is poly(s) = poly(log(T),log(n), Q, σ). The Formula Test, Step 4, C requires computing the arithmetization at a point z Fm′ , which takes time poly(s,log q). The Farith ∈ q Zero on Subcube Test, Step 5, takes time poly(m′,log q) to compute a sum of m′ products of Fq elements (evaluating zero( ) takes time poly(log q)). Thus the complexity of AR is bounded by · M poly(log(T),log(n), Q, σ, γ) using our choice of pcpparams from Definition 10.19.

Completeness. We now establish the Completeness property of AR by concocting a PCP proof Π that M is accepted with probability 1. Let a, b 0, 1 M be as specified in Item 3 of Theorem 10.22. Then by definition of the formula ϕ , ∈ { } M C there exist strings u3, u4, u5 0, 1 such that (a, b, u3, u4, u5) satisfies ϕ . ∈ { } C Let g1, g2 denote the low-degree encodings of a and b, respectively (see Section 3.4 for definition of low- degree encoding). In particular, g ,..., g are m-variate multilinear polynomials, and for all x ,..., x 1 5 1 5 ∈ 0, 1 m, { }

g1(x1)= ax1 , g2(x2)= bx2 , g3(x3)= u3,x3 , g4(x4)= u4,x4 , g5(x5)= u5,x5 , where we index the coordinates of a, b, u3, u4, u5 by m-bit strings in the natural way.

32 Proposition 3.29 in [NW19] only establishes that arith has total degree at most O(s), where s is the number of gates of the circuit . By inspecting the construction, however, oneF can see that every variable in occurs at most twice, and therefore the C F arithmetization has individual degree 2. Farith

132 Next, define the polynomial c : Fm′ F as 0 q → q c (x, o, w)= (x, o, w) (g (x ) o ) (g (x ) o ) . 0 Farith · 1 1 − 1 · · · 5 5 − 5 Note that c has individual degree 3, since as recalled in item 3 above (x, o, w) has individual degree 0 Farith 2 and each of the gi’s are multilinear). We now show that c (x, o, w) = 0 for all (x, o, w) 0, 1 m′ . By construction, the arithmetization 0 ∈ { } arith(x, o, w) is either 0 or 1. If it is 0, then we are done. If it is 1, then this means that (x, o, w) = 1, F o1 o2 o3 o4 oF5 which by definition means that (x, o)= 1, which means that ax bx u u u is a clause in C 1 ∨ 2 ∨ 3,x3 ∨ 4,x4 ∨ 5,x5 the formula ϕ . But since (a, b, u3, u4, u5) satisfies ϕ , this means that at least one of ax1 = o1, bx2 = o2, u = o , u C = o , or u = o . But this means thatC the product (g (x ) o ) (g (x ) o )= 0, 3,x3 3 4,x4 4 5,x5 5 1 1 − 1 · · · 5 5 − 5 by definition of the gi’s. Thus c0(x, o, w)= 0. Thus, by Proposition 10.18, there exist polynomials c ,..., c : Fm′ F of individual degree at most 1 m′ q → q m′ Fm 3 that certify that c0 vanishes on 0, 1 . In other words, for all z q (not just over the boolean cube), we have { } ∈ m′ c0(z)= ∑ ci(z) zero(zi) (132) i=1 · where zero(x)= x(1 x). Let the PCP proof Π− be the collection of polynomials , where the ’s act on (g1,..., g5, c0, c1,..., cm′ ) gi disjoint variables and the c ’s act on all of them. Let z Fm′ and let Ξ = (α ,..., α , β , β ,..., β ) = i ∈ q 1 5 0 1 m′ evalz(Π). The PCP view (z, Ξ) passes the Formula Test always, because β0 = c0(z) and αi = gi(z) for i 1, 2, . . . , 5 . The PCP view also passes the Zero on Subcube Test, because of Equation (132). ∈ { } Thus the PCP view is accepted by the Turing machine AR with probability 1. This establishes the Completeness property. M

Soundness. Fix a low-degree PCP proof Π such that the PCP view Ξ for = (g1,..., g5, c0,..., cm′ ) (z, ) Ξ = eval (Π) causes AR to accept with probability greater than p . This, in particular, implies that z M sound the Formula Test is satisfied with high probability. With probability greater than psound over the choice of z Fm′ , we have ∼ q c (z)= (z) (g (x ) o ) (g (x ) o ) 0 Farith · 1 1 − 1 · · · 5 5 − 5 Note that c0(z), by definition of low-degree PCP proof, has individual degree at most d. The right-hand side has individual degree at most 2 + 5d, because has individual degree 2 and the g ’s have in- Farith i dividual degree at most d. Thus both sides have total degree at most (2 + 5d)m′. Suppose c0(z) was not identical to (z) (g (x ) o ) (g (x ) o ). Then the Schwartz-Zippel lemma implies Farith · 1 1 − 1 · · · 5 5 − 5 that the probability they agree on a randomly chosen z is at most (2 + 5d)m′/q. By our choice of q, m′, and d in Definition 10.19, this is less than psound, which is a contradiction. Thus c0(z) is equal to (z) (g (x ) o ) (g (x ) o ) for all z Fm′ . Farith · 1 1 − 1 · · · 5 5 − 5 ∈ q Next, we consider the Zero on Subcube Test. With probability greater than psound over the choice of z Fm′ , we have ∼ q m′ c0(z)= ∑ ci(z) zero(zi). i=1 · Again, the polynomials on both sides of the equation have individual degree at most d + 2, and therefore total degree at most (2 + d)m′. By the Schwartz-Zippel lemma, the probability that both sides would agree on the evaluation of a random z, if they weren’t equal polynomials, would be at most (2 + d)m′/q, which

133 m′ Fm′ is also less than psound. Thus c0(z) = ∑i=1 ci(z) zero(zi) for all z q . This in particular implies that m · ∈ c0(z) vanishes on the subcube 0, 1 ′ . { } M We can now decode assignments a, b, u3, u4, u5 0, 1 that satisfy the formula ϕ . Let a = ∈ { } C Dec(g ), b = Dec(g ), u = Dec(g ), u = Dec(g ), u = Dec(g ) where Dec( ) is the decoding 1 2 3 3 4 4 5 5 · map defined in Section 3.4. The fact that c (z) vanishes on the subcube 0, 1 m′ implies that all clauses 0 { } of ϕ is satisfied by a, b, u3, u4, u5. By construction of ϕ , this implies that there exists aprefix, bprefix such C C that (n, x, y, aprefix, bprefix) accepts in time T. This completes the proof of the Soundness property, and the proofD of the Theorem.

10.5 A normal form verifier for the PCP In this section we show how to convert the PCP from Section 10.4 into a normal form verifier. This results in an “answer reduction” scheme: a way to map a verifier into a new (typed) verifier ˆ AR with a smaller answer size. V V Let =( , ) be a normal form verifier and (λ, µ, γ) a tuple of integers. In the rest of this section we V S D define the (typed) answer-reduced verifier ˆ AR =( ˆAR, ˆ AR) associated with and parameters (λ, µ, γ). Completeness, complexity and soundnessV of the constructioS D n are shown in the followingV sections.

10.5.1 Parameters and notation We establish some notation and parameters that are used throughout Section 10.5. First, define the functions

T(n)=(2λn)µ and Q(n)=(λn)µ . (133)

In the main theorem of this section, Theorem 10.24, we assume that the input verifier satisfies V TIME (n) T(n) and TIME (n) Q(n) . (134) D ≤ S ≤ In what follows we write T and Q as free parameters (though they are implicitly functions of the index n). Next, for all integers n N define the PCP parameters (q, m, d, m , s) = pcpparams(n, T, Q, σ, γ) ∈ ′ where σ = . Note that the parameters (q, m, d, m , s) are all implicitly functions of n. |D| ′ Recall from Definition 4.14 the notation µ denoting the distribution over pairs of questions (xA, xB) S generated by . We use µ ,A to indicate the marginal distribution of µ on the first question xA, and S S S SUPP(µ ) to indicate the set of question pairs that have nonzero probability under µ . S S Remark 10.23. Throughout Section 10.5, for convenience we often identify the label A with 1 and B with 2. For example, gA is another label for the polynomial g1.

10.5.2 The answer-reduced verifier Let =( , ) be a normal form verifier and (λ, µ, γ) integers. All other required parameters and notation are introducedV S D in Section 10.5.1.

Sampler. In this section we define the (typed) sampler ˆAR for the (typed) answer reduced verifier ˆ AR. We first give an intuitive description of the sampler,S and then proceed to give a formal definition.V The sampler ˆAR is a product of the oracularized sampler ˆORAC corresponding to , and a typed PCP sampler ˆPCP S S S . The corresponding question distribution µ ˆAR is then a product distribution µ ˆORAC µ ˆPCP . S S S × S

134 The PCP distribution µ ˆPCP corresponds to six copies of the classical low-degree test question distribu- tion (see Section 7.1.1). RecallS that the classical low-degree test is a nonlocal game that checks whether the players’ answers are consistent with low-degree polynomials. The answer-reduced verifier will use the classical low-degree test to check that the players respond with evaluations of low-degree polynomials g ,..., g , c ,..., c , and then process the evaluations using the PCP verifier AR from Theorem 10.22. 1 5 0 m′ M Five of the six copies are meant to certify the “low-degreeness” of g1,..., g5, and the sixth copy is used to certify the “low-degreeness” of the entire bundle of polynomials . The reason that the g1,..., g5, c0,..., cm′ gi’s are checked individually is to ensure that they only depend on certain blocks of input variables, because ultimately the soundness of the answer-reduced verifier reduces to the soundness of the PCP verifier AR, M and Theorem 10.22 assumes that the polynomials g1,..., g5 of the PCP proof only depend on certain subsets of variables. We now formally define the typed PCP sampler ˆPCP. The type set is S PCP = POINT ,..., POINT ALINE ,..., ALINE DLINE ,..., DLINE , T { 1 6} ∪ { 1 6} ∪ { 1 6} and the type graph GPCP =( PCP, EPCP) uses the complete edge set EPCP = PCP PCP. The ambient vector space for the sampler isT T ×T

5 PCP V = V V V VAUX X VAUX I VAUX V , (135) i,X ⊕ i,I ⊕ i,V ⊕ , ⊕ , ⊕ , Mi=1  Fm F where the spaces Vi,X, Vi,V are each isomorphic to q , the space Vi,I is isomorphic to q, the spaces F5+s F VAUX,X, VAUX,V are each isomorphic to q , and the space VAUX,I is isomorphic to q. In addition, de- fine the following direct sums:

5

V X = V VAUX X , 6, i,X ⊕ ,  Mi=1  5

V I = V VAUX I , 6, i,I ⊕ ,  Mi=1  5

V V = V VAUX V . 6, i,V ⊕ ,  Mi=1  For all i 1, 2, . . . , 6 , we call the space V the i-th point register, the space V the i-th coordinate regis- ∈ { } i,X i,I ter, and the space Vi,V the i-th direction register (compare this with the subspaces defined in Section 7.1.1). For every type t PCP, we define the following CL functions on VPCP: ∈T For the types t = POINT for i 1,...,6 , the corresponding 1-level CL function LPOINT is • i ∈ { } i identical to the CL function LPOINT defined in Equation (46), but acts on the subspace V V i,X ⊕ i,I ⊕ Vi,V.

For the types t = ALINE for i 1,...,6 , the corresponding 2-level CL function LALINE is • i ∈ { } i identical to the CL function LALINE defined in Equation (47), but acts on the subspaces V V i,X ⊕ i,I ⊕ Vi,V.

For the types t = DLINE for i 1,...,6 , the corresponding 3-level CL function LDLINE is • i ∈ { } i identical to the CL function LDLINE defined in Equation (49), but acts on the subspaces V V i,X ⊕ i,I ⊕ Vi,V.

135 Observe that for i 1,...,5 , the CL distributions µL OINT ,L and µL OINT ,L INE correspond to the ∈ { } P i ALINEi P i DL i question distributions of the classical low-degree test parameterized by q and m (see Section 7.1.1 for the definition of the low-degree test distributions, which only depend on the first two arguments of the 4-tuple ldparams), and the CL distributions µ and µ correspond to the classical low-degree LPOINT6 ,LALINE6 LPOINT6 ,LDLINE6 test parameterized by q and m′. We finally give the formal definition of the sampler ˆAR. The type set of ˆAR is AR = ORAC PCP S S T T ×T and the type graph is GAR = GORAC GPCP with edge set × AR ORAC PCP E = (u, v), (u′, v′) : u, u′ E and v, v′ E . { } { } ∈ { } ∈  Let VORAC denote the ambient space of the oracularized sampler ˆORAC. The ambient space of ˆAR is then ORAC PCP S AR S the direct product V V . For all type pairs (tORAC, tPCP) , the corresponding CL function ⊕ ∈ T ˆORAC LtORAC ,tPCP is simply the direct product (i.e. concatenation) of the CL functions LtORAC (coming from ) PCP AR S and Lt (coming from ˆ ). Thus one can see that the distribution corresponding to ˆ is the product PCP S S distribution µ ORAC µ PCP . ˆ × ˆ Note that SˆAR is anSmax ℓ, 3 -level sampler, where ℓ is the number of levels of the sampler . S { } S Decider. The decider ˆ AR is described in Fig. 13. D 10.5.3 Main theorem for answer reduction Theorem 10.24. There exists a polynomial time Turing machine AnsRedVerifier that, on input ( , λ, µ, γ) V where is an ℓ-level normal form verifier and λ, µ, γ N, outputs the description of the answer reduced V ∈ verifier AR = ( AR, AR). Let T(n) and Q(n) be the functions specified in Eq. (133), and suppose that V S D satisfies the complexity conditions specified in Eq. (134). Then AR is max ℓ + 2, 5 -level, has time complexityV bounds V { }

µ TIME AR (n)= O TIME ˆORAC (n)+ poly(log(T(n)), , γ) = poly (λn) , , γ , S |D| |D| S µ TIME AR (n)= poly log(T(n)), Q(n), , γ = poly (λn) , , γ ,  D |D| |D| and furthermore the sampler AR only depends on , the parameter tuple (λ, µ, γ), and the description length (but nothing else aboutS ). Moreover, thereS exists a function |D| D a aµ b µbγ δ(ε, n)= aγ (λ n) ε +(λ n)− ·|D|· · ·|D|· for universal constants a > 0, 0 < b < 1, such that the following hold for all n  2: ≥ 1. (Completeness) If has a projective, consistent, and commuting (PCC) strategy of value 1, then Vn AR has a symmetric PCC strategy with value 1. Vn AR 2. (Soundness) If val∗( ) > 1 ε for some ε > 0 then val∗( ) 1 δ(ε, n). Vn − Vn ≥ − 3. (Entanglement) Let E ( ) be as defined in Definition 5.13. Then for all ε 0, · ≥ E ( AR, 1 ε) E ( , 1 δ(ε, n)) . Vn − ≥ Vn − Proof. The Turing machine AnsRedVerifier can be described as follows:

136 Type QuestionFormat AnswerFormat

POINT for i = 1,...,5 y Fm α F i i ∈ q i ∈ q ALINE for i = 1,...,5 v Fm F h : F F i i ∈ q × q i q → q DLINE for i = 1,...,5 v Fm F Fm h : F F i i ∈ q × q × q i q → q POINT z =(y, o, w) Fm′ (α ,..., α , β ,..., β ) Fm′+6 6 ∈ q 1′ 5′ 0 m′ ∈ q ALINE v Fm′ F (h ,..., h , f ,..., f ) : F Fm′+6 6 q q 1′ 5′ 0 m′ q q ∈ m × m → m +6 DLINE v F ′ Fq F ′ (h ,..., h , f ,..., f ) : Fq F ′ 6 ∈ q × × q 1′ 5′ 0 m′ → q Table 1: Question and answer formats for types in PCP. T AR On input (n, tA, xA, tB, xB, aA, aB), the decider ˆ parses tA, tB as (tQ,A, tΠ,A), and (tQ,B, tΠ,B) ORAC PCP D respectively in , parses xA and xB as (xQ,A, xΠ,A) and (xQ,B, xΠ,B) respectively. The decider performsT the following× T steps sequentially, for all w A, B : ∈ { } 1. (Global consistency check): If tA = tB, reject if aA = aB. 6 2. (Input consistency check): If t = ORACLE and t = v A, B , and if (tΠ , tΠ ) = Q,w Q,w ∈ { } ,w ,w (POINT , POINT ), reject if α = α (where A 1 and B 2, as per Remark 10.23). 6 v v 6 v′ ↔ ↔ 3. (Input low degree test) If tQ,w = tQ,w = v A, B , and if (tΠ,w, tΠ,w) = ∈ { } LD (POINT , ALINE ) (resp. DLINE instead of ALINE ), execute on input v v v v Dldparams (POINT, xΠ,w, ALINE, xΠ,w, aw, aw) (resp. DLINE instead of ALINE), where ldparams = (q, m, d,1). Reject if LD rejects. Dldparams 4. (Proof encoding checks): If tQ,w = tQ,w = ORACLE,

(a) (Consistency test) If (tΠ,w, tΠ,w)=(POINTi, POINT6) for some i 3,...,5 , reject if α = α . ∈ { } i 6 i′ (b) (Individual low degree test) If (tΠ,w, tΠ,w)=(POINTi, ALINEi) (resp. DLINEi) for some LD i 3,...,5 , execute on input (POINT, xΠ , ALINE, xΠ , a , a ) (resp. ∈ { } Dldparams ,w ,w w w DLINE). Reject if LD rejects. Dldparams (c) (Simultaneous low degree test) If (tΠ,w, tΠ,w) = (POINT6, ALINE6) (resp. DLINE6), LD execute on input (POINT, xΠ,w, ALINE, xΠ,w, aw, aw) (resp. DLINE), where Dldparams′ LD ldparams′ =(q, m′, d, m′ + 6). Reject if rejects. Dldparams′ v 5. (Game check): If t = ORACLE, then for v A, B , compute x = L (x ). If tΠ = Q,w ∈ { } w,v Q,w ,w POINT , reject if AR(( , n, T, Q, γ, x A, x B), (z, a )) rejects. Otherwise, accept. 6 M D w, w, w

Figure 13: The decision procedure ˆ AR. Parameters T, Q, q, m, m , d, γ are defined in Section 10.5.1. D ′

1. From the description of , compute the description of the (typed) oracularized verifier ˆ ORAC = V V ( ˆORAC, ˆ ORAC) using the Turing machine OracleVerifier from Theorem 9.1. S D 2. From the description of the typed sampler ˆORAC and the parameter tuple (λ, µ, γ), compute the descriptions of the typed sampler ˆAR as describedS in Section 10.5.2. S

137 3. From the descriptions of the decider and the parameter tuple (λ, µ, γ), compute the descriptions of the typed decider ˆ AR as described inD Figure 13. D 4. Compute the detyped verifier AR =( AR, AR)= Detype( ˆ AR). V S D V The Turing machine AnsRedVerifier takes time that is poly( ,log λ,log µ,log γ), because each step, |V| including the detyping procedure, runs in time that is polynomial in the length of the input ( , λ, µ, γ). V Properties of the sampler AR. It can be verified via inspection that the typed sampler ˆAR depends on ˆORAC (which itself only dependsS on ; see Theorem 9.1), and ˆPCP (which only depends onS the parameter S S S tuple (λ, µ, γ) and the description length ). Using Theorem 9.1, ˆORAC is a ℓ-level typed sampler and, |D| S therefore, ˆAR is a max ℓ, 3 -level sampler as ˆPCP is a typed 3-level sampler. Using Lemma 6.18 for the S { } S detyping it follows that AR is a max ℓ + 2, 5 -level typed sampler. S { } Complexity. In addition to the running time of ˆORAC, the time complexity of ˆAR also includes the run- PCP S S ning time of ˆ , which is dominated by the complexity of computing the CL functions LPOINT , LALINE , S i i and LDLINE . From Lemma 7.9, this takes time poly(m ,log q)= poly(γ, s)= poly(log T,log n, Q, , γ)= i ′ |D| poly(log T, , γ), therefore the time complexity of ˆAR is |D| S

O (TIME ˆORAC (n)+ poly(log T(n), , γ)) . S |D| The complexity of AR = Detype( ˆAR) then follows by Lemma 6.18 and Equation (134). ARS S LD LD The decider ˆ executes subroutines and AR. The runtime of is poly(m, d, m ,log q). D D M D ′ The runtime of AR is given in Theorem 10.22; for T and Q as in (133) it is poly(log T,log n,log q, Q, ). M |D| In addition to theese subroutines, ˆ AR computes the CL functions LA and LB of the oracularized sam- pler ˆORAC in Step 5 of Figure 13. ByD Theorem 9.1, the complexity of ˆORAC is polynomial in the time S S complexity of , which by Equation (134) is at most Q(n). Thus the overallS time complexity of ˆ AR, using both the PCP parameter settings of Definition 10.19 D and the assumptions of Equation (134), is poly((λn)µ, , γ). The complexity of Detype( ˆ AR) follows by Lemma 6.18. |D| D

Completeness, Soundness, and Entanglement. The Completeness property is established in Section 10.6 and the Soundness and Entanglement properties are proven in Section 10.7.

10.6 Completeness of the answer-reduced verifier We establish the Completeness property of the answer reduced verifier as stated in Theorem 10.24.

Proof of the Completeness part of Theorem 10.24. We establish the Completeness property of ˆ AR, which by the Detyping Lemma (Lemma 6.18) establishes the Completeness property of AR. V V Let n 1 be an index for ˆ AR. Let S be a PCC strategy for with value 1. By Theorem 9.1 it ≥ V Vn follows that there exists a symmetric PCC strategy S ORAC using a state ψ with value 1 for ˆ ORAC. We | i Vn define a strategy S AR for the typed verifier ˆ AR as follows. The shared state is ψ . Given the index n Vn | i and (λ, µ, γ), each player can compute (q, m, d, m′, s) = pcpparams(n, T, Q, σ, γ) (see Definition 10.19), where T = T(n) and Q = Q(n) are as in (133), and σ = . Let M = 2m. |D|

138 1. On receipt of a question ((tQ, tΠ), (xQ, xΠ)) a player first measures their share of ψ using the ORAC | i projective measurement for S for the typed question (tQ, xQ) to obtain an outcome aQ. The player then computes an answer, depending on tQ, aQ and tΠ, xΠ, as follows:

(a) Suppose t = v A, B and tΠ POINT , ALINE (resp. DLINE instead of ALINE ). Q ∈ { } ∈ { v v} v v Let a′Q = aQ if aQ has length at most T, and let a′Q be the truncation of aQ to its first T symbols otherwise. Let ℓ T be the length of a , and set a ≤ ′Q M/2 ℓ a′′ = encΓ(a′ , − a ). Q Q ⊔ Next, the player computes the low-degree encoding g of a using the low-degree encoding aQ′′ ′′Q described in Section 3.4, in the same way as described in Section 10.4.2. The player then returns the restriction of g to the axis-parallel line (resp. the diagonal line) specified by xΠ. aQ′′ v (b) If tQ = ORACLE, for v A, B the player computes questions xv = L (xQ), as in Step 1 of ˆ ORAC ∈ { } . The player parses aQ as a pair (aA, aB). Let a′A = aA if aA has length at most T, and D ℓ let a′A be the truncation of aA to its first T symbols otherwise. Let A T be the length of a′A, and set ≤ M/2 ℓA a′′ = encΓ(a′ , − ). A A ⊔ Define similarly. The player computes a PCP proof Π as de- a′′B = (g1,..., g5, c0,..., cm′ ) scribed in the completeness case of Theorem 10.22 for the tuple ( , n, T, xA, xB), where the D polynomials g1, g2 are low-degree encodings of a′′A and a′′B, respectively.

i. If tΠ POINTi, ALINEi (resp. DLINEi instead of ALINEi) for i 1,...,5 , the player ∈ { } ∈ { Fm } returns the restriction of gi to the axis-parallel line (resp. diagonal line) of q specified by xΠ. ii. If tΠ POINT , ALINE (resp. DLINE instead of ALINE ), the player returns the ∈ { 6 6} 6 6 restriction of all the polynomials to the axis-parallel line of Fm′ (resp. g1,..., g5, c0,..., cm′ q the diagonal line) specified by xΠ. (c) In all other cases the player returns 0.

The strategy S AR is projective and consistent because S ORAC is. To show that it has value 1, we first observe that by definition it satisfies all consistency checks. Moreover, the strategy passes all low-degree tests with certainty because it always returns restrictions of consistent polynomials. Finally, it also passes the game check with probability 1. This follows from the completeness statement of the PCP made in Theorem 10.22 and the fact that, if accepts the input (n, xA, xB, aA, aB) in time at most T then it also D accepts (n, xA, xB, a′A, a′B) in time at most T, where a′A and a′B are obtained from aA and aB by truncating them to strings of length M if their lengths exceed M. To show that S AR is commuting, note that using the product structure of ˆAR every typed question pair S with positive probability consists of a pair of questions ((tQ,A, xQ,A), (tQ,B, xQ,B)) with positive probability ORAC ORAC for ˆ , together with an arbitrary pair ((tΠ , xΠ ), (tΠ , xΠ )). Using that S is commuting S ,A ,A ,B ,B and that the additional operations associated with ((tΠ,A, xΠ,A), (tΠ,B, xΠ,B)) amount to classical post- processing it follows that S AR is commuting. This establishes the existence of a symmetric PCC strategy for ˆ AR with value 1. By Lemma 6.18 it Vn follows that there exists a symmetric PCC strategy for AR = Detype( ˆ AR) with value 1. Vn V n

139 10.7 Soundness of the answer-reduced verifier

Proof of the soundness part of Theorem 10.24. We first show the soundness for the typed verifier ˆ AR. Soundness for the detyped verifier AR follows from Lemma 6.18, with a constant-factor loss usingV that the type set AR for ˆ AR has constantV size. T V ˆ AR > > We proceed in two steps. Fix an index n 1 and suppose that val∗( n ) 1 ε for some ε 0. Observe that ˆORAC and ˆPCP both sample distributions≥ that are invariantV under permuta− tion of the two players; therefore,S the sameS holds for ˆAR. Moreover, the decider ˆ AR treats both players symmetrically. Therefore, the game played by ˆ AR isS a symmetric game. ApplyingD Lemma 5.7 it follows that there exists Vn a symmetric projective strategy S =( ψ , M) for ˆ AR with value greater than 1 ε. | i Vn − We use the following shorthand notation. A pair of questions to the players is ((tA, xA′ ), (tB, xB′ )) where for w A, B , tw = (tQ,w, tΠ,w) and xw′ = (xQ′ ,w, xΠ′ ,w). When w is clear from context we omit it from ∈ { } A B the subscript. Fixing a w, whenever tQ = ORACLE we introduce xA = L (xQ) and xB = L (xQ′ ) and often write directly the player’s question as x =(xA, xB). Whenever t = v A, B we slightly abuse Q Q ∈ { } notation and write the question as xQ = (xv, v), explicitly including the type to clarify which player it points to. We denote the measurements used by both players in strategy S by M(x )xΠ , where for the sake of { Q a } clarity we have notationally separated the two parts xQ and xΠ of the question and omitted explicit mention tΠ,xΠ of the associated types tQ and tΠ (we include the type and write M(xQ)a when it is needed for clarity). First we show that the strategy S is close to a strategy S ′ that performs “low-degree” measurements: upon receipt of a typed question (t, x) = ((tQ, tΠ), (xQ, xΠ)) a player first performs a measurement depending on xQ to obtain a tuple of low-degree polynomials, and then returns evaluations of those polynomials on the subspaces (either axis-parallel or diagonal lines) specified by xΠ. This step of the argument uses the quantum soundness of the low-degree test performed in Stems 3 and 4 of Figure 13. Next, we “decode” ORAC this strategy to produce a strategy S ′′ for ˆ with a high value. This step makes use of the classical soundness of the underlying PCP shown inV Section 10.4. The conclusion of the theorem then follows from the soundness of ˆ ORAC (Theorem 9.3). We proceed with the details. We start by showingV a sequence of claims that establish approximations implied by the assumption that S AR succeeds with probability greater than 1 ε in the decision procedure implemented by the decider in Figure 13. −

Claim 10.25 (Global consistency check, Step 1). On average over questions (tA, xA)=((tQ, tΠ), (xQ, xΠ)) sampled from the marginal distribution of µ ˆAR on the first player it holds that S M(x )xΠ I I M(x )xΠ . (136) Q a ⊗ ≃ε ⊗ Q a

Proof. First we observe that the condition tA = tB for the global consistency check, Step 1 in Figure 13, holds with constant probability over the choice of a pair of questions (tA, xA), (tB, xB) sampled according to µ ˆAR . Thus S must succeed in this test with probability 1 O(ε), conditioned on the test being executed: thisS is because each of ˆORAC and ˆPCP have a constant probability− of returning a pair of questions of the same type. S S

Moreover, observe that conditioned on tA = tB a pair of questions ((tA, xA), (tA, xB)) µ ˆAR is ∼ S such that xA = xB = LtA (z), where z is the sampler seed and LtA the CL function of type tA associated with ˆAR. The claim then follows directly from the test and the definition of approximate consistency (DefinitionS 5.15).

140 Claim 10.26 (Input consistency check, Step 2). For all v A, B , on average over question pairs Fm′ ∈ { } (xA, xB) µ and z =(y1,..., y5, o, w) q sampled uniformly at random, ∼ S ∈

POINT ,z POINTv,yv M(xA, xB) 6 I I M(x , v) , (137) αv ⊗ ≃ε ⊗ v αv where as in Remark 10.23 we made the identification 1 A and 2 B. Moreover, an analogous relation holds for operators acting on opposite sides of the tensor↔ product. ↔

Proof. For w = A and fixed v A, B there is a constant probability that t = ORACLE, t = v, ∈ { } Q,w Q,w and (tΠ,w, tΠ,w)=(POINT6, POINTv). Therefore, the input consistency check in Step 2 is executed with constant probability, and S must pass it with probability 1 O(ε), conditioned on the test being executed. − Moreover, conditioned on tQ,w = ORACLE, tQ,w = v, and (tΠ,w, tΠ,w)=(POINT6, POINTv), the distribution of (xQ,w, xQ,w) is ((xA, xB), xv) for (xA, xB) µ and the distribution of (xΠ,w, xΠ,w) is ∼ S (z, y ) for a uniformly random z Fm′ . Eq. (137) then follows directly from the specification of the test v ∈ q and the definition of approximate consistency. The “moreover” part follows from the case w = B.

Claim 10.27 (Input low degree test, Step 3). For each v A, B and for each x in the support of the x,v∈ { } marginal of µ on player v there exists a measurement G PolyMeas(m, d, q) such that the following S { g } ∈ hold for some δ1 = O(δLD(O(ε), q, m, d, 1)), where δLD is defined in Lemma 7.8. For all v A, B , on Fm ∈ { } average over x chosen from the marginal of µ on player v and yv q sampled uniformly at random, S ∈ POINTv,yv x,v (138) M(x, v)α I δ1 I G[eval ( )=α] , ⊗ ≃ ⊗ yv · POINTv,yv x,v (139) I M(x, v)α δ1 G[eval ( )=α] I , ⊗ ≃ yv · ⊗ Gx,v I I Gx,v , (140) g ⊗ ≃δ1 ⊗ g where we used the notation evalyv (g)= g(yv) for the evaluation map. Proof. Fix v A, B . For any w A, B there is a constant probability that t = t = v ∈ { } ∈ { } Q,w Q,w and (tΠ,w, tΠ,w)=(POINTv, ALINEv) (resp. DLINEv). Therefore, the input low degree test in Step 3 of Figure 13 is executed with constant probability, and S must pass it with probability 1 O(ε), conditioned on the test being executed. − Observe that by definition the distribution of (xΠ,A, xΠ,B) conditioned on tQ,w = tQ,w = v, uniformly random xQ = (xv, v), and (tΠ,w, tΠ,w)=(POINTv, ALINEv) (resp. DLINEv), where w A, B is uniformly random, is exactly the distribution of questions in the game GLD described in Section∈ { 7.1.1} , parametrized by ldparams =(q, m, d, 1). For every v A, B and question x = Lv(z) in the support of the marginal distribution of µ on ∈ { } S player v let εx,v be the probability that S is accepted in Step 3, conditioned on the test being executed and on average over w A, B . Then E[ε ] = O(ε), where the expectation is taken over a uniformly ∈ { } x,v random v A, B and x = Lv(z) for uniformly random z. By definition∈ { it} follows that the strategy ˆAR conditioned on the first part of the players’ questions being S tQ,w = tQ,w = v and xQ,A = xQ,B = x is a projective strategy that succeeds with probability 1 εx,v in the low-degree test LD executed in Item 3. − Dldparams We may thus apply Lemma 7.8 to obtain Gx,v PolyMeas(m, d, q) such that (138), (139) and (140) { g } ∈ each hold with approximation error O(δLD(ε , q, m, d, 1)). Using that for fixed q, m, d the function ε x,v 7→ δLD(ε, q, m, d, 1) is concave, the claim follows from Jensen’s inequality.

141 Claim 10.28 (Proof encoding checks, Step 4). For each xQ = (xA, xB) in the support of µ there exist ( ) S measurements G xA,xB ,i PolyMeas(m, d, q) for each i 3, 4, 5 and { g } ∈ ∈ { }

(xA ,xB) J PolyMeas(m′, d, q, m′ + 6) f1,..., f5,c0,...,c { m′ } ∈ such that the following hold for some

δ2 = O δLD (O(ε), q, m, d, 1)+ δLD (O(ε), q, m′, d, m′ + 6) .  First, for all i 3, 4, 5 , on average over (xA, xB) µ and z = (y1,..., y5, o, w) of type POINT6 sampled uniformly∈ { at random,} ∼ S

OINT P i,yi POINT6,z I M(xA, xB) ε M(xA, xB) I . (141) ⊗ αi ≃ αi ⊗ Fm Second, for all i 3, 4, 5 and on average over (xA, xB) µ and yi q sampled uniformly at random, ∈ { } ∼ S ∈

POINTi,yi (xA,xB),i (142) M(xA, xB)α I δ2 I G[eval ( )=α] , ⊗ ≃ ⊗ yi · ( ) ( ) G xA,xB ,i I I G xA,xB ,i . (143) g ⊗ ≃δ2 ⊗ g

Fm′ Third, for all i 1,...,5 and j 0,..., m′ , on average over (xA, xB) µ and z q sampled uniformly at random,∈ { } ∈ { } ∼ S ∈

POINT6,z (xA ,xB) M(xA, xB) I δ I J , (144) α1,...,α5,β0,...,β 2 [evalz( )=(α1,...,α5,β0,...,β )] m′ ⊗ ≃ ⊗ · m′ ( ) ( ) J xA,xB I I J xA ,xB . (145) f1,..., f5,c0,...,c δ2 f1,..., f5,c0,...,c m′ ⊗ ≃ ⊗ m′ Moreover, analogous equations to (141), (142) and (144) hold with operators acting on opposite sides of the tensor product. Here, recall the definition of PolyMeas( , , ) from Definition 7.7. · · · Proof. The proof of the first item is similar to the proof of Claim 10.26, and we omit it. The proof of the second and third items is similar to the proof of Claim 10.27, and we include more de- tails. Fix an i 3, 4, 5 . For any w A, B there is a constant probability that t = t = ORACLE ∈ { } ∈ { } Q,w Q,w and (tΠ,w, tΠ,w)=(POINTi, ALINEi) (or DLINEi instead of ALINEi), in which case the individual low- degree test in Step 4b is executed. Therefore, S must succeed in that part of the test with probability 1 O(ε) conditioned on the test being executed. − Furthermore, for fixed i 3, 4, 5 and uniformly random w A, B , conditioned on the test being ∈ { } ∈ { } executed for that i and w the distribution of (xΠ,A, xΠ,B) is exactly the distribution of questions in the game GLD described in Section 7.1.1, parametrized by ldparams =(q, m, d, 1). For every i 3, 4, 5 and x = (xA, xB) in the support of µ let εx,i be the probability that S is ∈ { } S accepted in Step 4b, conditioned on the test being executed for that i and on average over w A, B . ∈ { } Then for each i, E[εx,i]= O(ε), where the expectation is taken over a uniformly random x µ . ∼ S By definition of the individual low-degree test it follows from Lemma 7.8 that for every x = (xA, xB) (xA,xB),i in the support of µ and i 3, 4, 5 there is a measurement Gg PolyMeas(m, d, q) such that S ∈ { } { } ∈ on average over y Fm′ of type POINT sampled uniformly at random, Eq. (142) and (143) both hold i ∈ q i with approximation error O(δLD(εx,i, q, m, d, 1)). Eq(142) and (143) follow using the concavity of δLD as a function of ε.

142 Finally we consider the simultaneous low-degree test, Step 4c. Here as well, using that there is a con- stant probability that tQ,w = tQ,w = ORACLE and (tΠ,w, tΠ,w)=(POINT6, ALINE6) (resp. DLINE6) it follows that S AR must succeed in that part of the test with probability 1 O(ε). Using a similar ar- − gument as before it follows from Lemma 7.8 (this time for parameters (q, m′, d, m′ + 6)) that for every (xA ,xB ) (xA, xB) there is a family of measurements J PolyMeas(m , d, q, m + 6) such that on f1,..., f5,c0,...,c ′ ′ { m′ } ∈ average over z Fm′ sampled uniformly at random, Eq. (144) and (145) both hold with approximation ∈ q error O(δLD(O(ε), q, m′, d, m′ + 6)).

x ,i x The families of measurements G Q and J Q , for x in the support of µ and i g f1,..., f5,c0,...,c Q { } { m′ } S ∈ 1,...,5 , whose existence follows from Claim 10.27 and Claim 10.28 have outcomes that are low (in- { } dividual) degree polynomials: for the first family, individual degree d polynomials g : Fm F , and for q → q the second, tuples of individual degree d polynomials f , c : Fm′ F . Recall that m = 5m + 5 + s and i j q → q ′ Fm′ F5m F5 Fs that an element z q is written as a triple (y, o, w) with x = (y1,..., y5) q , o q and w q. The following claim,∈ whose proof is based on Lemma 5.26, shows that we can∈ reduce to∈ a situation where∈ the polynomials f ,..., f returned by J are such that for each i 1,...,5 , f only depends on the y , 1 5 ∈ { } i i and not on the entire variable z. m Claim 10.29. For all (xA, xB) in the support of µ and individual degree d polynomials g ,..., g : F S 1 5 q → F and c ,..., c : Fm′ F define q 0 m′ q → q ( ) ( ) ( ) ( ) ( ) ΛxA,xB = xA,1 xB,2 xA,xB ,3 xA,xB ,5 xA ,xB xA,xB ,5 xA,xB ,3 xB,2 xA,1 (146) g1,...,g5,c0,...,c Gg1 Gg2 Gg3 Gg5 Jc0,...,cm Gg5 Gg3 Gg2 Gg1 , m′ · · · ′ · · · where the outcomes ( f1,..., f5) of the J operator in the middle have been marginalized over. Then there is a 1/2 1/2 m′d δ = O δLD (ε, q, m′, d, m′ + 6) + 3 q    such that δ max ε, δ , δ 3 ≥ 1 2 Fm′ and on average over (xA, xB) µ and z q sampled uniformly at random, ∼ S ∈ ΛxA,xB xA ,xB (147) [eval ( )=(α,β)] I δ3 I J[eval ( )=(α,β)] . z · ⊗ ≃ ⊗ z · Moreover, a similar equation holds with the operators acting on opposite sides of the tensor product.

Proof. We apply Lemma 5.26 with the following setting of parameters. The number of sets of functions k is set to 6. The question set is set to the support of µ , and the distribution µ on it is the distribution µ . X S S The sets for i 1,...,5 consist of individual degree d polynomials over Fm′ that depend only on the Gi ∈ { } q i-th block of m variables. The set 6 consists of (m′ + 1)-tuples of individual degree d polynomials over Fm′ G q . We first verify the assumption on the sets of functions. Since all polynomials have individual degree at most d (and therefore total degree at most m′d), by Lemma 3.22 the parameter ε in Lemma 5.26 can be set to m′d/q. The family of measurements Ax in Lemma 5.26 is the family of measurements J(xA ,xB) g1,...,g6 f1,..., f5,c0,...,c { } { m′ } here, where we set g = f for i 1,...,5 and g = (c ,..., c ). The measurements Gi,x in i i ∈ { } 6 0 m′ { g }

143 ( ) ( ) Lemma 5.26 are GxA,i for i 1, 2 , G xA,xB ,i for i 3, 4, 5 , and J xA ,xB for i = 6. To ensure { g } ∈ { } { g } ∈ { } { g } that all polynomials are defined over the same range, we treat g : Fm F that is an outcome of some q → q Gi,x as a polynomial g : Fm′ F , where the role of the m variables of g is taken by the i-th block of { g } ′ q → q m variables of g′. We verify that assumption (38) in the lemma holds with an error δ = O((max ε, δ , δ )1/2). For { 1 2} i 1, 2 the assumption follows by combining (137) and (139) with (144) and Fact 5.24. For i 3, 4, 5 ∈ { } ∈ { } we use (141) instead of (137) and (142) instead of (139). Finally, for i = 6 we use (145) and Fact 5.24. In these derivations, we use Fact 5.23, the triangle inequality for “ ”. ≃ 1/2 The conclusion follows from Lemma 5.26, with an error δ3 = O((δ + m′d/q) ). Observing that ε, δ1, δ2 = O(δLD(ε, q, m′, d, m′ + 6)), as can be verified from the definition of δLD given in Lemma 7.8, and 1/2 therefore δ = O(δLD(ε, q, m′, d, m′ + 6) ), the claimed bound 1/2 1/2 m′d δ = O δLD (ε, q, m′, d, m′ + 6) + 3 q    follows.

At this point, we have constructed measurements G and Λ that return low-degree polynomials in a ˆ AR similar way as is expected from the honest strategy in n , as described in the proof of Theorem 10.24. These measurements can be used to specify a new strategyV S for the game ˆ AR as follows. The shared ′ Vn state remains the state ψ used in S . For w A, B , upon reception of a question (t , x ) player w | i ∈ { } w w performs the following. If tw = (tQ,w, tΠ,w) is such that tQ,w = ORACLE, the player measures their share of using the measurement ΛxQ,w defined in Claim 10.29 to obtain a tuple of ψ (g1,..., g5, c0,..., cm′ ) polynomials.| i The player then answers exactly as in the strategy described in the “completeness” part of the proof of Theorem 10.24. Similarly, if t = v A, B the player first measures their share of ψ using Q,w ∈ { } | i the measurement GxQ,w,v from Lemma 10.27 to obtain a polynomial g as outcome; the player then answers according to the same honest strategy. Lemma 10.30. The strategy S succeeds with probability 1 O(δ ) in the game ˆ AR. ′ − 3 Vn Proof. First we establish useful consistency relations. By combining Equation (147) and Equation (144) Fm′ and applying Fact 5.24 we obtain that for all i 1,...,5 , on average over (xA, xB) µ and z q sampled uniformly at random, ∈ { } ∼ S ∈

(x ,x ) M(x , x )POINT6,z I I Λ A B , (148) A B αi δ3 [eval ( ) =α ] ⊗ ≃ ⊗ z · i i and a similar equation holds with the operators acting on opposite sides of the tensor product. Next, combin- ing (148) together with Equation (137) and Equation (138) it follows that for each v A, B , on average Fm′ ∈ { } over (xA, xB) µ and z =(y1,..., y5, o, w) q sampled uniformly at random, ∼ S ∈ xv,v Λ(xA,xB ) (149) G[eval ( )=α ] I δ3 I [eval ( ) =α ] . yv · v ⊗ ≃ ⊗ z · v v From the Schwartz-Zippel lemma (Lemma 3.22) it follows that the probability that any two distinct individ- xv,v (xA,xB) ual degree d polynomials gv (an outcome of G ) and gv′ (an outcome of Λ ) agree at a uniformly Fm random point yv q is at most m′d/q. It thus follows from (149) that for all v A, B , on average ∈ FM ∈ { } over (xA, xB) µ and yv q sampled uniformly at random, ∼ S ∈ ( ) Gxv,v I I Λ xA,xB . (150) gv ⊗ ≃δ3+m′d/q ⊗ gv 144 S ˆ AR We now show that ′ that is accepted by n with high probability. We bound the probability of succeeding in each subtest. V First note that the strategy is accepted in Step 1 of Figure 13. For the G measurements, consistency follows from (140). For the Λ measurements, note first that by (147) and (145) it follows that on average over (xA, xB) µ , ∼ S ΛxA,xB ΛxA,xB (151) [eval ( )=(α,β)] I δ3 I [eval ( )=(α,β)] . z · ⊗ ≃ ⊗ z · Using that all outcomes of ΛxA,xB are individual degree d polynomials and the Schwartz-Zippel lemma (Lemma 3.22) it follows that whenever a measurement of ΛxA,xB ΛxA,xB returns distinct outcomes, the ⊗ outcomes take a different value at a uniformly random z with probability at least 1 m d/q. It then follows − ′ from (151) and the fact that δ m d/q by definition that the strategy S is accepted in Step 1 with 3 ≥ ′ ′ probability 1 O(δ3). Next, the− strategy is also accepted in the consistency check performed in item 2 due to (150), and the consistency check in item 4(a) for the same reasons as for item 1. Finally, for the low-degree tests performed in item 3 and items 4(b) and 4(c), the strategy succeeds due to consistency and the fact that, as long as both players obtain the same polynomial outcomes, they pass the low-degree tests with probability 1. It remains to analyze the strategy’s success probability in Step 5, the game check. Note that by as- sumption the original strategy S succeeds with probability 1 O(ε) in that test. Using (147) and (144) − together with the consistency relations established at the start of the proof, it follows that S and S ′ generate outcomes aw in Step 5 that are within total variation distance O(δ3). The lemma fellows. We now complete the proof by a reduction to the game ORAC: from the strategy S we construct a Vn ′ symmetric strategy S = ( ψ , A) for ˆ ORAC by “decoding” the low-degree measurements G and Λ. The ′′ | i V state ψ in S ′′ is identical to the state used in S ′ (which is identical to the state used in S ). To begin, we | i Fm F define a decoding map ∆( ), which takes in a polynomial g : q q and outputs a string in 0, 1 ∗. This map is computed as follows:· → { }

First, compute a = Dec(g) 0, 1 M, where Dec is the (boolean) decoding of the low-degree code • defined in Section 3.4. ∈ { }

M/2 ℓ If there exists an a 0, 1 of length ℓ T such that a = encΓ(a , a ), then • prefix ∈ { }∗ a ≤ prefix ⊔ − ∆(g)= aprefix. Otherwise, ∆(g) is allowed to be arbitrary.

We can now define the “decoded” measurements Axv,v and AxA,xB as follows: { } { } Axv,v = Gxv,v , AxA,xB = ΛxA,xB . (152) a [∆( )=a] aA,aB [∆( )A B =(aA,aB )] · · , Lemma 10.31. The strategy S succeeds with probability 1 O(δ ) in the game ˆ ORAC. ′′ − 3 Vn Proof. We consider the different subtests executed by ˆ ORAC (see Figure 11). We start with Step 2, the D consistency checks. Success in the first check, Step 2a, follows from the success of S ′ in the global AR consistency check, Step 1 of ˆ , the definition (152), and the fact that conditioned on tQ,A = tQ,B = D AR ORAC ORACLE, the distribution of (x , x ) in ˆ is the same as the distribution of (xA, xB) in , Q,A Q,B Vn Vn conditioned on tA = tB = ORACLE. Similarly, success in the second check, Step 2b, follows from success AR of S ′ in the input consistency check, Step 2 of ˆ . ORACD Next we consider the game check of ˆ , Step 1. To analyze the success probability of S ′′ we use that S succeeds in the game check of Dˆ AR, Step 5, and the soundness of the PCP, as shown in Theo- ′ D rem 10.22. Let psound be as in Theorem 10.22.

145 Let A B , be in the support of , and Π an outcome of w , (xA, xB) µ = (g1,..., g5, c0,..., cm′ ) x ,x ∈ { } S AR Λ A B such that conditioned on that outcome being obtained by player w in the game check of ˆ , AR D M accepts the pair of inputs ( , n, T, Q, γ, xA, xB) and (z, a ) with probability at least p over the choice D w sound of a uniformly random z Fm′ and a = eval (Π). ∈ q w z For any such Π, the soundness statement of Theorem 10.22 states that there exist aA, aB 0, 1 such ∈ { }∗ that (n, xA, xB, aA, aB)= 1 and furthermore av = ∆(gv) for v A, B . It follows that for any proof Π D x ,x ∈ { } returned by Λ A B which is accepted with probability greater than p in the game check of ˆ AR it { Π } sound D holds that (n, xA, xB, ∆(gA), ∆(gB)) = 1. Using this observation we evaluate the probability q that the D ′′g strategy S succeeds in the game check of ˆ ORAC. Let q be the probability that S succeeds in the game ′′ V ′g ′ check of ˆ AR. D E (xA,xB ) q′′g = ∑ ψ AaA,aB I ψ (xA,xB) µ h | ⊗ | i aA,aB : (n,xA,xB,aA,aB )=1 ∼ S D E (xA,xB) = ∑ ψ ΛΠ I ψ (xA,xB) µ h | ⊗ | i Π: (n,xA,xB,∆(g ),∆(g ))=1 ∼ S D 1 2 E (xA,xB) ∑ ψ ΛΠ I ψ Pr [V(evalz(Π)) = 1] ≥ (xA,xB) µ h | ⊗ | i · z Fm′ Π: (n,xA,xB,∆(g ),∆(g ))=1 q ∼ S D 1 2 ∼ E (xA,xB ) = q′g ∑ ψ ΛΠ I ψ Pr [V(evalz(Π)) = 1] − (xA,xB) µ h | ⊗ | i · z Fm′ ∼ S Π: (n,xA,xB,∆(g1),∆(g2))=0 q D ∼ q′ (1 q′′) p . ≥ g − − g · sound Rearranging terms, q′g psound 1 q′g q′′ − = 1 − . (153) g ≥ 1 p − 1 p − sound − sound Altogether, using Lemma 10.30 we have shown that S is accepted in each subtest performed by ˆ ORAC ′′ D with probability at least 1 O(δ ). Since every subtest occurs with constant probability, the lemma follows. − 3

To conclude the proof of the soundness, we appeal to the soundness statement for ˆ ORAC, given in V Theorem 9.3. Now we establish the form of the error function δ as stated in the theorem. Let a′, b′ be the universal constants from Lemma 7.8 such that

a′ b′ b′ b′md δLD(ε, q, m, d, t)= a′(dmt) (ε + q− + 2− ). By our choice of PCP parameters in Definition 10.19, we have that m s, so therefore we can upper 3 ≤ bound the product dm′(m′ + 6) by cγs where c > 1 is a universal constant. Furthermore, we see from Definition 10.19 that 2dm q s(γb′+3a′)/b′ , so we have ≥ ≥ a′ 3a′ b′ 3a′ b′ δLD (ε, q, m′, d, m′ + 6) a′(cγ) (s ε + 2 s q− ) ≤ · · · a 3a b 3a (γb +3a ) 2a′(cγ) ′ (s ′ ε ′ + s ′ s− ′ ′ ) ≤ · · a a b γb a′′γ ′′ (s ′′ ε ′′ + s− ′′ ) (154) ≤ · where we set a = max 2a ca′ , 3a and b = b . From Proposition 10.17 and Eq. (134) we have that s is ′′ { ′ ′} ′′ ′ at most poly((λn)µ, σ), so there exists a universal constant C > 0 such that for all n 2, ≥ s C(λ σ n)Cµ. ≤ · · 146 Plugging this bound into Equation (154) and using the choice of q, d, m′ from Definition 10.19, we get that there exist universal constants a > 1 and 0 < b < 1 such that

1/2 1/2 m′d δ = O δLD(ε, q, m′, d, m′ + 6) + 3 q   a  µa b µbγ  aγ ((λ σ n) ε +(λ σ n)− ). ≤ · · · · · To show the entanglement bound, we observe that the strategy S for ˆ ORAC constructed above uses the ′′ V same entangled state ψ as the strategy S for ˆ AR that we started with; the claimed bound follows. | i V

147 11 Parallel Repetition

In each of the transformations on verifiers presented so far (introspection, oracularization, and answer re- duction), the soundness gap of the resulting verifier is slightly degraded: while the completeness property, i.e. the property of having a PCC strategy with success probability 1, is preserved, if the starting game Vn has value at most 1 ε, the resulting game n′ has value at most 1 δ(ε, n) for some function δ. In order to apply the compression− procedure recursivelyV we need a way to restore− the soundness gap after a sequence of transformations. We accomplish this using (a modification of) the technique of parallel repetition. In- formally, this amounts to transforming a two-player game G into another two-player game Gk in which the verifier plays k simultaneous copies of G with the players, and accepts if and only if the players’ answers correspond to valid answers in each copy. Intuitively, if the value of G is v < 1, then one would expect the value of Gk to decay exponen- tially with the number of repetitions k. It is not true in general that the value of Gk is vk, but exponential decay bounds on the (tensor product) value of parallel-repeated games are known for specific classes of games [JPY14, DSV15, BVY17]. In particular, it was shown in [BVY17] that the class of anchored games satisfies exponential-decay parallel repetition, and furthermore every game can be efficiently transformed into an equivalent anchored game. Put together, this gives a general soundness amplification procedure called “anchored parallel repetition,” which we use in our compression procedure to reset the soundness gap to a fixed constant. The parallel repetition theorems of [DSV15, Yue16] are also applicable for the purpose of soundness amplification, but are not sufficient for us. The crucial point here is that the anchored parallel repetition result of [BVY17] allows us to relate the amount of entanglement needed to play the repeated game Gk to the entanglement needed to play the original anchored game G: roughly speaking, [BVY17] show that for v exp( cε8 k/A)), where c is a universal constant and A an upper bound on the length of answers ≥ − · from the players in G, we have E (Gk, v) E (G, 1 ε). On the other hand, the parallel repetition theorems ≥ − of [DSV15, Yue16] only imply that E (Gk, v) log E (G, 1 ε),33 which is not sufficient for our purposes. The main theorem of this section, Theorem≥ 11.2, is given− in Section 11.2 after we briefly recall the definition of the anchoring transformation in Section 11.1 from [BVY17].

11.1 The anchoring transformation Let =( , ) be a normal form verifier. We present a transformation on the verifier , called anchoring, V S D V that produces another normal form verifier ANCH =( ANCH, ANCH). V S D We define the anchoring ANCH of by first defining a typed verifier ˆ ANCH = ( ˆANCH, ˆ ANCH), and V V V S D then detyping ˆ ANCH using Lemma 6.18 to obtain ANCH. Define the type set ANCH = GAME, ANCHOR V V T { } and type graph GANCH the complete graph over ANCH along with self-loops at each vertex. T The sampler ˆANCH is a ANCH-typed sampler, with field size q(n)= 2 and the same dimension s(n) as S T ( ) that of the sampler . Fix an integer n N. Let V = Fs n denote the ambient space of on index n. Let S ∈ 2 S LA, LB : V V denote the CL functions of on index n. For w A, B the associated CL functions ANCH, w → ANCH S ∈ { } Lt of ˆ are { } S w ANCH, w L if t = GAME , Lt = ( 0 if t = ANCHOR . Intuitively, when the type t sampled for player w is GAME, then they receive a question Lw(z) as they

33The reason is due to the use of the “quantum correlated sampling lemma” of [DSV15].

148 would according to . Otherwise if t = ANCHOR, then their question is the zero string. Thus if Lw is an ℓ S ANCH, w ℓ ANCH, w -level CL function, then LGAME is also an -level CL function, and LANCHOR is a 0-level CL function. ANCH The decider ˆ performs the following: on input (n, tA, xA, tB, xB, aA, aB), if either tA or tB is D equal to the type ANCHOR, then the decider accepts. Otherwise, it accepts only if (n, xA, xB, aA, aB) accepts. D We define the anchoring of to be the detyped verifier ANCH = Detype( ˆ ANCH). V V V Proposition 11.1. Let = ( , ) be a normal form verifier where is an ℓ-level sampler. Let ANCH = V S D S V ( ANCH, ANCH) be the anchoring of . Then the verifier ANCH is a normal form verifier that satisfies the S D V V following properties for all n N, ∈ 1. (Completeness) If there is a value-1 PCC strategy for n, then there is a value-1 PCC strategy for ANCH. V Vn ANCH 2 2. (Soundness) For all ε > 0, if val∗( ) 1 ε, then val∗( ) 1 (4 16 )ε. Furthermore, Vn ≥ − Vn ≥ − · E ( ANCH, 1 ε) E , 1 (4 162)ε . Vn − ≥ Vn − · 3. (Complexity) The time complexities of the verifier ANCH satisfy  V TIME ANCH (n)= poly(TIME (n)) , S S TIME ANCH (n)= poly(TIME (n)) . D D Furthermore the number of levels of ANCH is ℓ + 2. S 4. (Efficient computability) The descriptions of ANCH and ANCH can be computed in polynomial time from the descriptions of and , respectively.S In particular,D the sampler ANCH only depends on the sampler . S D S S Proof. We analyze the completeness, soundness, and complexity properties of the typed verifier ˆ ANCH; the corresponding properties of the detyped verifier ANCH follow from Lemma 6.18 and the fact thatV the type V set ANCH has size 2. T Fix an index n N. For the completeness property, let S be a value-1 PCC strategy for n. We define ∈ S ANCH ˆ ANCH V a value-1 PCC strategy for n : whenever a player receives the ANCHOR type as a question type, they perform the trivial measurementV (i.e. measure the identity operator). Otherwise, the player performs the same measurement as in S . This is clearly value-1 and PCC. Item 1 follows from this and Lemma 6.18. For the soundness property, we observe that if a strategy S ANCH has value 1 ε in ˆ ANCH, then − Vn 1 ε = 1 γ + γp − − where p is the value of S ANCH in the game , and γ = 1/4 is the probability that neither player receives the Vn question type ANCHOR; this follows from the distribution associated with the typed sampler ˆANCH. Thus ANCH S S has value 1 ε/γ in n, and thus val∗( n) 1 4ε. Item 2 follows from this and Lemma 6.18. Items 3 and 4 are− straightforwardV and also followV ≥ from− Lemma 6.18.

11.2 Parallel repetition of anchored games We present a second transformation on normal form verifiers that amounts to performing parallel repetition. Fix a function k : N N, and let =( , ) be a normal form verifier where is an ℓ-level sampler with → V S D S dimension s(n). Let denote a 1-input Turing machine that computes a function k : N N (i.e. the inputs K → and outputs are interpreted as positive integers). The -fold parallel repeated verifier REP =( REP, REP) is defined as follows. K V S D

149 Sampler. The sampler REP is an ℓ-level sampler. On index n, it has field size q(n) = 2 and dimension S sREP(n) = k(n)s(n). We treat the ambient space VREP of REP on index n as the k(n)-fold direct sum of S the ambient space V of on index n. For all integers n N, w A, B , we define the CL functions S ∈ ∈ { } LREP, w : VREP VREP as follows: → k(n) LREP, w = Lw , Mi=1 where LA, LB : V V are the CL functions of the sampler on index n. The CL functions LREP, w are ℓ → REP, w REP, w S -level; the j-th factor spaces Vj, u of L are defined as

k(n) VREP, w = Vw j, u j, ui Mi=1 w w for all u =(u1,..., uk(n)) where V is the j-th factor space of L with prefix ui V. In other words, the j, ui ∈ sampler REP is a k(n)-fold product of the sampler . S S Decider. The decider REP is defined as follows: on input (n, x, y, a, b) where x, y, a, b are k(n)-tuples of D questions and answers, respectively, accept if and only if (n, x , y , a , b ) accepts for all i 1,..., k(n) . D i i i i ∈ { } Theorem 11.2 (Anchored Parallel Repetition Theorem [BVY17]). There exist a universal constant c > 0 and a polynomial time Turing machine RepeatedVerifier such that the following holds. On input a tuple ( , ) where is an ℓ-level normal form verifier and a Turing machine computing the number of repetitions,V K theV Turing machine returns the description ofKthe anchored -fold parallel repeated verifier K REP = ( REP, REP). Let k(n) be the function computed by the Turing machine on input n. Then the V S D K following hold for all n N. ∈ 1. (Completeness) If has a value-1 PCC strategy, then REP has a value-1 PCC strategy. Vn Vn 2. (Soundness and Entanglement) For all ε > 0, for all

c ε8 k(n) v > exp , −TIME (n)  D  REP if val∗( ) > v then val∗( ) 1 ε and furthermore Vn Vn ≥ − E ( REP, v) E ( , 1 ε) . Vn ≥ Vn − 3. (Complexity) The repeated verifier REP is (ℓ + 2)-level and has The time complexities V

TIME REP (n)= O(TIME (n)+ k(n) TIME (n)) , S K · S TIME REP (n)= O(TIME (n)+ k(n) TIME (n)) . D K · D Proof. The existence of the Turing machine RepeatedVerifier is clear from the above discussions. The Turing machine is polynomial time as (a) the description of the repeated sampler REP only depends on the the description of the sampler and the description of the Turing machine , andS (b) the description of the repeated decider only dependsS on and . K D K

150 Item 1 follows from the following straightforward observation: if S = (ψ, A, B) is a value-1 PCC strategy for , then the strategy where the players share k(n) copies of ψ , and for the i-th instance of the Vn | i game ANCH, the players use strategy S on the i-th copy of ψ (and performing the identity measurement Vn | i whenever they receive the ANCHOR type). It is straightforward to check that this strategy has value 1 and is PCC. REP To show Item 2 we apply [BVY17, Theorem 17]. The exponential decay bound on the value of n pre- sented in [BVY17] depends on the answer length of the players in the original game . By DefinitionV 5.31, Vn this answer length is restricted to TIME (n), so the claimed bound follows. Item 3 follows from the fact that computingD the direct sum of k(n) CL functions and factor spaces of the “single-copy” sampler ANCH requires k(n) times the complexity of the “single-copy” sampler, and the complexity of ANCH followsS from Proposition 11.1. The dependence on the time complexity of as the S K sampler REP need to compute the function k(n). The time complexity of the decider follows from the S repeated decider having to run k(n) instances of the decider ANCH and take the logical AND of the k(n) decider outputs. The complexity of ANCH follows from PropositionD 11.1 which in turn runs an instance of D the original decider . Since the CL functions of the sampler ANCH are (ℓ + 2)-level (by Proposition 11.1), and taking the directD sum of CL functions does not increaseS the number of levels (by Lemma 4.6), the CL functions LREP, w are (ℓ + 2)-level.

151 12 Gap-preserving Compression

We combine the transformations from the previous sections (the introspection games, oracularization, an- swer reduction, and parallel repetition) to obtain our main technical result, a gap-preserving compression theorem (Theorem 12.1) for normal form verifiers. Recall from Definition 5.30 that a verifier = ( , ) is λ-bounded if the size = max , V S D |V| {|S| |D|} is at most λ and the time complexity bounds TIME (n) and TIME (n) are at most nλ for all n 2. The compression theorem states the existence of an efficientS “compressionD procedure” Compress that≥ achieves the following. Given as input a λ-bounded verifier and the parameter λ, the procedure returns a “com- COMPR VCOMPR n pressed verifier” such that the n-th game n simulates the N-th game N for N = 2 , and furthermore bothV the sampler time complexity andV the decider time complexity of CVOMPR can be exponen- Vn tially smaller than those of , which can be as large as Nλ. We note that the time complexity bounds of VN the compressed verifier hold without the assumption that the input verifier is λ-bounded. The completeness, soundness, and entanglement bounds of the Theorem require the λ-boundedness of the input verifier . V Theorem 12.1 (Compression theorem). There exists a universal constant C0 > 0 such that the following holds. There is a polynomial time Turing machine Compress that takes as input a tuple ( , λ) where is a V V normal form verifier and λ > 0 is an integer, and returns the description of an 9-level normal form verifier COMPR =( COMPR, COMPR) that has time complexities V S D

TIME COMPR (n)= poly(n, λ), S TIME COMPR (n)= poly(n, λ). D The description of COMPR is independent of and is computable from the binary representation of λ in S V time polylog(λ). n Moreover, if the input verifier is 9-level and λ-bounded, then for all n C0 and N = 2 the following statements hold. V ≥

1. (Completeness) If has a value-1 PCC strategy, then COMPR has a value-1 PCC strategy. VN Vn 1 COMPR 1 2. (Soundness) If val∗( ) , then val∗( ) . VN ≤ 2 Vn ≤ 2 λ 3. (Entanglement) E ( COMPR, 1 ) max E ( , 1 ), 2N 1 . Vn 2 ≥ VN 2 − n o 12.1 Proof of Theorem 12.1 Recall the following Turing machines.

1. IntroVerifier takes as input a tuple ( , λ, ℓ) and returns in polynomial time a description of the (de- V typed) introspective verifier INTRO corresponding to the verifier and parameters (λ, ℓ). V V 2. AnsRedVerifier takes input ( , λ, µ, γ) and returns in polynomial time a description of the answer V reduced verifier AR corresponding to and parameters (λ, µ, γ). V V 3. RepeatedVerifier takes input ( , ) and returns in polynomial time the anchored repeated verifier REP corresponding to and theV TuringK machine computing the number of repetitions. V V K In addition we introduce the following Turing machine.

152 Lemma 12.2. There is a polynomial-time Turing machine ComputeRepetitions that on input (λ, τ), repre- sented in binary, returns the description of a 1-input Turing machine that on input n returns the binary Kλ,τ representation of (λn)τ. The time complexity of is poly(log n,log λ, τ). Kλ,τ Proof. Consider a 3-input Turing machine that on input (n, λ, τ) computes the quantity (λn)τ. This K clearly takes time at most poly(log n,log λ, τ). The Turing machine ComputeRepetitions, on input (λ, τ), outputs a description of the 1-input Turing machine (n) = (n, λ, τ) derived from by hardwiring Kλ,τ K K λ, τ on the second and third input tapes, respectively (see Remark 3.2 for a discussion of hardwiring inputs of Turing machines). This takes time poly(log λ,log τ).

We specify the Turing machine Compress in Fig. 14. The constants µ and γ are specified in Eq. (157), and CREP is specified in Eq. (162).

Input: ( , λ) where is the description of a normal form verifier and λ is an integer. V V 1. If the description length of is at most λ, then let (0) = . Otherwise, let (0) denote V V V V the trivial verifier whose description is (0, 0).

2. Compute (1) = IntroVerifier( (0), λ, 9). V V 3. Compute (2) = AnsRedVerifier( (1), λ, µ, γ). V V

4. Compute = ComputeRepetitions(λ, CREP). K 5. Compute (3) = RepeatedVerifier( (2), ). V V K 6. Return COMPR = (3). V V

Figure 14: The definition of the Turing machine Compress, where the constants µ and γ are specified in

Eq. (157), and CREP is specified in Eq. (162).

Lemma 12.3. Let Compress be the Turing machine specified in Fig. 14. For all verifiers and integer λ, the V output of Compress( , λ) is a normal form verifier COMPR = ( COMPR, COMPR) such that COMPR does V V S D S not depend on but only on the parameter λ. Furthermore, there is a Turing machine ComputeSampler V that on input λ outputs the description of the compressed sampler COMPR in time polylog(λ). S Proof. The verifier COMPR = (3) is a normal form verifier because the introspective verifier (1) is a normal form verifierV (even if theV input verifier is not; see Remark 8.8), and the verifier transformationsV AnsRedVerifier, RepeatedVerifier preserve the normalV form (by Theorems 10.24 and 11.2). The introspective verifier computed in Step 2 of Figure 14 has a sampler (1) that only depends on S the parameter λ, as implied by Lemma 8.1. The answer reduced verifier from Step 3 has a sampler (2) S that, according to Theorem 10.24, only depends on the sampler (1), the parameter tuple (λ, µ, γ), and the S description length (1) . Note that (1) only depends on λ, the parameters (µ, γ) are universal constants, |D | S and the description length of (1) is at most the time complexity of IntroVerifier on input ( (0), λ, 9), which D V is polynomial in λ. Thus (2) only depends on λ. Similarly, the sampler (3) of the repeated verifier from S (2) S Step 5 only depends on the sampler , the parameter λ, and the universal constant CREP, as implied by S Theorem 11.2. Therefore the final sampler COMPR = (3) only depends on λ. S S

153 Let ∗ be the trivial verifier whose description is (0, 0). Define ComputeSampler as the Turing machine V DUMMY that on input λ runs Compress to compute ( , ) = Compress( ∗, λ) and returns the description of as the output. By the above discussion,S theD description of is theV same as that of COMPR and the S S S time complexity of ComputeSampler is polylog(λ) as Compress runs in polynomial time: this is stated in Theorem 12.1 and follows easily from the fact that the Turing machines IntroVerifier, ComputeRepetitions, AnsRedVerifier, and RepeatedVerifier are all polynomial time.

Proof of Theorem 12.1. We evaluate the parameters of the verifiers generated in each step of the Compress procedure. These parameters are summarized in Table 2 and are explained in the following. We fix an index n 2 and let N = 2n. ≥

Time complexity Level Completeness Soundness Entanglement Sampler Decider

(1) E , 1 ε (1) > n 1 (1) val∗( n ) 1 ε1 V −(0) ≥ poly(n, λ) poly(Nλ) 5 value-1 PCC V − ⇒ max E ( , 1 δ ), V (0) N  1 val∗( ) 1 δ1(ε1, n) V λ − VN ≥ − (1 δ ) 2N −1 (2) (2) val ( ) > 1 ε E , 1 ε (2) value- PCC ∗ n 2 n 2 poly(n, λ) poly(n, λ) 7 1 V (1) − ⇒ V (1) − ≥ V val∗( ) 1 δ (ε , n) E , 1 δ (ε , n) Vn ≥ − 2 2 Vn − 2 2 ( ) (3) val ( 3 ) > 1 ε E , 1 ε  (3) value- PCC ∗ n 3 n 3 poly(n, λ) poly(n, λ) 9 1 V (2) − ⇒ V (2) − ≥ V val∗( ) 1 δ (ε , n) E , 1 δ (ε , n) Vn ≥ − 3 3 Vn − 3 3  Table 2: Parameters and properties of the input verifier , introspective verifier (1), answer reduced verifier V V (2), and parallel repeated verifier (3) computed by Compress. The functions δ , δ , δ are specified below V V 1 2 3 in the main text. The quantity δ in the “Entanglement” entry of (1) is a shorthand for δ (ε , n). 1 V 1 1

Preprocessing the input verifier. Let ( , λ) be the input to the Turing machine Compress. The “trun- cated” verifier (0) computed in Step 1 isV to ensure that the description length of the verifier that is passed V through IntroVerifier is bounded by λ. If already has description length at most λ, then (0) is identical to . Otherwise there is no guarantee aboutV any properties of the verifier (0). V V V Introspection. The verifier (1) = ( (1), (1)) is the introspective verifier corresponding to (0) and V S D V parameters λ and ℓ = 9 computed by IntroVerifier. We state its parameters and properties in the second row of Table 2 in terms of the index n N. The complexity bounds and the∈ number of levels of the introspective verifier are given by Theorem 8.3.

Using that the argument ℓ in IntroVerifier is set to 9, there exists a universal constant CINTRO 1 such that ≥ for all λ N and for all n CINTRO, we have ∈ ≥

CINTRO TIME (1) (n) (λ n) , S ≤ · (155) CINTRO λ n TIME (1) (n) 2 . D ≤ By Theorem 8.3 these bounds hold for all inputs ( (0), λ, 9) (in particular these bounds do not assume that V the verifier (0) is λ-bounded). V

154 The statements in the remaining entries of the row assume that the input verifier is 9-level and λ- V bounded (and therefore (0) is 9-level and λ-bounded). The “Completeness”, “Soundness”, and “Entangle- V ment” entries all follow from Theorem 8.3 where δ1(ε1, n) is defined as

a b1 b δ (ε , n)= a (λn) 1 ε +(λn)− 1 (156) 1 1 1 · 1   for some universal constants a1 > 0, 0 < b1 < 1.

Answer reduction. Let (2) =( (2), (2)) denote the answer reduced verifier corresponding to (1) and V S D V parameters (λ, µ, γ) computed by AnsRedVerifier where we define µ and γ as

2a1 µ = CINTRO , γ = (157) ⌈ ⌉ b1b2 l m where a1, b1 are the constants defining δ1 in Eq. (156). The third row in Table 2 gives the properties of the answer reduced verifier. The bounds on the number of sampler levels and the time complexities follow from Theorem 10.24. Using the bounds on the number of levels and complexities of the introspective verifier (1) from Eq. (155) V (which do not depend on the λ-boundness, or the number of levels, of the input verifier ), as well as the (1) V fact that the description length of is poly(λ), we get that there exists a universal constant CAR such V that for all n CAR, ≥ CAR max TIME (2) (n), TIME (2) (n) (λ n) , (158) S D ≤ · The statement in the “Completeness”n entry follows fromo the completeness part of Theorem 10.24, Eq. (155) and assumption (134). The “Soundness” and “Entanglement” entries also follow from Theo- rem 10.24 where δ2(ε2, n) is defined as

a (1) µa b (1) µb γ δ (ε , n)= a′ γ ′ (λ n) ′ ε ′ +(λ n)− ′ 2 2 · · · |D | · · 2 · |D | ·   for universal constants a > 0 and 0 < b < 1. By construction, the description length of (1) is at most λ, ′ ′ D so we have that δ2 can be bounded by

a b2 b γ δ (ε , n) a (λn) 2 ε +(λn)− 2 2 2 ≤ 2 · 2   where we set a = max a γa′ , 2µa and b = b . Since a , b , µ, and γ are universal constants, so are 2 { ′ ′} 2 ′ ′ ′ a2, b2.

Parallel repetition. Let (3) = ( (3), (3)) denote the anchored repetition (see Section 11.2) of (2) computed by RepeatedVerifierV . WeS stateD the parameters and properties of (3) in the fourth row ofV Ta- V ble 2. By Lemma 12.2 the Turing machine λ,CREP returned by ComputeRepetitions(λ, CREP) computes the function K k(n)=(λ n)CREP , (159) · where CREP is the universal constant specified later in Eq. (162). The time complexities of (3) and (3) follow from Item 3 of Theorem 11.2 and the complexity upper bounds on (2) and (2) specifiedS in theD third row of Table 2. These bounds only depend on the bounds S D

155 stated for verifier (2) in the third row of Table 2, and again do not depend on whether the input verifier V V is λ-bounded. The statement in the “Completeness” entry follows from Item 1 of Theorem 11.2. The “Soundness” and “Entanglement” entries follow from Item 2 of Theorem 11.2 where δ3(ε3, n) is defined as

1 8 log(1 ε3) TIME (2) (n) δ (ε , n)= a − − D , (160) 3 3 3 k(n)   for some universal constant a3 > 0.

Putting everything together. The verifier COMPR is (3). We now put together the bounds and parame- ters from Table 2 to obtain the conclusions statedV in TheoremV 12.1. The claim that the Turing machine Compress is polynomial time follows from the fact that the Tur- ing machines IntroVerifier, ComputeRepetitions, AnsRedVerifier, and RepeatedVerifier are all polynomial time. The claimed number of levels and time complexity of the sampler COMPR follows from the complexity S bounds in Table 2, which in turn only depend on the parameter λ. In particular the sampler COMPR is independent of the input verifier , as shown in Lemma 12.3. Finally, the claimed time complexityS to compute the description of COMPRValso follows from Lemma 12.3. The claimed time complexityS of COMPR in the theorem statement follows from Lemma 12.2 to estimate D the complexity of the Turing machine which computes (159) and appears in the bound on TIME (3) (n). We now establish the completeness,K soundness, and entanglement properties of COMPR. AssumeD that V the input verifier is 9-level and λ-bounded. Therefore, by construction, the verifier (0) constructed in V V Step 1 is identical to . Fix an index n N. The completeness property (Item 1 in Theorem 12.1) follows V ∈ from chaining together the completeness properties of (3), (2), (1) and the assumption that has a Vn Vn Vn VN value-1 PCC strategy. We now establish the soundness property (Item 2 in Theorem 12.1). Assume for now that we have set the integers C and CREP so that for all integers λ, n such that n C the following inequality holds: 0 ≥ 0 1 1 δ 2δ 2δ , n , n < . (161) 1 2 3 2 2       1 (3) 1 Suppose for contradiction that n C , the verifier is λ-bounded, and val∗( ) , but val∗( ) > . ≥ 0 V VN ≤ 2 Vn 2 By chaining together the soundness guarantees of the three verifiers (3), (2), and (1), we obtain Vn Vn Vn 1 1 val∗( ) 1 δ 2δ 2δ , n , n > , VN ≥ − 1 2 3 2 2       a contradiction.34

It remains to show that there is a suitable choice for the constants C0 and CREP. We work backwards and first identify ε∗1, depending on n, such that

a b b 1 δ (ε∗, n)= a (λn) 1 (ε∗) 1 +(λn)− 1 < 1 1 1 · 1 2   34Due to the strict inequality vs. non-strict inequality distinction in the soundness statements, in order to do the chaining, we need that δ (ε) < 2δ (ε) for non-zero ε and j 1, 2, 3 , which holds in our case. j j ∈{ }

156 1/b1 1 1/b1 for all sufficiently large n. Choosing ε∗ = a , we get that for all n C1 =(4a1) , 1 8a1(λn) 1 ≥   a b b δ (ε∗, n)= a (λn) 1 (ε∗) 1 +(λn)− 1 1 1 1 · 1   a b 1 a (λn) 1 (ε∗) 1 + ≤ 1 · 1 4  1 1  a1 + ≤ 8a1 4 1   < , 2 where we used the fact that λ 1 in the second line. ≥ Next, we identify ε∗2, depending on n, such that

a b c ε∗1 δ (ε∗, n)= a (λn) 2 (ε∗) 2 +(λn)− 2 < 2 2 2 · 2 2   1/b2 2a1 ε∗1 for all sufficiently large n. For the choice of γ in (157), and for ε∗ = a , we get that for ≥ b1b2 2 8a2(λn) 2 all n C =(4a )b1/a1 (8a )1/a1 ,   ≥ 2 2 1 a b b γ δ (ε∗, n)= a (λn) 2 (ε∗) 2 +(λn)− 2 2 2 2 · 2  a b b γ 1 = a (λn) 2 (ε∗) 2 + a (λn)− 2 (ε∗)− ε∗ 2 · · 2 2 · 1 · 1 a b (b γ a /b ) 1/b = a (λn) 2 (ε∗) 2 + a (λn)− 2 − 1 1 (8a ) 1 ε∗ 2 · · 2 2 · 1 · 1 a b a /b 1/b a (λn) 2 (ε∗) 2 + a (λn)− 1 1 (8a ) 1 ε∗ ≤ 2 · · 2 2 · 1 · 1 ε∗1 ε∗1 a2 + ≤ 8a2 4 ε   < ∗1 . 2

Using the condition that is λ-bounded (thus λ), and Equation (158), we have that TIME (2) (n) 2 CAR V |V| ≤ D ≤ (λ n) for all n CAR. We specify the universal constant CREP used in the description of the Turing ≥ machine Compress in Figure 14. Let CREP be the minimum integer such that for all n 2, ≥ 8 2a3 (λ2n)CAR < (λ n)CREP . (162) ε∗2 · ·  

Note that such an integer CREP exists because the left-hand side of Equation (162) is some fixed polynomial in λ and n, due to our choice of ε∗2 and ε∗1. CREP For this choice of CREP, using that k(n) is defined as (λ n) we have that the function δ3 defined 1 < · in (160) satisfies 2δ3( 2 ) ε∗2. Let

C = max 2, CAR, CINTRO, C , C . (163) 0 { 1 2}

Since CAR, CINTRO, C1, C2 are all universal constants, then C0 is also a universal constant. For this choice of C , the inequality in (161) holds for all n C . This establishes Item 2. 0 ≥ 0

157 Finally we show the entanglement lower bound. For all n C , ≥ 0 (3) 1 (2) E , E , 1 ε∗ Vn 2 ≥ Vn − 2    (1)  E , 1 ε∗ ≥ Vn − 1   1 Nλ 1 max E , , 2 − . ≥ VN 2 n   o This establishes Item 3 in Theorem 12.1, and concludes the proof of the theorem.

12.2 An MIP∗ protocol for the Halting problem

For every Turing machine , we construct a normal form game G such that val∗(G) = 1 if halts and 1 M M val∗(G) otherwise. ≤ 2

Input: ( , , λ, n, x, y, a, b) where is the description of an 8-input Turing machine and is the descriptionR M of Turing machine R . M M 1. Run for n steps. Accept if it halts and continue otherwise. M 2. Compute the description of the 5-input Turing machine that on input (n, x, y, a, b) D D simulates with input ( , , λ, n, x, y, a, b). R R M 3. Compute the description = ComputeSampler(λ). S 4. Compute the description COMPR = Compress( , λ) where =( , ). V V V S D 5. Accepts if the decider COMPR of verifier COMPR accepts (n, x, y, a, b). D V

Figure 15: Description of Turing machine . The Turing machines ComputeSampler and Compress are given in Lemma 12.3 and Theorem 12.1. F

First, we define a Turing machine as in Fig. 15. For all Turing machines and parameters λ N HALT F M ∈ we define the decider ,λ to be the 5-input Turing machine that on input (n, x, y, a, b) simulates with DM F input ( , , λ, n, x, y, a, b). We sometimes omit the subscripts , λ for notational simplicity and denote F M M the decider as HALT. Define HALT = ComputeSampler(λ) and HALT =( HALT, HALT). By inspecting the definition ofD the Turing machineS in Fig. 15, it is easy to concludeV that theS descriptionD computed by HALT at step 2 is the description of FHALT itself.35 Therefore, the input to the Compress TuringD machine at D D step 4 is the description of HALT and λ. We have the following lemma for HALT. V V Lemma 12.4. Let be a Turing machine and λ integer. Then the verifier HALT has the following prop- erties. M V 35Alternatively, we can use Kleene’s recursion theorem [Kle54] here so that HALT is the fixed point of a Turing machine similarly defined as . This approach was taken in the first arXiv version of theD paper. We choose to present the proof without resorting to Kleene’sF recursion theorem for the sake of simplicity.

158 HALT 1. For any integer n, if halts in n steps then val∗( n )= 1. Furthermore, there is a value-1 PCC strategy for the gameMHALT. V Vn 2. For any integer n, if does not halt in n steps then HALT has a value-1 PCC strategy if and only if M Vn COMPR does. Furthermore, under the same assumption it holds that for any α [0, 1], Vn ∈ E HALT, α = E COMPR, α . Vn Vn   Proof. Suppose halts in n steps. Then from the definition of and HALT, it follows that HALT accepts M HALT F D D on input (n, x, y, a, b) for all x, y, a, b, and hence val∗( n ) = 1. If does not halt in n steps, then ALT OMPRV M H accepts on input (n, x, y, a, b) if and only if C accepts on (n, x, y, a, b). D D We show in Lemma 12.6 below that for all Turing machine there is a choice of λ such that HALT = M V ( HALT, HALT) is λ-bounded. We first show a technical lemma. S D Lemma 12.5. Suppose C, C N are two integers. Then for all λ 2C2C and n 2, ′ ∈ ≥ ′ ≥ C λ C(C′ λn) n . · ≤ x 2 Proof. By calculating the derivative of 2 2C′C x with respect to x, it is easy to see that for all x 2 2C C′ 2 x − λ 2 ≥ log ln 2 , the inequality 2C C′x 2 holds. Substituting C in for x, we have for all λ 2C C′ 2 ≤ ≥ ≥ 2C C′ C log ln 2 , λ 1 λ 1 CC′λ 2 C − n C − . ≤ ≤ This implies that, for λ 2C2C and n 2, ≥ ′ ≥ C C λ C(C′ λn) (CC′ λn) n . · ≤ · ≤

Lemma 12.6. There is a integer parameter λ, polynomial-time computable from the description of the Turning machine , such that the verifier HALT corresponding to and parameter λ is λ-bounded. M V M Proof. We first do an accounting of the time complexity of HALT. By definition, for all and λ, the D M decider HALT simulates on input ( , , λ, n, x, y, a, b). Its running time is therefore a polynomial in the timeD bounds of followingF steps. F M

1. Simulating : this takes poly( , n) time. M |M| 2. Computing the description : this takes poly( , ) time. D |M| |F| 3. Computing the description COMPR: this takes polylog(λ) time. S 4. Computing the description COMPR: this takes poly( ,log λ) time. D |V| 5. Executing COMPR on input (n, x, y, a, b): this takes poly(n, λ) time. D

159 As is computed in time poly( , ) the size poly( , ). Similarly, the size of D |M| |F| |D| ≤ |M| |F| the sampler polylog(λ) as ComputeSampler is polynomial time. Therefore, the size of the verifier |S| ≤ is bounded by poly( , ,log λ). So the time complexity bounds in Items 4 and 5 are at most |V| |M| |F| poly( , ,log λ) and poly(λ, n) respectively. We emphasize that we did not use the λ-boundedness of the| verifierM| |F|, as this is what we want to prove here! And, as a consequence, the total time complexity of HALT, V D CD TIME HALT (n) CD λ n , D ≤ |M| · |F| · · 2 for some integer CD N. By Lemma 12.5, we have for all λ 2C  and n 2, ∈ ≥ D |M| |F| ≥ λ TIME HALT (n) n . D ≤ HALT Similarly, the sampler has running time poly(n, λ) and therefore there is an integer CS > 0 such that S CS TIME HALT (n) CS(λn) , S ≤ which, again by Lemma 12.5, is at most nλ for λ 2C2 and n 2. ≥ S ≥ As we have shown = poly( , ,log λ) and the description of the verifier is the same as ALT |V| |M| |F| V that of H , there exists integer C > 0 such that V HALT C( log λ)C. V ≤ |M| |F| ·

There is a λ0 such that the above is at most λ for all λ λ0. Finally, taking ≥ λ = max 2C2 , 2C2, λ { D |M| |F| S 0} completes the proof. The computability of λ from is obvious.  |M| Putting things together we obtain the following result.

Theorem 12.7. For all Turing machines , there exists a game G such that M 1. If halts on the empty tape, then val∗(G) = 1. Moreover, there is a (finite-dimensional) PCC strategyM for the players in G that wins with certainty.

1 2. If does not halt on the empty tape, then val∗(G) . M ≤ 2 Furthermore, the game G is polynomial-time computable from the description of . M Proof. Fix α = 1 . Fix a Turing machine . Let λ be the computable integer promised by Lemma 12.6 2 M such that HALT is λ-bounded. V Let C0 be the universal constant specified in Theorem 12.1. For all n C0, let Gn denote the game corresponding to the verifier HALT, and let GCOMPR denote the game corresponding≥ to the compressed Vn n verifier where = Compress( HALT, λ) Finally, let G = G . Vn′ V ′ V C0 Suppose that halts on the empty tape; let T be the minimum number of time steps required for M M to halt on the empty tape. Observe that for all n T, by Lemma 12.4 it holds that G has a value-1 PCC ≥ n strategy. We will use this to show inductively that G also has a value-1 PCC strategy, for all C n < T. n 0 ≤ Claim 12.8. Let C n < T. Suppose that G has a value 1 PCC strategy for all m > n. Then G also 0 ≤ m n has a value 1 PCC strategy.

160 Proof. Since by assumption does not halt in n steps, by Lemma 12.4 it holds that Gn has a value 1 PCC GCOMPR M HALT GCOMPR strategy if n does. Since is λ-bounded and n C0, by Theorem 12.1 it follows that n V n ≥ has a value 1 PCC strategy if G2n does. Since 2 > n, this is true by the hypothesis of the claim. Thus, Gn has a value 1 PCC strategy as claimed.

By Claim 12.8 and downwards induction on n (with the base case n = T), we have that Gn has a value-1 G G PCC strategy for all n C0. In particular, we have val∗( ) = val∗( C0 ) = 1. This shows the first item in the theorem statement.≥ Now suppose that does not halt on the empty tape. We have that for all n N: M ∈ E (G , α)= E (GCOMPR, α) E (G , α) , n n ≥ N where the equality follows from Lemma 12.4 and the inequality follows from Theorem 12.1 (again using the λ-bounded property of HALT). By induction, we get that for all k N, V ∈ (k+1) λ E G E G E G E GCOMPR (g (C0)) ( , α)= ( C0 , α) (k)( ), α = (k) , α α 2 , ≥ g C0 g (C0) ≥     where g(k)( ) is the k-fold composition of the function g(n) = 2n and the second inequality follows from · Theorem 12.1 again. Since g( ) is a monotonically increasing function and by definition α > 0, this implies · that there is no finite upper bound on E (G, α), and therefore every finite dimensional strategy for the game G 1 must have success probability at most α = 2 . Recall the definition of the complexity class RE, which stands for the set of recursively enumerable languages (also called Turing-recognizable languages). Precisely, a language L 0, 1 is in RE if ⊆ { }∗ and only if there exists a Turing machine such that if x L, then (x) halts and outputs 1, and if M ∈ M x / L, then either (x) outputs 0 or it does not halt. The Halting Problem is the language that contains descriptions∈ of TuringM machines that halt on the empty input tape. The following well-known lemma shows that the Halting Problem is complete for RE. We include the simple proof for convenience. Lemma 12.9. The Halting Problem is complete for RE. Proof. To see that the Halting Problem is in RE, define to take as input an x that represents a Turing M machine = [x], and runs the universal Turing machine to simulate on the empty tape; if halts with N N N a 1 then so does . M To show that the Halting problem is complete for RE, let L RE and a Turing machine such that if ∈ M x L, then (x) halts and outputs 1. For an input x, let be the following Turing machine. first ∈ M Nx Nx runs on input x. If accepts, then accepts. On all other outcomes, goes into an infinite loop. M M Nx Nx Thus halts if and only if x L. Nx ∈ Corollary 12.10. MIP∗ = RE.

Proof. Since the Halting Problem is complete for RE, and by Theorem 12.7 is contained in MIP∗, we have the inclusion RE MIP∗. The reverse inclusion MIP∗ RE follows from the following observation. ⊆ ⊆ Let L MIP∗ . From the definition of MIP∗ (see e.g. [VW16, Section 6.1], from which we borrow the notation∈ used here) it follows that there exists a polynomial-time Turing machine such that for all R x 0, 1 , (x) is the description of an m-turn verifier V interacting with k provers, where m and k are ∈ { }∗ R x both polynomial functions of x and such that | | val∗(V ) 2/3 if x L x ≥ ∈ val∗(V ) 1/3 if x / L x ≤ ∈

161 Consider the following Turing machine : on input x, it computes Vx = (x), and then exhaustively searches over tensor-product strategies ofA increasing dimension and increasingR accuracy to evaluate a lower bound on val∗(V ). If val∗(V ) 2/3, then for arbitrarily small δ there exists a finite dimensional tensor- x x ≥ product strategy for the players that achieves value 2/3 δ > 1/3. When the Turing machine identifies − A such a strategy it terminates, outputting 1. If there is no such strategy, then never halts. This implies that A L RE. ∈ 12.3 An explicit separation

As discussed in Section 1.3, Theorem 12.7 implies that Cqa, the set of approximately finite-dimensional correlations, is a strict subset of Cqc, the set of commuting-operator correlations. This is because if Cqa = Cqc, then there exists an algorithm to approximate the entangled value of a given nonlocal game G to arbitrary accuracy. On the other hand, Theorem 12.7 shows that deciding whether a game has entangled value 1 or at most 1/2, promised that one is the case, is undecidable. Therefore the correlation sets must be different. In fact, Theorem 12.7 implies that there is an infinite family M of Turing machines that do not halt on co an empty input tape such that for all M , the corresponding game G has val∗(G ) < val (G ), M ∈ M M M where recall that valco(G ) denotes the commuting-operator value of G , which is supremum of success probabilities over all commuting-operatorM strategies for G .36 However,M it is not immediately clear, given a specific non-halting Turing machine , whether the associatedM game G separates the commuting- M M operator model from the tensor product model of strategies. While Theorem 12.7 implies that val∗(G ) 1 co G G M ≤ 2 , it could also be the case that val ( ) = val∗( ) in that particular instance. We conjecture that valco(G )= 1 for all non-halting TuringM machines M, but it appears to be difficult to identify an explicit M M value-1 commuting operator strategy that demonstrates this. In this section we identify an explicit game G that separates the tensor product model from the commuting- 1 co operator model; we show that val∗(G) but val (G) = 1. Interestingly, the proof does not exhibit an ≤ 2 explicit value-1 commuting-operator strategy for G. We construct the separating game in a similar manner to the games constructed in Section 12.2. Let denote the following 1-input Turing machine: it takes as input a description of a nonlocal game G and runsA the semidefinite programming hierarchy of [NPA08, DLTW08] to compute a non-increasing sequence co co α1, α2,... of upper bounds on val (G) such that limn ∞ αn = val (G). The Turing machine halts if → co A it obtains a bound αn < 1. Thus this algorithm eventually halts whenever val (G) < 1, and otherwise it runs forever. Consider the Turing machine in Figure 16. We follow the same steps as in Section 12.2. Let SEP be N Dλ the decider that on input (n, x, y, a, b), simulates on input ( , λ, n, x, y, a, b). We sometimes leave the N N parameter λ implicit. Define SEP = ComputeSampler(λ) and SEP =( SEP, SEP). By the definition of Turing machine in Fig. 16,S the description computed by VSEP is theS descriptionD of SEP itself and the N SEP V D SEP SEP V input to Compress is the description of and λ. Define the game G = where C0 is the constant V VC0 from Theorem 12.1. Then GSEP is the game G computed in SEP. D A similar argument to that of Lemma 12.6 shows that there exist choices of λ (computable from the description of ) such that SEP is λ-bounded. A V 36 co To see why this holds, observe that if it were the case that val∗(G ) = val (G ) for all but finitely many non-halting then we could construct an algorithm to decide the Halting problemM as follows. OnM input , first checks if is one M A co M A M of the finitely many Turing machines for which val∗(G ) < val (G ); if so, then it outputs a hard-coded answer for whether halts on the empty tape or not. Otherwise, computesM the nonlocalM game G and executes the aforementioned algorithm for M A M approximating the entangled value of games assuming that Cqa = Cqc.

162 Input: ( , λ, n, x, y, a, b) where is the description of a 7-input Turing machine. R R 1. Compute the description = ComputeSampler(λ). S 2. Compute the description of the 5-input Turing machine that on input (n, x, y, a, b) D D simulate with input ( , λ, n, x, y, a, b). Let =( , ). R R V S D 3. Compute the description of the game G = where C is the integer from Theorem 12.1. VC0 0 4. Simulate on input G for n steps and accept if halts. A A 5. Compute the description COMPR = Compress( , λ). V V 6. Accepts if the decider COMPR of verifier COMPR accepts (n, x, y, a, b). D V

Figure 16: Description of Turing machine . The Turing machines ComputeSampler and Compress are given in Lemma 12.3 and Theorem 12.1. N

SEP SEP SEP 1 co SEP Theorem 12.11. For the game G = it holds that val∗(G ) and val (G )= 1. VC0 ≤ 2 Proof. Suppose that valco(GSEP)= 1. The invocation of the Turing machine on input GSEP never halts, and therefore the decider SEP never accepts in Step 2 of Figure 16. ApplyingA Theorem 12.1, we get that SEP 1 SEP 1 D E ( , ) E ( n , ) and Vn 2 ≥ V2 2 SEP 1 2λn 1 E , 2 − Vn 2 ≥   E SEP 1 for all n C0. An inductive argument implies that there is no finite upper bound on ( n , 2 ), and thus SEP≥ SEP 1 V val∗(G )= val∗( ) , which implies the theorem. C0 2 V ≤ co SEP On the other hand, suppose that val (G ) < 1. Then there exists some m C0 such that GSEP SEP ≥ A halts on input after m steps, so n has a value-1 PCC strategy for all n m (i.e. the players do not respond with any answers). ThusV by the completeness statement of Theorem≥12.1 and an induction SEP GSEP argument, we have that n has a value-1 PCC strategy for all n C0, which implies that val∗( )= 1, V SEP co SEP ≥ a contradiction because of val∗(G ) val (G ). This completes the theorem. ≤

163 A Analysis of the Pauli basis test

In this appendix we give a proof of Theorem 7.14. For convenience, we repeat the statement of the theorem here. Theorem A.1. There exists a function

a b b bmd δQLD(ε, m, d, q)= a(md) (ε + q− + 2− ) for universal constants a 1 and 0 < b < 1 such that the following holds. For all admissible parameter ≥ S GPAULI tuples qldparams = (q, m, d) and for all strategies = ( ψ , A, B) for the game qldparams that succeed with probability at least , there exist local isometries | i 1 ε φA : A A′ A′′ , φB : B B′ B′′ − q M H m→H ⊗H H →H ⊗H (where ψ A B and , = (C ) with M = 2 ) and a state AUX such A′′ B′′ ∼ ⊗ A′ B′ that | i∈H ⊗H H H | i∈H ⊗H

M 1. φA φ ψ AUX EPR δQLD(ε, m, d, q), ⊗ B| i − | i ⊗ | qi⊗ ≤ 2. Letting A˜ x = φ Ax φ† and B˜ y = φ Byφ† , we have for W X, Z a A a A b B b B ∈ { } PAULI A˜ ( ,W) I (τW) I u ⊗ B′ B′′ ≈δQLD u A′′ ⊗ A′ B′B′′ (PAULI ) I B˜ ,W I (τW ) , A′ A′′ ⊗ u ≈δQLD A′ A′′B′ ⊗ u B′′ M where the δQLD statement holds with respect to the state AUX A B EPRq ⊗ and the answer ≈ | i ′ ′ ⊗ | iA′′B′′ summation is over u FM. ∈ q We also recall the meaning of the parameters: q is the field size, m the number of variables, d the degree m of the low-degree code, and M = 2 is the number of EPRq states being tested for. This appendix is structured as follows. First, we| establishi some preliminary facts and definitions in Appendix A.1. In the remainder of the appendix, we show Theorem A.1 by showing how to construct a representation of the Pauli group on the provers’ Hilbert space, starting from a strategy S that succeeds GPAULI in qldparams with high probability. This is done in several stages: in Appendix A.2, we use the strategy measurements to define a set of observables that correspond to a subset of the Pauli X and Z observables, and show that these approximately satisfy the associated Pauli group relations. In Appendix A.3, we adjoin ancillary registers containing EPR states to the provers’ state ψ , and define a new set of approximately- commuting observables acting on the expanded state. In Appe| ndixi A.4, we combine the approximately- commuting observables into a strategy for the classical low-degree test, and in Appendix A.5, we apply the soundness theorem of the classical low-degree test to construct a single projective measurement that simultaneously measures all of the approximately-commuting observables. Finally, in Appendix A.6, we use this single simultaneous measurement to construct an exact representation of the Pauli group, show that GPAULI it is close to the original strategy observables on the expanded state, and deduce that the game qldparams is a self-test for the strategy S PAULI.

A.1 Preliminaries F Definition A.2. Given integer d, m and a prime power q we denote by idegd,m( q) the set of polynomials in m variables with individual degree at most d in each variable. When the field is clear from context we write idegd,m. When the number of variables is clear from context we write idegd. We denote by degd the set of polynomials with total degree at most d.

164 FM Recall the low-degree code defined in Section 3.4, in which vectors h q get mapped to polynomials Fm m ∈ gh idegd,m( q ) such that for every u 0, 1 , gh(u) = hu where the coordinates of h are indexed ∈ ∈F {m } FM by binary strings of length m. For x q , recall the definition of the vector indm(x) q from ∈ k ∈ Section 3.4. Recall also the finite field trace tr( ) = trq 2( ) : Fq F2, where q = 2 for odd k, defined in Section 3.3.1. · → · → m 2 2 Call a tuple ω =(uX, uZ, rX, rZ) (F ) (F ) anticommuting if the quantity ∈ q × q γ = tr((rX ind (uX)) (rZ ind (uZ))) · m · · m introduced in Equation (52) is not 0, and commuting if γ = 0. Fm 2 Fact A.3. The probability that a tuple ω = (uX, uZ, rX, rZ) sampled uniformly at random from ( q ) 2 × (Fq) is anti-commuting is at least

m 1 md 1 3md 1 (1 q− ) (1 q− ) 1 1 . − · − · − q · 2 ≥ − q · 2     The probability that ω is commuting is also at least 1 3md 1 . − q · 2   Proof. First, note that indm(uX)= 0 if and only if uX = 0. If uX = 0 then ω is commuting. Otherwise, if uX = 0, then observe that for all rX, rZ, uZ, 6 tr((rX ind (uX)) (rZ ind (uZ))) = tr(rXrZ(ind (uX) ind (uZ))) · m · · m m · m = tr((rXrZ) g (uZ)) , · indm(uX) where is a nonzero polynomial in F . gindm(uX) idegd,m( q) If rX = 0, then ω is commuting. Condition on rX = 0. Since g = 0 has total degree at most md, 6 indm(uX) 6 by the Schwartz-Zippel lemma the probability that gindm(uX)(uZ) is zero over a uniformly random choice of uZ is at most md/q. Conditioned on g (uZ) = 0, the product (rXrZ) g (uZ) is a uniformly indm(uX) 6 · indm(uX) random element of Fq when rZ is chosen uniformly at random. m 1 The probability that uX = 0, rX = 0, g (uZ) = 0 is at least (1 q ) (1 q ) (1 md/q) 6 6 indm(uX) 6 − − · − − · − since uX, rX, uZ are chosen independently. Since q = 2t is an admissible field size, we have that for all a F , the trace of a is equal to ∑ a ∈ q i i where (a ,..., a )= κ(a) Ft are defined in Section 3.3.1. Thus the probability that a uniformly random 1 t ∈ 2 element of Fq has trace 0 is exactly 1/2. In this appendix we often use the “ ” notation to specify closeness of general operators (not necessarily ≈δ POVMs) that are indexed by questions but not answers: given sets of operators Ax and Bx indexed by { } { } questions x , we write Ax Bx to denote ∈X ≈δ E ψ (Ax Bx)†(Ax Bx) ψ δ x µh | − − | i ≤ ∼ where µ is a question distribution and ψ is a state that have been fixed by context. This will be useful for expressing closeness of observables, for| example.i The following two lemmas relate the closeness between sets of operators and weighted sums of those operators. Lemma A.4. Let Ax and Bx be operators indexed by some finite set , and let µ be a probability { } { } X distribution over . Let α be a set of complex numbers such that α 1. Then if Ax Bx on X { x} | x| ≤ ≈δ average over x drawn from µ, then A B, where A = E α Ax and B = E α Bx. ≈δ x x x x 165 Proof. Expand

2 2 x x x x E αx A B ψ E (A B ) ψ x − | i ≤ x k − | ik    E (Ax Bx) ψ 2 ≤ x k − | ik δ . ≤ The first inequality is the triangle inequality and the second inequality is Jensen’s inequality.

x x Lemma A.5. Let be a finite set. Let Aa a and Ba a be POVMs and let αa a be a collection A { }x ∈A { x} ∈A x x { } ∈A of complex numbers on the unit circle. Let A = ∑a αa Aa and B = ∑a αa Ba . Then

Ax Bx implies Ax Bx a ≈δ a ≈δ′ for δ = δ. ′ |A| · Proof. Expand

2 x x 2 x x E (A B ) ψ = E ∑ αa(Aa Ba ) ψ x k − | ik x a − | i 2 x x E ∑ (Aa Ba ) ψ ≤ x a k − | ik   x x 2 E ∑ (Aa Ba ) ψ ≤ x |A| · a k − | ik δ . ≤ |A|· The first inequality follows from the triangle inequality, the second inequality follows from Cauchy-Schwarz, and the last inequality follows from the assumption that Ax Bx. a ≈δ a The following lemma shows that POVMs that are approximately projective (in the sense of being self- consistent) are close to exact projective measurements. The lemma first appears in [KV11]; for a self- contained proof, see [JNV+20].

Lemma A.6 (Orthonormalization lemma). Let 0 δ 1. There exists a function η (δ) = O δ1/4 ≤ ≤ ortho such that the following holds. Let ψ be a bipartite state on A B where A, B are finite dimen- | i H ⊗H H H  sional. Let Q be a POVM and { a} Q IB IA Q . (164) a ⊗ ≃δ ⊗ a Then there exists a projective measurement P such that { a}

P IA Q IA . (165) a ⊗ ≈ηortho a ⊗ A.2 Strategies Let qldparams = (q, m, d) be an admissible parameter tuple (Definition 7.12) and let S = ( ψ , M) be a | i strategy for GPAULI that succeeds with probability at least 1 ε, where ψ is a bipartite state on registers qldparams − | i A and B, with associated Hilbert spaces A and B respectively. For notational convenience we use M to denote the operators for both players; it willH be clearH from context which player’s operators we are referring

166 (POINT,W),u (POINT,W),u to (for example, we write Ma IB to indicate that Ma is viewed as an operator acting on register A). ⊗ Using Naimark’s theorem (as formulated in [NW19, Theorem 4.2]), at the cost of extending the state ψ by adding sufficiently many qubits initialized in the 0 state we may assume that the measurements in S| iare projective. Let | i ψ′ =( ψ 0 0 0 0 )AB | i | i ⊗ | · · · i ⊗ | · · · i be the extended state. In the remainder of this section, we will work with the projective strategy on the extended state, which we continue labeling ψ (instead of ψ′ ) for notational convenience. We introduce several additional notational| i shorthands.| Byi definition the strategy S contains mea- surement operators for each of the possible question types and content summarized in Fig. 7, such as (POINT,W),y Fm Ma a Fq for all y q , etc. For “line” questions (of type ALINE or DLINE) we will use { } ∈ ∈ LINE as a formal placeholder for either type ALINE or DLINE; which type is meant will be clear from the nature of the line. Moreover, we will use ℓ to denote the description of either an axis-parallel line (given by a pair ℓ = (u0, s)) or a diagonal line (given by a triple ℓ = (u0, s, v)). We also introduce the following collection of observables constructed from the measurement operators in the strategy. For every u Fm, ∈ q r F , and W X, Z , define ∈ q ∈ { } r tr(ar) (POINT,W),u W (u)= ∑ ( 1) Ma . (166) a F − ∈ q

r (POINT,W),u The observable W (u) acts on register A or B depending on whether Ma is viewed as being an operator acting on register A or B; this will be made clear from context. Recall the definitions of (anti-)commuting tuple given at the start of Section A.1.

S GPAULI Lemma A.7. Assume that succeeds in qldparams with probability at least 1 ε, for some ε 0. Then the following hold: − ≥

GPAULI 1. (Consistency check) On average over (t, x) sampled from the question distribution of qldparams, we t,x t,x have that M IB IA M . a ⊗ ≃ε ⊗ a 2. (Low-degree check) For all W X, Z , on average over (ℓ, u) drawn from the line-point dis- ∈ { (}LINE,W),ℓ (POINT,W),u tribution (see Definition 7.6), we have M IB ε IA Ma , where the answer [evalu( )=a] ⊗ ≃ ⊗ summation is over a F . · ∈ q Fm (POINT,W),u 3. (Pauli basis consistency check) For all W X, Z , on average over u q , we have Ma (PAULI,W) (PAULI,W)∈ { } (PAULI,W)∈ ⊗ I I M where M = ∑ M M and g is the low-degree en- B ε A [g (u)=a] [g (u)=a] h Fq :g (u)=a h h ≃ ⊗ h h ∈ h coding of h.

4. (Commutation check) For all W X, Z , on average over commuting ω = (uX, uZ, rX, rZ), we (PAIR,W),ω ∈PAIR { ,ω } have Ma IB ε IA M , where the answer summation is over a F2. ⊗ ≃ ⊗ [βW =a] ∈

5. (Commutation consistency check) For all W X, Z , on average over commuting ω =(uX, uZ, rX, rZ), ∈ { } we have (POINT,W),uW (PAIR,W),ω, where the answer summation is over F . M[tr( r )=a] IB ε IA Ma a 2 · W ⊗ ≃ ⊗ ∈

167 6. (Magic square check) For any anticommuting ω = (uX, uZ, rX, rZ) let Λω to be the value of the VARIABLE ,ω strategy ψ , MCONSTRAINTi,ω M j in the game GMS. Then | i { α1,α2,α3 } ∪ { a }   E Λω = 1 O(ε) . ω : tr((rX uX) (rZ uZ))=0 − · · · 6

7. (Magic square consistency check) For all W X, Z , on average over anticommuting ω = ∈ { } (uX, uZ, rX, rZ),

(POINT,X),uX VARIABLE ,ω M IB IA M 1 , (167) [tr( rX)=a] ε a · ⊗ ≃ ⊗ (POINT,Z),uZ VARIABLE ,ω M IB IA M 5 , (168) [tr( rZ)=a] ε a · ⊗ ≃ ⊗

where the answer summations are over F2.

Furthermore, all of the approximations above hold with “ ε” instead of “ ε”, and they also hold with the tensor factors interchanged. ≈ ≃

Proof. Since the question types are sampled according to the type graph GPAULI of finite size in Fig. 6 in the GPAULI PAULI game qldparams, the probability of any of the subtests of qldparams (described in Fig. 7) being executed is at least some universal constant. Thus the consistency condD itions expressed in Items 1, 2, 3 of the lemma follow directly from the corresponding tests in Fig. 7 (where Item 2 follows from the consistency test that is LD S a subtest of ldparams) and the fact that the strategy is assumed to succeed with probability at least 1 ε GPAULI D − in qldparams. For the remaining items, recall that by Fact A.3 the probability that a tuple ω =(uX, uZ, rX, rZ) sampled as question content for a question of type PAIR, (PAIR, W),CONSTRAINTi or VARIABLEj is commuting is at least (1 3md/q) (1/2), which is at least 1/4 assuming 6md q; the latter condition is without loss − · ≤ of generality provided that the constant a in Theorem 7.14 is at least 6 (as otherwise δ 1 and the theorem ≥ statement is vacuous). The same lower bound holds for the probability that ω is anticommuting. Therefore, the consistency conditions in Items 4, 5 and 7 of the lemma also follow directly from the corresponding tests in Fig. 7. Finally, Item 6 in the lemma follows from the fact that the Magic Square check in Fig. 7 is executed with constant probability over the choice of a pair of questions. That all of the approximations in the Lemma statement also hold with ε replaced by ε is immediate from Fact 5.18. ≃ ≈

Lemma A.8. The following hold for the observables defined in (166).

For all W X, Z , r F , on average over uniformly random u Fm we have • ∈ { } ∈ q ∈ q r r W (u) IB IA W (u) (169) ⊗ ≈ε ⊗

On average over uniformly random ω =(uX, uZ, rX, rZ), •

rX rZ X (uX)Z (uZ) IB ⊗ tr((rX uX) (rZ uZ )) rZ rX ( 1) · · · Z (uZ)X (uX) IB . (170) ≈√ε − ⊗ Furthermore, all approximations above hold with the tensor factors interchanged.

168 Proof. We first establish (169). Fix a W X, Z and r F . From item 1 of Lemma A.7 and the data ∈ { } ∈ q processing inequality (Fact 5.24), using that the question type (POINT, W) has constant probability of being sampled by PAULI, we get that on average over u F , S ∈ q (POINT,W),u (POINT,W),u M[tr( r)=a] IB ε IA M[tr( r)=a] , · ⊗ ≈ ⊗ · where the answer summation is over a F . Equation (169) follows directly from Lemma A.5. ∈ 2 We now establish (170). We have that on average over commuting ω = (uX, uZ, rX, rZ) each of the approximate consistency relations hold, for any W X, Z : ∈ { } (POINT,W),uX (PAIR,W),ω M IB IA M (Item 5 of Lemma A.7) [tr( rX)=a] ε a · ⊗ ≃ ⊗ PAIR,ω ε M IB (Item 4) ≃ [βW =a] ⊗ PAIR,ω ε IA M . (Item 1 and Fact 5.24) ≃ ⊗ [βW =a] Fact 5.18 then implies that on average over commuting ω,

(POINT,W),uX PAIR,ω (171) M[tr( r )=a] IB ε IA M[β =a] . · W ⊗ ≈ ⊗ W Applying Lemma 5.25, since MPAIR,ω is projective it follows from (171) for W = X and for W = Z that { βX,βZ }

(POINT,X),uX (POINT,Z),uZ (POINT,Z),uZ (POINT,X),uX M M IB M M IB (172) [tr( rX)=a] [tr( rZ)=a] ε [tr( rZ)=a] [tr( rX)=a] · · ⊗ ≈ · · ⊗ t,x on average over commuting ω. Then for every ω = (uX, uZ, rX, rZ), since the measurements Ma are projective we get that { }

rX rZ (POINT,X),uX (POINT,Z),uZ X (uX)Z (uZ)= I 2M I 2M [tr( rX)=1] [tr( rZ)=1] − · − ·  (POINT,X),u  (POINT,Z),u (POINT,X),u (POINT,Z),u = I 2M X 2M Z + 4M X M Z . (173) [tr( rX)=1] [tr( rZ)=1] [tr( rX)=1] [tr( rZ)=1] − · − · · · Using (172) we deduce from (173) that on average over commuting ω,

rX rZ rZ rX X (uX)Z (uZ) Z (uZ)X (uX). ≈ε This shows the “commuting” part of (170). Now we consider the case that ω is anticommuting. Item 7 of Lemma A.7, combined with Lemma A.5, implies that the following two approximations hold on average over anticommuting ω:

rX VARIABLE ,ω X (uX) IB IA M 1 , (174) ⊗ ≈ε ⊗ rZ VARIABLE ,ω Z (uZ) IB IA M 5 , (175) ⊗ ≈ε ⊗ VARIABLE ,ω VARIABLE ,ω where for j 1, 5 , MVARIABLEj,ω = M j M j . ∈ { } 0 − 1 For all anticommuting ω, let ε = 1 Λ , where Λ is defined in item 6 of Lemma A.7. From ω − ω ω item 6 of Lemma A.7 we get that E anticomm. ε O(ε). Theorem 7.10 combined with Jensen’s inequality ω ω ≤ implies that on average over anticommuting ω,

VARIABLE ,ω VARIABLE ,ω VARIABLE ,ω VARIABLE ,ω IA M 1 M 5 IA M 5 M 1 . (176) ⊗ ≈√ε − ⊗ 169 Using 5.19 twice we get from (174) and (175) that on average over anticommuting ω,

rX rZ VARIABLE ,ω VARIABLE ,ω X (uX)Z (uZ) IB IA M 1 M 5 , (177) ⊗ ≈ε ⊗ rX rZ VARIABLE ,ω VARIABLE ,ω Z (uX)X (uZ) IB IA M 5 M 1 . (178) ⊗ ≈ε ⊗ Using (176) we get that on average over anticommuting ω, we have

rX rZ rZ rX X (uX)Z (uZ) IB Z (uZ)X (uX) IB . ⊗ ≈√ε − ⊗ This shows the “anticommuting” part of (170).

A.3 Expanding the Hilbert space and defining commuting observables

q M Expanded state. We introduce registers A′, A′′, B′, B′′ that are each isomorphic to (C )⊗ . Define the state M M ψˆ AA A BB B =( ψ )AB EPRq ⊗ EPRq ⊗ , (179) | i ′ ′′ ′ ′′ | i ⊗ | iA′ A′′ ⊗ | iB′ B′′ where we have added maximally entangled states in registers A′A′′ and B′B′′; recall also that the state ψ is already implicitly appended with sufficiently many ancilla qubits initialized in the 0 state. For the remainder| i of the Appendix all approximations will be evaluated with respect to this state. | i

Expanded observables. For all W X, Z , r F , and u Fm define the following observable ∈ { } ∈ q ∈ q Wˆ r(u): Wˆ r(u)= Wr(u) τW (r ind (u)) , (180) ⊗ · m r W where W (u) is the observable defined in Equation (166) and τ (r indm(u)) is the generalized Pauli q M · observable (see Section 3.7) acting on (C )⊗ . For a F define corresponding projections ∈ q (POINT,W),u tr(ar) r Mˆ a = E ( 1) Wˆ (u) . r F − ∈ q (POINT,W),u Note that Mˆ a can be equivalently written as a sum of projectors by applying Fourier identities. Specifically, for W X, Z , u Fm, and a F , define the projector ∈ { } ∈ q ∈ q τW,u = τW = τW . (181) a [g (u)=a] ∑ [h indm(u)=a] · h FM · ∈ q (POINT,W),u Then we can write Mˆ a as follows:

(POINT,W),u tr(ar) r W Mˆ a = E ( 1) W (u) τ (r indm(u)) r F − ⊗ · ∈ q (POINT ) = E ( 1)tr(ar) ∑ ( 1)tr(a′r) M ,W ,u ∑ ( 1)tr(a′′r)τW,u F a′ a′′ r q − a F − ⊗ a F − ∈  ′∈ q   ′′∈ q  (POINT ) = ∑ E ( 1)tr((a+a′+a′′)r) M ,W ,u τW,u F a′ a′′ a ,a F r q − ⊗ ′ ′′∈ q ∈ (POINT ) = ∑ M ,W ,u τW,u . a′ a′′ a ,a F ⊗ ′ ′′∈ q a′+a′′=a

170 For a fixed u Fm, r F and b F , define a projection ∈ q ∈ q ∈ 2 ˆ (POINT,W),u,r ˆ (POINT,W),u Mb = ∑ Ma , (182) a F :tr(ar)=b ∈ q which can be equivalently written as

(POINT ) 1 Mˆ ,W ,u,r = I +( 1)bWˆ r(u) . (183) b 2 −   ˆ r ˆ (POINT,W),u ˆ (POINT,W),u,r The registers that the operators W (u), Ma , and Mb act on depend on context. For example if Wr(u) is viewed as an operator acting on register A, then we will view Wˆ r(u) as acting on W registers AA′ (with the τ observable acting on register A′). We generally indicate via subscripts which registers the operators act on (for example we will write ˆ r or ˆ r ). (W (u))AA′ (W (u))BA′′ Lemma A.9. There exists a function δ(ε)= poly(ε) such that the following holds.

1. (Self-consistency) For all W X, Z , on average over r F and u Fm, we have ∈ { } ∈ q ∈ q (POINT ) (POINT ) (Mˆ ,W ,u) (Mˆ ,W ,u) . a AA′ ≈ε a BA′′

2. (Approximate commutation) On average over uniformly random ω =(uX, uZ, rX, rZ), we have

(POINT,X),uX,rX (POINT,Z),uZ,rZ (POINT,Z),uZ,rZ (POINT,X),uX,rX Mˆ Mˆ δ Mˆ Mˆ . b b′ ≈ b′ b

Furthermore, all the approximations above hold when the registers A, A′, A′′ are exchanged with B, B′, B′′ respectively.

(POINT,W),u Proof. Item 1, the self-consistency of Mˆ a , follows from the fact that the original points mea- (POINT,W),u { } surements Ma are approximately self-consistent between registers A and B in ψˆ (item 1 in { } W | i Lemma A.7), the Pauli projections τ (ind (u)) are perfectly self-consistent on between registers A′ { a m } and A′′, and Fact 5.24. We now establish the approximate commutation relations. On average over uniformly random ω = (uX, uZ, rX, rZ), we have

(POINT X) (POINT Z) Mˆ , ,uX,rX Mˆ , ,uZ,rZ b b′

1 b rX b′ rZ = I +( 1) Xˆ (uX) I +( 1) Zˆ (uZ) 4 − −    1 b rX b′ rZ b+b′ rX rZ = I +( 1) Xˆ (uX)+( 1) Zˆ (uZ)+( 1) Xˆ (uX)Zˆ (uZ) , 4 − − −   where the first equality follows from (183). Note that

rX rZ Xˆ (uX)Zˆ (uZ)

rX rZ X Z = X (uX)Z (uZ) τ (rX uX)τ (rZ uZ) ⊗ · · rZ rX Z X Z (uZ)X (uX) τ (rZuZ)τ (rX uX) ≈√ε ⊗ · rZ rX = Zˆ (uZ)Xˆ (uX) ,

171 where the first equality follows from (180) and for the second line we used the approximate (anti-)commutation r r relation (170) for the X X (uX) and Z Z (uZ) observables, as well as the exact (anti-)commutation relations Z X for the τ (rZ uZ) and τ (rX uX) observables. This shows the desired approximate commutation rela- tion. · ·

Lemma A.10. There is exists a function δ(ε)= poly(ε) such that the following holds. For all W X, Z (LINE,W),ℓ ∈ { } and lines ℓ there exists a projective measurement Mˆ ℓ , acting on registers AA′ or BA′′, f f degmd( ) such that: { } ∈

1. (Self-consistency) For all W X, Z and r Fq, on average over a uniformly random pair (ℓ, u) drawn from the line-point distribution∈ { } (Definition∈ 7.6),

(LINE ) ℓ (LINE ) ℓ (Mˆ ,W , ) (Mˆ ,W , ) . f AA′ ≈δ f BA′′

2. (Consistency with points measurements I) On average over a uniformly random pair (ℓ, u) drawn from the line-point distribution,

(LINE ) ℓ (LINE ) ℓ (POINT ) (Mˆ ,W , ) (Mˆ ,W , ) (Mˆ ,W ,u) . f AA′ ≈δ f AA′ ⊗ f (u) BA′′

3. (Consistency with points measurements II) On average over a uniformly random pair (ℓ, u) drawn from the line-point distribution,

ℓ ( ˆ (LINE,W), ) ( ˆ (POINT,W),u) M[eval ( )=a] AA′ δ Ma BA′′ . u · ≈

Furthermore, all the approximations above hold when the registers A, A′, A′′ are exchanged with B, B′, B′′ respectively.

Proof. Define the operator

(LINE ) ℓ (LINE ) ℓ ℓ Mˆ ,W , = ∑ M ,W , τW, , f f ′ f ′′ f , f deg (ℓ): ⊗ ′ ′′∈ md f ′+ f ′′= f

W,ℓ where the Pauli operator τ acting on register A′ is obtained by measuring all M qudits in the basis W to f ′′ obtain an outcome h FM and then returning the restriction f of g : Fm F to the line ℓ. Here, g is ∈ q ′′ h q → q h the degree-d low-degree extension of h defined in (12). LINE ℓ Item 1, the self-consistency of Mˆ ( ,W), , follows from the fact that the original lines measure- { f } (LINE ) ℓ ment M ,W , is self-consistent between registers A and B in ψˆ (Item 1 in Lemma A.7), the Pauli { f } | i W,ℓ projections τ are perfectly self-consistent on between registers A′ and A′′, and Fact 5.24. { f ′′ } Item 2, the consistency with the points measurements, follows because on average over (ℓ, u) drawn

172 from the line-point distribution we have ℓ ˆ (LINE,W), (184) (M f )AA′ (LINE ) ℓ ℓ = ∑ (M ,W , τW, ) (185) f ′ f ′′ AA′ f , f deg (ℓ): ⊗ ′ ′′∈ md f ′+ f ′′= f (LINE,W),ℓ W,ℓ (POINT,W),u W,u ε ∑ (M τ )AA (M τ )BA (186) f ′ f ′′ ′ f ′(u) f ′′(u) ′′ ≈ f , f deg (ℓ): ⊗ ⊗ ⊗ ′ ′′∈ md f ′+ f ′′= f (LINE,W),ℓ W,ℓ (POINT,W),u W,u = ∑ (M τ )AA ∑ M τ (187) f ′ f ′′ ′ a′ a′′ BA f , f deg (ℓ): ⊗ ⊗ a ,a : ⊗ ′′ ′ ′′∈ md  ′ ′′  f ′+ f ′′= f a′+a′′= f ′(u)+ f ′′(u) (LINE ) ℓ (POINT ) =(Mˆ ,W , ) (Mˆ ,W ,u) . (188) f AA′ ⊗ f (u) BA′′ The first approximation follows from the consistency between the line and point measurements (Item 2 in ℓ Lemma A.7) as well as the (exact) consistency between the τW, and τW,u measurements. The third { f ′′ } { f ′′(u)} line follows because the consistency of the Pauli measurements force a′ = f ′(u) and a′′ = f ′′(u) . This establishes Item 2. To show Item 3 we first apply Fact 5.24 to Item 2 to obtain ℓ ℓ ( ˆ (LINE,W), ) ( ˆ (LINE,W), ) ( ˆ (POINT,W),u) (189) M[eval ( )=a] AA′ δ M[eval ( )=a] AA′ Ma BA′′ . u · ≈ u · ⊗ (LINE ) ℓ (POINT ) Since both the Mˆ ,W , and Mˆ ,W ,u measurements are projective, we also get { f } { a } LINE ℓ I = ∑(Mˆ ( ,W), ) [evalu( )=a] AA′ a · (LINE,W),ℓ (LINE,W),ℓ (POINT,W),u = ∑ (Mˆ ) (Mˆ ) (Mˆ a ) [evalu( )=a] AA′ [evalu( )=a] AA′ BA′′ a · − · ⊗   (LINE,W),ℓ (POINT,W),u + (Mˆ ) (Mˆ a ) . (190) ∑ [evalu( )=a] AA′ BA′′ a · ⊗ For projective sub-measurements A and B and a state ϕ , { i} { i} | i ∑ ϕ Ai Bi ϕ = ∑ ϕ Ai(Ai Bi) ϕ + ∑ ϕ (Ai Bi)Bi ϕ i h | − | i i h | − | i i h | − | i  1/2 1/2 2 2 ∑ ϕ A ϕ ∑ ϕ (Ai Bi) ϕ ≤ h | i | i h | − | i  i   i  1/2 1/2 + ∑ ϕ (A B )2 ϕ ∑ ϕ B2 ϕ h | i − i | i h | i | i  i   i  1/2 2 2 ∑ ϕ (Ai Bi) ϕ . ≤ h | − | i  i  (LINE,W),ℓ (LINE,W),ℓ Applying this fact to the projective sub-measurements (Mˆ )AA a and (Mˆ )AA { [evalu( )=a] ′ } { [evalu( )=a] ′ ⊗ (POINT ) · · (Mˆ ,W ,u) in (190) and using (189) gives a BA′′ } (LINE,W),ℓ (POINT,W),u I (Mˆ ) (Mˆ a ) . (191) √δ ∑ [evalu( )=a] AA′ BA′′ ≈ a · ⊗

173 To go from (191) to Item 3 of the lemma we use another fact about projective measurements. For families of projective measurements Ax and Bx acting on separate registers of a bipartite state φ , suppose { i } { i } | i I ∑ Ax Bx. Then ≈δ i i ⊗ i 2 x x δ E I ∑ Ai Bi φ ≥ x − ⊗ | i  i  2 x x x x = E 1 + φ ∑ Ai Bi φ 2 ∑ φ Ai Bi φ x h | ⊗ | i− h | ⊗ | i   i  i  x x = 1 E ∑ φ Ai Bi φ , − x i h | ⊗ | i where we used the projectivity in going from the second to the third line, to remove the square. Thus

x x E ∑ φ Ai Bj φ δ , x i=jh | ⊗ | i≤ 6 x x x x and Ai I δ I Bi . By Item 1 of Fact 5.18, this in turn implies that Ai I δ I Bi . Applying this ⊗ ≃ ⊗ ℓ ⊗ ≈ ⊗ fact to the projective measurements ( ˆ (LINE,W), ) and ( ˆ (POINT,W),u) in (191) gives M[eval ( )=a] AA′ a Ma BA′′ { u · } { }

LINE ℓ POINT (Mˆ ( ,W), ) (Mˆ ( ,W),u) [eval ( )=a] AA′ δ a BA′′ u · ≈ as desired.

A.4 Combining the X and Z measurements In this section our goal is to create a set of combined measurements that simultaneously measure the approximately-commuting measurements constructed in the previous section. We do this first for the points. Recall that A and B denote the Hilbert spaces corresponding to registers A and B, respectively. H H Lemma A.11. There is exists a function δQ(ε) = poly(ε) such that the following holds. For every x, z Fm ˆ x,z Cq M ∈ q and A, B there are projective measurements Q a,b F acting on ( )⊗ such that: H∈{H H } { a,b } ∈ q H⊗ 1. (Self-consistency) On average over uniformly random x, z Fm, ∈ q (Qˆ x,z) (Qˆ x,z) . (192) a,b AA′ ≈δQ a,b BA′′

2. (Consistency with Mˆ ) On average over uniformly random x, z Fm, ∈ q

(POINT ) (POINT ) (Qˆ x,z) (Mˆ ,X ,x Mˆ ,Z ,z) , (193) a,b AA′ ≈δQ a b BA′′ (POINT ) (POINT ) (Qˆ x,z) (Mˆ ,Z ,z Mˆ ,X ,x) . (194) a,b AA′ ≈δQ b a BA′′

Furthermore, all the approximations above hold when the registers A, A′, A′′ are exchanged with B, B′, B′′ respectively.

174 The construction of the measurements Qˆ x,z whose existence is guaranteed in the lemma proceeds { a,b } in three steps. In the first step, for every (r, s) F2 we combine the binary-outcome measurements ∈ q ˆ (POINT,X),x,r ˆ (POINT,X),z,s Ma and Mb ; this is made possible by their approximate commutation as shown in Lemma A.9. In a second step, we show that the measurements thus constructed have a particular “ap- proximate linearity” property, when seen as a function of the pair (r, s). This allows us, in the third step, to combine them into a single measurement ˆ x,z with outcomes in F2 F2t by applying the following Qa,b q ∼= 2 result from [NV17]: { }

Theorem A.12 (Quantum linearity test). Let 0 δ 1, let t be a positive integer, and let ψ be a state in u ≤ ≤ Ft | i a Hilbert space A B. Let be a set of observables indexed by u 2 that act on A. Suppose H ⊗H {O } Ft ∈ H that on average over u, u′ chosen uniformly from 2, we have the following “approximate linearity” holds:

u u′ u+u′ ( )A ( )A O O ≈δ O where the statement is taken with respect to the state . Then there exists an extended state ψ ψ′ AA′ B = ≈ in an extended space | andi observables u acting on | i such ψ AB 0 A′ A B A′ A A′ | i ⊗ | i Ft H ⊗H ⊗H {L } H ⊗H that for all u, u′ 2, ∈ u u u+u u u ′ = ′ and ( ) ( )A . L L L L AA′ ≈δ O Furthermore, the observable u = I when u = 0. L We now give the proof of Lemma A.11.

Proof of Lemma A.11. For convenience in the proof we slightly abuse the notation to mean for ≈δ ≈poly(ε) some polynomial function δ = poly(ε) that may differ each time the notation is used. First step: construction of projective binary measurements Rˆ ω . For every ω = (x, z, r, s) and { a,b} a, b F , define a measurement operator ∈ 2 (POINT Z) (POINT X) (POINT Z) Rˆ ω = Mˆ , ,z,s Mˆ , ,x,r Mˆ , ,z,s . a,b b · a · b ω It is clear that Rˆ a,b F is a POVM. Observe that on average over uniformly random ω, { a,b} ∈ 2

(POINT Z) (POINT X) Rˆ ω Mˆ , ,z,s Mˆ , ,x,r , (195) a,b ≈√ε b · a

(POINT Z) which follows from the approximate commutation relation of Lemma A.9 and the projectivity of Mˆ , ,z,s , { b } together with Fact 5.19. Next we show that Rˆ ω is approximately self-consistent: we aim to show that on { a,b} average over ω, it holds that (Rˆ ω ) (Rˆ ω ) for some polynomial function δ = poly(ε). To show a,b AA′ ≃δ a,b BA′′

175 this, we perform the following calculation: (recall the state ψˆ defined in (179)) | i E ˆ ˆ ω ˆ ω ˆ ∑ ψ (Ra,b)AA (Ra,b)BA ψ ω ′ ′′ a,b h | ⊗ | i E ˆ ˆ (POINT,Z),z,s ˆ (POINT,X),x,r ˆ ω ˆ ε1/4 ∑ ψ (M Ma )AA (Ra,b)BA ψ (196) ω b ′ ′′ ≈ a,b h | · ⊗ | i ˆ ˆ (POINT,Z),z,s ˆ (POINT,X),x,r ˆ (POINT,Z),z,s ˆ (POINT,X),x,r ˆ ε1/4 E ∑ ψ (M Ma )AA (M Ma )BA ψ (197) ω b ′ b ′′ ≈ a,b h | · ⊗ · | i ˆ ˆ (POINT,Z),z,s ˆ (POINT,Z),z,s ˆ (POINT,X),x,r ˆ √ε E ∑ ψ (M )AA (M Ma )BA ψ (198) ω b ′ b ′′ ≈ a,b h | ⊗ · | i (POINT,Z),z,s (POINT,Z),z,s = E ∑ ψˆ (Mˆ )AA (Mˆ )BA ψˆ ω b ′ b ′′ b h | ⊗ | i ˆ ˆ (POINT,Z),z,s ˆ √ε E ∑ ψ (M )AA ψ (199) ω b ′ ≈ b h | | i = 1. The approximation in (196) is derived by bounding the magnitude of the difference:

E ˆ ˆ ω ˆ (POINT,Z),z,s ˆ (POINT,X),x,r ˆ ω ˆ ∑ ψ (Ra,b Mb Ma )AA′ (Ra,b)BA′′ ψ ω h | − · ⊗ | i a,b

ω (POINT,Z),z,s (POINT,X),x,r 2 ω 2 E ∑ ψˆ (Rˆ Mˆ Mˆ a ) ψˆ E ∑ ψ (Rˆ ) ψ ω a,b b AA′ ω a,b BA′′ ≤ a b h | − · | i · a b h | | i s , r , √ε1/2 √1, ≤ · where the second line follows from Cauchy-Schwarz, and the third line follows from (195) and the fact ˆ ω 2 that ∑a,b(Ra,b) I. The approximation in (197) follows from a similar calculation. The approximation in (198) is derived≤ by bounding the magnitude of the difference:

E ˆ (POINT,Z),z,s ˆ (POINT,X),x,r ˆ (POINT,Z),z,s ˆ (POINT,X),x,r ∑ ψ (Mb (I Ma ))AA′ (Mb Ma )BA′′ ψ ω h | · − ⊗ · | i a,b

OINT OINT OINT OINT E ˆ (P ,Z),z,s ˆ (P ,Z),z,s ˆ (P ,X),x,r ˆ (P ,X),x,r = ψ ∑ Mb Mb ∑(I Ma ) Ma ψ ω h | ⊗ · a − ⊗ | i  b   

(POINT,X),x,r 2 (POINT,X),x,r 2 E ∑ ψ (I Mˆ a ) (Mˆ a ) ψ ≤ ω h | − AA′ ⊗ BA′′ | i r a √ε . ≤ (POINT,X),x,r Here the first inequality follows from the Cauchy-Schwarz inequality, the projectivity of Mˆ a , ˆ (POINT,Z),z,s ˆ (POINT,Z),z,s and the fact that ∑b Mb Mb I. The second inequality follows from the self- POINT X ⊗ ≤ POINT X consistency of Mˆ ( , ),x,r, established in Lemma A.9: combined with Fact 5.19, (Mˆ ( , ),x,r) a a AA′′ ≈ε ˆ (POINT,X),x,r implies ˆ (POINT,X),x,r ˆ (POINT,X),x,r ˆ (POINT,X),x,r . The ap- (Ma )BA′′ (Ma )AA′′ (Ma )BA′′ ε (Ma )BA′′ proximation in (199) follows from a similar calculation.⊗ ≈ This shows that (Rˆ ω ) (Rˆ ω ) , as desired. Combined with Fact 5.18, we have a,b AA′ ≃δ a,b BA′′ (Rˆ ω ) (Rˆ ω ) . (200) a,b AA′ ≈δ a,b BA′′ 176 Finally we apply the orthonormalization Lemma (Lemma A.6) to Rˆ ω in order to obtain projective { a,b} measurements Lˆ ω such that on average over ω, { a,b} Lˆ ω Rˆ ω . (201) a,b ≈δ a,b ˆ ω Second step: approximate linearity relations for Ra,b . We proceed in three steps. First we establish (POINT ) { } linearity relations for Mˆ ,W ,u,r . For all W X, Z , u Fm, r, r F and c F , { b } ∈ { } ∈ q ′ ∈ q ∈ 2

(POINT,W),u,r (POINT,W),u,r′ (POINT,W),u (POINT,W),u ∑ Mˆ Mˆ = ∑ Mˆ a Mˆ b b′ a′ b,b F : a,a F : · ′∈ 2 ′∈ q b+b′=c tr(ar+a′r′)=c (POINT,W),u = ∑ Mˆ a a:tr(a(r+r′))=c

(POINT,W),u,r+r′ = Mˆ c , (202)

(POINT ) where the second line follows from projectivity of Mˆ ,W ,u . { a } Second we show approximate linearity of the Rˆ ω . On average over x, z Fm and r, r , s, s F , { a,b} ∈ q ′ ′ ∈ q ∑ Rˆ x,z,r,s Rˆ x,z,r′,s′ a,b a′,b′ · AA′ a,b,a′,b′:   a+a′=c b+b′=d

x,z,r,s x,z,r′,s′ δ ∑ Rˆ Rˆ (203) a,b a′,b′ ≈ AA′ ⊗ BA′′ a,b,a′,b′:     a+a′=c b+b′=d

(POINT,Z),z,s (POINT,X),x,r (POINT,Z),z,s′ (POINT,X),x,r′ δ ∑ Mˆ Mˆ a Mˆ Mˆ (204) b b′ a′ ≈ · AA′ ⊗ · BA′′ a,b,a′,b′:     a+a′=c b+b′=d

(POINT,Z),z,s (POINT,Z),z,s′ (POINT,X),x,r+r′ δ ∑ Mˆ Mˆ Mˆ c (205) b b′ ≈ AA′ ⊗ · BA′′ b,b′:     b+b′=d

(POINT,Z),z,s (POINT,X),x,r+r′ (POINT,Z),z,s′ δ ∑ Mˆ Mˆ c Mˆ . (206) b b′ ≈ AA′ ⊗ · BA′′ b,b′:     b+b′=d

In each approximation, the answer summation is over c, d F2. Line (203) follows from the self- ˆ ω ∈ consistency of Ra,b shown in (200). Line (204) follows from using (195) twice. Line (205) follows from { } (POINT,X),x,r the self-consistency and linearity properties of Mˆ a shown in Lemma A.9 and (202) respectively. Line (206) follows from the approximate commutativity{ relation} shown in Lemma A.9. Continuing,

ˆ (POINT,X),x,r+r′ ˆ (POINT,Z),z,s+s′ (206) δ Mc Md (207) ≈ · BA′′   ˆ x,z,r+r′,s+s′ δ Rc,d (208) ≈ BA′′   ˆ x,z,r+r′,s+s′ δ Rc,d . (209) ≈ AA′   177 (POINT ) Line (207) follows from self-consistency and linearity properties of Mˆ ,Z ,z,s . Line (208) follows { b } from (195), and line (209) follows from the self-consistency of Rˆ ω from Eq. (200). { a,b} ˆ ω Fm F Third we deduce approximate linearity of the La,b . On average over x, z q , and r, r′, s, s′ q, we have { } ∈ ∈

x,z,r,s x,z,r′,s′ x,z,r,s x,z,r′,s′ ∑ Lˆ Lˆ δ ∑ Lˆ Lˆ a,b a′,b′ a,b a′,b′ AA′ ≈ AA′ ⊗ BA′′ a,b,a′,b′:   a,b,a′,b′:     a+a′=c a+a′=c b+b′=d b+b′=d

x,z,r,s x,z,r′,s′ δ ∑ Rˆ Rˆ a,b a′,b′ ≈ AA′ ⊗ BA′′ a,b,a′,b′:     a+a′=c b+b′=d ˆ x,z,r+r′,s+s′ δ Rc,d ≈ AA′   ˆ x,z,r+r′,s+s′ δ Lc,d . (210) ≈ AA′   The first two approximations follow from the closeness of the Rˆ ω and Lˆ ω guaranteed by (201) as { a,b} { a,b} well as the self-consistency of Rˆ ω shown in (200). The third approximation follows from the linearity { a,b} ˆ ω properties of Ra,b in (206). { } x,z ω a+b ω Third step: construction of ˆ . For each define an observable = ∑ F ( ) ˆ . The Qa,b ω a,b 2 1 La,b { } O ∈ − ω ˆ ω self-consistency of between registers AA′ and BA′′ follows from the self-consistency of the La,b , which is obtained byO combining (200) and (201). We now verify approximate linearity. On average{ over} x, z Fm, and r, r , s, s F , ∈ q ′ ′ ∈ q x,z,r,s x,z,r′,s′ = ∑ ( 1)a+b+a′+b′ Lˆ x,z,r,s Lˆ x,z,r′,s′ O O − a,b a′,b′ a,b,a′,b′

c,c′ x,z,r+r′,s+s′ δ ∑( 1) Lˆ ≈ − c,c′ c,c′ = x,z,r+r′,s+s′ O ˆ ω where the approximation follows from the approximate linearity properties of La,b shown in (210). Iden- F Ft { F}2t Fm tify the field q with the vector space 2, and thus we treat pairs (r, s) as vectors in 2 . For every x, z q , x,z,r,s ∈ we can apply Theorem A.12 to the collection of observables r,s F to obtain the existence of “ex- {O } ∈ q actly linear” observables x,z,r,s that are δ-close to x,z,r,s , on average over the choice of x, z, r, s. By {L } {O } assuming that ψˆ has sufficiently many ancilla zero qubits, we do not need to extend the state further. | i x,z We are ready to define the measurements Qˆ whose existence is claimed in the statement of the { a,b } lemma. For all x, z Fm and a, b F , define ∈ q ∈ q Qˆ x,z = E( 1)tr(ra+sb) x,z,r,s . a,b r,s − L This operator is a projection (and thus positive semidefinite):

ˆ x,z 2 E tr((r+r′)a+(s+s′)b) x,z,r,s x,z,r,s (Qa,b ) = ( 1) r,s,r′,s′ − L L = E ( 1)tr((r+r′)a+(s+s′)b) x,z,r+r′,s+s′ r,s,r′,s′ − L = E( 1)tr(ra+sb) x,z,r,s . r,s − L

178 The second equality follows from the exact linearity of the x,z,r,s . Next, the operators sum to identity: {L } ∑ Qˆ x,z = E ∑( 1)tr(ra+sb) x,z,r,s a,b r,s a,b a,b − L = x,z,0,0 L = I .

ˆ x,z Thus Qa,b forms a projective measurement. We{ establish} self-consistency:

ˆ x,z E tr(ra+sb) x,z,r,s Qa,b = ( 1) AA′ r,s − L AA′     tr(ra+sb) x,z,r,s δ E( 1) ≈ r,s − O AA′   tr(ra+sb) x,z,r,s δ E( 1) ≈ r,s − O BA′′   ˆ x,z δ Qa,b . ≈ BA′′   The first approximation is due to Lemma A.4. The second approximation is due to Lemma A.4 and the self-consistency of the x,z,r,s observables. The third approximation follows from Lemma A.4 again. {O } Finally we show consistency with the (POINT,W),u . {Ma } ˆ x,z E tr(ra+sb) x,z,r,s Qa,b δ ( 1) AA′ ≈ r,s − O BA′′     E tr(ra+sb)+c+d ˆ x,z,r,s = ∑( 1) Lc,d r,s − BA′′  c,d  E tr(ra+sb)+c+d ˆ x,z,r,s δ ∑( 1) Rc,d ≈ r,s − BA′′  c,d  E tr(ra+sb)+c+d ˆ (POINT,Z),z,s ˆ (POINT,X),x,r δ ∑( 1) Md Mc ≈ r,s − BA′′  c,d  E tr(sb)+d ˆ (POINT,Z),z,s E tr(ra)+c ˆ (POINT,X),x,r = ∑( 1) Md ∑( 1) Mc s − · r c − BA′′  d    ˆ (POINT,Z),z ˆ (POINT,X),x = Md Mc . · BA′′   The second line follows from the definition of x,z,r,s . The third line follows from the closeness of Lˆ ω {O } { c,d} and Rˆ ω . The fourth line follows from (195). { c,d} An analogous derivation shows the same properties of Qˆ x,z with the tensor factors switched to BB { a,b } ′ and AB′′, respectively. This concludes the proof of the lemma.

Lemma A.13. There exists a function δP(ε, m, d, q) = poly(ε, md/q) such that the following holds. For ℓX,ℓZ q M A, B and for every pair of lines ℓX and ℓZ there exists a POVM T acting on (C )⊗ H∈{H H } { fX, fZ } H⊗ with answers consisting of pairs of polynomials f , f in deg (ℓ ) deg (ℓ ) such that on average X Z md X × md Z

179 over pairs (ℓX, x) and (ℓZ, z) independently sampled from the line-point distribution,

ℓ ℓ ( X, Z ) ( ˆ x,z) T[eval ( )=a,eval ( )=b] AA′ δP Qa,b BA′′ x · z · ≃ ℓ ℓ (T X, Z ) (Qˆ x,z) [eval ( )=a,eval ( )=b] BB′ δP a,b AB′′ . x · z · ≃

(LINE ) ℓ Proof. We construct the required measurements by combining the measurements Mˆ ,W , W for W fW ∈ X, Z . For convenience in the proof we slightly abuse the notation to mean for some polyno- { } ≈δ ≈poly(ε) mial function δ = poly(ε) that may differ each time the notation is used. (LINE ) ℓ We first argue that the measurements Qˆ x,z obtained in Lemma A.11 are consistent with the Mˆ ,W , W . { a,b } { fW } Applying Fact 5.19 to (194), we get that on average over uniformly random x, z,

ˆ x,z ˆ (POINT,Z),z ˆ (POINT,Z),z ˆ x,z Qa,b Mb = Mb Qa,b AA′ · BA′′ BA′′ · AA′      ˆ (POINT,Z),z ˆ (POINT ,X),x δ Mb Ma ≈ BA′′  ˆ x,z  δ Qa,b . ≈ AA′   Since ˆ x,z is a projective measurement, applying Lemma 5.21 with ˆ x,z playing the role of x and Qa,b (Qa,b )A,A′ Aa { } POINT Z (Qˆ x,z) (Mˆ ( , ),z) playing the role of Bx, we obtain that a,b A,A′ · b BA′′ a ˆ x,z ˆ x,z ˆ (POINT,Z),z Qa δ ∑ Qa,b Mb . (211) AA′ ≈ AA′ · BA′′   b     We now show that on average over uniformly random x, z,

ˆ x,z ˆ (POINT,X),x Qa δQ Ma . (212) AA′ ≈ BA′′     This follows from (211) and the following calculation:

2 E ˆ x,z ˆ (POINT,Z),z ˆ (POINT,X),x ˆ ∑ ∑ Qa,b Mb Ma ψ x,z a AA′ · BA′′ − BA′′ | i b       2

E ˆ (POINT,Z),z ˆ x,z ˆ (POINT,X),x ˆ = ∑ ∑ Mb Qa,b Ma ψ x,z a BA′′ · AA′ − BA′′ | i b        2 E ˆ (POINT,Z),z ˆ x,z ˆ (POINT,X),x ˆ = ∑ Mb Qa,b Ma ψ x,z BA′′ · AA′ − BA′′ | i a,b       

= O(δ) ,

ˆ (POINT,Z),z where the second line uses the property ∑b Mb = I and the third line uses the projectivity of POINT Z Mˆ ( , ),z . This shows (212). A similar argument shows that { b } ˆ x,z ˆ (POINT,Z),z (213) Qb δQ Mb . AA′ ≈ BA′′    

180 Next, from Lemma A.10 we get that on average over (ℓX, x) and (ℓZ, z) sampled independently from the line-point distribution (Definition 7.6),

ℓ ( ˆ (LINE,X), X ) ( ˆ (POINT,X),x) (214) M[eval ( )=a] BA′′ δ Ma BA′′ , x · ≈ LINE ℓ POINT Z (Mˆ ( ,Z), Z ) (Mˆ ( , ),z) (215) [eval ( )=b] BA′′ δ b BA′′ . z · ≈ where we used both the self-consistency of the line measurements (item 1 in Lemma A.10) as well as their ˆ (LINE,W),ℓ ˆ x,z consistency with the points measurements (item 3). Since the M f and Qa,b are projective, together with (212) and (213) by Fact 5.18 we get { } { }

x,z (LINE,X),ℓX x,z (LINE,Z),ℓZ Qˆ δ Mˆ and Qˆ δ Mˆ . (216) a Q [evalx( )=a] b Q [evalz( )=b] AA′ ≃ · BA′′ AA′ ≃ · BA′′         For all lines ℓX, ℓZ and ( fX, fZ) deg (ℓX) deg (ℓZ) define the operator ∈ md × md ℓ ℓ (LINE X) ℓ (LINE Z) ℓ (LINE X) ℓ T X, Z = Mˆ , , X Mˆ , , Z Mˆ , , X . fX, fZ fX · fZ · fX ℓ ℓ Then the collection T X, Z forms a valid POVM. Using the following choices of measurements, { fX, fZ }

(LINE X) ℓ (LINE Z) ℓ ℓ ℓ “Ax,y1,y2 ”: Qˆ x,z , “(G )x”: Mˆ , , X , “(G )x”: Mˆ , , Z , “Jx ”: T X, Z , a1,a2 a,b 1 g fX 2 g fZ g1,g2 fX, fZ Lemma 5.27 implies that ℓ ℓ Qˆ x,z T X, Z , (217) a,b δP [evalx( )=a,evalz( )=b] AA′ ≃ · · BA′′     where δP = poly(ε, md/q) is the δpasting as given by Lemma 5.27, conditions (40) and (41) in the lemma are given by (216) and (192) respectively, and the distribution over (ℓX, ℓZ) are two independently and uniformly chosen lines. This is the exactly first consistency relation in the conclusion of the Lemma. An identical argument with the appropriate modifications to the registers yields the second consistency relation from the Lemma as well.

A.5 Applying the classical low-degree test In the previous section we showed how to combine the approximately commuting X and Z basis point and line measurements into a joint point measurement and a joint line measurement. In this section we show how these measurements can be combined in a single measurement that returns a pair of global polynomials encoding the X-basis and Z-basis information, respectively, by applying the lemma that states quantum soundness of the classical low-degree test, Lemma 7.8. Introduce a “combining” map that takes two polynomials f , g ideg (Fm) and returns a new poly- ∈ d,m q nomial combine ideg (F2m+2) defined by f ,g ∈ d,2m+2 q

combine f ,g( x , z , α , β )= α f (x)+ βg(z) . Fm Fm F q q q F ∈ ∈ ∈ ∈ q |{z} |{z} |{z} |{z} ℓ F2m+2 This combining map can also be defined for polynomials restricted to lines. Let be a line in q with ℓ ℓ Fm intercept u = (uX, uZ, uα, uβ) and slope v = (vX, vZ, vα, vβ). Let X and Z be lines in q such that for

181 any point (x, z, α, β) ℓ, it holds that x ℓ and z ℓ . Then for any two polynomials f deg (ℓ ) ∈ ∈ X ∈ Z ∈ dm X and g deg (ℓ ), we define combine deg (ℓ) by ∈ dm Z f ,g ∈ dm+1 combine (u + t v)=(u + t v ) f (u + t v )+(u + t v )g(u + t v ) . (218) f ,g · α · α X · X β · β Z · Z In the following two lemma we define combined point and line measurements according to this definition of combine f ,g.

m Lemma A.14. For all x, z F and α, β F and A, B there exists a projective measurement ∈ q ∈ q H∈{H H } x,z,α,β q M Qˆ c c F acting on (C )⊗ such that the following holds: { } ∈ q H⊗ 1. (Self-consistency) On average over uniformly random (x, z, α, β) F2m+2: ∈ q (Qˆ x,z,α,β) (Qˆ x,z,α,β) . (219) c AA′ ≈δQ c BA′′

2. (Consistency with Mˆ ) On average over uniformly random (x, z, α, β) F2m+2: ∈ q ˆ x,z,α,β (POINT,X),x (POINT,Z),z (Qc )AA δ ∑ Mˆ a Mˆ , (220) ′ Q b BA ≈ a,b: ′′  αa+βb=c  ˆ x,z,α,β (POINT,Z),z (POINT,X),x (Qc )AA δ ∑ Mˆ Mˆ a . (221) ′ Q b BA ≈ a,b: ′′  αa+βb=c 

Furthermore, all the approximations above hold when the registers A, A′, A′′ are exchanged with B, B′, B′′ respectively.

Proof. Define ˆ x,z,α,β ˆ x,z Qc = ∑ Qa,b. a,b: αa+βb=c

x,z,α,β The self-consistency and consistency properties of Qˆ c follow from the consistency properties of x,z { } Qˆ established by Lemma A.11, and Fact 5.24. { a,b } 2 Lemma A.15. There exists a function δcombine(ε, m, d, q) = poly(m ε, md/q) such that the following ℓ F2m+2 ˆ ℓ Cq M holds. For every line in q and A, B there exists a POVM Q f acting on ( )⊗ ℓ H ∈ {H H } { } ℓ H⊗ with outcomes f degmd+1( ) such that on average over a uniformly random pair ( , (x, z, α, β)) drawn ∈ F2m+2 from the line-point distribution over q ,

ˆ ℓ ˆ x,z,α,β (222) (Q[eval ( )=a])AA′ δcombine (Qa )BA′′ , (x,z,α,β) · ≃ ˆ ℓ ˆ x,z,α,β (223) (Q[eval ( )=a])BB′ δcombine (Qa )AB′′ , (x,z,α,β) · ≃ where the answer summation is over a F . ∈ q Before proving the lemma we give definitions regarding distributions over lines and points.

182 Definition A.16. For any i 1,..., m the i-th restricted axis parallel line distribution D is the ∈ { } ALINE,i restriction of DALINE to pairs (ℓ, u) where the line ℓ is along the direction ei (i.e. is of the form ℓ =(u0, ei). ℓ Similarly, the i-th restricted diagonal line distribution DDLINE,i is the restriction of DDLINE to pairs ( , u), where the line ℓ =(u0, s, v) is such that χ(s)= i, and coordinates 1,..., i 1 of v are 0. (Recall that χ is defined in (48).) −

Observe that DALINE is a uniform mixture over the distributions D INE for i 1,..., m , and AL ,i ∈ { } likewise for DDLINE. As a consequence, any approximation that holds on average over the line-point dis- tribution DLINE with error δ holds over each of the i-th restricted distributions DALINE,i and DDLINE,i with error 2mδ. Similarly, any approximation that holds over DLINE DLINE holds over any product of the i-th × restricted distributions with error 4m2δ. In particular, by Lemma A.13 for any t , t ALINE, DLINE 1 2 ∈ { } and i, j 1,..., m , we have ∈ { } ℓ ℓ E E ψˆ (T X, Z ) (I (Qˆ x,z )) ψˆ = O(m2δ ) (224) ∑ f , f AA′ f (x), f (z) P . (ℓ ,x) D (ℓ ,z) D h | X Z ⊗ − X Z | i X ∼ t1,i Z ∼ t2,j fX, fZ

Lemma A.17. There exists a distribution D over tuples (ℓ, ℓX, ℓZ) with the following properties: ℓ F2m+2 1. The marginal distribution over is the same as the marginal of the line-point distribution over q . 2. For any point of the form u =(u , u , α, β) lying on ℓ, it holds that u ℓ and u ℓ . Moreover, X Z X ∈ X Z ∈ Z for u chosen uniformly at random in ℓ the marginal distributions over (ℓX, uX) and over (ℓZ, uZ) can Fm be expressed as mixtures of the i-th restricted line-point distributions DALINE,i and DDLINE,i over q . Proof. We describe D by giving a procedure to sample from it. We start by sampling (ℓ, u) from the line- F2m+2 ℓ point distribution over q . Write u =(uX, uZ, α, β). There are two cases to consider: the case where is axis-parallel and the case where ℓ is diagonal. In each case, we sample lines ℓX and ℓZ such that the pairs (ℓX, uX) and (ℓZ, uZ) satisfy Property 2 of the lemma. ℓ ℓ ℓ 1. Suppose is an axis-parallel line (v, ej) where v =(vX, vZ, α0, β0). Then we sample X =(v′X, eiX ), Z = (v , e ) Fm 1,2,..., m in the following way. ′Z iZ ∈ q × { } (a) If 1 j m, then set iX = j and choose iZ to be uniformly random in 1,..., m . ≤ ≤ { } (b) If m + 1 j 2m, then set iX to be uniformly random in 1,..., m and iZ = j m. ≤ ≤ { } − (c) If 2m + 1 j 2m + 2, then set iX and iZ to both be uniformly random in 1,..., m . ≤ ≤ { } LN (d) Let v′ be the canonical representative L (vX), and likewise let v′ be the canonical represen- X eiX Z LN tative L (vZ) (see Definition 7.3 for the definition of the canonical representative of a line). eiZ ℓ ℓ Note that vX lies on the line X =(v′X, eiX ), and similarly for vZ and Z =(v′Z, eiZ ).

We claim that this construction satisfies Property 2 of the lemma. We show this first for ℓX and uX. If we are in case (a), then uX is a uniformly random point on the line (vX, ej), which is equal to ℓX as ℓ well. Thus, the marginal distribution over ( X, uX) conditioned on j in this case is equal to DALINE,j. If we are in case (b) or case (c), then uX = vX, and hence in these cases as well uX is a uniformly random point on the line ℓX. Thus, the marginal distribution over (ℓX, uX) conditioned in j in these

cases is equal to DALINE, i.e. the uniform mixture over D INE for i 1,..., m . An analogous AL ,i ∈ { } argument shows the same for ℓZ and uZ.

183 2. Suppose ℓ is a random diagonal line (v, s, w), where v =(vX, vZ, α, β), and where w =(wX, wZ, γ, δ) is a direction vector whose first nonzero coordinate is j = χ(s) (where χ is defined in Equation (48)). ℓ ℓ We sample X =(v′X, sX, w′X) and Z =(v′Z, sZ, w′Z) in the following manner:

(a) If 1 j m, then w = wX and w = wZ. We choose sX and sZ such that χ(sX) = j and ≤ ≤ ′X ′Z χ(sZ)= 1.

(b) If m + 1 j 2m, then w is a random diagonal line direction (and sX is chosen accordingly), ≤ ≤ ′X and w = wZ, with sZ such that χ(sZ)= j m. ′Z − (c) If 2m + 1 j 2m + 2 then both wX, and wZ are uniformly random, with sX and sZ chosen accordingly.≤ ≤ LN LN (d) Set v′X = L (vX) and v′Z = L (vZ). wX′ wZ′

We claim that this construction satisfies Property 2 of the Lemma. We show this first for ℓX and uX. If we are in case (a), then uX is a uniformly random point on the diagonal line (vX, sX, wX) which is equal to ℓX. Thus, the marginal distribution over (ℓX, uX) conditioned on j in this case is equal to DDLINE,j. If we are in case (b), (c), or (d), then uX = vX and thus uX is a uniformly random point on the line ℓX, which passes through vX. The marginal distribution over (ℓX, uX) conditioned on j in one of these cases is DDLINE. For ℓZ and uZ, the situation is slightly different: in case (a), then the marginal distribution on (ℓZ, uZ) conditioned on j is equal to DDLINE,1; in case (b), it is equal to DDLINE,j m, and in case (c) it is DDLINE. −

Let D be the distribution over tuples (ℓ, ℓX, ℓZ) induced by this procedure. This distribution D satisfies Property 1 in the lemma by construction, and Property 2 as argued above. ℓ ℓ Proof of Lemma A.15. We construct the measurements Qˆ ℓ out of the lines measurements T X, Z guaranteed f fX, fZ ℓ F2m+2 by Lemma A.13. To do so we first show how to map any line-point pair ( , u) in the expanded space q ℓ ℓ to two pairs (ℓ , u ) and (ℓ , u ) in Fm. We will define Qˆ ℓ in terms of T X, Z and show that the resulting X X Z Z q fX, fZ measurement is consistent with the combined points measurement Qˆ x,z,α,β from Lemma A.14. ℓ To define Qˆ we make use of the distribution D from Lemma A.17. Let D ℓ denote the distribution ℓ |ℓ F2m+2 derived from D by conditioning on the first element being . Then for every line in q and function f deg (ℓ), define ∈ (md+1) ℓ ℓ ℓ Qˆ = E T X, Z , f ∑ fX, fZ (ℓX,ℓZ) D ℓ ℓ ℓ ∼ | fX degmd( X), fZ degmd( Z): ∈ f =combine∈ fX, fZ where is defined in (218). Note that ˆ ℓ forms a valid POVM for every choice of line ℓ, and combine fX, fZ Q f { } ℓ ℓ ℓ moreover that combine fX, fZ is well-defined for the lines , X, Z because of Property 2 of Lemma A.17.

184 We now show the consistency relation (222). We show the first relation explicitly. The inconsistency is

∆ E ˆ ˆ ℓ ˆ x,z,α,β ˆ = ∑ ψ Q f (I Q f (x,z,α,β)) ψ (ℓ,(x,z,α,β)) DLINE h | ⊗ − | i ∼ f ℓ ℓ = E E ∑ ∑ ψˆ T X, Z (I Qˆ x,z,α,β ) ψˆ ℓ ℓ ℓ fX, fZ α fX(x)+β fZ(z) ( ,(x,z,α,β)) DLINE ( X, Z) D ℓ f f , f : f =combine h | ⊗ − | i ∼ ∼ | X Z fX, fZ ℓ ℓ E E ˆ X, Z ˆ x,z,α,β ˆ = ∑ ψ Tf , f (I Qα f (x)+β f (z)) ψ (ℓ,ℓ ,ℓ ) D (x,z,α,β) ℓ h | X Z ⊗ − X Z | i X Z ∼ ∈ fX, fZ ℓ ℓ = E E ψˆ T X, Z (I Qˆ x,z) ψˆ ∑ fX, fZ ∑ a,b (ℓ,ℓX,ℓZ) D (x,z,α,β) ℓ h | ⊗ − | i ∼ ∈ fX, fZ a,b:αa+βb=α fX(x)+β fZ(z) ℓ ℓ E E ˆ X, Z ˆ x,z ˆ ∑ ψ Tf , f (I Q f (x), f (z)) ψ ≤ (ℓ,ℓ ,ℓ ) D (x,z,α,β) ℓ h | X Z ⊗ − X Z | i X Z ∼ ∈ fX, fZ ℓ ℓ = E E E ∑ ψˆ T X, Z (I Qˆ x,z ) ψˆ . (225) fX, fZ fX(x), fZ(z) (ℓ,ℓ ,ℓ ) D x ℓX z ℓZ h | ⊗ − | i X Z ∼ ∈ ∈ fX, fZ Here, we used Property 1 of Lemma A.17 in going from the second to the third line. We used the data- processing inequality (Fact 5.24) in the second to last line, and we used Property 2 of Lemma A.17 to rewrite the expectation over x and z in the last line. Now, to complete the bound, observe that by Property 2 of Lemma A.17 the right-hand side of Equation (225) is a mixture of terms of the form ℓ ℓ E E ˆ ( X, Z ) ( ( ˆ x,z )) ˆ ∑ ψ Tf , f AA′ I Q f (x), f (z) ψ (ℓ ,x) D (ℓ ,z) D h | X Z ⊗ − X Z | i X ∼ t1,i Z ∼ t2,j fX, fZ for t , t ALINE, DLINE and i, j 1,..., m . Hence using Equation (224) it follows that 1 2 ∈ { } ∈ { } 2 (225) = O(m δP) . This establishes the first relation (222). An analogous argument shows the second relation.

a b b bmd Lemma A.18. There exists a function δS(ε, m, d, q) = a(md) (ε + q− + 2− ) for some universal constants a > 1, 0 < b < 1 such that the following holds. For A, B there exists a projective q M H ∈ {H H } measurement Sˆ acting on (C ) with outcomes consisting of pairs (gX, gZ) of polynomials { gX,gZ } H⊗ ⊗ each in ideg (F ) such that for W X, Z , on average over uniformly random u Fm, d,m q ∈ { } ∈ q ˆ ˆ (POINT,W),u (S[eval ( )=a])AA δS (Ma )BA (226) u ·W ′ ≃ ′′ ˆ ˆ (POINT,W),u (S[eval ( )=a])BB δS (Ma )AB , (227) u ·W ′ ≃ ′′ where the notation eval ( ) means the evaluation of the first outcome g of Sˆ if W = X, or the second u ·W X outcome gZ if W = Z, at the point u.

ℓ x,z,α,β Proof. Lemma A.15 establishes the existence of a symmetric strategy SLD = (ψˆ, Qˆ Qˆ ) for { f } ∪ { a } the classical low-degree test GLD for ldparams =(q, 2m + 2, d, 1), with value 1 δ . Lemma 7.8 ldparams − combine implies the existence of a projective measurement Sˆ with outcomes g ideg (F ) that is δLD-self- { g} ∈ d,2m+2 q ˆ x,z,α,β a b consistent and consistent with the “points” measurements Qa , where δLD = a(md) (δcombine + b bmd { } q− + 2− ) for some universal constants a > 1, 0 < b < 1 is given by Lemma 7.8.

185 We now argue that with high probability, the polynomial g returned by the measurement Sˆ has the { g} form g(x, z, α, β) = αgX(x)+ βgZ(z) for degree-d polynomials gX, gZ. Let denote the set of such G polynomials. On average over uniformly random (x, z, α, β),

(Sˆ ) (Sˆ ) (Qˆ x,z,α,β ) g AA′ ≈δLD g AA′ ⊗ g(x,z,α,β) BA′′ (Sˆ ) (Qˆ x,z,α,β ) ≈δQ g AA′ g(x,z,α,β) AA′ (POINT,X),x (POINT,Z),z δ (Sˆg) ∑ 1 Mˆ Mˆ (228) Q AA′ αb+βb′=g(x,z,α,β) b b′ ≈ ⊗ BA′′  b,b′  (POINT,Z),z (POINT,X),x δ (Sˆg) ∑ 1 Mˆ Mˆ , (229) Q AA′ αb+βb′=g(x,z,α,β) b′ b ≈ ⊗ BA′′  b,b′  x,z,α,β where the first approximation uses the consistency of Sˆg with Qˆ a from Lemma 7.8 and Fact 5.18, { } { } x,z,α,β and the second, third and fourth lines follow from the consistency properties of Qˆ a from Lemma A.14 combined with Fact 5.20. Note that we will need both Equation (228) and Equation{ (229}) for the subsequent calculations. For all g define ψˆ =(Sˆ ) ψˆ . Then on average over uniformly random (x, z, α, β), | gi g AA′ | i 2 (POINT,X),x (POINT,Z),z E ∑ 1 Mˆ Mˆ ψˆg αb+βb′=g(x,z,α,β) b b′ x,z,α,β BA′′ | i  b,b′  E ˆ ˆ (POINT,Z),z ˆ (POINT,X),x ˆ (POINT,Z),z ˆ = ∑ 1αb+βb =g(x,z,α,β)1αb+βb =g(x,z,α,β) ψg Mb M b Mb ψg x,z,α,β ′ ′′ h | ′′ ′ | i b,b′,b′′ E ˆ ˆ (POINT,Z),z ˆ (POINT,X),x ˆ (POINT,Z),z ˆ 1/q ∑ 1αb+βb =g(x,z,α,β) ψg Mb Mb Mb ψg (230) ≈ x,z,α,β ′ h | ′ ′ | i b,b′

(POINT ) where in the second equality, we used the fact that Mˆ ,X ,x is projective, and in the third line, we { b } used that for fixed x, z, α, β with β = 0, if αb + βb = g(x, z, α, β) and αb + βb = g(x, z, α, β), then 6 ′ ′′ b′ = b′′. The probability that β = 0 is 1/q, and the term under the approximation has an absolute value of at most 1, so the error incurred is at most 1/q. Fix b, b F . Let h (x, z, α, β)= g(x, z, α, β) (αb + βb ). Write ′ ∈ q b,b′ − ′ d+1 i j hb,b′ (x, z, α, β)= ∑ hb,b′,i,j(x, z) α β i,j=0 for some polynomials h (x, z) of degree at most d + 1. Let ideg (F ) denote the polynomi- b,b′,i,j G′ ⊂ d,2m+2 q als that are linear in α, β, i.e., g = αg (x, z)+ βg (x, z). Fix a g / . Then h (x, z, α, β) is a nonzero 1 2 ∈ G′ b,b′ polynomial of degree at most (2m + 2)d, and in particular there must exist an (i, j) such that the polyno- mial h (x, z) is not the identically zero polynomial. Call a pair (x, z) F2 good if h (x, z, α, β) b,b′,i,j ∈ q b,b′ is a nonzero polynomial function of α, β, and bad otherwise. The probability that (x, z) is bad is at most (2m + 2)d/q by the Schwartz-Zippel lemma. Otherwise, conditioned on a good (x, z) pair, the probability that over the choice of is at most , again by the Schwartz-Zippel hb,b′ (x, z, α, β) = 0 α, β (2m + 2)d/q 2 lemma. Thus for g / we can upper-bound (230) by 2(2m+2)d ψˆ . Combined with (228) we get that ∈ G′ q | gi 2 ˆ ˆ (231) ∑ (Sg)AA′ ψ O(δLD + δQ + md/q). g/ | i ≤ ∈G′

186 Next, we argue that not only is g linear in α, β with high probability, but also that g = αgX + βgZ where gX depends on x only and gZ depends on z only. Fix a polynomial g , i.e., g(x, z, α, β) = ∈ G′ αg (x, z)+ βg (x, z) for polynomials g , g ideg (F ). For all x, z Fm, α, β, b, b F , 1 2 1 2 ∈ d,2m q ∈ q ′ ∈ q

1αb+βb =αg (x,z)+βg (x,z) = 1b=g (x,z)1b =g (x,z) + 1b=g (x,z) b =g (x,z) 1αb+βb =αg (x,z)+βg (x,z) . ′ 1 2 1 ′ 2 6 1 ∨ ′6 2 · ′ 1 2 By the Schwartz-Zippel lemma,

E (232) 1b=g1(x,z) b =g2(x,z) 1αb+βb =αg1(x,z)+βg2(x,z) 1/q . α,β 6 ∨ ′6 · ′ ≤

Let = , where (resp. Z) is the set of polynomials g where g (x, z) depends G′ GX′ ∪ GZ′ ∪ G GX′ G′ ∈ G′ 1 nontrivially on z (resp. g (x, z) depends nontrivially on x). Fix g . We can upper-bound (230) by 2 ∈ GX′ 1 2 ˆ (POINT,Z),z ˆ (POINT,X),x ˆ (POINT,Z),z ψˆg + E ∑ 1b=g (x,z)1b =g (x,z) ψˆg M M M ψˆg (233) x,z 1 ′ 2 b′ b b′ q | i b,b h | | i ′ 1 2 ˆ (POINT,Z),z ψˆg + E ∑ E 1b =g (x,z) ψˆg M ψˆg (234) z x ′ 2 b′ ≤ q | i b h | | i ′   where the indicator is removed using positivity in the inequality. To bound the second term 1b=g1(x,z) in (234), we use an argument similar to the one we used to derive (231). That is, say that a value of z is good if g ( , z) is not a constant function. The probability that z is good is at least 1 (2m + 2)d/q by the 2 · − Schwarz-Zippel lemma. Moreover, for each good z and for each b the expectation E 1 is at most ′ x b′=g2(x,z) 2(2m+2)d+1 ˆ 2 (2m + 2)d/q. Hence, we conclude that if g2(x, z) depends on x, then (234) is at most q ψg , and thus | i 2 ˆ ˆ (235) ∑ (Sg)AA′ ψ O(δLD + δQ + md/q). g | i ≤ ∈GX′ POINT POINT By starting with (229) instead (i.e., the order of M( ,Z),z and M( ,X),x are switched), we can b′ b perform similar reasoning to deduce that

ˆ ˆ 2 (236) ∑ (Sg)AA′ ψ O(δLD + δQ + md/q). g | i ≤ ∈GZ′

We thus obtain

2 2 2 2 ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ (237) ∑ (Sg)AA′ ψ ∑ (Sg)AA′ ψ + ∑ (Sg)AA′ ψ + ∑ (Sg)AA′ ψ g/ | i ≤ g/ | i g | i g | i ∈G ∈G′ ∈GX′ ∈GZ′ O(δLD + δ + md /q). (238) ≤ Q

187 Let δ be the error in the right hand side of Equation (238). Putting everything together, we obtain that G ˆ ˆ ˆ (239) 1 = ∑ ψ (Sg)AA′ ψ g h | | i ˆ ˆ ˆ (240) δS ∑ ψ (Sg)AA′ ψ ≈ g h | | i ∈G (POINT,Z),z (POINT,X),x 1/2 E ∑ ψˆ (Sˆg ,g ) ∑ 1 Mˆ Mˆ ψˆ (δQ) X Z AA′ αb+βb′=αgX (x)+βgZ(z) b′ b ≈ x,z,α,β g ,g h | ⊗ BA′′ | i X Z  b,b′  (241)

1/2 E ψˆ (Sˆ ) (δQ) ∑ gX,gZ AA′ ∑ 1αb+βb′=αgX (x)+βgZ(z)=αb′′+βb′ ≈ x,z,α,β g ,g h | ⊗ · X Z  b,b′,b′′ POINT POINT POINT Mˆ ( ,X),x Mˆ ( ,Z),z Mˆ ( ,X),x ψˆ (242) b′′ b′ b BA′′ | i  E ψˆ (Sˆ ) 1 1/q ∑ gX ,gZ AA′ ∑ αb+βb′=αgX(x)+βgZ(z) ≈ x,z,α,β g ,g h | ⊗ · X Z  b,b′ (POINT ) (POINT ) (POINT ) Mˆ ,X ,x Mˆ ,Z ,z Mˆ ,X ,x ψˆ (243) b b′ b BA′′ | i (POINT ) (POINT ) (POINT ) E ψˆ (Sˆ ) Mˆ ,X ,x Mˆ ,Z ,z Mˆ ,X ,x ψˆ . (244) δS+1/q ∑ gX,gZ AA′ gX (x) gZ (z) gX (x) ≈ x,z g ,g h | ⊗ BA′′ | i X Z  

Here Equation (240) follows from (238) and the projectivity of Sˆg:

ˆ ˆ ˆ ˆ ˆ 2 ∑ ψ (Sg)AA′ ψ = ∑ (Sg)AA′ ψ δS. g/ h | | i g/ | i ≤ ∈G ∈G Equation (241) follows by approximation (229) together with the Cauchy-Schwarz inequality and the pro- jectivity of Sˆg:

(POINT,Z),z (POINT,X),x E ∑ ψˆ (Sˆg ,g ) (I ∑ 1 Mˆ Mˆ ψˆ X Z AA′ αb+βb′=αgX(x)+βgZ(z) b′ b x,z,α,β g ,g h | · − BA′′ | i X Z   b,b′   2 E ∑ ψˆ (Sˆg ,g ) ψˆ X Z AA′ ≤ x,z,α,β g ,g h | | i r X Z 2 ˆ ˆ (POINT,Z),z ˆ (POINT,X),x ˆ E ∑ (SgX,gZ ) I ∑ 1 M M ψ AA′ αb+βb′=αgX(x)+βgZ(z) b′ b · vx,z,α,β ⊗ − BA′′ | i u gX,gZ b,b u   ′   t 1 δ . ≤ · Q q Equation (242) follows by a similar calculation. To pass from here to Equation (243), we argue that for α = 0, the indicator forces b = b , and the terms where α = 0 are bounded in absolute value by 1/q. 6 ′′ Finally, to reach Equation (244), we apply Equation (232) to the expectation of the indicator over α, β. Equation (244) tells us that

(POINT,X),x (POINT,Z),z (POINT,X),x Sˆ 1/2 Mˆ Mˆ Mˆ [evalx( )=a,evalz( )=b] O(δS+(δQ) +1/q) a b a . · · AA′ ≃ BA′′  1/2  Let δS be the error in this approximation and observe that δS = O(δLD +(δQ) + md/q) has the form a b b b md a′(md) ′ (ε ′ + q− ′ + 2− ′ ) for some universal constants a′ > 1, 0 < b′ < 1. We thus conclude by

188 an application of Fact 5.24 that Equation (226) holds for W = X. To obtain the same result for W = Z, we perform exactly the same steps starting from Equation (239) but with X and Z interchanged (so we use Equation (228) instead of Equation (229)). Finally, entirely analogous calculations with the registers swapped yield Equation (227).

A.6 Pulling the X and Z measurements apart Recall the decoding map of the low-degree code defined in Section 3.4. This map takes in a polynomial g ideg (F ) and returns a string Dec(g) FM consisting of the evaluations of g on the points in the ∈ d,m q ∈ q hypercube 0, 1 m. We now use this decoding map to construct a new set of measurement operators out of ˆ { } Fm F S. First, we introduce some notation. For all W X, Z and all degree d polynomials g : q q define projectors ∈ { } → ˆX ˆ ˆZ ˆ Sg = ∑ Sg,gZ and Sg = ∑ SgX,g , gZ gX where the measurement Sˆ is given by Lemma A.18. { gX ,gZ } For all W X, Z , u˜ FM, and a F , define the projector ∈ { } ∈ q ∈ q W W τa (u˜)= ∑ τh (245) h FM : h u˜=a ∈ q · W Cq M where recall that τh is the projector corresponding to measuring ( )⊗ in the W basis and obtaining FM FM ˜ W,u˜ h q as the outcome. For all u˜ q define the projective measurement Ma a Fq by ∈ ∈ { } ∈ ˜ W,u˜ ˆW W Ma = ∑ Sg τ(Dec(g) u˜) a(u˜) . (246) g ⊗ · −

ˆW ˜ W,u˜ If Sg is viewed as an operator acting on registers AA′ (resp. BB′) then we view Ma as an operator acting on registers AA′A′′ (resp. BB′B′′). (When necessary we will indicate via subscripts which registers the ˜ W,u˜ ˆW W operators are acting on.) The Ma are projective because the Sg and τ operators are projective. For all j 1,..., t define the observable ∈ { } ˜ j tr(eja) ˜ W,u˜ W (u˜)= ∑( 1) Ma , (247) a −

F F W where ej j 1,...,t denotes the self-dual basis for q over 2 specified in Section 3.3.1. Let τ be the generalized{ } ∈{ Pauli observable} defined in Section 3.7. Observe that

tr(ej(Dec(gW ) u˜)) ˆ W tr(ej(Dec(gW ) u˜)) tr(ej(u˜ h)) ˆ W ∑ ( 1) · SgX,gZ τ (eju˜)= ∑ ∑( 1) · ( 1) · SgX,gZ τh g ,g − ⊗ g ,g − − ⊗  X Z  X Z h tr(ej(u˜ Dec(gW )+a)) ˆW W = ∑ ∑( 1) · SgW ∑ τh gW a − ⊗ h: h u˜=a  ·  tr(ej(u˜ Dec(gW )+a)) ˆW W = ∑ ∑( 1) · SgW τa (u˜) gW a − ⊗ = W˜ j(u˜) ,

189 where the first equality uses the definition of W( ), the second uses the definition of ˆW and regroups τ eju˜ SgW W terms, the third uses the definition (245) of τa (u˜), and the last is by a change of variables. These equations in particular show that the observable W˜ j(u˜) is Hermitian and squares to the identity, because the measure- ment Sˆ is projective. Furthermore, it can be verified by direct calculation that the observables W˜ j(u˜) { gX ,gZ } satisfy the same relations as the generalized Pauli group. Specifically, for all j, j 1,..., t , u˜, v˜ FM, ′ ∈ { } ∈ q j j Z˜ ′ (v˜)X˜ (u˜) j = j′ X˜ j(u˜)Z˜ j′ (v˜)= tr((e u˜) (e v˜)) j j 6 (( 1) j · j Z˜ (v˜)X˜ (u˜) j = j′ − W,u˜ j The next lemma shows that the measurements M˜ a and observables W˜ (u˜) are consistent with the original points measurements, and are self-consistent.{ }

FM Lemma A.19. Let δS as in Lemma A.18. For all W X, Z , all u˜ q and j 1,..., t , the W,u˜ j ∈ { } ∈ ∈ { } measurements M˜ a a F and observables W˜ (u˜) satisfy the following properties: { } ∈ q 1. (Consistency with points measurements) For all W X, Z , for all j 1,..., t , and on average ∈ { } ∈ { } over uniformly random u Fm, ∈ q

POINT I M˜ W,indm(u) M( ,W),u I AA′ A′′ ⊗ a ≃δS a ⊗ BB′ B′′ and ( ) (POINT ) M˜ W,indm u I I M ,W ,u . a ⊗ BB′B′′ ≃δS AA′ A′′ ⊗ a 2. (Self-consistency) On average over uniformly random u˜ FM, ∈ q W˜ j(u˜) I I W˜ j(u˜) . ⊗ BB′ B′′ ≈δS AA′ A′′ ⊗

(POINT,W),u W,indm(u) Proof. We show the first item of the lemma for the case when Ma acts on register A and M˜ a acts on registers BB′B′′ (the argument for when the A and B registers are interchanged proceeds analo- gously):

(POINT,W),u W,indm(u) E ∑ ψˆ Ma M˜ a ψˆ u Fm h | ⊗ | i ∈ q a E ˆ (POINT,W),u ˆW W ˆ = ∑ ψ (Ma )A (Sg )BB′ (τDec(g) ind (u) a(indm(u)))B′′ ψ u Fm h | ⊗ ⊗ · m − | i ∈ q g,a ˆ (POINT,W),u ˆW = E ∑ ψˆ (M )AB (Sg )BB ψˆ Fm Dec(g) indm(u) ′′ ′ u q h | · ⊗ | i ∈ g = E ˆ ( ˆ (POINT,W),u) ( ˆW) ˆ ∑ ψ Mg(u) AB′′ Sg BB′ ψ u Fm h | ⊗ | i ∈ q g 1 δ . ≥ − S

W,indm(u) The second line follows from expanding the definition of M˜ a using (246), the third line follows (POINT ) from the definition of Mˆ ,W ,u given in Section A.3, the fourth line follows from the fact that Dec(g) a · indm(u)= g(u), and the fifth line follows from Lemma A.18. This implies the first item of the Lemma.

190 To show the second item we first argue that, on average over u˜,

˜ W,u˜ ˜ W,u˜ Ma δS Ma . (248) AA′ A′′ ≈ BB′ B′′     This follows because on average over u Fm and u˜ FM we have ∈ q ∈ q ˜ W,u˜ Ma (249) AA′ A′′   ˜ W,u˜ ( ˆW) ( ˆ (POINT,W),u) (250) δS Ma ∑ Sg AA′ Mg(u) BA′′ ≈ AA′ A′′ · g ⊗     ˆW W (POINT,W),u W (251) = ∑ (Sg )AA′ (τ(Dec(g) u˜) a(u˜))A′′ ∑ (Mr )B (τs (indm(u)))A′′ g ⊗ · − · ⊗    r,s:r+s=g(u)  W W (POINT,W),u = ∑ (Sˆ ) (τ ) (M )B (252) g AA′ h A′′ (g gh)(u) g,h: ⊗ ⊗ − (Dec(g) h) u˜=a − · The first inequality follows from right-multiplying (249) by the approximation (270) in Lemma A.20 (to be W,u˜ proved below) and using Fact 5.19. Line (251) follows from expanding the definition of M˜ a . Line (252) follows from the fact that W ( ) = ∑ W, W ( ( )) = ∑ W, and τ( ( ) ) u˜ h:(Dec(g) h) u˜=a τh τs indm u h:gh(u)=s τh Dec g u˜ a − · W · − the τh operators are orthogonal projections. { } POINT Using the self-consistency of M( ,W),u shown in Lemma A.7, we then obtain { a } ˆW (POINT,W),u W (POINT,W),u (252) ε ∑ (Sg )AA (M )A (τh )A M (253) ′ (g gh)(u) ′′ (g gh)(u) B ≈ g,h: · − ⊗ · − (Dec(g) h) u˜=a     − · ˆW (POINT,W),u W W (POINT,W),u 0 ∑ (Sg )AA (M )A (τh )A (τh )A M (254) ′ (g gh)(u) ′ ′′ (g gh)(u) B ≈ g,h: · − ⊗ ⊗ · − (Dec(g) h) u˜=a     − · ˆW ˆ (POINT,W),u W (POINT,W),u = ∑ (Sg )AA (M )AA (τh )A M . (255) ′ g(u) ′ ′′ (g gh)(u) B g,h: · ⊗ · − (Dec(g) h) u˜=a     − · The “ ” in Line (254) indicates that the left hand side is equal to the right hand side when applied to the ≈0 state ψˆ . Line (254) follows from the self-consistency of the operators τW , and Line (255) follows from | i { h } ˆ (POINT,W),u W W the definition of Ma . Using the fact that ψˆ = ∑h (τ )B (τ )B ψˆ , we get that { } | i ′ h′ ′ ⊗ h′ ′′ | i W (POINT,W),u W (POINT,W),u W W (255) 0 ∑ (Sˆ Mˆ ) (τ ) M ∑ τ τ g g(u) AA′ h A′′ (g gh)(u) h′ h′ ≈ · ⊗ · − ⊗ ⊗ BB′B′′ g,h: h′ (Dec(g) h) u˜=a     − · (256)

W (POINT,W),u W (POINT,W),u W = ∑ (Sˆ Mˆ ) (τ ) (Mˆ )BB (τ )B . (257) g g(u) AA′ h A′′ (g gh+g )(u) ′ h′ ′′ · ⊗ · − h′ ⊗ g,h,h′: (Dec(g) h) u˜=a     − ·

(POINT ) Line (257) follows from (256) by expanding the definition of Mˆ ,W ,u . Next, we argue that, on { a }

191 average over u Fm, ∈ q W W (POINT,W),u W (257) δ ∑ (Sˆ ) (τ ) (Mˆ )BB (τ )B (258) S g AA′ h A′′ (g gh+g )(u) ′ h′ ′′ ≈ ⊗ ⊗ − h′ ⊗ g,h,h′: (Dec(g) h) u˜=a     − · ˆW W We show this by bounding the magnitude of the difference. Using the fact that Sg and τh are projective, we have: 2 W (POINT,W),u W (POINT,W),u W E ∑ ∑ (Sˆ (I Mˆ )) (τ ) (Mˆ )BB (τ )B ψˆ g g(u) AA′ h A′′ (g gh+g )(u) ′ h′ ′′ u a · − ⊗ ⊗ − h′ ⊗ | i g,h,h′: (Dec(g) h) u˜=a − · E ˆ ˆ (POINT,W),u ˆW ˆ (POINT,W),u W = ∑ ψ ((I M ) Sg (I M ))AA (τh )A u h | − g(u) · · − g(u) ′ ⊗ ′′ g,h,h′   (POINT,W),u 2 W (Mˆ ) (τ )B ψˆ (g gh+g )(u) BB′ h′ ′′ · − h′ ⊗ | i (POINT,W),u (POINT,W),u  E ˆ (( ˆ ) ˆW ( ˆ )) ˆ (259) ∑ ψ I Mg(u) Sg I Mg(u) AA′ ψ ≤ u g h | − · · − | i δ . (260) ≤ S (POINT ) The inequality in Line (259) follows from the fact that for all g, h, the sum ∑ (Mˆ ,W ,u )2 h′ (g gh+g )(u) BB′ − h′ ⊗ (τW) is at most I. The inequality in Line (260) follows from the second item of Lemma A.20 (to be h′ B′′ proved below). Next, we show that on average over u Fm, ∈ q W W W (POINT,W),u W (258) = ∑ (Sˆ ) (τ ) (Sˆ Mˆ )BB (τ )B (261) g AA′ h A′′ g′ (g gh+g )(u) ′ h′ ′′ ⊗ · · − h′ ⊗ g,h,g′,h′: (Dec(g) h) u˜=a     − · ˆW W ˆW ˆ (POINT,W),u W O(δ +√ε) ∑ (Sg )AA (τh )A (Sg M )BB (τh )B . ≈ S ′ ⊗ ′′ · ′ · g′(u) ′ ⊗ ′ ′′ g,h,g′,h′:     (g′ g )(u)=(g gh)(u) − h′ − (Dec(g) h) u˜=a − · (262)

W W W W δ ∑ (Sˆ ) (τ ) (Sˆ )BB (τ )B (263) ≈ S g AA′ ⊗ h A′′ · g′ ′ ⊗ h′ ′′ g,h,g′,h′:     (g′ g )(u)=(g gh)(u) − h′ − (Dec(g) h) u˜=a − · ˆW W ˆW W md/q ∑ (S )AA (τ )A (S )BB (τ )B (264) ≈ g ′ ⊗ h ′′ · g′ ′ ⊗ h′ ′′ g,h,g′,h′:     g′ g =g gh h′ (Dec−(g) h)−u˜=a − · Before we justify the sequence of steps to go from Line (258) to Line (264), we first argue how this estab- lishes Line (248), and hence the second item of the Lemma statement. Absorbing the error terms O(md/q) and O(√ε) into δS, we have shown that

M˜ W,u˜ (SˆW) (τW ) (SˆW) (τW) . a δS ∑ g AA′ h A′′ g′ BB′ h′ B′′ AA′ A′′ ≈ ⊗ · ⊗   g,h,g′,h′:     g′ g =g gh − h′ − (Dec(g) h) u˜=a − ·

192 Note that the conditions and are equivalent to the conditions (Dec(g) h) u˜ = a g gh = g′ gh′ and − · . Thus the− expression in− (264) is symmetric between and (Dec(g′) h′) u˜ = a g gh = g′ gh′ g, h − · − − W,u˜ g′, h′, and so an analogous derivation shows that M˜ a is δS-close to (264), and thus δS-close to BB′ B′′ W,u˜ W,u˜   M˜ a . This shows (248). Since M˜ a is projective, by Fact 5.24, we get for all j 1,..., t , AA′A′′ { } ∈ { } on average over u˜ FM, ∈ q ( ˜ W,u˜ ) ( ˜ W,u˜ ) M[tr(e )=b] AA′ A′′ δS M[tr(e )=b] BB′ B′′ j· ≈ j· where the answer summation is over b F2. Finally, the second item of the Lemma follows from apply- ing Lemma A.5. ∈ We now return to justifying the sequence of steps between Lines (258) and (264). The equality in Line (261) follows by left-multiplying (258) by I = ∑ (SˆW) . BB′ g′ g′ BB′

The approximation in Line (262). The approximation in Line (262) follows by bounding the magnitude of the difference:

2 W W W (POINT,W),u W E ∑ ∑ (Sˆ ) (τ ) (Sˆ Mˆ )BB (τ )B ψˆ g AA′ h A′′ g′ (g gh+g )(u) ′ h′ ′′ u a ⊗ · · − h′ ⊗ | i g,h,g′,h′:     (g′ g )(u)=(g gh)(u) h′ −(Dec(g) 6 h) −u˜=a − · POINT = E ∑ ψˆ SˆW) (τW ) (Mˆ ( ,W),u SˆW g AA′ h A′′ (g gh+g )(u) g′ u h | ⊗ · − h′ · · g,h,g′,h′:    (g′ g )(u)=(g gh)(u) − h′ 6 − POINT Mˆ ( ,W),u ) (τW) ψˆ (g gh+g )(u) BB′ h′ B′′ − h′ ⊗ | i OINT OINT  E ˆ ˆW W ˆ (P ,W),u ˆW ˆ (P ,W),u W ˆ ∑ ψ Sg )AA (τh )A (M Sg M )BB (τh )B ψ ≤ u h | ′ ⊗ ′′ · a′ · ′ · a′ ′ ⊗ ′ ′′ | i g,h,g′,h′,a′: a =g (u)     ′6 ′ E ˆ ˆ (POINT,W),u ˆW ˆ (POINT,W),u ˆ = ∑ ψ (M Sg M )BB ψ (265) u a′ ′ a′ ′ g ,a :a =g (u)h | · · | i ′ ′ ′6 ′ E ˆ ˆ (POINT,W),u ˆW ˆ O(√ε) ∑ ψ (I M )AB (Sg )BB ψ (266) ≈ u h | − g′(u) ′′ ⊗ ′ ′ | i g′ δ . (267) ≤ S The approximation in Line (266) follows from the following calculation:

(266) (265) −

E ˆ ˆ (POINT,W),u ˆW ˆ (POINT,W),u ˆ (POINT,W),u ˆ ∑ ψ (M Sg )BB ((M )AB (M )BB ) ψ (268) ≤ u h | a′ · ′ ′ · a′ ′′ − a′ ′ | i g′,a′:a′ =g′(u) 6 E ˆ ˆ (POINT,W),u ˆ (POINT,W),u ˆW ˆ (POINT,W) ,u ˆ + ∑ ψ ((M )AB (M )BB ) (Sg )BB (M )AB ψ u h | a′ ′′ − a′ ′ · ′ ′ · a′ ′′ | i g′,a′:a′ =g′(u) 6 (269)

193 Using Cauchy-Schwarz, we can bound the first term on the right hand side of the inequality by 1/2 E ˆ ˆ (POINT,W),u ˆW ˆ (POINT,W),u ˆ ∑ ψ (M Sg M )BB ψ u a′ ′ a′ ′ g ,a :a =g (u)h | · · | i! · ′ ′ ′6 ′ E ˆ ˆ (POINT,W),u ˆ (POINT,W),u ˆW ∑ ψ ((M )AB (M )BB ) Sg u a′ ′′ a′ ′ ′ g ,a :a =g (u)h | − · · ′ ′ ′6 ′ 1/2 ˆ (POINT,W),u ˆ (POINT,W),u ˆ ((Ma )AB′′ (Ma )BB′ ) ψ ′ − ′ | i! √1 √ε ≤ · (POINT ) The inequality follows from the self-consistency of Mˆ ,W ,u, as established in Lemma A.9. Similarly, a′ we can bound the second term in Line (269) by √ε. This establishes the approximation in Line (266). The inequality in Line (267) follows from Equation (227) of Lemma A.18. This concludes the derivation of the approximation in Line (262).

The approximation in Line (263). We establish this approximation by bounding the magnitude of the difference: 2 ˆW W ˆW (POINT,W),u W E ∑ ∑ (S )AA (τ )A (S (I Mˆ ))BB (τ )B ψˆ g ′ h ′′ g′ g′(u) ′ h′ ′′ u a ⊗ · · − ⊗ | i g,h,g′,h′:     (g′ g )(u)=(g gh)(u) − h′ − (Dec(g) h) u˜=a − · E ˆ ˆW W ˆ (POINT,W),u ˆW = ∑ ψ (Sg )AA (τh )A ((I M ) Sg u h | ′ ⊗ ′′ ⊗ − g′(u) · ′ · g,h,g′,h′: (g′ g )(u) h′ =(−g g )(u) − h ˆ (POINT,W),u W ˆ (I M ))BB (τh )B ψ − g′(u) ′ ⊗ ′ ′′ | i E ˆ ˆ (POINT,W),u ˆW ˆ (POINT,W),u ˆ ∑ ψ ((I M ) Sg (I M ))BB ψ ≤ u h | − g′(u) · ′ · − g′(u) ′ | i g′ δ . ≤ S The last inequality follows from the second item of Lemma A.20 (to be proved below).

The approximation in Line (264). We establish this approximation by bounding the magnitude of the difference: 2 E (SˆW) (τW) (SˆW) (τW ) ψˆ ∑ ∑ g AA′ h A′′ g′ BB′ h′ B′′ u a ⊗ · ⊗ | i g,h,g′,h′:     g′ g =g gh − h′ 6 − (g′ g )(u)=(g gh)(u) h′ −(Dec(g) h) −u˜=a − · ˆ ˆW W ˆW W ˆ E 1 = ∑ ψ (Sg )AA (τh )A (Sg )BB (τh )B ψ [(g′ gh )(u)=(g gh)(u)] h | ′ ⊗ ′′ · ′ ′ ⊗ ′ ′′ | i · u − ′ − g,h,g′,h′:     g′ g =g gh − h′ 6 − md/q. ≤

194 The last inequality follows from the Schwartz-Zippel lemma: two distinct polynomials of total degree at most md cannot agree on more than md/q fraction of points.

We now give a proof of the following Lemma, which was used in several derivations in Lemma A.19.

Lemma A.20. For W X, Z , on average over u Fm, we have that ∈ { } ∈ q ( ˆW) ( ˆ (POINT,W),u) (270) ∑ Sg AA′ Mg(u) BA′′ δS I g ⊗ ≈ and

(POINT ) (SˆW (I Mˆ ,W ,u)) 0 (271) g · − g(u) AA′ ≈δS where the answer summation is over g ideg (F ). Furthermore, all the approximations above hold ∈ d,m q when the registers A, A′, A′′ are exchanged with B, B′, B′′ respectively. Proof. We first prove Equation (270). From Equation (226) of Lemma A.18 we have that on average over u Fm, ∈ q ˆ ( ˆW) ( ˆ (POINT,W),u) ˆ (272) ψ ∑ Sg AA′ Mg(u) BA′′ ψ 1 δS h | g ⊗ | i≥ −

(POINT ) which is equivalent to ∑ (SˆW) (Mˆ ,W ,u) I. By Fact 5.18, we have that g g AA′ ⊗ g(u) BA′′ ≃δS ( ˆW) ( ˆ (POINT,W),u) ∑ Sg AA′ Mg(u) BA′′ δS I. g ⊗ ≈

We now prove Equation (271). Equation (272) and Fact 5.18 shows that

POINT (SˆW ) (Mˆ ( ,W),u) [g(u)=a] AA′ ≈δS a BA′′ where the answer summation is over a F . Left-multiplying this expression by (SˆW ) and using ∈ q [g(u)=a] AA′ Fact 5.19 we get

(POINT ) (SˆW ) (SˆW ) (Mˆ ,W ,u) [g(u)=a] AA′ ≈δS [g(u)=a] AA′ ⊗ a BA′′ POINT (SˆW Mˆ ( ,W),u) ≈ε [g(u)=a] · a AA′ where the second approximation follows from the following calculation:

2 E ˆW ˆ (POINT,W),u ˆ (POINT,W),u ˆ (273) ∑ (S[g(u)=a])AA′ (Ma )AA′ (Ma )BA′′ ψ u a · − | i   2 E ˆ ˆ (POINT,W),u ˆ (POINT,W),u ˆ (274) ∑ ψ (Ma )AA′ (Ma )BA′′ ψ ≤ u a h | − | i   (POINT,W),u (POINT,W),u 2 = E ∑ (Mˆ a )AA (Mˆ a )BA ψˆ (275) u ′ − ′′ | i a ε (276) ≤

195 The first inequality follows from the fact that SˆW I for all a, and the second inequality follows from [g(u)=a] ≤ (POINT,W),u the self-consistency of Mˆ a (established in Lemma A.9). Thus

(POINT ) (SˆW ) (SˆW Mˆ ,W ,u) [g(u)=a] AA′ ≈O(δS+ε) [g(u)=a] · a AA′ On the other hand, notice that

2 ˆW ˆ (POINT,W),u E ∑ (Sg (I M ))AA ψˆ u · − g(u) ′ | i g POINT POINT = E ˆ (( ˆ ( ,W),u) ˆW ( ˆ ( ,W),u)) ˆ ∑ ψ I Mg(u) Sg I Mg(u) AA′ ψ u g h | − · · − | i E ˆ ˆ (POINT,W),u ˆW ˆ (POINT,W),u ˆ = ∑ ψ ((I Ma ) S[g(u)=a] (I Ma ))AA′ ψ u a h | − · · − | i O(δ + ε). ≤ S

Equation (271) is thus obtained by absorbing ε into the function δS.

The next lemma shows that we can construct local unitaries VA, VB from the measurements Sˆ such that M j the shared state ψˆ is close to AUX EPRq ⊗ for some state AUX , and the observables W˜ (u˜) are | i W| i ⊗ | i M | i W,u˜ mapped to Pauli observables τ (eju˜) acting on EPRq ⊗ . Using the consistency between the M˜ a a F ∈ q |(POINT,Wi ),u { } measurements and the points measurements Ma a F established in Lemma A.19, and the consis- ∈ q { } (PAULI,W) tency between the points measurements and the “total Pauli measurements” Mh h FM established { } ∈ q in Lemma A.7, we deduce that the total Pauli measurements must be close (up to the local unitaries VA, VB) W M to the Pauli projectors τh h FM acting on EPRq ⊗ . { } ∈ q | i 1/4 Lemma A.21. There exists a function δQLD(ε, m, d, q)= O(δS + md/q) (where δS is defined in Lemma A.18) and unitaries VA acting on registers AA′A′′ and VB acting on BB′B′′ such that

1. There exists a state AUX on registers AA′BB such that | i ′ 2 M VA VB ψˆ AA A BB B AUX AA BB EPRq ⊗ δQLD. ⊗ | i ′ ′′ ′ ′′ − | i ′ ′ ⊗ | iA′′B′′ ≤

2. For all W X , Z , h FM ∈ { } ∈ q

(PAULI,W) † W VA M VA δQLD IAA (τh )A h A ≈ ′ ⊗ ′′  (PAULI,W) † W VB M VB δQLD IBB (τh )B h B ≈ ′ ⊗ ′′   M where the statements hold with respect to the state AUX AA BB EPRq ⊗ . ≈ | i ′ ′ ⊗ | iA′′B′′ Proof. Define the linear map

ˆ X Z VA = ∑ (SgX,gZ )AA′ (τ (Dec(gZ))τ (Dec(gX)))A′′ , gX,gZ ⊗

196 X Z q M X where the τ ( ), τ ( ) act on (C )⊗ . (Also note that the argument of τ is Dec(gZ) and the argument of Z · · τ is Dec(gX).) We verify the unitarity of VA: † ˆ X Z Z X VAVA = ∑ (SgX,gZ )AA′ (τ (Dec(gZ))τ (Dec(gX))τ (Dec(gX))τ (Dec(gZ)))A′′ gX,gZ ⊗ ˆ = ∑ (SgX,gZ )AA′ IA′′ gX,gZ ⊗

= IAA′ A′′ ˆ where in the first line we used that the measurement SgX ,gZ is projective and in the second line we used Z X { } that the τ and τ observables are self-inverse. The unitary VB is defined analogously. F F ˜ j Let ej j 1,...,t denote the self-dual basis for q over 2 specified in Section 3.3.1, and let W be { } ∈{ } j the observables introduced in (247). Conjugating the observable W˜ acting on the registers AA′A′′ by the unitary VA we get j † VA W˜ (u˜) VA

tr(ej(Dec(gW ) u˜)) ˆ X Z W = ∑ ( 1) · SgX,gZ τ (Dec(gZ))τ (Dec(gX))τ (eju˜) g ,g − AA′ ⊗ X Z    Z X τ (Dec(gX))τ (Dec(gZ)) A′′ ˆ W  = ∑ SgX,gZ τ (eju˜) g ,g AA′ ⊗ A′′ X Z     W =IAA′ τ (eju˜) , ⊗ A′′ where in the third line we used the (anti-)commutation relation X Z W tr(e (Dec(g ) u˜)) W X Z τ (Dec(g ))τ (Dec(g ))τ (e u˜)=( 1) j W · τ (e u˜)τ (Dec(g ))τ (Dec(g )) . Z X j − j Z X In the third line we also used that the observables τW square to the identity. An entirely analogous calcula- tion shows that ˜ j † W . VB W (u˜) VB = IBB′ τ (eju˜) ⊗ B′′ Next, from the self-consistency property of the observables W˜ j specified in Lemma A.19 we get that for W X, Z , ∈ { } † j j j j E E ψˆ W˜ (u˜) I I W˜ (u˜) W˜ (u˜) I I W˜ (u˜) ψˆ δS , j 1,...,t u˜ FMh | ⊗ − ⊗ ⊗ − ⊗ | i≤ ∈{ } ∈ q     which, since W˜ j is Hermitian, is equivalent to j j E E ψˆ W˜ (u˜) W˜ (u˜) ψˆ 1 δS/2 . (277) j 1,...,t u˜ FMh | ⊗ | i≥ − ∈{ } ∈ q

Let ϑ = VA VB ψˆ , and for W X, Z define the operator | i ⊗ | i ∈ { } W W HW = E E τ (eju˜) τ (eju˜) j 1,...,t u˜ FM ⊗ ∈{ } ∈ q M W W ⊗ = E E τ (ejs) τ (ejs) j 1,...,t s Fq ⊗ ∈{ }  ∈  M = E τW(s) τW (s) ⊗ s Fq ⊗  ∈ 

197 where in the second line we used that for all u˜ FM, τW (u˜) = M τW (u˜ ) with τW(u˜ ) being the ∈ q i=1 i i generalized Pauli W observable acting on the i-th qudit. In the third line, we used the fact that for every W W W W N fixed r, Es F τ (ejs) τ (ejs)= Es F τ (s) τ (s). Equation (277) is then equivalent to ∈ q ⊗ ∈ q ⊗ ϑ H ϑ 1 δ /2 , h | W | i≥ − S for W X, Z . Observe∈ { that}

X Z X Z tr(as ) tr(bs ) E τ (s)τ (s′) τ (s)τ (s′)= E ∑ ( 1) ′ a + s a ( 1) ′ b + s b s,s′ ⊗ s,s′ a,b F − | ih |⊗ − | ih | ∈ q = E ∑ a + s a a + s a s a F | ih | ⊗ | ih | ∈ q = EPR EPR . | qih q| M X Z X Z ⊗ M This implies that HX HZ = Es,s F τ (s)τ (s′) τ (s)τ (s′) = EPRq EPRq ⊗ . By the ′∈ q ⊗ | ih | triangle inequality, we have  

H ϑ H ϑ ϑ H ϑ + ϑ H ϑ . k X| i− Z| ik ≤ k| i− X| ik k| i− Z| ik Note that for W X, Z , ∈ { } ϑ H ϑ 2 = 1 2 ϑ H ϑ + ϑ H2 ϑ 2 2 ϑ H ϑ δ , k| i− W| ik − h | W | i h | W| i≤ − h | W | i≤ S where in the inequality we used that H I. Furthermore, W ≤ ϑ H2 ϑ = H ϑ 2 h | W| i k W | ik = ϑ +(H I) ϑ 2 k| i W − | ik ( ϑ (H I) ϑ )2 ≥ k| ik−k W − | ik (1 δ )2 ≥ − S 1 2δ . ≥ − S Therefore

2 δ H ϑ H ϑ 2 S ≥ k X| i− Z| ik 2 2 p = ϑ H ϑ + ϑ H ϑ 2 ϑ HX HZ ϑ h | X| i h | Z| i− h | | i 2 4δ 2 ϑ H H ϑ , ≥ − S − h | X Z| i which implies that

M ϑ (I ( EPR EPR ⊗ ) ) ϑ 1 2δ δ . h | AA′ BB′ ⊗ | qih q| A′′B′′ | i≥ − S − S p Define the unnormalized state AUX on the registers AA′BB as | 0i ′ M AUX =( EPR ⊗ ) ϑ | 0iAA′BB′ h q| A′′B′′ · | iAA′A′′BB′ B′′

198 and define the normalized state AUX 1 AUX . The inner product between and AUX = AUX 0 ϑ | i k | 0ik | i | i | i⊗ EPR M can thus be evaluated as | qi⊗ M 1 M ϑ ( AUX EPRq ⊗ )= ϑ ( AUX0 EPRq ⊗ ) h | | i ⊗ | i AUX ·h | | i ⊗ | i k | 0i k 1 M = ϑ (IAA BB ( EPRq EPRq ⊗ )A B ) ϑ AUX ·h | ′ ′ ⊗ | ih | ′′ ′′ | i k | 0i k 1 2 = AUX0 AUX · k | i k k | 0i k = AUX k | 0i k 1 2δ δ ≥ − S − S 1q O( δ )p. ≥ − S M 2 p This implies ϑ AUX EPRq ⊗ O(√δS), which establishes the first item of the lemma. We now establishk| i − | thei second ⊗ | itemi ofk the≤ lemma. In what follows, all and statements hold with ≃ ≈ respect to the state ψˆ . From Lemma A.7 we have the following implications: for all W X, Z and on | i ∈ { } average over u Fm, ∈ q (PAULI,W) (POINT,W),u M IBB B ε IAA A Ma , (278) [gh(u)=a] ⊗ ′ ′′ ≃ ′ ′′ ⊗ (POINT ) (POINT ) I M ,W ,u M ,W ,u I , (279) AA′ A′′ ⊗ a ≃ε a ⊗ BB′ B′′ PAULI PAULI where M( ,W) = ∑ M( ,W). From Lemma A.19 we get that [gh(u)=a] h:gh(u)=a h

(POINT ) ( ) M ,W ,u I I M˜ W,indm u . (280) a ⊗ BB′ B′′ ≃δS AA′A′′ ⊗ a Using Fact 5.23 with Equations (278), (279), and (280), we get

(PAULI,W) ˜ W,indm(u) M IBB B √δ IAA A Ma , (281) [gh(u)=a] ⊗ ′ ′′ ≃ S ′ ′′ ⊗ where we used that δ ε. Thus we have for all u Fm S ≥ ∈ q

W,indm(u) † W W VB M˜ a VB = IBB (τ (indm(u)))B = IBB (τ )B , (282) ′ ⊗ a ′′ ′ ⊗ [gh(u)=a] ′′ where we used the fact that for all h F , the inner product h ind (u) is equal to g (u), the evaluation ∈ q · m h of the low-degree encoding of h at the point u. (PAULI,W) Let ∆ AUX EPR M. We show that † W (the = q ⊗ VA Mh VA δQLD IAA′ (τh )A′′ | i | i ⊗ | i AA′ A′′ ≈ ⊗ argument for the operators acting on registers BB′B′′ proceeds identically).

† (PAULI,W) † W (PAULI,W) † W ∑ ∆ VA M VA (τh )A VA M VA (τh )A ∆ h | [gh(u)=a] − ′′ [gh(u)=a] − ′′ | i h     ∆ (PAULI,W) † W ∆ = 2 2 ∑ VA Mh VA (τh )A′′ − ℜ h h | | i ∆ (PAULI,W) † W ∆ (283) = 2 2 ∑ VA Mh VA (τh )B′′ − h h | ⊗ | i

199 where in the last line we used that (τW ) EPR M =(τW) EPR M. We now claim that h A′′ | qi⊗ h B′′ | qi⊗ ∆ (PAULI,W) † W ∆ ∑ VA Mh VA (τh )B′′ h h | ⊗ | i E (PAULI,W) † W md/q ∑ ∆ VA M VA (τh )B ∆ (284) ≈ u h | h ⊗ ′ ′′ | i h,h′: gh (u)=g (u) h′ (PAULI,W) † W = E ∑ ∆ VA M VA (τ )B ∆ [gh(u)=a] [gh(u)=a] ′′ u a F h | ⊗ | i ∈ q (PAULI,W) † W,indm(u) † = E ∑ ∆ VA M VA VB M˜ a VB ∆ (285) [gh(u)=a] u a F h | ⊗ | i ∈ q where the equality in Equation (285) follows from the identity in Equation (282), and the approximation in Equation (284) follows from the following calculation:

E (PAULI,W) † W ∑ ∆ VA M VA (τh )B ∆ u h ′ ′′ h=h : h | ⊗ | i 6 ′ gh(u)=g (u) h′ E 1 (PAULI,W) † W = ∑ [gh(u)= gh (u)] ∆ VA M VA (τh )B ∆ u ′ h ′ ′′ h=h h | ⊗ | i 6 ′   md . ≤ q

In the last line we used that for for distinct h = h , the polynomials g = g and thus by the Schwartz- 6 ′ h 6 h′ Zippel lemma can only agree on at most md/q fraction of points u Fm. ∈ q Continuing from Equation (285), we use the first item of the Lemma which shows that ∆ ϑ 2 k| i − | ik ≤ O(√δ ) and equivalently V† V† ∆ ψˆ 2 O(√δ ), so S k A ⊗ B| i − | ik ≤ S

(PAULI,W) W,indm(u) Eq. (285) 1/4 E ∑ ψˆ M M˜ a ψˆ O(δS ) [gh(u)=a] ≈ u a F h | ⊗ | i ∈ q 1 O(δ1/2) , ≥ − S where the last line comes from (281). Together with Equation (283),

Eq. (283) O(δ1/4 + md/q) . ≤ S This concludes the proof of the second item of the lemma.

We now prove Theorem A.1.

S GPAULI Proof of Theorem A.1. Let = (ψ, M) be a projective strategy for qldparams that succeeds with prob- ability at least 1 ε, where ψ is a bipartite state on registers AB. As mentioned at the beginning of − | i Appendix A.2, we assume that the state ψ is padded with sufficiently many ancilla 0 qubits. Let φA be | i | i the following isometry mapping the register A to the registers AA′A′′ where A′ and A′′ are isomorphic to q M (C )⊗ : M φA : θ A VA( θ A ( EPR ⊗ ) ) | i 7→ | i ⊗ | qi A′ A′′

200 where VA is the unitary acting on AA′A′′ from Lemma A.21. Define the isometry φB analogously. Lemma A.21 then implies that there exists a state AUX on registers AA′BB such that | i ′ 2 M φA φB ψ AUX EPR ⊗ δQLD ⊗ | i − | i ⊗ | qi ≤ and for W X, Z , ∈ { } (PAULI,W) † W (PAULI,W) † W φA (M )A φ (τ ) and φB (M )B φ (τ ) h A ≈δQLD h A′′ h B ≈δQLD h B′′ M GPAULI where the statement holds with respect to the state AUX EPRq ⊗ . This shows that qldparams is ≈ PAULI | i ⊗ | i a self-test for the strategy S with robustness δQLD(ε, m, d, q). We conclude the proof by making the identification of registers AA′ with A′ and BB′ with B′ to be consistent with the statement of Theorem A.1. We also note that, when unpacking the dependence on parameters ε, m, d, q, the function δQLD(ε, m, d, q)= 1/4 a b b bmd > < < O(δS + md/q) has the form a(md) (ε + q− + 2− ) for some universal constants a 1, 0 b 1, as desired.

201 References

[Ara02] PK Aravind. A simple demonstration of Bell’s theorem involving two observers and no prob- abilities or inequalities. arXiv preprint quant-ph/0206070, 2002. 7.2, 7.2

[AS98] Sanjeev Arora and Shmuel Safra. Probabilistic checking of proofs: A new characterization of NP. Journal of the ACM (JACM), 45(1):70–122, 1998. 16

[BFL91] L´aszl´oBabai, Lance Fortnow, and Carsten Lund. Non-deterministic exponential time has two-prover interactive protocols. Computational complexity, 1(1):3–40, 1991. 1.1, 1.1, 1.2, 16, 7.1.1

[BSGH+06] Eli Ben-Sasson, Oded Goldreich, Prahladh Harsha, Madhu Sudan, and Salil Vadhan. Robust PCPs of proximity, shorter PCPs, and applications to coding. SIAM Journal on Computing, 36(4):889–974, 2006. 2

[BSS05] Eli Ben-Sasson and Madhu Sudan. Simple PCPs with poly-log rate and query complexity. In Proceedings of the thirty-seventh annual ACM symposium on Theory of computing, pages 266–275. ACM, 2005. 2

[BSS08] Eli Ben-Sasson and Madhu Sudan. Short PCPs with polylog query complexity. SIAM Journal on Computing, 38(2):551–607, 2008. 10.4.1

[BVY17] Mohammad Bavarian, Thomas Vidick, and Henry Yuen. Hardness amplification for entangled games via anchoring. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pages 303–316. ACM, 2017. 2, 11, 11.2, 11.2

[CHSH69] John F Clauser, Michael A Horne, Abner Shimony, and Richard A Holt. Proposed experiment to test local hidden-variable theories. Physical review letters, 23(15):880, 1969. 1.2

[CHTW04] Richard Cleve, Peter Hoyer, Benjamin Toner, and John Watrous. Consequences and limits of nonlocal strategies. In Proceedings. 19th IEEE Annual Conference on Computational Com- plexity, 2004., pages 236–249. IEEE, 2004. 1.2

[CLP15] Valerio Capraro, Martino Lupini, and Vladimir Pestov. Introduction to sofic and hyperlinear groups and Connes’ embedding conjecture, volume 2136. Springer, 2015. 1.4

[CM14] Richard Cleve and Rajat Mittal. Characterization of binary constraint system games. In Inter- national Colloquium on Automata, Languages, and Programming, pages 320–331. Springer, 2014. 7.2

[Col19] Andrea Coladangelo. A two-player dimension witness based on embezzlement, and an elementary proof of the non-closure of the set of quantum correlations. arXiv preprint arXiv:1904.02350, 2019. 1.3

[Con76] Alain Connes. Classification of injective factors cases II1, II∞, IIIλ, λ = 1. Annals of Mathematics, pages 73–115, 1976. 1, 1.3, 1.3 6

[CS18] Andrea Coladangelo and Jalex Stark. Unconditional separation of finite and infinite- dimensional quantum correlations. arXiv preprint arXiv:1804.05116, 2018. 1.2, 1.3

202 [DLTW08] Andrew C Doherty, Yeong-Cherng Liang, Ben Toner, and Stephanie Wehner. The quantum moment problem and bounds on entangled multi-prover games. In 2008 23rd Annual IEEE Conference on Computational Complexity, pages 199–210. IEEE, 2008. 1.3, 1.4, 12.3

[DPP19] Ken Dykema, Vern I Paulsen, and Jitendra Prakash. Non-closure of the set of quantum corre- lations via graphs. Communications in Mathematical Physics, 365(3):1125–1142, 2019. 1.3

[DSV15] Irit Dinur, David Steurer, and Thomas Vidick. A parallel repetition theorem for entangled projection games. Computational Complexity, 24(2):201–254, 2015. 11, 33

[Eke91] Artur K Ekert. Quantum cryptography based on bells theorem. Physical review letters, 67(6):661, 1991. 1.2

[FJVY19] Joseph Fitzsimons, Zhengfeng Ji, Thomas Vidick, and Henry Yuen. Quantum proof systems for iterated exponential time, and beyond. In Proceedings of the 51st Annual ACM SIGACT Symposium on Theory of Computing, pages 473–480. ACM, 2019. 1.2, 1.2, 1.3, 1.3

[FL92] Uriel Feige and L´aszl´oLov´asz. Two-prover one-round proof systems: Their power and their problems. In Proceedings of the twenty-fourth annual ACM symposium on Theory of comput- ing, pages 733–744. ACM, 1992. 1.1

[FNT14] Tobias Fritz, Tim Netzer, and Andreas Thom. Can you compute the operator norm? Proceed- ings of the American Mathematical Society, 142(12):4265–4276, 2014. 1.3, 10, 1.4

[Fri12] Tobias Fritz. Tsirelson’s problem and Kirchberg’s conjecture. Reviews in Mathematical Physics, 24(05):1250012, 2012. 1, 1.3, 1.4

[GH16] Isaac Goldbring and Bradd Hart. Computability and the Connes embedding problem. Bulletin of Symbolic Logic, 22(2):238–248, 2016. 1.3

[GH20] Isaac Goldbring and Bradd Hart. The universal theory of the hyperfinite II1 factor is not computable. arXiv preprint arXiv:2006.05629, 2020. 1.3

[Har04] Prahladh Harsha. Robust PCPs of proximity and shorter PCPs. PhD thesis, Massachusetts Institute of Technology, 2004. 10.4

[HS65] Juris Hartmanis and Richard E Stearns. On the computational complexity of algorithms. Transactions of the American Mathematical Society, 117:285–306, 1965. 3.1, 3.1

[IKM09] Tsuyoshi Ito, Hirotada Kobayashi, and Keiji Matsumoto. Oracularization and two-prover one- round interactive proofs against nonlocal strategies. In 2009 24th Annual IEEE Conference on Computational Complexity, pages 217–228. IEEE, 2009. 1.2

[IV12] Tsuyoshi Ito and Thomas Vidick. A multi-prover interactive proof for NEXP sound against en- tangled provers. In 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science, pages 243–252. IEEE, 2012. 1.2

[Ji16] Zhengfeng Ji. Classical verification of quantum proofs. In Proceedings of the forty-eighth annual ACM symposium on Theory of Computing, pages 885–898. ACM, 2016. 1.2

203 [Ji17] Zhengfeng Ji. Compression of quantum multi-prover interactive proofs. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pages 289–302. ACM, 2017. 1.2, 1.2, 1.3

[JNP+11] Marius Junge, Miguel Navascues, Carlos Palazuelos, David Perez-Garcia, Volkher B Scholz, and Reinhard F Werner. Connes’ embedding problem and Tsirelson’s problem. Journal of Mathematical Physics, 52(1):012102, 2011. 1, 1.3, 1.4

[JNV+20] Zhengfeng Ji, Anand Natarajan, Thomas Vidick, John Wright, and Henry Yuen. Quantum soundness of the classical low individual degree test. arXiv preprint arXiv:2009.12982, 2020. 1.2, 5.23, 7, 7.1.1, A.1

[JPY14] Rahul Jain, Attila Pereszl´enyi, and Penghui Yao. A parallel repetition theorem for entangled two-player one-round games under product distributions. In 2014 IEEE 29th Conference on Computational Complexity (CCC), pages 209–216. IEEE, 2014. 11

[KKM+11] Julia Kempe, Hirotada Kobayashi, Keiji Matsumoto, Ben Toner, and Thomas Vidick. En- tangled games are hard to approximate. SIAM Journal on Computing, 40(3):848–877, 2011. 1.2

[Kle54] Stephen C. Kleene. Introduction to Metamathematics. Journal of Symbolic Logic, 19(3):215– 216, 1954. 15, 35

[KPS18] Se-Jin Kim, Vern Paulsen, and Christopher Schafhauser. A synchronous game for binary constraint systems. Journal of Mathematical Physics, 59(3):032201, 2018. 1.2, 1.4, 5.12

[KV11] Julia Kempe and Thomas Vidick. Parallel repetition of entangled games. In Proceedings of the 43rd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2011, pages 353–362, 2011. A.1

[LFKN90] Carsten Lund, Lance Fortnow, Howard Karloff, and Noam Nisan. Algebraic methods for inter- active proof systems. In Proceedings of 31st Annual Symposium on Foundations of Computer Science, pages 2–10. IEEE, 1990. 1.1

[LJ91] H.W. Lenstra Jr. Finding isomorphisms between finite fields. Mathematics of Computation, 56(193):329–347, 1991. 3.3.2

[Mer90] David Mermin. Simple unified form for the major no-hidden-variables theorems. Physical Review Letters, 65(27):3373, 1990. 7.2

[MNY20] Hamoon Mousavi, Seyed Sajjad Nezhadi, and Henry Yuen. On the complexity of zero gap MIP∗. arXiv preprint arXiv:2002.10490, 2020. 1.4 [MP13] Gary L Mullen and Daniel Panario. Handbook of finite fields. Chapman and Hall/CRC, 2013. 3.3

[MR18] Magdalena Musat and Mikael Rørdam. Non-closure of quantum correlation matrices and fac- torizable channels that require infinite dimensional ancilla. arXiv preprint arXiv:1806.10242, 2018. 1.3

204 [NPA08] Miguel Navascu´es, Stefano Pironio, and Antonio Ac´ın. A convergent hierarchy of semidef- inite programs characterizing the set of quantum correlations. New Journal of Physics, 10(7):073013, 2008. 1, 1, 1.3, 1.4, 12.3

[NV17] Anand Natarajan and Thomas Vidick. A quantum linearity test for robustly verifying entangle- ment. In Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2017, pages 1003–1015, New York, NY, USA, 2017. ACM. A.4

[NV18a] Anand Natarajan and Thomas Vidick. Low-degree testing for quantum states, and a quantum entangled games PCP for QMA. In 2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS), pages 731–742. IEEE, 2018. 1.2, 1.2, 2, 7, 7.1.1, 7.3

[NV18b] Anand Natarajan and Thomas Vidick. Two-player entangled games are NP-hard. In Proceed- ings of the 33rd Computational Complexity Conference, page 20. Schloss Dagstuhl–Leibniz- Zentrum fuer Informatik, 2018. 1.2, 2, 16

[NW19] Anand Natarajan and John Wright. NEEXP MIP∗. arXiv preprint arXiv:1904.05870v3, 2019. 1.2, 1.2, 2, 2, 2, 2, 5.1, 5.2, 5.18, 5.19, ⊆5.22, 5.24, 5.2, 5.2, 5.27, 7.1.1, 7.3, 7.3.1, 8.1, 8.4, 8.4.1, 8.4.1, 8.4.3, 8.4.3, 10, 1, 10.4, 2, 3, 32, A.2

[Oza13] Narutaka Ozawa. About the Connes embedding conjecture. Japanese Journal of Mathematics, 8(1):147–183, 2013. 1.3, 1.4

[Pap94] Christos Papadimitriou. Computational complexity. Addison Wesley, 1994. 3.1, 10.2

[Per90] Asher Peres. Incompatible results of quantum measurements. Physics Letters A, 151(3- 4):107–108, 1990. 7.2

[PSS+16] Vern I Paulsen, Simone Severini, Daniel Stahlke, Ivan G Todorov, and Andreas Winter. Es- timating quantum chromatic numbers. Journal of Functional Analysis, 270(6):2188–2222, 2016. 1.2, 5.12

[PV16] Carlos Palazuelos and Thomas Vidick. Survey on nonlocal games and operator space theory. Journal of Mathematical Physics, 57(1):015220, 2016. 7

[Rog87] Hartley Rogers, Jr. Theory of Recursive Functions and Effective Computability. MIT Press, Cambridge, MA, USA, 1987. 15

[RS96] Ronitt Rubinfeld and Madhu Sudan. Robust characterizations of polynomials with applica- tions to program testing. SIAM Journal on Computing, 25(2):252–271, 1996. 16

[RS97] Ran Raz and Shmuel Safra. A sub-constant error-probability low-degree test, and a sub- constant error-probability PCP characterization of NP. In Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, pages 475–484, 1997. 16

[Sch80] Jacob Schwartz. Fast probabilistic algorithms for verification of polynomial identities. Journal of the ACM, 27(4):701–717, 1980. 3.22

[Sha90] Adi Shamir. IP = PSPACE. In Proceedings of 31st Annual Symposium on Foundations of Computer Science, pages 11–15. IEEE, 1990. 1.1

205 [Sho90] Victor Shoup. New algorithms for finding irreducible polynomials over finite fields. Mathe- matics of Computation, 54(189):435–447, 1990. 3.3.2

[Slo19a] William Slofstra. The set of quantum correlations is not closed. In Forum of Mathematics, Pi, volume 7. Cambridge University Press, 2019. 1, 1.3, 1.4

[Slo19b] William Slofstra. Tsirelsons problem and an embedding theorem for groups arising from non- local games. Journal of the American Mathematical Society, 2019. 1.3

[Tsi93] Boris S Tsirelson. Some results and problems on quantum bell-type inequalities. Hadronic Journal Supplement, 8(4):329–345, 1993. 1.3

[Tsi06] Boris S Tsirelson. Bell inequalities and operator algebras, 2006. Problem state- ment from website of open problems at TU Braunschweig (2006), available at http://web.archive.org/web/20090414083019/http://www.imaph.tu-bs.de/qi/problems/33.html. 1, 1.3

[Tur37] Alan M. Turing. On computable numbers, with an application to the Entscheidungsproblem. Proceedings of the London mathematical society, 2(1):230–265, 1937. 3.1

[Vid16] Thomas Vidick. Three-player entangled xor games are -hard to approximate. SIAM Journal on Computing, 45(3):1007–1063, 2016. 1.2, 7.1.1

[VW16] Thomas Vidick and John Watrous. Quantum proofs. Foundations and Trends in Theoretical Computer Science, 11(1-2):1–215, 2016. 12.2

[Wan89] Charles C. Wang. An algorithm to design finite field multipliers using a self-dual normal basis. IEEE Transactions on Computers, 38(10):1457–1460, 1989. 3.3.2

[WBMS16] Xingyao Wu, Jean-Daniel Bancal, Matthew McKague, and Valerio Scarani. Device- independent parallel self-testing of two singlets. Physical Review A, 93(6):062121, 2016. 7.2

[Yue16] Henry Yuen. A parallel repetition theorem for all entangled games. In 43rd International Colloquium on Automata, Languages, and Programming (ICALP 2016). Schloss Dagstuhl- Leibniz-Zentrum fuer Informatik, 2016. 11

[Zip79] Richard Zippel. Probabilistic algorithms for sparse polynomials. In Symbolic and Algebraic Computation, pages 216–226, 1979. 3.22

206