Reconstruction of Generalized Depth-3 Arithmetic Circuits with Bounded Top Fan-in

Zohar S. Karnin∗ Amir Shpilka∗

Abstract In this paper we give reconstruction algorithms for depth-3 arithmetic circuits with k multi- plication gates (also known as ΣΠΣ(k) circuits), where k = O(1). Namely, we give an algorithm that when given a black box holding a ΣΠΣ(k) circuit C over a field F as input, makes queries to the black box (possibly over a polynomial sized extension field of F) and outputs a circuit C0 computing the same polynomial as C. In particular we obtain the following results. 1. When C is a multilinear ΣΠΣ(k) circuit (i.e. each of its multiplication gates computes a multilinear polynomial) then our algorithm runs in polynomial time (when k is a constant) and outputs a multilinear ΣΠΣ(k) circuits computing the same polynomial. 2. In the general case, our algorithm runs in quasi polynomial time and outputs a generalized depth-3 circuit (a notion that is defined in the paper) with k multiplication gates. In fact, this algorithm works in the slightly more general case where the black box holds a generalized depth-3 circuits. Prior to this work there were reconstruction algorithms for several different models of bounded depth circuits: the well studied class of depth-2 arithmetic circuits (that compute sparse polyno- mials) and its close by model of depth-3 set-multilinear circuits. For the class of depth-3 circuits only the case of k = 2 (i.e. ΣΠΣ(2) circuits) was known. Our proof technique combines ideas from [Shp07] and [KS08] with some new ideas. Our most notable new ideas are: We prove the existence of a unique canonical representation of depth- 3 circuits. This enables us to work with a specific representation in mind. Another technical contribution is an isolation lemma for depth-3 circuits that enables us to reconstruct a single multiplication gate of the circuit.

∗Faculty of Computer Science, Technion, Haifa 32000, Israel. Email: {zkarnin,shpilka}@cs.technion.ac.il. Re- search supported by the Israel Science Foundation (grant number 439/06). Contents

1 Introduction 1 1.1 Depth-3 circuits ...... 1 1.2 Statement of our results ...... 2 1.3 Related works ...... 2 1.4 Our Techniques ...... 3 1.5 Organization ...... 4

2 Preliminaries 5 2.1 Generalized Depth 3 Arithmetic Circuits ...... 5 2.2 Preserving Subspaces ...... 7 2.2.1 Rank Preserving Subspaces that Preserve Multilinearity ...... 8 2.3 A “Distance Function” for ΣΠΣ Circuits ...... 10

3 Canonical κ-Distant Circuits 12 3.1 Proof of Existence ...... 12 3.2 Uniqueness ...... 14

4 Reconstructing ΣΠΣ(k, d, ρ) Circuits 15 4.1 Finding a κ-Distant Representation in a Low Dimension Subspace ...... 17 4.1.1 Step 1: Finding the set of subspaces ...... 17 4.1.2 Steps 2 & 3: Gluing the Restrictions Together ...... 23 4.1.3 The Algorithm for finding a κ-distant circuit ...... 27 4.2 Gluing Together the Restrictions of a Low Rank Circuit ...... 28 4.3 The Reconstruction Algorithm ...... 31

5 Reconstructing Multilinear ΣΠΣ circuits 33 5.1 Finding a Partition of the Circuit in a Low Dimension Subspace ...... 34 5.2 Lifting a Low Rank Multilinear ΣΠΣ(k) Circuit ...... 37 5.3 The Algorithm ...... 42

A Toolbox 45 A.1 Finding Linear Factors ...... 45 A.2 Brute Force Interpolation ...... 46 A.3 Reconstructing Linear Functions ...... 47 A.4 A Deterministic PIT Algorithm for Depth-3 Circuits ...... 47

B Proof of Lemma 4.9 48 B.1 proof of Lemma B.1 ...... 53 B.1.1 Notations ...... 54 B.1.2 The Proof ...... 55 List of Algorithms

1 Canonical partition of a circuit ...... 13 2 Reconstructing a circuit given a depth-1 m-linear function ...... 24 3 Reconstructing a circuit given an m-linear function tree ...... 26 4 Finding the ξd(k)-distant circuit of a polynomial ...... 27 5 Gluing together low dimension restrictions of a low rank circuit ...... 29 6 Learning a ΣΠΣ(k, d, ρ)circuit...... 31 7 Lifting the g.c.d of a low ∆ measured circuit ...... 38 n 8 Lifting a low rank multilinear circuit to F ...... 39 n 9 Lifting a multilinear circuit to F ...... 41 10 Reconstruction of a multilinear ΣΠΣ(k) circuit ...... 42 11 Brute force interpolation ...... 47 1 Introduction

In this work we consider the problem of reconstruction an arithmetic circuit for which we have oracle access: We get a black box holding an arithmetic circuit C belonging to some predetermined class of circuits, and we wish to construct a circuit C0 that computes the same polynomial as C. Our only access to the polynomial computed by C is via black-box queries. That is, we are allowed to pick inputs (adaptively) and query the black box for the value of C on those inputs. The problem of reconstructing an arithmetic circuit using queries is an important question in algebraic complexity that received a lot of attention. As the problem is notoriously difficult, research focused on restricted models such as the model of depth-2 arithmetic circuits (circuits computing sparse polynomials) that was studied over various fields (see e.g. [BOT88, KS01] and the references within), the model of read-once arithmetic formulas [HH91, BHH95, BB98, SV08] and the model of set-multilinear depth-3 circuits [BBB+00, KS06a]. The focus of this work is on the class of depth-3 arithmetic circuits that have a bounded number of multiplication gates, over some finite field F. We give two different algorithms for this model. The first is a polynomial time reconstruction algorithm for the case that the circuits is a multilinear circuit. In the general case, without the multilinearity assumption, we give a quasi-polynomial time reconstruction algorithm. These results extend and generalize a previous work by the second author that gave reconstruction algorithms for depth-3 circuit with only 2 multiplication gates [Shp07].

1.1 Depth-3 circuits In this paper we study the model of depth-3 arithmetic circuits. We only consider circuits with a plus gate at the top. Notice that for depth-3 circuits with a multiplication gate at the top, reconstruction is possible using known factoring algorithms [Kal85, KT90, Kal95]. A depth-3 circuit with a plus gate at the top is also known as a ΣΠΣ circuit and has the following structure:

k k d X X Yi C = Mi = Li,j(x1, . . . , xn), i=1 i=1 j=1

k where the Li,j-s are linear functions. The gates {Mi}i=1 are called the multiplication gates of the circuit and k is the fan-in of the top plus gate. We call a depth-3 circuit with a top fan-in k a ΣΠΣ(k) circuit. Although depth-3 circuits seems to be a very restricted model of computation, understanding it is one of the greatest challenges in arithmetic circuit complexity. It is the first model for which lower bounds are difficult to prove. Over finite fields exponential lower bounds were proved [GK98, GR00] but over fields of characteristic zero only quadratic lower bounds are known [SW01, Shp02]. Moreover, recently Agrawal and Vinay showed1 that exponential lower bounds for depth-4 circuits imply exponential lower bounds for general arithmetic circuits [AV08]. In addition they showed that a polynomial time black-box PIT algorithm for depth-4 circuits implies a (quasi-polynomial time) derandomization of PIT of general arithmetic circuits. As for depth-2 circuits most of the questions are quite easy we see that depth-3 circuits stand between the relatively easy (depth-2) and the very difficult general case (depth-4). Hence it is an important goal to get a good understanding of depth-3 circuits. 1This result also follows from the earlier works of [VSBR83, Raz08].

1 1.2 Statement of our results We give two reconstruction algorithms. The first for general ΣΠΣ(k) circuits and the second for multilinear ΣΠΣ(k) circuits. While in the case of multilinear depth-3 circuits our algorithm returns a multilinear ΣΠΣ(k) circuit, in the general case we return what we call a generalized depth-3 circuit. A polynomial f(¯x) that is computable by a generalized depth-3 circuit has the following form   k k di X X Y ˜ ˜  f(¯x) = Mi =  Li,j(¯x) · hi Li,1(¯x),..., Li,ρi (¯x) (1) i=1 i=1 j=1 ˜ where the Li,j’s and the Li,j’s are linear functions in the variablesx ¯ = (x1, . . . , xn), over F. Every hi ˜ ρi is a polynomial in ρi ≤ ρ variables, and the functions {Li,j}j=1 are linearly independent. We shall assume, w.l.o.g., that each hi depends on all its ρi variables. We call M1,...,Mk the multiplication Qdi gates of the circuit (Mi = j=1 Li,j · hi). The degree of the circuit, d = deg(C), is defined as the maximal degree of its multiplication gates (i.e. d = maxi=1...k{deg(Mi)}). A degree d generalized depth-3 circuit with k multiplication gates is called a ΣΠΣ(k, d, ρ) circuit. The size of a ΣΠΣ(k, d, ρ) circuit is defined as the sum of degrees of the multiplication gates of the circuit (thus, the size of Pk the circuit given by Equation (1) is i=1 deg(Mi) ≤ dk). We denote the size of a circuit C by size(C). When ρ = 0 (i.e. each hi is a constant function) we get the class of depth-3 circuits with k multiplication gates and degree d, also known as ΣΠΣ(k, d) circuits (similarly we have the class ΣΠΣ(k) circuits of depth-3 circuits with k multiplication gates). When k and d are arbitrary we get the class of depth-3 circuits that we denote with ΣΠΣ. Notice that the difference between generalized ΣΠΣ(k) circuits and ΣΠΣ(k) circuits is the non linear part hi of the multiplication gates. Also note that ΣΠΣ(k, d, ρ) circuits can be simulated by ρ depth-3 circuits with k · d multiplication gates. Thus, when the ρi-s are polylogarithmic in n (as in this paper) then a ΣΠΣ(k, d, ρ) circuit can be simulated by a depth-3 circuit with quasi-polynomially many gates. Our result for reconstructing general ΣΠΣ(k) circuits actually holds for generalized depth-3 circuits. Namely, our reconstruction algorithm, when given black-box access to a ΣΠΣ(k, d, ρ) circuit, returns a ΣΠΣ(k, d, ρ0) circuit computing the same polynomial. Therefore, instead of stating our result for ΣΠΣ(k) circuits we state them in their most general form.

Theorem 1. Let f be an n-variate polynomial computed by a ΣΠΣ(k, d, ρ), over a field F. Then there O(k3) O(k) is a reconstruction algorithm for f, that runs in time poly(n) · exp(log(|F|) · log(d) · ρ ), and outputs a ΣΠΣ(k, d, ρ0) circuit for f (ρ0 is equal to ρ up to an additive polylogarithmic factor). When 5 |F| = O(d ) the algorithm is allowed to make queries to f from a polynomial size algebraic extension field of F. Our second result deals with the case of multilinear ΣΠΣ(k) circuits. A multilinear circuit is a circuit in which every multiplication gate computes a multilinear polynomial. Theorem 2. Let f be a multilinear polynomial in n variables that is computed by a multilinear ΣΠΣ(k) 2 2O(k ) circuit, over a field F. Then there is a reconstruction algorithm for f that runs in time (n + |F|) 5 and outputs a multilinear ΣΠΣ(k) circuits computing f. When |F| < n The algorithm is allowed to make queries to f from a polynomial size algebraic extension field of F.

1.3 Related works We are aware of two reconstruction algorithms for restricted depth-3 circuits. The first by [BBB+00, KS06a] gives a reconstruction algorithm for set-multilinear depth-3 circuits. Although being depth-3

2 circuits, set-multilinear circuits are closer in nature to depth-2 circuits and in fact it is better to think of them as a generalization of depth-2 circuits. In particular the techniques used to reconstruct set multilinear circuits also work when the input is a depth-2 circuit but completely fail even for the case of ΣΠΣ(2) multilinear circuits (see [Shp07] for an explanation). The second related result is by the second author who gave a reconstruction algorithm for ΣΠΣ(2) circuits [Shp07]. Our work uses ideas from this earlier work together with some new techniques to get the more general result. Another line of results that is related to this paper is works on algorithms for polynomial identity testing (PIT for short) of depth-3 circuits. In [DS06] a quasi-polynomial time non black-box PIT algorithm was given to ΣΠΣ(k) circuits (for a constant k). This result was later improved by [KS07] that gave a polynomial time algorithm for ΣΠΣ(k) circuits (again for a constant k). In [KS08] we managed to generalize the results of [DS06] to the case of black-box algorithms. Unfortunately, this is the best black-box PIT algorithm for ΣΠΣ(k) circuits today. I.e., no polynomial time algorithm is known in the black-box case. We note that black-box algorithms for the polynomial identity problem are closely related to reconstruction algorithms as they give a set of inputs such that the values of the circuit on those inputs completely determine the circuit and thus it gives sufficient information for reconstructing the circuit. The main problem of course is coming up with an efficient algorithm that will use this information to reconstruct the circuit. Such algorithms are known for depth-2 circuits where several PIT algorithms were used for reconstruction (see e.g. [KS01] and the references within). We are unaware of any generic way of doing so (namely, moving from PIT to reconstruction). This is exactly the reason that the black-box PIT algorithms of [KS01] for ΣΠΣ(2) circuits and of [KS08] for ΣΠΣ(k) circuits do not immediately imply reconstruction algorithms. The problem of reconstructing arithmetic circuits is also closely related to the problem of learning arithmetic circuits. In the learning scenario the focus is on learning arithmetic circuits over small fields (usually over F2). Moreover, we are usually satisfied with outputting an approximation to the unknown arithmetic circuit. For example, it is not difficult to see that the learning algorithm of Kushilevitz and Mansour [KM93] can learn ΣΠΣ(k) circuits over constant sized fields, for k = O(log n), in polynomial time. However, the model that we are considering is different. We are interested in larger fields, in particular we allow queries from extension fields, and we want to output the exact same polynomial. This difference is prominent when considering interpolation (i.e. reconstruction) algorithms, see e.g. the discussion in [KS01]. Other related works include hardness results for learning arithmetic circuits. In [FK06] Fortnow and Klivans showed that polynomial time reconstruction of arithmetic circuits implies a lower bound for the class ZPEXPRP. However, for ΣΠΣ(k) circuit this does not give much as it is easy to think of polynomials that are not computed in this model. Another related result was given in [KS06b] where Klivans and Sherstov showed hardness result for PAC learning depth-3 arithmetic circuits. Namely, they show that there is no efficient PAC learning algorithm for depth-3 arithmetic circuits that works for every distribution. It is unclear however whether this result can be extended to show a hardness result for a membership-query algorithm over the uniform distribution (like the model under consideration here). Indeed, for the reconstruction problem of arithmetic circuits the uniform distribution is the most relevant model. Thus, although we believe that the problem of reconstructing general depth-3 circuits is difficult we do not have sufficiently strong hardness results to support that feeling. It is an interesting open question to prove (or disprove) it.

1.4 Our Techniques Our algorithms use the intuition behind the construction of the test-sets of [KS08] (that give a black- box PIT algorithm) although here we need to evaluate the circuit at a much larger set of points. The scheme of the algorithm is similar in nature to the algorithm of [Shp07], however it is much more

3 complicated and requires several new ideas that we now outline. Conceptually we have two main new ideas. The first is the notion of a canonical circuit for depth-3 circuits and the second is an isolation lemma. We now shortly discuss each of the ideas.

Canonical κ-distant circuits: An important ingredient of our proof is the definition of a canonical κ-distant circuit for a polynomial f computed by a ΣΠΣ(k, d, ρ) circuit. The canonical circuit is also a ΣΠΣ(k, d, ρ) circuit computing f, and it is defined uniquely (see Corollary 3.7). The advantage in this definition is that as we know that the canonical circuit is unique then if we manage to compute its restriction to some low dimensional subspace then we can “lift” each multiplication gate separately to the whole space and get the canonical circuit for f. In other words, if C = M1 + ... + Mk is the canonical circuit for f, then M1|V + ... + Mk|V is the canonical circuit for f|V , when V is a rank-preserving subspace (as defined in [KS08], see Section 2.2). An important step towards the defi- nition of a canonical circuit is the definition of a distance function between multiplication gates (see Section 2.3). We use the distance function in order to cluster multiplication gates together into one new (generalized) multiplication gate in a way that any two new multiplication gates are far from each other. This robust structure allows us to deal with each multiplication gate separately.

Isolation Lemma: The second new ingredient in our proof is a new isolation lemma that basically shows that for any canonical circuit C = M1 + ... + Mk there exists an index 1 ≤ m ≤ k and a set of subspaces V = {Ui} such that for j 6= m, Mj|Ui = 0, for each i. Moreover, it is possible to reconstruct

Mm from the set of circuits {Mm|Ui } (we discuss this in more details in section 4). The proof of the existence of such a structure is very technical and it relies on a generalization of a technical lemma of [DS06].

Finally we also require an idea that appeared in our previous work [KS08]. There we defined the notion of rank-preserving subspaces and used it to give a deterministic sub-exponential black- box PIT algorithm for ΣΠΣ(k, d, ρ) circuit. Here we show that rank-preserving subspaces can be used to derandomize the reconstruction algorithm of [Shp07]. In particular this makes our algorithm deterministic whereas the algorithm of [Shp07] was randomized. Given these tools we manage to follow the algorithmic scheme laid out in [Shp07] to reconstruct a ΣΠΣ(k, d, ρ) circuit for f. A sketch of the scheme is: First we restrict our inputs to a rank-preserving subspace V . Then, using the isolation lemma we reconstruct the multiplication gates of the canonical circuit of f|V . After that we further reconstruct f|Vi (for i ∈ [n]), where V is of co-dimension 1 inside n n each Vi and span (∪i=1Vi) = F . Then we use the uniqueness of the canonical circuit for f to glue the different circuits {f|Vi } to get a single circuit for f. A more detailed sketch can be found in Section 4.

1.5 Organization The paper is organized as follows. In Section 2 We give some definitions and discuss some easy properties of restrictions of linear functions to affine subspaces. We then describe the results of [DS06] regarding identically zero depth-3 circuits. We also discuss the results of [KS08] regarding a deterministic PIT algorithm for depth-3 circuits. Specifically we present the notion of rank-preserving subspaces as given in [KS08]. In Section 3 we define the notion of κ-distant ΣΠΣ(k, d, r) circuit and prove the existence and uniqueness of such a circuit. In Section 4 we give the proof of Theorem 1 and in Section 5 we give the proof of Theorem 2.

4 2 Preliminaries

For a positive integer k we denote [k] = {1, . . . , k}. A partition of a set S is a set of nonempty subsets of S such that every element of S is in exactly one of these subsets. Let F be a field. We denote with n F the n’th dimensional vector space over F. We shall use the notationx ¯ = (x1, . . . , xn) to denote the vector of n indeterminates. For two linear functions L1,L2 we write L1 ∼ L2 or alternatively say that L1 and L2 are equivalent whenever L1 and L2 are linearly dependent (that is, for some α, β ∈ F, αL1 + βL2 = 0). n n n Let V = V0 + v0 ⊆ F be an affine subspace, where v0 ∈ F and V0 ⊆ F is a linear subspace. Let L(¯x) be a linear function. We denote with L|V the restriction of L to V . Assume the dimension of V0 is t, then L|V can be viewed as a linear function of t variables in the following way: Let {vi}i∈[t] be a Pt basis for V0. For v ∈ V let v = i=1 xi · vi + v0 be its representation according to the basis. We get that t X 4 L(v) = xi · L(vi) + L(v0) = L|V (x1, . . . , xt), i=1 in order for L|V (x1, . . . , xt) to be well defined, {vi}i∈[t] should be chosen in some unique way. We give a definition for a “default” basis later. A linear function L will sometimes be viewed as a vector of Pn n + 1 entries. Namely, the function L(x1, . . . , xn) = i=1 αi · xi + α0 corresponds to the vector of coefficients (α0, α1, . . . , αn). Accordingly, we define the span of a set of linear functions of n variables n+1 as the span of the corresponding vectors (i.e. as a subspace of F ). For an affine subspace V of dimension t, the linear function L|V can be viewed as a vector of t + 1 entries. Thus, V defines a n+1 t+1 linear transformation from F to F . For example, let V = V0 + v0 be as above, and {vi}i∈[t] be a basis for V0. Let A be the t + 1 × n + 1 matrix whose i’th row, for 1 ≤ i ≤ t, is vi and its t + 1’th row is v0. Then A represents the required linear transformation. Let L1,...,Lm be a set of linear functions. We define the span of these linear functions along with 1 (i.e., the constant function) as span1(L1,...,Lm). For a subspace L of linear functions, we define a measure of its dimension “modolu 1” as the dimension of the subspace obtained by taking the homogenous part of its linear functions. We denote the “modolu 1” dimension measure by dim1(L). For convenience we say that L1,...,Lm are linearly independentH to indicate that their homogenous parts are linearly independent.

2.1 Generalized Depth 3 Arithmetic Circuits We first recall the usual definition of depth-3 circuits. A depth-3 circuit with k multiplication gates of degree d has the following form:

k k d X X Yi C = Mi = Li,j(x1, . . . , xn) (2) i=1 i=1 j=1 where each Li,j is a linear function in the input variables and d = maxi=1...k{deg(Mi)}. Recall that we defined a ΣΠΣ(k, d, ρ) circuit (see Equation 1) to be a circuit of the form   k k di X X Y ˜ ˜  C = Mi =  Li,j(¯x) · hi Li,1(¯x),..., Li,ρi (¯x) . (3) i=1 i=1 j=1 We thus see that in a generalized depth-3 circuit multiplication gates can have an additional term that is a polynomial that depends on at most ρ linear functions. For each Mi (as in Equation 3), we assume w.l.o.g. that hi has no linear factors. The following notions will be used throughout this paper.

5 Definition 2.1. Let C be a ΣΠΣ(k, d, ρ) arithmetic circuit that computes a polynomial as in Equa- tion (3).

Qdi 1. For every multiplication gate Mi we define Lin(Mi) = j=1 Li,j(¯x) (we use the notations of Equation 3). That is, Lin(Mi) is the product of all the linear factors of Mi (recall that hi has no linear factors). We call hi the non-linear term of Mi. P 2. For each A ⊆ [k], we define CA(¯x) to be a sub-circuit of C as follows: CA(¯x) = i∈A Mi(¯x). 3. Define gcd(C) as the product of all the non-constant linear functions that appear as factors in all the multiplication gates. I.e. gcd(C) = gcd(Lin(M1),..., Lin(Mk)). A circuit will be called simple if gcd(C) = 1.

4. The simplification of C, sim(C), is defined as sim(C) =∆ C/ gcd(C).

5. We define ∆  k ˜  Lin(C) = {Li,j}i∈[k],j∈[di] ∪ ∪i=1span1{Li,j}j∈[ρi] . ˜ Notice that we take every linear function in the span of each {Li,j}j∈[ρi] to be in Lin(C). 6. We define rank(C) as the dimension of the span of the linear functions in C. That is,    rank(C) = dim1 span1 {Li,j}i,j ∪ {L˜i,j}i,j = dim1(Lin(C)).

A word of clarification is needed regarding the definition of Lin(C) and rank(C). Notice that the definition seems to depend on the specific choice of linear functions L˜i,j. That is, it may be the case ˜ ˜ (and it is indeed the case) that every polynomial hi(Li,1,..., Li,ρi ) can be represented as a (different) polynomial in some other set of linear functions. However the following lemma from [Shp07] shows that the specific representation that we chose does not change the rank nor the set Lin(C). We say that a polynomial h)¯x) is a polynomial in exactly k linear functions if h can be written as a polynomial in k linear functions but not in k − 1 linear functions.

Lemma 2.2 (Lemma 20 in [Shp07]). Let h(¯x) be a polynomial in exactly k linear functions. Let

0 0 P (`1, . . . , `k) = h = Q(`1, . . . , `k)

0 be two different representations for h. Then span1({`i}i∈[k]) = span1({`i}i∈[k]). We use the notation C ≡ f to denote the fact that a ΣΠΣ circuit2 C computes the polynomial f. Notice that this is a syntactic definition, we are thinking of the circuit as computing a polynomial and not a function over the field. Let C be a ΣΠΣ(k) circuit. We say that C is minimal if there is no A ( [k] such that CA ≡ 0. The following theorem of [KS08] gives a bound on the rank of identically zero ΣΠΣ(k, d, ρ) circuits:

Theorem 2.3 (Lemma 4.2 of [KS08]). Let k ≥ 3, and C be a simple and minimal ΣΠΣ(k, d, ρ) circuit such that deg(C) ≥ 2 and C ≡ 0. Using the notations of equation 3, we have that

k O(k2) k−2 X O(k2) k−2 rank(C) < 2 log (d) + ρi ≤ 2 log (d) + kρ. i=1

2When speaking of ΣΠΣ circuits we refer to generalized depth-3 circuits.

6 For convenience, we define R(k, d, ρ) =∆ 2O(k2) logk−2(d) + kρ as the above bound on the rank from the theorem. It follows that R(k, d, ρ) is larger than the rank of any identically zero simple and minimal ΣΠΣ(k, d, ρ) circuit. We also define R(k, d) =∆ R(k, d, 0) as the upper bound for the rank of a simple and minimal identically zero ΣΠΣ(k, d) circuit. The following theorem gives a bound on the rank of multilinear ΣΠΣ(k) circuits that are identically zero.

O(k2) Theorem 2.4 (Corollary 6.9 of [DS06]). There exists an integer function RM (k) = 2 such that every multilinear ΣΠΣ(k) circuit C that is simple, minimal and equal to zero, satisfies that rank(C) < RM (k).

Specifically, RM (k) denotes the minimal integer larger than the rank of any identically zero simple and minimal multilinear ΣΠΣ(k) circuit. This theorem will be used in section 5, where we discuss multilinear circuits.

2.2 Rank Preserving Subspaces Throughout the paper we use subspaces of low dimension that preserve the circuit rank to some extent. Such subspaces were introduced in [KS08] for the purpose of deterministic black-box identity testing of polynomials that are computable by ΣΠΣ(k, d, ρ) circuits. We define these subspaces, state some of their useful properties and give the construction of [KS08]. Most of the lemmas of this section appear in [KS08] so their proof is omitted. Pk Given a ΣΠΣ(k, d, ρ) circuit C = i=1 Mi and a subspace V we define CV to be the circuit whose multiplication gates are {Mi|V }i∈[k]. Note that this is a syntactic definition, we do not make an attempt to find a “better” representation for C|V . Definition 2.5. Let C be a ΣΠΣ(k, d, ρ) circuit and V an affine subspace. We say that V is r-rank- preserving for C if the following properties hold:

1. For any two linear functions L1,L2 ∈ Lin(C) such that L1  L2, it holds that L1|V  L2|V .

2. ∀A ⊆ [k], rank(sim(CA)|V ) ≥ min{rank(sim(CA)), r}.

3 3. No multiplication gate M ∈ C vanishes on V . In other words M|V 6≡ 0 for every multiplication gate M ∈ C.

4. Lin(M)|V = Lin(M|V ) for every multiplication gate M in C (that is, the non-linear term of M has no new linear factors when restricted to V ). The following lemma lists some of the useful properties of rank-preserving subspaces. Lemma 2.6. (lemma 3.2 and specific cases of theorem 3.4 of [KS08]) Let C be a (generalized) depth-3 circuit and V be an r-rank-preserving affine subspace for C. Then we have the following:

1. For every ∅= 6 A ⊆ [k], V is r-rank preserving for CA. 2. V is r-rank-preserving for sim(C).

3. gcd(C)|V = gcd(C|V ).

4. sim(C)|V = sim(C|V ).

5. If C is a ΣΠΣ(k, d, ρ) circuit and r ≥ R(k, d, ρ) then C ≡ 0 if and only if C|V ≡ 0.

3At times we abuse notations and treat a circuit C as a set of multiplication gates.

7 6. If C is a ΣΠΣ(k) multilinear circuit, r ≥ RM (k) and C|V is a multilinear circuit then C ≡ 0 if and only if C|V ≡ 0. Now that we have seen their definition and some of their useful properties, we explain how to con- struct rank preserving subspaces. We show two construction methods for rank-preserving-subspaces. One method, used for ΣΠΣ(k, d, ρ) circuits, finds an r-rank preserving subspace. The second method, used for multilinear ΣΠΣ(k) circuits, finds an r-rank preserving subspace that preserves the multilin- earity of circuits. That is, for a multilinear ΣΠΣ(k) circuit C we find an r-rank preserving subspace V such that C|V is also a multilinear circuit. The following definition and lemma explain how to find a rank preserving subspace for general ΣΠΣ(k, d, ρ) circuits.

+ Definition 2.7. Let α ∈ F and r ∈ N . n • For 0 ≤ i ≤ r let vi,α ∈ F be the following vector

i+1 n(i+1) vi,α = (α , . . . , α ).

• Let Pα,r be the matrix whose j-th column (for 1 ≤ j ≤ r) is vj,α. Namely,

 α2 α3 . . . αr+1   α4 α6 . . . α2(r+1)  P = (v , . . . , v ) =   . α,r 1,α r,α  . .. .   . . .  α2n . . . αn(r+1)

n • Let V0,α,r be the linear subspace spanned by {vi,α}i∈[r]. Let Vα,r ⊆ F be the affine subspace Vα,r = V0,α,r + v0,α. In other words,

r Vα,r = {Pα,ry¯ + v0,α :y ¯ ∈ F } .

Lemma 2.8. (corollary 4.9 in [KS08]) Let C be a ΣΠΣ(k, d, ρ) circuit and r ≥ R(k, d, ρ). Let S ⊆ F dk k r+2 4 be a set of n 2 + 2 2 / different elements of the field . Then, for every ΣΠΣ(k, d, ρ) circuit C over F there are at least (1 − )|S| elements α ∈ S such that Vα,r is an r-rank-preserving subspace for C.

2.2.1 Rank Preserving Subspaces that Preserve Multilinearity As stated before, when dealing with multilinear circuits we require that the rank-preserving subspaces also preserve the multilinearity of the circuit. Actually, for multilinear circuits we need to slightly change Definition 2.5. We first explain why it is necessary to change the definition and then give the modified definition for the multilinear case. Let C be an n-input multilinear ΣΠΣ(k) circuit having Qn n a multiplication gate M = i=1 xi. Let V be a subspace of F of co-dimension r. Assume that C|V is a multilinear circuit. Then at least r linear functions in M have been restricted to constants and are thus linearly dependent. Whenever r > 1, this violates Property 1 of Definition 2.5, indicating that C does not have any low dimension rank preserving subspaces. We now give the definition of multilinear-rank-preserving-subspaces5:

4 Recall our assumption that if |F| is not large enough then we work over an algebraic extension field of F. 5These subspaces are used only for non-generalized depth-3 circuit. Thus, we only define them for such circuits.

8 Definition 2.9. Let C be a ΣΠΣ(k) circuit and V an affine subspace. We say that V is r-multilinear- rank-preserving for C if the following properties hold:

1. For any two linear functions L1  L2 ∈ Lin(C), we either have that L1|V  L2|V or that both L1|V ,L2|V are constant functions.

2. ∀A ⊆ [k], rank(sim(CA)|V ) ≥ min{rank(sim(CA)), r}.

3. No multiplication gate M ∈ C vanishes on V . In other words M|V 6≡ 0 for every multiplication gate M ∈ C.

4. The circuit C|V is a multilinear circuit. Despite the modification of Property 1, Lemma 2.6 applies also for r-multilinear-rank-preserving subspaces6. The following definition shows how to construct such a subspace. The two lemmas proceeding it prove the correction of the construction:

Definition 2.10. (Definition 5.3 of [KS08]) Let B ⊆ [n] be a non-empty subset of the coordinates and α ∈ F be a field element.

• Define VB as the following subspace:

VB = span{ei : i ∈ B}

n where ei ∈ {0, 1} is the vector that has a single nonzero entry in the i’th coordinate.

• Let v0,α be, as before, the vector

2 n v0,α = α, α , . . . , α .

∆ • Let VB,α = VB + v0,α. Lemma 2.11. (Theorem 5.4 of [KS08]) Let C be a ΣΠΣ(k) multilinear depth-3 circuit over the field k F and r ∈ N. There exists a subset B ⊆ [n] such that |B| = 2 · r and B has the following properties:

1. ∀A ⊆ [k], rank(sim(CA)|VB ) ≥ min{rank(sim(CA)), r}. n 2. For every u¯ ∈ F , C|VB +¯u is a multilinear ΣΠΣ(k) circuit. Lemma 2.12. (easy modification of Theorem 5.6 of [KS08]) Let C be a ΣΠΣ(k) multilinear n-variate circuit. Let B be the set guaranteed by lemma 2.11 for some integer r. Then there are less than n3k2 many α ∈ F such that VB,α is not r-multilinear-rank-preserving for C. We now state an additional requirement of r-multilinear-rank-preserving subspaces.

Definition 2.13. Let C be an n-variate ΣΠΣ(k, d) multilinear circuit over the field F. Let B ⊆ [n] and α ∈ F. We say that VB,α is a liftable r-multilinear-rank-preserving subspace for C if the following 0 hold: For each B ⊇ B, the subspace VB0,α is an r-multilinear-rank-preserving subspace for C.

6As a matter of fact, in the definition of rank preserving subspaces in [KS08], Property 1 was the same as in Defini- tion 2.9.

9 Clearly, the subspaces of Definition 2.10 might restrict linear functions from the circuit to constants. Hence, these subspaces are not always liftable. For example, take C(x, y, z) = (x + y + 1) + (x + 1)z, 3 V1 as the subspace of F where x = y = 0 and V2 as the subspace where y = 0. Clearly, V1 ⊆ V2, V1 is 1-rank preserving and V2 is not (as x + y + 1|V2 ∼ x + 1|V2 ). The following lemma shows how to construct an r-multilinear-rank-preserving liftable subspace.

Lemma 2.14. Let C be a ΣΠΣ(k) multilinear arithmetic circuit over a field F. Let r ∈ N and B be 4 2 the set guaranteed by lemma 2.11 for C and r. Then there are less than n k many α ∈ F such that 0 for some B ⊇ B, VB0,α is not r-multilinear-rank-preserving for C. Proof. Let B ⊆ B0. It can easily be seen that the only property that might be violated is Property 1 ˆ ˆ of definition 2.9. Assume that for each B ⊇ B of size |B| = |B| + 1, it holds that VB,αˆ is r-multilinear- rank-preserving for C. Let L ,L ∈ C such that L L and L | is not a constant function. 1 2 1  2 1 VB0,α Then there exist some B0 ⊇ Bˆ ⊇ B, where |Bˆ| = |B| + 1, such that L | is not a constant function. 1 VB,αˆ Hence, by our assumption, L | L | and thus L | L | . It follows that V 0 is 1 VB,αˆ  2 VB,αˆ 1 VB0,α  2 VB0,α B ,α r-multilinear-rank-preserving for C. 4 2 We now notice that by Lemma 2.12, there are at most n k many α ∈ F such that for some Bˆ ⊃ B ˆ of size |B| = |B| + 1, it holds that VB,αˆ is not r-multilinear-rank-preserving for C. This proves the lemma.

We conclude this section with the following corollary giving the method to find a liftable r- multilinear-rank-preserving subspace.

4 2 Corollary 2.15. Let r, n, k ∈ N. Let S ⊆ F be some set of size |S| > n k . Let C be a multilinear ΣΠΣ(k) circuit of n inputs. There exist some B ⊆ [n], such that |B| = 2k · r and α ∈ S such that VB,α is r-multilinear-rank-preserving and liftable for C.

2.3 A “Distance Function” for ΣΠΣ Circuits In this section we define a “distance function” for ΣΠΣ circuits. Then we discuss some of its properties. The function measures the dimension of the linear functions in the simplification of the sum of the circuits. Finally we prove that the weight of any two circuits7 computing the same polynomial is equal up to some additive constant.

Definition 2.16. Let C1,...,Ci be a collection of ΣΠΣ circuits (i ≥ 1). Define:

i ∆ X ∆(C1,...,Ci) = rank(sim( Cj)). j=1

Note that this is a syntactic definition as this sum might contain a multiplication gate M and the multiplication gate −M.

The following lemma explains why we refer to ∆ as a distance function.

Lemma 2.17. (triangle inequality) Let C1,C2,C3 be ΣΠΣ circuits. Then

∆(C1,C3) ≤ ∆(C1,C2,C3) ≤ ∆(C1,C2) + ∆(C2,C3).

7The weight of a circuit is its distance from 0.

10 Proof. The first inequality is trivial since Lin(sim(C1 + C3)) ⊆ Lin(sim(C1 + C2 + C3)). We now show the second inequality. Let L be a linear function in Lin(sim(C1 + C2 + C3)). It suffices to prove that either L ∈ Lin((sim(C1 + C2)) or L ∈ Lin(sim(C1 + C3)). We do this by simply checking all possibilities. L ∈ Lin(sim(C1)) ⇒ L ∈ Lin(sim(C1 + C2)). (1) L ∈ gcd(C1) ⇒ w.l.o.g. L∈ / gcd(C2) ⇒ L ∈ Lin(sim(C1 + C2)). (2) L/∈ Lin(C1) ⇒ w.l.o.g. L ∈ Lin(C2) ⇒ L ∈ Lin(sim(C1 + C2)).

Implication (1) follows from the fact that L/∈ gcd(C1)∩gcd(C2)∩gcd(C3). Implication (2) holds since L ∈ Lin(C1) ∪ Lin(C2) ∪ Lin(C3). We now prove that the weight of two minimal circuits computing the same polynomial is roughly the same. To do so we define a default circuit for a polynomial f. We then show that its weight is roughly the same as the weight of any other minimal circuit computing f.

Definition 2.18. Let U be a linear space of n-input linear functions. Define the default basis of U as the Gaussian elimination of some basis of linear functions8

Definition 2.19. Let f(¯x) be an n-variate polynomial of degree d. Define Lin(f) as the product of the 2 2 2 + linear factors of f (i.e. for f(x1, x2, x3) = x3(x1 + x2) (x1 + x2), Lin(f) = x3(x1 + x2) ). Let r ∈ N be such that f/Lin(f) is a polynomial of exactly r linear functions (as defined in Lemma 2.2). Let h be an r-variate polynomial and let L˜1,..., L˜r be r linear functions such that

• f/Lin(f) = h(L˜1,..., L˜r).

• L˜1,..., L˜r are the default basis of the linear space they span.

Notice that the existence of h and L˜1,..., L˜r is guaranteed by the definition of r. Define Cf , the default circuit of f, as the following ΣΠΣ(1, d, r) circuit:

∆ Cf = Lin(f) · h(L˜1,..., L˜r).

In Appendix A.2 we give a brute force algorithm that given a black-box access to a polynomial, constructs its default circuit. The next lemma implies that for a polynomial f, the ∆ measure of the circuit Cf is the lowest among all ΣΠΣ circuits computing f. It also shows that the ∆ weight of any circuit computing f is close to the ∆ measure of Cf , thus showing that the ∆ measure of any two circuits computing the same polynomial is close.

Lemma 2.20. Let C be a minimal ΣΠΣ(k, d, ρ) circuit computing the polynomial f. Then

∆(Cf ) ≤ ∆(C) < ∆(Cf ) + R(k + 1, d) + k · ρ.

Proof. Using the notations of Definition 2.19, it is not hard to see that f/Lin(f) is a factor of sim(C) and that sim(Cf ) ≡ f/Lin(f). Clearly, sim(C) is a polynomial of at most ∆(C) = rank(sim(C)) linear functions. Since, f/Lin(f) is a polynomial in exactly ∆(Cf ) linear functions we get that

∆(Cf ) = rank(sim(Cf )) ≤ rank(sim(C)) = ∆(C).

8The Gaussian elimination of a set of linear functions is done by performing a Gaussian elimination on the matrix whose rows are the coefficients of the linear functions.

11 We proceed to the second inequality. If C ≡ 0 (i.e., f = 0), then since it is also minimal, we get that

∆(C) < R(k, d, ρ) < R(k + 1, d) + k · ρ.

Assume that C does not compute the zero polynomial. Consider the ΣΠΣ(k + 1, d, max{ρ, ∆(Cf )}) circuit C −Cf . As no subcircuit of C (nor C itself) compute the zero polynomial, we have that C −Cf is minimal. Since the circuit clearly computes the zero polynomial, Theorem 2.3 implies that

∆(C) ≤ ∆(C − Cf ) < R(k + 1, d) + k · ρ + ∆(Cf ).

Given a ΣΠΣ(k, d, ρ) circuit C we define its canonical representation in the following way. Let Pk C = i=1 Mi, be the representation of C as sum of multiplication gates. Let fi be the polynomial computed by Mi. Then the canonical representation of C is

k X C = Cfi . (4) i=1

Pk Note that the only difference from the description C = i=1 Mi, is the basis with respect to which we represent sim(Mi)

3 Canonical κ-Distant Circuits

In this section we define the notion of a κ-distant circuit. We prove the existence of such a circuit C0 computing f (Theorem 3.2) and prove its uniqueness (Theorem 3.6). We then show that for a subspace 0 0 V that is rank preserving for C , the restriction C |V is the unique κ-distant circuit computing f|V (Corollary 3.7).

Definition 3.1. Let C be a ΣΠΣ(s, d, r) circuit computing a polynomial f (in particular, assume that C is not a ΣΠΣ(s, d, r − 1) circuit). We say that C is κ-distant if for any two multiplication gates of C, M and M 0, ∆(M + M 0) ≥ κ · r.

3.1 Proof of Existence In this section we prove the existence of a κ-distant ΣΠΣ(s, d, r) circuit C0 computing f. We would like to have r as small as possible (as a function of k, d, ρ) as it will affect the running time of the reconstruction algorithm. Our results are given in the following theorem.

+ Theorem 3.2. Let f be a polynomial that can be computed by a ΣΠΣ(k, d, ρ) circuit and let κ ∈ N . + + Let rinit ∈ N be such that rinit ≥ R(k + 1, d) + k · ρ. Then there exist s, r ∈ N and a ΣΠΣ(s, d, r) 0 dlog (κ+1)e·(k−2) 0 circuit C computing f such that s ≤ k, rinit ≤ r ≤ rinit · k k and C is κ-distant. Proof. The proof is algorithmic. That is, we give an algorithm for constructing a κ-distant circuit C0 that computes the same polynomial as C. The idea behind the algorithm is to cluster the multiplication gates of C, such that any two multiplication gates in the same cluster are close to each other, and any two multiplication gates in different clusters are far away from each other. Then we replace each cluster with the default circuit (recall definition 2.19) for the polynomial that it computes. Before giving the algorithm and its analysis we make the following definition.

12 Definition 3.3. Let C be a ΣΠΣ(k, d, ρ) circuit and I = {A1,...,As} be some partition of [k]. For ∆ s s each i ∈ [s] define Ci = CAi . The set {Ci}i=1 is called a partition of C. We say that {Ci}i=1 is (κ0, r)-strong when the following conditions hold:

• ∀i ∈ [s], ∆(Ci) ≤ r. 0 • ∀i, j ∈ [s] such that i 6= j, ∆(Ci,Cj) ≥ κ · r.

0 Input: n, k, d, ρ, rinit, κ ∈ N such that rinit ≥ ρ and a ΣΠΣ(k, d, ρ) circuit C of n inputs. Output: An integer r ≥ rinit and I, a partition of [k]. 0 1 ι ← dlogk(κ )e; 2 I1 ← {{1}, {2},..., {k}}; 3 r1 ← rinit; 4 while the partition was changed in any one of the former ι iterations do 5 Define j as the number of the current iteration (its initial value is 1); 6 Let Gj(Ij,Ej) be a graph where each subset belonging to the partition Ij is a ; 7 Ej ← ∅; 8 foreach Ai 6= Ai0 ∈ Ij do 9 if ∆(C ,C ) < r then Ai Ai0 j 10 Ej ← (Ai,Ai0 ) 11 end 12 end 13 Ij+1 ← the set of connected components of Gj. That is, every connected is now a set in the partition; 14 rj+1 ← rj · k; 15 end 16 Define m as the total number of iterations (that is, in the last iteration we had j = m); ι 17 r ← rm/k ; 18 I ← Im; Algorithm 1: Canonical partition of a circuit

Lemma 3.4. The partition outputted by Algorithm 1 is (κ0, r)-strong.

Proof. Let I = {A1,...,As} be the partition found in the algorithm. We shall use the notations of Definition 3.3. First we note that the end of the algorithm, for each i 6= i0 ∈ [s], we have that

ι 0 ∆(Ci,Cj) ≥ rm = r · k ≥ r · κ .

Thus, we only have to prove that for every i ∈ [s] it holds that ∆(Ci) ≤ r. Fix some i ∈ [s]. We consider two cases. The first is that Ci is a multiplication gate (i.e., Ai is a singleton). Clearly, its non-linear term is a polynomial of at most ρ linear functions. Hence, ∆(Ci) ≤ ρ ≤ rinit ≤ r. In the Ps0 0 second case Ci = l=1 Ci,l where the Ci,l were computed at an earlier stage of the algorithm. Let j be the iteration in which the circuit Ci was computed (that is, the iteration in which the set of indices 0 of the multiplication gates of Ci became a member of the partition). Let E ⊆ Ej0 be some a of the connected component Ci,1,...,Ci,s0 . Then,

(1) (2) X 0 0 0 0 ∆(Ci) = ∆(Ci,1,...,Ci,s ) ≤ ∆(Ci,l1 ,Ci,l2 ) < |E | · rj ≤ k · rj ≤ rm−ι = r. 0 (Ci,l1 ,Ci,l2 )∈E

13 Inequality (1) can be reached by repeatedly using the second inequality of Lemma 2.17. To justify inequality (2) notice that the partition did not change in the last ι iterations and thus j0 < m − ι. This proves the lemma.

Now that we are guaranteed that we have a strong partition we prove an upper bound on r. I.e. we show that the weight of each Ci is not too large. ι·(k−1) ι·(k−2) Lemma 3.5. At the end of Algorithm 1, rm is at most rinit · k . Thus, r ≤ rinit · k . Proof. In every ι iterations, the number of elements in I is reduced by at least one (otherwise the algorithm terminates). The number of elements in I begins with k and ends with at least 1. Hence, ι·(k−1) the number of iterations is at most ι · (k − 1) indicating that rm is at most rinit · k . Hence, r is ι·(k−2) bounded from above by rinit · k .

0 We proceed with the proof of Theorem 3.2. We now set κ = κ + 1 and fix some integer rinit such s s that rinit ≥ R(k + 1, d) + k · ρ. Let {Ci}i=1 be the partition outputted by Algorithm 1. Let {fi}i=1 be the polynomials computed by the subcircuits of the partition. That is, Ci computes fi. Define

s 0 ∆ X C = Cfi . i=1

We now show that C0 satisfies the requirements of Theorem 3.2. Since the partition is (κ0, r)-strong, 0 Lemma 2.20 implies that ∆(Cfi ) ≤ ∆(Ci) ≤ r, for each i ∈ [s]. Hence, C is a ΣΠΣ(s, d, r) circuit. Let Ci,Ci0 be two subcircuits in the partition of C. Then,

(1) (2) (3) ∆(C + C ) ≥ ∆(C ) > ∆(C + C 0 ) − (R(k + 1, d) + k · ρ) ≥ fi fi0 fi+fi0 i i κ0 · r − r = κ · r Inequalities 1 and 2 stem from lemma 2.20 (we assume w.l.o.g. that C is minimal and thus so is 0 Ci + Ci0 ). Inequality 3 holds since r ≥ rinit ≥ R(k + 1, d) + k · ρ and ∆(Ci + Ci0 ) ≥ κ · r (by Lemma 3.4). This concludes the proof of Theorem 3.2

3.2 Uniqueness In this section we prove, for a large enough value of κ, the uniqueness of a κ-distant ΣΠΣ(k, d, r) circuit computing a polynomial f. As a corollary we obtain a result showing that if C0 is a κ-distant 0 circuit computing f and V is rank preserving for C then the unique κ-distant circuit computing f|V 0 is C |V .

+ Theorem 3.6. Let f be a polynomial of degree d. Let k, r, κ ∈ N be such that κ ≥ R(2k, d, r)/r. Then there exists at most one canonical minimal κ-distant ΣΠΣ(s, d, r) circuit computing f such that s ≤ k.

0 ∆ Ps ∆ Ps0 Proof. Let s, s ≤ k and C1 = i=1 Cfi and C2 = i=1 Cgi be two canoncial minimal κ-distant circuits computing f. It suffices to prove that s = s0 and that for some reordering of the multiplication gates, 0 ∀i ∈ [s],Cfi = Cgi . Consider the ΣΠΣ(s + s , d, r) circuit

s s0 ∆ X X C = Cfi − Cgi i=1 i=1

14 Clearly, C computes the zero polynomial. We now show that each minimal subcircuit of C is composed 0 of exactly two multiplication gates Cfi and Cgj where i ≤ s and j ≤ s . This will prove our claim. Let C˜ be some minimal subcircuit of C. Clearly C˜ is a ΣΠΣ(m, d0, r) circuit where 2 ≤ m ≤ s + s0 and d0 ≤ d. It suffices to prove that C˜ cannot contain two multiplication gates originating from the 0 same Cl (l ∈ {1, 2}). If s = s = 1 then both circuit are of the form Cf and are thus clearly equal.

So assume w.l.o.g. that s ≥ 2. Assume for a contradiction and w.l.o.g. that both Cf1 and Cf2 are multiplication gates in C˜. Then,

(1) (2) ˜ 0 κ · r ≤ ∆(Cf1 ,Cf2 ) ≤ ∆(C) < R(s + s , d, r) ≤ R(2k, d, r).

Inequality 1 holds since C1 is κ-distant, and Inequality 2 follows from the definition of R(k, d, r) right after Theorem 2.3. This contradicts our assumption regarding κ and thus proves the lemma.

Corollary 3.7. Let f be an n-variate polynomial of degree d.

+ • Let k, s, r, κ ∈ N be such that κ ≥ R(2k, d, r)/r and s ≤ k. • Let C0 be a minimal ΣΠΣ(s, d, r) κ-distant circuit computing f.

• Let V be an (r · κ)-rank preserving subspace for C0.

0 Then C |V is a minimal κ-distant ΣΠΣ(s, deg(f|V ), r) circuit computing f|V . In addition there is no 0 0 other κ-distant ΣΠΣ(k , deg(f|V ), r) minimal circuit computing f|V for any k ≤ k. Proof. Let M,M 0 be two different multiplication gates of C0. As V is (r · κ)-rank preserving for C we get that 0 0 ∆((M + M )|V ) ≥ min{∆(M + M ), r · κ} ≥ r · κ,

∆(M|V ) ≤ ∆(M) ≤ r. 0 0 Hence C |V is a κ-distant ΣΠΣ(s, d , r) circuit computing f|V . Since κ ≥ R(2k, d, r)/r, Theorem 3.6 0 implies the uniqueness of C |V .

4 Reconstructing ΣΠΣ(k, d, ρ) Circuits

In this section we give our main reconstruction algorithm (Theorem 1). Recall the general scheme of the algorithm that was described in Section 1.4: we first restrict the inputs to the polynomial f to a low dimensional rank-preserving subspace V , then we reconstruct the unique κ-distant circuit for f|V , n and finally we lift this to a circuit over F . We now explain the intuition behind this approach. Recall that Corollary 3.7 states that C is a κ-distant circuit for f and V is a rank-preserving subspace for C (with the adequate parameters) then C|V is the unique κ-distant circuit for f|V . Stated differently, if 0 we manage to find a κ-distant circuit C that computes f|V , then if we lift each multiplication gate of 0 n C , separately, to F then we get back the circuit C. Therefore, our goal is to find a κ-distant circuit for f|V . We do this in Section 4.1. This is the main technical part of the algorithm. We now describe the main idea in the reconstruction of f|V . Let us first start with the simple 0 0 case that C (= C|V ) has a single generalized multiplication gate. Then by factoring C we can find gcd(C0), and then use brute force interpolation to compute sim(C0). This is very nice and easy, but it is not clear what to do if we have more than a single multiplication gate. The solution is to find a reduction to the case of a circuit with a single gate. We are able to achieve such a reduction using an isolation lemma that roughly says the following.

15 Lemma 4.1 (Informal). For every t-variate κ-distant circuit C = Cf1 + ... + Cfk , there exists a t polynomial sized set of subspaces V ⊂ F , each of co-dimension at most k, and an index i0 ∈ [k] such that for every U ∈ V we have that Cfj |U = 0 if and only if j 6= i0. Moreover, there is an efficient reconstruction algorithm for reconstructing C from the restrictions {C | } . fi0 fi0 U U∈V The exact version of the lemma is given in Lemma 4.4 (which strongly depends on Lemma 4.6). The “moreover” part of the lemma is proved by gluing the different gates C | to a single C . The fi0 U fi0 way to glue the different restrictions together is given in Section 4.1.2 (Algorithms 2 and 5), and is based on the earlier work of [Shp07]. The main question that remain then is how to find the isolating set of subspaces V (or even how to prove its existence). It turns out that because the dimension of V is relatively low (eventually it will be polylog(n)), then if we know that such a set exists then we can go over all possibilities for it. I.e. we go over all possibilities for the set V, and for each “guess”, we try to reconstruct a multiplication gate, that is, a circuit of the form C . The point is that after we fi0 reconstruct C we can continue by recursion and learn all the other multiplication gates. In this way fi0 we get many guesses for C0, and then we can simply check (by going over all elements of V ) which one is a correct representation for it (this is given in Algorithm 4). In view of the above, we just have to prove the existence of such a set V. The idea is to construct the subspaces step by step. That is, we first find a set of linear functions L that splits the multiplication gates of C0 to two sets A and A¯ such that for every ` ∈ L all the multiplication gates in A¯ vanish when restricted to the subspace defined by ` = 0. On the other hand, none of the multiplication gates in A vanishes on ` = 0 (actually there is a stronger demand that we skip now). The definition of a splitting set and the proof of its existence are given in Section 4.1.1 (Definition 4.3 and Lemma 4.6). Given a splitting set L and the sets A, A¯ we can look for a splitting set L0 for A (that splits it to A0 and 0 0 0 A \ A for some ∅ ( A ( A). The sets L and L define a set of co-dimension 2 subspaces: for every ` ∈ L and `0 ∈ L0 we have the space defined by ` = `0 = 0. We can continue to do so until we are left with a single multiplication gate that was split from the other multiplication gates by the subspaces that we generated, which are of co-dimension at most s when C is a ΣΠΣ(s, d, r) circuit (actually, this is not a completely accurate description, see Definition 4.2 and Lemma 4.4, that discuss m-linear function trees for a more formal treatment). This proves the existence of V. In particular, we reduced the problem of proving the existence of the subspaces V to the problem of proving the existence of a splitting set. The proof of the existence of splitting sets is the most technical part of the proof. It is based on an extension of the main lemma of [DS06] (Lemma 4.9 in our paper). At the heart of the proof of Lemma 4.9 lies a connection to locally decodable codes. Basically, one way to think of the proof regarding the existence of splitting sets is that if there is no splitting set then we can find a too good locally decodable code inside C0, and since there are known bounds on the performance of such codes, it must be the case that splitting sets exist. The proof of the lemma is quite technical and we cannot give further intuition at this stage (except to advise the reader to read the intuition given in [DS06]). To conclude the algorithm for constructing C has the following form (Algorithm 6).

• Find a set of subspaces, of low dimension, that contains a rank-preserving subspace for C. Denote this subspace with V . We do the following for each space in the family but focus on V as we will later verify which of the circuits that we constructed is the correct one.

• Run Algorithm 4, to get C|V . The algorithm uses as a subroutine Algorithm 3 that constructs a single multiplication gate of C˜ using an m-linear function tree (whose existence is based on the existence of splitting sets). In particular we have to try Algorithm 4 for every “guess” of an m-linear function tree.

0 n • Run Algorithm 5 to lift each multiplication gate of the κ-distant circuit C |V to F .

16 • Use the PIT algorithm of [KS08] to find the circuit C.

4.1 Finding a κ-Distant Representation in a Low Dimension Subspace

In this section we show how to find a minimal κ-distant ΣΠΣ(s, d, r) circuit computing f|V , given black box oracle access to f|V . The values of s, d, r and κ that we choose are such that the uniqueness of the circuit is guaranteed (according to Theorem 3.6 and Corollary 3.7). Since we only deal with the restriction of f to the subspace V we assume for convenience that f is a polynomial of t =∆ dim(V ) 0 ∆ Ps variables and find the minimal κ-distant ΣΠΣ(s, d, r) circuit C = i=1 Cfi computing f (that is, we abuse notations and write f instead of f|V ). In order to find such a circuit we show an algorithm that finds a single multiplication gate of the circuit. Namely, we show how to reconstruct the circuit Cfi for some i ∈ [s]. In the main algorithm we recursively find the other multiplication gates by running the algorithm on f − fi (which clearly has a ΣΠΣ(s − 1, d, r) κ-distant circuit). Given black-box access to the polynomial fi we can construct the circuit Cfi using brute force interpolation (see Algorithm 11 in section A.2 of the appendix). Obviously, the lack of access to any such polynomial fi complicates matters. To overcome this problem we find a set of subspaces {Uj} such that for some i ∈ [s], we have that for every Uj,

f|Uj = fi|Uj .

0 t 0 Moreover, for every i 6= i ∈ [s], it holds that fi |Uj = 0. For each such subspace Uj ⊆ F we use algorithm 11 to reconstruct C . Our choice of subspaces {Uj} is such that we are able to construct fi|Uj Cf given the circuits {C }. To conclude, the scheme of our algorithm consists of three steps: i fi|Uj

1. Obtain the set of subspaces {Uj}j where for some i ∈ [s], we have that for every Uj, f|Uj = fi|Uj . 2. Reconstruct the circuits {C } (that are the same as {C }). f|Uj fi|Uj

3. Glue together the different restrictions {C } to get the circuit Cf . fi|Uj i

Section 4.1.1 deals with the first step. There we present the properties that the set of subspaces {Uj} has to satisfy in order for the third step (the “gluing” step) to work. We then construct the set {Uj} having the required properties. In the second step we simply apply the brute force algorithm given in the appendix (section A.2). In section 4.1.2 we give the algorithm for the third step. That is, we reconstruct Cf given the circuits {C }. We conclude in section 4.1.3 with algorithm 4 that finds i fi|Uj the κ-distant circuit C0.

4.1.1 Step 1: Finding the set of subspaces In this section we deal with the following problem. Let

s 0 ∆ X C = Cfi (5) i=1 be a ΣΠΣ(s, d, r) minimal κ-distant t-variate circuit computing a polynomial f. Recall that s is a constant, d is the degree of the polynomial f and t is relatively small (roughly polylog(d)). The values of r and κ will be revealed later. We are searching for a set of (affine) subspaces {Uj} with the following properties:

1. ∃i ∈ [s] s.t. ∀Uj, f|Uj = fi|Uj .

17 2. Given the circuits {C } it is possible to reconstruct the circuit Cf . fi|Uj i Actually, we are not able to find one set of subspaces realizing the requirements. Instead, we find a family of sets of subspaces that contains at least one “good” set for every ΣΠΣ(s, d, r) circuit C. By gluing together the restrictions of f to each set of subspaces in the family we get a set of possible “guesses” for fi. For each guess of fi we recursively search a κ-distant circuit for f − fi and thus obtain a set of possible circuits computing f. It will be clear that one of the circuits is the needed κ-distant circuit C0. Given the resulting circuits, it is not difficult to find the correct one by using the black-box polynomial identity test of [KS08]. Note that the size of the family of sets of subspaces determines the running time of the algorithm. The size of our family will be exponential in t and in polylog(d). Most of this section is devoted to proving the existence of a set {Uj} containing a low (polyloga- rithmic) number of subspaces of a small (< s) co-dimension. The subspaces {Uj} are found recursively. We first find a set of m subspaces {U1,j}j∈[m] of co-dimension 1 (m = polylog(d) and its exact value will 0 0 be determined later) such that f|Uj has a ΣΠΣ(s , d, r) κ-distant circuit for some 0 < s < s. Specif- s ically, in each subspace, at least one of the polynomials {fi}i=1 vanishes while f does not. Within each subspace U1,j we recursively find m co-dimension 1 subspaces in which the restriction of f has a ΣΠΣ(s00, d, r) κ-distant circuit for some 0 < s00 < s0. We continue this process until we obtain many s subspaces such that on any one of them exactly one of the polynomials {fi}i=1 does not vanish. From this large set of subspaces we extract a sufficiently large set of subspaces {Uj} on which fi is the only polynomial that does not vanish, for some fixed i. The process of finding the set of subspaces {Uj} implicitly defines a directed tree. The root corresponds to the entire space of t-variate linear functions. Each edge corresponds to a linear function defining a co-dimension 1 subspace. Each non-root vertex of distance l from the root corresponds to a co-dimension l subspace in a natural way. Hence, what we are after is a family of trees (of small depth) that contains at least one tree whose leaves correspond to a “good” set of subspaces. In fact, we shall prove the existence of a “good” tree of small depth whose non-leaf vertices have exactly m children, and then use brute force search to find this tree. We now define m-linear functions trees whose existence will guarantee the existence of a “good” set of subspaces {Uj} Definition 4.2. An “m-linear function tree” is a tree graph with the following properties.

• Each non-leaf vertex has exactly m children.

• Each edge e is labelled with a linear function ϕe.

• All linear functions labelling the edges from a vertex u to its children are linearly independentH .

• For each vertex u let πu = {e1, . . . , el} be the path from the root to u. Define Vu as the co- l dimension l subspace in which {ϕei }i=1 vanish. • Define the size of the “m-linear function tree” as the number of non-leaf vertices in the tree. An “m-linear function tree” is an interpolating tree for a polynomial f of degree d, if the following hold:

1. m ≥ max{100 log(d), ∆(Cf ) + 2}. 2. For every vertex u in the tree and edge e connecting u to one of its children, ϕ ∈/ Lin(C ). e f|Vu

18 The usefulness of this definition is demonstrated in the following section. There we give an algo- rithm that receives as input an interpolating tree for f and the circuits C , for every leaf u of the i fi|Vu tree, and outputs Cfi . In the rest of the section we prove the existence of an m-linear function tree of depth lower than s that is an interpolation tree for some fi. Moreover, in each leaf u of the tree it holds that f|Vu = fi|Vu . We now explain how to choose the labels of the edges such that the resulting tree will have the required properties.

+ Definition 4.3. Let f be a t-variate polynomial of degree d. Let r, s, k ∈ N and ξ(k) be some 0 ∆ Ps increasing positive integer function. Let C = i=1 Cfi be a ξ(k)-distant ΣΠΣ(s, d, r) circuit computing f. Let L be a set of linearly independentH linear functions. We say that L is a (ξ, r, k)-splitting set for C0 when it satisfies the following properties: 1. |L| ≥ max {100 log(d), r + 2}· k.

2. There exists some ∅= 6 A ( [s] such that each ϕ ∈ L is a linear factor of fi if and only if i∈ / A.

3. For every ϕ ∈ L and i ∈ [s], we have that ϕ∈ / Lin(sim(Cfi )).

4. For each i 6= j ∈ A and ϕ ∈ L, it holds that ∆(Cfi|ϕ=0 ,Cfj |ϕ=0 ) ≥ ξ(k − 1) · r. The reason for the name “splitting set” comes from the fact that it splits the multiplication gates of C0 into two non-trivial sets (A and [s]\A), such that for every i ∈ [s] \ A and every ϕ ∈ L we have that Cfi |ϕ=0 = 0. Before we prove the existence of such a set we give a lemma that demonstrates its usefulness.

+ Lemma 4.4. Let ξ(k) be an increasing integer function and r, d ∈ N . Assume that for every s ≤ + 0 ∆ Ps k ∈ N and every ξ(k)-distant ΣΠΣ(s, d, r) circuit C = i=1 Cfi computing the polynomial f, there exists a (ξ, r, k)-splitting set. Define m =∆ max {100 log(d), r + 2}. Then for each such circuit, there exists an m-linear function tree T with the following properties 1. The depth of T is smaller than s.

0 2. There exists i ∈ [s] such that T is an interpolating tree for fi0 .

3. For every subspace V corresponding to a leaf of T we have that f|V = fi0 |V (recall Definition 4.2). Proof. We prove the claim by induction on s. For s = 1, the tree is a single node and the claim is trivial. For s > 1 let L be the (ξ, r, k)-splitting set guaranteed. Let ϕ ∈ L and consider f|ϕ=0. Then, ∆ P by Definition 4.3, there exists some ∅= 6 A ( [s] such that Cϕ = i∈A Cfi|ϕ=0 is a ξ(k − 1)-distant ΣΠΣ(|A|, d, r) circuit computing f|ϕ=0. Since |A| < s we get by induction that there exists an m-linear function tree Tϕ with the following properties:

1. The depth of Tϕ is lower than |A| ≤ s − 1.

2. There exists iϕ ∈ [s] such that Tϕ is an interpolating tree for fiϕ .

3. For every subspace V corresponding to a leaf of Tϕ we have that f|V = fiϕ |V . 0 0 In particular, there is some i ∈ [s] such that at least |L|/s ≥ |L|/k ≥ m of the trees Tϕ satisfy iϕ = i . 0 We now construct a new tree T by connecting m of these trees (those with iϕ = i ) to a new root and labelling each edge from the root to Tϕ with the linear function ϕ. It is not hard to see that T is an m-linear function tree satisfying the required properties (recall that by the definition of ∆(Cf ) we have that ∆(C ) ≤ r, as f 0 is computed by a multiplication gate in a ΣΠΣ(s, d, r) circuit). fi0 i

19 In other words, the lemma above implies that a splitting set guarantees the existence of an inter- polating tree. We proceed to proving the existence of splitting sets. That is, we find a function ξ such + 0 that for every s ≤ k ∈ N and ξ(k)-distant ΣΠΣ(s, d, r) circuit C , there exists a (ξ, r, k)-splitting set L. Our proof will have the following structure. We first define the function ξ and discuss some of its properties. We then find a set of linearly independentH linear functions for which Property 2 of definition 4.3 holds, L. We then give an upper bound on the number of functions in L that do not satisfy either Property 3 or Property 4 of Definition 4.3. Finally, we remove the “bad” functions from L and verify that the number of function remaining is large enough (thus guaranteeing Property 1). The function we work with is the following:

( k j−1 (k+3)(k+4) Q  2 k j=3 2 · 2 · (log(d)) k > 2 ξd(k) = (k+3)(k+4) . (6) k 2 2 · (log(d)) k = 2

The next lemma gives some properties of ξd. Its proof is both trivial and technical and is thus omitted: Lemma 4.5. For k ≥ 2, d ≥ 2 and r ≥ 50 log(d) the following properties hold:

• ξd(k) is an increasing function.

• ξd(k) ≥ 2k + 1.

• ξd(k) − 2 ≥ ξd(k) − k ≥ ξd(k)/2.

ξd(k)·r k−1 • k+1 k−1 ≥ 2k log(dk) + 2k (for k = 2, replace 2 with 1). 2 ( 2 )

ξd(k) k−1 k−1 • 2k+1 > 2k 2 + 1 (for k = 2, replace 2 with 1). We are now ready to find the linear functions required for the splitting set.

0 Ps Lemma 4.6. Let f be a t-variate polynomial of degree d ≥ 2. Let C = i=1 Cfi be a ξd(k)-distant ΣΠΣ(s, d, r) circuit computing f, where r ≥ 50 log(d). Then there exists a (ξd, r, k)-splitting set for C0.

Proof. We start with the following lemma that shows that there are many linear functions satisfying Property 2 of Definition 4.3.

0 ξd(k)r H Lemma 4.7. Let i 6= i ∈ [s]. There exists a set L of 2k linearly independent linear functions and a partition of [s] to two sets A, A¯ =∆ [s] \ A such that

• i and i0 are not in the same set.

• For every ϕ ∈ L and j ∈ [s] we have that ϕ is a linear factor of fj if and only if j∈ / A. Proof. We first prove a weaker lemma that shows that for every two different multiplication gates of C0 there are many linearly independentH linear functions that divide one multiplication gate but not the other. We then see that the required set L can be extracted from those linear functions.

ξd(k)r−2r H Lemma 4.8. There are at least 2 linearly independent linear functions that are either linear factor of fi and not of fi0 or vice versa.

20 Proof. To ease the notations we prove the claim for i = 1 and i0 = 2. It is not hard to see that

Lin(sim(Cf1 + Cf2 )) = Lin(sim(Cf1 )) ∪ Lin(sim(Cf2 ))∪

(gcd(Cf1 )\ gcd(Cf2 )) ∪ (gcd(Cf2 )\ gcd(Cf1 )). For each set, we consider the dimension of its span and obtain the following inequality:

∆(Cf1 ,Cf2 ) ≤ ∆(Cf1 ) + ∆(Cf2 )+

dim1(gcd(Cf1 )\ gcd(Cf2 )) + dim1(gcd(Cf2 )\ gcd(Cf1 )).

Since Cf1 ,Cf2 are multiplication gates of a ξd(k)-distant ΣΠΣ(s, d, r) circuit, we have that ξd(k) · r ≤

∆(Cf1 ,Cf2 ) and that ∆(Cf1 ) + ∆(Cf2 ) ≤ 2r. Hence,

ξd(k) · r ≤ 2r + dim1(gcd(Cf1 )\ gcd(Cf2 ))+

dim1(gcd(Cf2 )\ gcd(Cf1 )). Therefore, w.l.o.g. we get that

ξ (k) · r − 2r dim (gcd(C )\ gcd(C )) ≥ d . 1 f1 f2 2

According to the definition of a default circuit (Definition 2.19), each linear function in gcd(Cf1 )\ gcd(Cf2 ) is a factor of f1 and not of f2. This concludes the proof of Lemma 4.8.

ξd(k)r−2r H We return to the proof of lemma 4.7. Assume w.l.o.g. that there are 2 linearly independent 9 ¯ 0 linear functions that divide fi0 and do not divide fi. We initialize A = {i}, A = {i } and L as the set 0 containing these linear functions. Consider a polynomial fj where j 6= i, i . If most of the functions in L divide fj, then we keep in L only the functions dividing fj and add j to A¯. Otherwise, we keep in L 0 only the functions that do not divide fj and add j to A. We repeat the procedure for all j 6= i, i . In each step we keep at least half of L. As there are k − 2 such steps, the size of the set L that remained at the end of the process is at least ξ (k)r − 2r d . 2k−1 According to lemma 4.5, ξ (k)r − 2r ξ (k)r d ≥ d . 2k−1 2k Thus, it can easily be seen that the acquired set L satisfies the requirements of the lemma. This completes the proof of Lemma 4.7.

We continue with the proof of Lemma 4.6. Let L denote the a set of functions guaranteed by Lemma 4.7. To bound the number of functions that do not satisfy Condition 3 of Definition 4.3, we notice that dim1(Lin(sim(Cfi ))) ≤ r for every i ∈ [s]. Hence, there are at most s · r ≤ k · r linear functions in L that belong to Lin(sim(Cfi )) for some i ∈ [s] (because the functions of L are linearly independentH ). To bound the number of functions that do not satisfy Condition 4 we need the following lemma which is a generalization of claim 4.8 of [DS06]. The proof is given in Section B of the appendix.

9When a ϕ is a linear factor of f we also say that it divides f.

21 ∆ Ps Lemma 4.9. Let k, s, d, r ∈ N be such that s ≤ k and let C = i=1 Cfi be a ΣΠΣ(s, d, r) circuit. Let Lˆ be a set of linearly independentH linear functions and A ⊆ [s] be a set of size |A| ≥ 2. Let 0 0 0 0 χ, r ∈ N be such that χ > 1 and r satisfies that for every A ⊆ A of size |A | = 2 we have that 0 P ˆ r ≤ ∆( i∈A0 Cfi ) − r · k. Assume that for each ϕ ∈ L, the following holds:

• For every i ∈ [s], ϕ divides fi if and only if i∈ / A.

• For every i ∈ [s], ϕ∈ / Lin(sim(Cfi )).

r0 • ∃i 6= j ∈ A, such that ∆(Cfi|ϕ=0 ,Cfj |ϕ=0 ) < 2·χ·log(d) . Then |A| r0 |L|ˆ ≤ · (max{2k log(dk) + 2k, } + r · k). 2 χ In other words, the lemma bounds the number of linear functions in L that satisfy Properties 2 and 3 of Definition 4.3 but not Property 4. We now proceed with the proof of Lemma 4.6. Consider the functions of L for which property 4 does not hold (but Properties 2 and 3 do hold). For such a function ϕ, there are i 6= j ∈ A such that10

ξd(k − 1) ξd(k) · r ∆(C ,C ) < ξd(k − 1) · r = · (ξd(k) · r) = . fi|ϕ=0 fj |ϕ=0 ξ (k) k+3k−1 d 2 2 log(d)

0 k+1 k−1 Set r = r · (ξd(k) − k) and χ = 2 · 2 . It follows that ξ (k) · r ξ (k) · r (1) (ξ (k) − k) · r r0 ∆(C ,C ) < d = d ≤ d = , fi|ϕ=0 fj |ϕ=0 k+3k−1 4χ log(d) 2χ log(d) 2χ log(d) 2 2 log(d) where inequality (1) stems from Lemma 4.5. In addition, for every i1 6= i2 ∈ [s] we have that

∆(C ,C ) − r · k ≥ (ξ (k) − k) · r = r0. fi1 fi2 d Hence, we may apply Lemma 4.9 with the parameters r0 and χ, and get that there are at most ( ) ! k − 1 ξ (k) · r ξ (k) · r k − 1 · max 2k log(dk) + 2k, d + r · k = d + r · k · 2 k+1k−1 2k+1 2 2 2 linear functions in L for which condition 4 is the only condition that does not hold (the equality follows from Lemma 4.5). We now remove the functions that do not satisfy Conditions 3 or 4 from L. According to the previous calculations, the number of functions that remain is at least:

ξ (k) ξ (k) k − 1 ξ (k) k − 1 (1) r d − k − d − k ≥ r d − 2k ≥ 2k 2k+1 2 2k+1 2

(2) 2r · k ≥ max {100 log(d), r + 2}· k, where inequality (1) is derived from Lemma 4.5 and inequality (2) holds since r ≥ 50 log(d) ≥ 50. 0 This completes the proof of Lemma 4.6 and shows the existence of a (ξd, r, k)-splitting set for C . We now have the following corollary.

10Notice that Condition 4 is only interesting for k ≥ 3. When k = 2 we have that |A| = 1 and the condition is always held.

22 + ∆ Corollary 4.10. Let r, d ∈ N be such that d ≥ 2 and r ≥ 50 log(d). Let m = max {100 log(d), r + 2} + 0 ∆ Ps and ξd(k) be as in Equation 6. Let s ≤ k ∈ N and C = i=1 Cfi be a minimal ξd(k)-distant ΣΠΣ(s, d, r) circuit. Then there exists an m-linear function tree T with the following properties:

• The depth of T is lower than s.

0 • There exist i ∈ [s] such that T is an interpolation tree for fi0 .

0 • In every subspace Vu corresponding to a leaf u of T , we have that f|Vu = fi |Vu .

4.1.2 Steps 2 & 3: Gluing the Restrictions Together In this section we deal with the following problem: Let f be a t-variate polynomial and T be an m-linear function tree that is interpolating for f. Let {Vu}u be the set of subspaces corresponding to the leaves of T . Given the tree T and black-box access to the polynomials {f|Vu }u we would like to reconstruct the circuit Cf (notice that f is actually the polynomial fi described at the beginning of Section 4.1). We give an algorithm for the problem above. We first describe its general scheme: As a first step we construct, for each leaf u of the tree, the circuit C using the brute force algorithm f|Vu of Appendix A.2. From here on, our methods are “local” when regarding the input tree. Specifically, let v be a vertex in the tree. Denote the children of v as {uj}j. We show how to construct the circuit Cf| given the circuits {Cf| }j. Using this “local construction method” we gradually construct for Vv Vuj each vertex v the circuit C until we reach the root (in which we construct C ). The goal of this f|Vv f section is to prove the following theorem.

Theorem 4.11. Let f be a t-variate polynomial and T an m-linear function tree that is an interpo- O(t2) lating tree for f. Then Algorithm 3, when given f and T as input, runs in time size(T ) · |F| , and outputs Cf . Algorithm 2 is the “local” algorithm that in fact deals with an m-linear function tree of depth 1. It works in two stages. First we find the linear functions of gcd(Cf ) (all linear functions dividing f) and then we construct sim(Cf ). In the first stage (finding gcd(Cf )) we find the restrictions {gcd(Cf )|Vu }u and glue them together using a method presented in [Shp07]. In that paper, an algorithm for recon- structing a single (non-generalized) multiplication gate given its restriction to various co-dimension 1 subspaces is devised (see Appendix A.3). In the second stage we glue the different restrictions of

{sim(Cf )|Vu }u. For this we use the same gluing algorithm given in Section 4.2. In particular, we quote Theorem 4.18 of Section 4.2 and show that its requirements are satisfied, which guarantees that we can perform the gluing. Since we do not use any of the results of this section when analyzing the gluing algorithm of Section 4.2 this does not affect the correctness of the algorithms.

O(t) Lemma 4.12. Algorithm 2 runs in d time, for d = max{di}. If the given tree is interpolating for f and fi = f|ϕi=0 then the algorithms outputs Cf . Proof. Notice that the linear functions of the m-linear function tree given as input in Algorithm 2 are linearly independentH . Hence, we will hereon assume w.l.o.g. that the functions in hand are x1, . . . , xm. In particular, fi = f|xi=0, for every i ∈ [m]. The proof has the following structure: First we show that in each co-dimension 1 subspace we are able to obtain the restrictions of gcd(Cf ) and sim(Cf ). We then prove that for every i ∈ [m], we have that fi/Lin(fi) is a polynomial of exactly ∆(Cf ) linear functions. Hence, Step 1 of the algorithm succeeds and r = ∆(Cf ). We proceed to prove that gcd(Cf ) is found correctly by using the results of [Shp07], where an algorithm for Step 2 is given. To show that sim(Cf ) is properly constructed we first show that for the polynomial h computed by

23 H Input: A set {ϕ1, . . . , ϕm} of m linearly independent t-variate linear functions, over the field F, defining an m-linear function tree of depth-1. In addition, for each i ∈ [m], the ∆ i i Qdi i circuit Cfi = pi(L1,...,Lri ) j=1 lj. t Qd0 Output: The circuit Cf in F that has the form Cf = p(L1(¯x),...,Lr(¯x)) j=1 lj(¯x). ∆ 1 If for some i, j ∈ [m] we have ri 6= rj, output “fail”. Otherwise, define r = r1; 0 d di i 0 Q Q 2 Find a set of linear functions l1, . . . , ld such that for each i ∈ [m], j=1 lj|xi=0 = j=1 lj. If no such functions exist, output “fail”; Qd0 3 Output gcd(Cf ) = j=1 lj; 1 1 H 4 Find an integer 1 < ˆi ≤ m for which L | ,...,L | are linearly independent . If no such 1 ϕˆi=0 r ϕˆi=0 ˆi exist, output “fail” ;

5 Define U0 as the co-dimension 2 subspace where ϕ1 = ϕˆi = 0. Let U1 be the co-dimension 1 subspace where ϕ1 = 0 and U2 be the co-dimension 1 subspace where ϕˆi = 0; 2 6 Let h be the polynomial computed by sim(Cf ). Run Algorithm 5 with input {Ch| } . Set Ul l=0 sim(Cf ) as the output of the algorithm ; Algorithm 2: Reconstructing a circuit given a depth-1 m-linear function tree

sim(Cf ), it holds that Ch = sim(Cf ). We then prove that the index ˆi exists and so the subspace U0 can be easily found. Moreover, we show that it is ∆(Ch)-rank preserving for Ch. Note that C | = sim(C )| , C | = sim(C ) and C | = sim(C ). Given this, the results of section 4.2 h U0 f1 ϕˆi=0 h U1 f1 h U2 fˆi guarantee the correctness of step 6 (see Theorem 4.18). We start by proving that Step 1 succeeds.

Lemma 4.13. For each i ∈ [m], the codimension 1 subspace defined by the equation xi = 0 is ∆(Cf )- rank preserving for sim(Cf ). Proof. Let ˜ ˜ sim(Cf ) =p ˜(L1,..., L∆(Cf )).

Assume for a contradiction that for some i ∈ [m], the subspace where xi = 0 is not ∆(Cf )-rank ˜ ˜ H preserving for sim(Cf ). Then L1|xi=0,..., L∆(Cf )|xi=0 must be linearly dependent . Hence, there exists a non-trivial combination of these functions that gives a constant function:

∆(Cf ) X ˜ αj · Lj|xi=0 = γ j=1

˜ ˜ for some α1, . . . , α∆(Cf ), γ ∈ F. The same combination of L1,..., L∆(Cf ) cannot be constant since H these functions are linearly independent . Hence for some 0 6= β ∈ F:

∆(Cf ) X αj · L˜j = β · xi + γ. j=1

Therefore, xi ∈ Lin(Cf ), meaning that Condition 2 of Definition 4.2 is violated, contradicting the interpolating properties of the input tree.

We continue with the proof and show how to obtain, in each co-dimension 1 subspace, the corre- sponding restrictions of gcd(Cf ) and sim(Cf ).

24 t Lemma 4.14. Let f be a non-zero t-variate polynomial under F. Let V ⊆ F be a subspace that is ∆(Cf )-rank preserving for sim(Cf ) and f|V 6= 0. Then

gcd(Cf )|V = gcd(Cf|V ) and sim(Cf )|V = sim(Cf|V ).

Proof. Let ˜ ˜ sim(Cf ) =p ˜(L1,..., L∆(Cf )). 0 Let g(¯x) be some irreducible factor of sim(Cf ). Obviously, g is not a linear function. Let g be a 0 ˜ ˜ ∆(Cf )-variate polynomial such that g (L1,..., L∆(Cf )) = g. Since V is ∆(Cf )-rank preserving for sim(Cf ) we get that ˜ ˜ dim1(span1{L1|V ,..., L∆(Cf )|V }) = ∆(Cf ). 0 ˜ ˜ Thus, we have that g|V (= g (L1|V ,..., L∆(Cf )|V )) is also an irreducible non-linear polynomial, mean- ing that sim(Cf )|V does not have any linear factor. Therefore, gcd(Cf )|V (≡ Cf|V /sim(Cf )|V ) contains all the linear factors of f|V , indicating that

gcd(Cf )|V = gcd(Cf|V ).

It follows that11

sim(Cf )|V = sim(Cf|V ).

Since the given tree is interpolating for f, we have that xi ∈/ Lin(Cf ) for every i ∈ [m]. Therefore, f|xi=0 6= 0. Hence, by Lemmas 4.13 and 4.14 it holds that

gcd(Cfi ) = gcd(Cf )|xi=0, sim(Cfi ) = sim(Cf )|xi=0. (7) ˜ ˜ Namely, if sim(C) =p ˜(L1,..., L∆(Cf )), then ˜ ˜ sim(Cfi ) =p ˜(L1|xi=0,..., L∆(Cf )|xi=0).

˜ ˜ H Since L1|xi=0,..., L∆(Cf )|xi=0 are linearly independent then for every i ∈ [m], we have that ri = ∆(Cf ). Hence, the algorithm does not fail in Step 1 and r = ∆(Cf ). We now prove that we can indeed reconstruct the linear functions of gcd(Cf ). According to equation 7, the set of linear functions in gcd(Cfi ) is the set of the projections of the linear functions in gcd(Cf ) to the co-dimension 1 subspace defined by xi = 0. In [Shp07] an algorithm (Algorithm 6 there) is given for exactly this problem (see Section A.3 in the appendix). Therefore, we can assume that we reconstructed gcd(Cf ). We proceed to the second stage of Algorithm 2. The following lemma proves that there is an index ˆi satisfying the requirement of Step 4. Lemma 4.15. There exist 1 < ˆi ≤ m such that

dim span L1| ,...,L1|  = r. 1 1 1 xˆi=0 r xˆi=0

11 Actually, we have shown that sim(Cf|V ) and sim(Cf )|V compute the same polynomial that does not have any linear factor. The representations of this polynomial may not be the same in both circuits. However, it is easy to see that given the “correct” definition of a default basis (used in the definition of default circuits), we can make sure that the representation is equal. We assume w.l.o.g. the equality of these circuits.

25 Proof. Let 1 < i ≤ m. Assume that  1 1  dim1 span1 L1|xi=0,...,Lr|xi=0 < r. 1 1 Then there exists a non-trivial linear combination of L1|xi=0,...,Lr|xi=0 that sums to some constant. r Alternatively, for some multiset (containing at least one non-zero element) {αj}j=0 in F, r X 1 α0 + αjLj |xi=0 = 0. j−1

1 1 1 1 The same combination of L1,...,Lr sums to xi (up to some multiplicative constant). As L1,...,Lr H 1 1 are linearly independent , we have that xi ∈ span1({L1,...,Lr}). Since m ≥ r +2, we get by a simple ˆ 1 1 dimension argument that there must exist some 1 < i ≤ m such that xˆi ∈/ span1({L1,...,Lr}). For this ˆi it holds that dim span L1| ,...,L1|  = r, 1 1 1 xˆi=0 r xˆi=0 as required.

Thus, the lemma implies that the subspace U0 exists. We are now close to completing the proof of Lemma 4.12. In Step 6 we run Algorithm 5 of Section 4.2 on the different restrictions of the circuit Ch where h is the polynomial computed by sim(Cf ). As h does not have any linear factors it follows that Ch = sim(Cf ). Thus, it suffices to prove that the output of Algorithm 5 is the circuit Ch, given the 2 restrictions {Ch| } . The requirement of Algorithm 5 is that U0 should be ∆(Ch)-rank preserving Ul l=0 for Ch (see Theorem 4.18). Clearly, since Ch = sim(Cf ), it holds that U0 satisfies this requirement and thus step 6 outputs Ch = sim(Cf ). This proves the algorithm correctness. We now analyze the running time of the algorithm. By Lemma A.4, finding the linear functions of gcd(Cf ) requires poly(d, t) time. The last step of the algorithm (constructing sim(C)) requires t · dO(∆(Ch)) = dO(t) time (this is shown in section 4.2, Theorem 4.18). It can easily be seen that the time required in all other steps is also poly(t, d, size(Cf )). Hence, the total running time of Algorithm O(t) O(t) is d (as size(Cf ) = d ). This completes the proof of Lemma 4.12. Now that we have analyzed Algorithm 2, we are ready to present Algorithm 3 that handles any m-linear function tree and not only a tree of depth 1.

+ Input: Tow integers t, m ∈ N and a t-variate polynomial f. An “m-linear function tree” T .

In addition, for every leaf u of T , a black box computing f|Vu . Output: The circuit Cf . 1 if T is a single node then 2 reconstruct Cf via Algorithm 11 of Section A.2 (brute force interpolation algorithm) 3 end 4 else 5 For each child u of the root, recursively run with the subtree rooted at u and the black boxes corresponding to its leaves; 6 Let {uj}j be the children of the root. Let {ϕj}j be the linear functions labelling the edges connecting the root to its children. Run Algorithm 2 with input {ϕj}j and {Cf| }j to Uuj obtain Cf ; 7 end Algorithm 3: Reconstructing a circuit given an m-linear function tree

We now give the analysis of Algorithm 3, thus proving Theorem 4.11.

26 Proof of Theorem 4.11. The proof of correctness follows by a simple induction. We analyze the running time of the algorithm. In total, Algorithm 2 is invoked size(T ) times (once for every non-leaf vertex). Similarly, we see that Algorithm 11 is called as subroutine at most size(T ) · m times. Hence, according to Lemma A.3 and Lemma 4.12, the running time of Algorithm 3 is

O(t) O(t2) d · size(T ) + |F| · m · size(T ). Since m ≤ t (in a t-dimensional space there are at most t linearly independentH linear functions) and d ≤ |F| (otherwise we work with an algebraic extension of F containing more than d elements) the claim follows.

4.1.3 The Algorithm for finding a κ-distant circuit We conclude Section 4.1 by giving its main algorithm. The following theorem concludes its analysis and this section

Theorem 4.16. Let C be a t-variate minimal ΣΠΣ(k, d, ρ) circuit computing a polynomial f. Let r ∈ N be such that r ≥ max {50 log(d),R(2k, d)}. Algorithm 4, given the inputs k, t, d, r and C runs O(ktk+1) in |F| time. It outputs an integer s ≤ k and a ξd(k)-distant ΣΠΣ(s, d, r) circuit computing f.

Input: k, t, d, r ∈ N and oracle access to a ΣΠΣ(k, d, ρ) circuit C in t indeterminates computing a polynomial f. + 0 Ps Output: s ∈ N such that s ≤ k and a ξd(k)-distant ΣΠΣ(s, d, r) circuit C = i=1 Cfi computing f. 0 1 If ∆(Cf ) ≤ r, output s = 1 and C = Cf ; ∆ 2 m = max {d100 log(d)e, r + 2}; 3 foreach “m-linear function tree” T of depth lower than k do 4 Run Algorithm 3 with inputs T and f; 5 If the algorithm failed, or outputted a circuit Cg such that ∆(Cg) > r continue to the next tree; 6 Recursively construct a ξd(k)-distant circuit with at most k − 1 multiplication gates computing the polynomial f − f1. If it does not exist, continue to the next tree; 7 Denote the found circuit as C1 and the number of its multiplication gates bys ˆ. Check

whether C1 + Cf1 is a ξd(k)-distant ΣΠΣ(ˆs + 1, d, r) circuit computing f. If so, output 0 s =s ˆ + 1 and C = C1 + Cf1 . Otherwise, continue to the next tree; 8 end 9 If no ξd(k)-distant circuit was found, output “fail”;

Algorithm 4: Finding the ξd(k)-distant circuit of a polynomial

Proof. (of Theorem 4.16) We first note that there is a unique ξd(k)-distant ΣΠΣ(s, d, r) circuit com- puting f. Indeed, as ξd(k) ≥ 2k + 1 ≥ 2k + R(2k, d)/r = R(2k, d, r)/r (see Lemma 4.5), we get by Theorem 3.6 that if such a circuit exists then it is unique. We start by proving the correctness of the algorithm. Note that before the algorithm outputs any circuit (Step 7) it verifies that it is indeed a ξd(k)-distant circuit for f so in any case when we output a circuit we are guaranteed to have the unique circuit at hand. So assume that for some s ≤ k 0 Ps there exists a ξd(k)-distant ΣΠΣ(s, d, r) minimal circuit C = i=1 Cfi for f. We prove that for some 0 0 s ≤ k, a ξd(k)-distant ΣΠΣ(s , d, r) circuit is outputted by the algorithm, thus proving its correctness.

27 Our proof is by induction on s. When s = 1, we clearly output {Cf }. Assume that s > 1. According to Corollary 4.10, for some tree that we check Algorithm 3 will produce, w.l.o.g., Cf1 . According to Ps Theorem 3.6, the circuit i=2 Cfi is the unique ξd(k)-distant ΣΠΣ(s−1, d, r) circuit computing f −fi. Ps Hence, by the induction hypothesis, the recursive call will output C1 = i=2 Cfi . Therefore, the next step of the algorithm will output the circuit C0, as we wanted to prove. We now analyze the running time. Our first action requires checking whether ∆(Cf ) ≤ r. To do O(t2) 12 so we reconstruct Cf using Algorithm 11 (brute force algorithm) in F time. We now analyze the number of iterations in each recursive call. For this we just have to bound the number of m-linear functions tree. Notice that ξd(k) · r must be lower than t otherwise we can immediately output “fail” (since then there are no ξd(k)-distant ΣΠΣ(s, d, r) circuits). Hence, t = Ω(log(d) + r). The size of any set of linearly independentH linear functions is at most t so m ≤ t. Each tree has at most O(mk) = O(tk) edges. For each edge there exists a t-variate linear function. The number of t-variate t+1 linear functions over the field F is at most |F| . Hence, the number of trees we check is bounded by O(tk+1) |F| and so is the number of iterations. In each iteration, besides the recursive call we construct a circuit given an m-linear function tree (Step 5) and preform a PIT test (Step 7). The former O(t2) requires |F| time, according to Theorem 4.11. The latter (PIT) may be done deterministically by brute force interpolation in dO(t) time. Hence, The time spent in each iteration not including the O(t2) recursive call is |F| . Concluding, we get that the total time spent without including recursive calls O(tk+1) is |F| . The recursion depth is obviously bounded by k since we reduce its value by one in each O(k·tk+1) call. Thus, the total running time is |F| .

4.2 Gluing Together the Restrictions of a Low Rank Circuit In this section we deal with the following problem: Let f be an n-variate polynomial of degree d over a field . Given the circuits {C }i for various low dimensional subspaces, we would like to construct F f|Vi the circuit Cf . That is, we show how to “glue” together representations of the restrictions of f to various low dimension subspaces. The set of subspaces {Vi} that we work with is such that each subspace Vi is ∆(Cf )-rank preserving for Cf . The algorithm has two parts. First we find the linear functions in gcd(Cf ) and then we find sim(Cf ). Using the properties of rank preserving subspaces we manage to isolate the restriction of each linear function in gcd(Cf ) to every subspace Vi. Having these restrictions, we reconstruc each linear function separately. In the second part (reconstruction of sim(C)) we use a result of [Shp07] where an algorithm for gluing together restrictions of a low rank circuit to various low dimensional subspaces, is given. ∆ For convenience we denote r = ∆(Cf ). We now describe the subspaces in which we receive the restrictions of f. One of the subspaces we have is contained in all other subspaces. We denote it by V . Define t =∆ dim(V ) and keep in mind that in our main algorithm, both r and t have small values (polylogarithmic in the circuit size). The following definition describes the entire set of subspaces {Vi} and contains the notations used in this section:

Definition 4.17. Let V be an affine subspace of dimension t.

• Denote with Vˆ the homogenous subspace of V and let v0 be some fixed vector such that V = Vˆ +v0.

n • Let v1, . . . , vn be a basis of F such that v1, . . . , vt is the default basis (as in definition 2.18) of Vˆ and vt+1, . . . , vn are the default basis to some complement subspace of Vˆ (that is, to some subspace of largest possible dimension that has trivial intersection with Vˆ ).

12 There are more efficient ways to check whether ∆(Cf ) ≤ r. However, this does not affect the analysis of the running time.

28 ∆ • For each 0 ≤ i ≤ n − t, set Vi = span1(Vˆ ∪ {vt+i}) + v0 (note that V0 = V ). ∗ ∗ ∗ ∗ • Let v1, . . . , vn be the dual basis of v1, . . . , vn. That is, each vi is a linear function and vi (vj) = 1 if and only if i = j (it is zero otherwise). n−t The set of subspaces w.r.t. which we receive the restrictions of f is {Vi}i=0 . The following theorem summarizes the properties of Algorithm 5 (the gluing algorithm). Theorem 4.18. Let f be an n-variate polynomial over a field F. Let V be an affine t-dimensional subspace of n (equivalently, V is of codimension n − t). Algorithm 5, given {C }n−t as input (as F f|Vi i=0 ∆(C ) defined in Definition 4.17), runs in time O(n · d f ). If V is ∆(Cf )-rank preserving for Cf then the algorithm outputs Cf .

(i) (i) Qdi (i) e Input: For each 0 ≤ i ≤ n − t, the circuit C : pi(L ,...,Lr ) · (` ) j . f|Vi 1 i j=1 j Qd0 ej Output: The circuit Cf : p(L1,...,Lr) · j=1 `j . 1 Part 1: Reconstructing gcd(Cf ) 0 0 2 If there exist 0 ≤ i < i ≤ n − t such that di 6= di0 output “fail”. Define d = d1; (0) 3 foreach linear function ` dividing f|V , with some multiplicity e, do (i) (i) (i) (0) 4 In each i ∈ [n − t] find a linear functions ` such that ` ∈ gcd(C ) and ` |V = ` . f|Vi If for some i ∈ [n − t] no such linear function `(i) exists or is not unique then output “fail”; (i) Pn ∗ 5 For each 0 ≤ i ≤ n − t, denote ` = αi,0 + j=1 αi,jvj . Define ∆ Pt ∗ Pn−t ∗ ` = α0,0 + j=1 α0,jvj + j=1 αj,j+tvj+t; 6 Insert the linear function ` to gcd(Cf ) with multiplicity e; 7 end 8 Part 2: Reconstructing sim(Cf ) 0 9 If there exist 0 ≤ i < i ≤ n − t such that ri 6= ri0 output “fail”. Define r = r1; 10 foreach i ∈ [n − t] do (i) (i) 11 find the unique r × r matrix that transforms the linear functions L1 |V ,...,Lr |V into (0) (0) L1 ,...,Lr . If no such matrix exist output “fail”; (i) (i) 12 transform the linear functions L1 ,...,Lr and the polynomial pi according to the found (i) (0) matrix so that for each j ∈ [r] and i ∈ [n − t] it will hold that Lj |V = Lj and ∆ pˆ = p0 = pi; 13 end

14 Find the unique set of linear functions Lˆ1,..., Lˆr such that for every j ∈ [r] and (i) 0 ≤ i ≤ n − t, we will have Lj|Vi = Lj ; 15 Set L1,...,Lr as the default basis of span1{Lˆ1,..., Lˆr}. Construct the polynomial p such that p(L1,...,Lr) =p ˆ(Lˆ1,..., Lˆr); ∆ 16 output Cf = p(L1,...,Lr) · gcd(Cf ); Algorithm 5: Gluing together low dimension restrictions of a low rank circuit

Proof of Theorem 4.18. We first note that if V = V0 is r-rank preserving for Cf then for each i ∈ [n−t] it holds that Vi is also r-rank preserving for Cf (since V0 ⊆ Vi). Thus, according to Lemma 4.14, in each subspace Vi, we have that

gcd(C ) = gcd(Cf )|V and sim(C ) = sim(Cf )|V . f|Vi i f|Vi i

29 Hence, the circuits Cf |V and C are identical. This gives us the method for obtaining the restrictions i f|Vi gcd(Cf )|Vi and sim(Cf )|Vi for every 0 ≤ i ≤ n−t. We now deal with the different parts of Algorithm 5. 0 0 Part 1: Constructing gcd(Cf ): Denote the distinct linear factors of gcd(Cf ) by `1, . . . , `d0 and let their multiplicities be e1, . . . , ed0 , respectively. Namely,

d0 Y 0 ej gcd(Cf ) = (` j) . j=1

∗ ∗ We shall think of those linear functions as represented with respect to the basis v1, . . . , vn. That is, for each i ∈ [d0], we define: n 0 0 X 0 ∗ `i = αi,0 + αi,jvj . j=1 The following lemma gives the analysis of part 1 of the algorithm.

Lemma 4.19. Part 1 of Algorithm 5 runs in time n · poly(d). If V is r-rank preserving for Cf , it outputs gcd(Cf ). Proof. The complexity analysis is trivial, so we only prove the correctness of the algorithm. Since all the subspaces are r-rank preserving, we have that gcd(Cf )|V = gcd(C ) for every subspace Vi i f|Vi (Lemma 4.14). Furthermore, the restrictions of every two non-equivalent linear functions remain non- 13 equivalent (Property 1 of Definition 2.5). It follows that deg(gcd(Cf )|Vi ) = deg(gcd(Cf )). Hence, Step 2 of the algorithm does not fail. As no two linearly independent linear functions become linearly (i) (i) (0) dependent when restricted to V we get that indeed there exists a unique ` such that ` |V = ` . (i) We now note that the function ` defined in Step 5 satisfy that `|Vi = ` for every 0 ≤ i ≤ n − t. By the structure of the Vi’s it is clear that this ` is unique. In particular ` must belong to gcd(Cf ) with multiplicity e as required.

Part 2: Gluing the Restrictions of sim(Cf ): In [Shp07] an algorithm for exactly this task was given (Algorithm 4). Although it is not presented in this form, the Algorithm 4 of [Shp07] can be seen (i) (i) (i) as getting representations of the form sim(f)|Vi = Q(`1 , . . . , `r ), where for each i and j, `j |V = `j, ˜ ˜ (i) and then computing `j such that for every i it holds that `j|Vi = `j . This is exactly what we wish to achieve in Part 2 of our algorithm, and indeed the algorithm we give is exactly the algorithm (presented implicitly) of [Shp07]. Thus correctness follow from the following lemma. + Lemma 4.20. (implicit in [Shp07]) Let h be a non-zero n-variate polynomial of degree d. Let r ∈ N n be such that h is a polynomial of exactly r linear functions. Let V0 be a subspace of F such that h|V0 n−t is a polynomial of exactly r linear functions. Algorithm 4 of [Shp07], given the input {h|Vi }i=0 outputs a representation of h as a polynomial of r linear functions in O(n · dr) time. By setting h as f/Lin(f), we clearly satisfy the requirements of Lemma 4.20. Hence, part 2 of r Algorithm 5 (that acts exactly as Algorithm 4 of [Shp07]) outputs the circuit sim(Cf ) in O(n · d ) time. Theorem 4.18 follows from this and from Lemma 4.19.

13 It is possible for exactly one linear function in gcd(Cf ) to be restricted to a constant. This case can be easily dealt with. We assume for convenience and w.l.o.g. that no functions were restricted to constants.

30 4.3 The Reconstruction Algorithm We are now ready to summarize our findings and present the learning algorithm. Algorithm 6, given the inputs k, n, d ∈ N and black box access to an n-input ΣΠΣ(k, d, ρ) circuit C, outputs a ΣΠΣ(s, d, r) 0 circuit C computing the same polynomial as C. Its running time is quasipolynomial in n, d, |F|. The following lemma gives the summary of the algorithm analysis. Theorem 1 is a corollary of it.

Lemma 4.21. Let k, n, d, ρ ∈ N and let C be an n-variate ΣΠΣ(k, d, ρ) circuit over the field |F|. Then Algorithm 6, given a black box computing C as input, outputs for some r ≤ k 0 max {50 log(d),R(2k, d),R(k + 1, d) + k · ρ}· (ξd(k) · k) and s ≤ k, a ΣΠΣ(s, d, r) circuit C such that C0 ≡ C. The running time of the algorithm is

“ k−2 k+1” 2 O (max{50 log(d),R(2k,d),R(k+1,d)+k·ρ}·((ξd(k)+1)·k) ·ξd(k)) n · |F| =

O(k3) O(k) poly(n) · exp(log(|F|) · log(d) · ρ ).

Input: k, n, d, ρ ∈ N and a black box holding an n-variate ΣΠΣ(k, d, ρ) circuit C. Output: A ΣΠΣ(s, d, r) circuit C0 computing the same polynomial as C. ∆ 1 rinit = max {50 log(d),R(2k, d),R(k + 1, d) + k · ρ};   dlog (ξ (k)+1)e·k dk k rinit·k k d +2 2 Let S ⊆ F be such that |S| = n 2 + 2 2 + 1; dlog (ξ (k)+1)e·(k−2) 3 foreach rinit ≤ r ≤ rinit · k k d and α ∈ S do ∆ 4 Set t = r · ξd(k); ∆ 5 Define V = Vα,t. Let Vˆ be the homogenous part of V , and v1, . . . , vt be the default basis of 0 Vˆ . Also let v0 be such that V = Vˆ + v0. Let vt+1, . . . , vn be the default basis for some V 0 n satisfying V ⊕ Vˆ = F ; ∆ 6 For each j ∈ [n − t], define Vj = span(Vˆ ∪ {vt+j}) + v0;

7 For each 0 ≤ j ≤ n − t run Algorithm 4 on the circuit C|Vj . If the algorithm failed in any of the restrictions then proceed to the next pair of (r, α), otherwise, for every j, define the Psj outputted circuit as i=1 C j ; fi

8 If for any 0 ≤ j1 < j2 ≤ n − t, sj1 6= sj2 then proceed to the next pair of (r, α). Otherwise ∆ define s = s1; 9 Reorder the numbering of the multiplication gates so that for every i ∈ [s] and j ∈ [n − t] it j 0 holds that fi |V = fi . If this is not possible then output “fail” ; n−t 10 For every i ∈ [s], run Algorithm 5 with input {C j }j=0. If the algorithm failed for any fi

i ∈ [s] then proceed to the next pair of (r, α). Otherwise, for each i ∈ [s] set Cfi to be the circuit found by Algorithm 5, given i; Ps Ps 11 If i=1 Cfi ≡ C, output i=1 Cfi . Otherwise, proceed to the next pair of (r, α); 12 end Algorithm 6: Learning a ΣΠΣ(k, d, ρ) circuit

Proof. (of Lemma 4.21) We first prove the correctness of the algorithm. Notice that before we output any circuit we verify that it computes the correct polynomial. Hence, it suffices to prove that there exists at least one pair of (r, α) for which a circuit is outputted. Let f be the polynomial computed by the input circuit C. In the main iteration, the integer r dlog (ξ (k)+1)e·(k−2) takes each value between rinit and rinit · k k d . According to Theorem 3.2, there exist at

31 0 least one such value r for which there exist an integer s ≤ k and a minimal ξd(k)-distant ΣΠΣ(s, d, r) circuit C0 computing f. We focus on the iterations in which r takes a minimal such value. According to Lemma 2.8, there exist α ∈ S for which the chosen subspace Vα,t is t-rank preserving for C0. We focus on the iteration where such an element α is chosen along with the specified value of 0 ∆ Ps0 r. Let C = C 0 be a minimal ξ (k)-distant ΣΠΣ(s, d, r) circuit computing f whose existence i=1 fi d stems from the choice of r. Notice that,

(1) (2) R(2k, d, r)/r = R(2k, d)/r + 2k ≤ 2k + 1 ≤ ξd(k) Inequality 1 holds since r ≥ R(2k, d). Inequality 2 follows from Lemma 4.5. Hence, by Corollary 3.7, 0 the circuit C |V is a ξd(k)-distant ΣΠΣ(s, deg(f|V ), r) minimal circuit computing f|V . In addition, 0 by the same corollary, there is no other ξd(k)-distant ΣΠΣ(k , deg(f|V ), r) minimal circuit computing 0 0 f|V for any k ≤ k. Therefore, for each subspace Vj, the circuit reconstructed in Step 7 is in fact C |Vj . In particular, all the sj’s are the same (and equal to s as defined in Step 8) and for each i ∈ [s] and 0 0 0 ≤ j ≤ n − t it holds, after Step 9, that C j = Cf |Vj . Since each Vj is r-rank preserving for C we fi i have by Lemma 4.14 that C 0 |V = C 0 . From this and from V being t-rank preserving for each C 0 , fi j fi |Vj fi it follows by Theorem 4.18 that in Step 10, for every i ∈ [s] Algorithm 5 outputs the circuit C 0 (that fi 0 is, for every i ∈ [s], fi = fi ). s0 We have shown so far that for some pair of (r, α), in Step 10 we reconstruct the circuits {C 0 } . fi i=1 Hence, the circuit we check in the following step is C0. As C0 ≡ f, the algorithm outputs it as the required circuit. This proves the correctness of Algorithm 6. We now analyze the complexity of the algorithm. The number of iterations we have is

3 2 3 2 |S| · 2O(k ) · (log(d))O(k ) · ρ = n · poly(d, ρ) · 2O(k ) · (log(d))O(k ) . The only steps whose running time analysis are neither trivial nor already analyzed (in previous sections) are steps 9 and 11. In Step 9, by performing a brute force PIT we reach a running time bound of 2 O(t) O(tk+1) n · s · d ≤ n · |F| . In Step 11, we would like to deterministically check whether the ΣΠΣ(s + k, d, r) circuit C − C0 computes the zero polynomial given only a black box computing the circuit. In [KS08], an algorithm is given that does this in  2  n · exp 2O(k ) · (log d)2k−1 + 2kr log d time (see Lemma A.5 of Section A.4). Recall that

k+1 k+1  O(k2) 2k−1  t = (ξd(k) · r) = Ω 2 · (log d) + 2kr log d .

Therefore, assuming that d ≤ |F|, the running time of is Step 11 is in fact O(tk+1) n · |F| . We have that the total time of each iteration, according to Theorems 4.16 and 4.18 is

O(tk+1) O(r) O(tk+1) n · |F| + n · d = n · |F| . Hence, the total running time of the algorithm is

 O(k3) O(k2)  O(tk+1) 2 O(tk+1) n · poly(d, ρ) · 2 · (log(d)) · n · |F| = n · |F| This proves Lemma 4.21.

32 5 Reconstructing Multilinear ΣΠΣ circuits

When reconstructing multilinear ΣΠΣ(k) circuits we obtain much better results. In this section we present an algorithm that given black box access to some n-variate multilinear ΣΠΣ(k) circuit C, 0 deterministically outputs, in poly(n, |F|) time, a multilinear ΣΠΣ(k) circuit C computing the same polynomial as C. The main outline of the reconstruction algorithm is the same as in ΣΠΣ(k, d, ρ) circuits. We first reconstruct the restriction of the circuit to several low dimension subspaces and then glue together the different restricted circuits. As before, the representation of the circuit in the low dimension subspaces (detailed in Section 5.1) ensure a small number of possible “lifts” of the circuit. Specifically, we reduce the case of lifting a multilinear ΣΠΣ(k) circuit to the case of lifting a ΣΠΣ(k0) circuit (for some k0 ≤ k) of low ∆ measure. This gives us a polynomial size list of circuits (the way to obtain this list is in Section 5.2) containing at least one circuit computing the same polynomial as C. Using a deterministic Polynomial Identity Test [KS08] we find the desired circuit. The similarity between both reconstructions ends here. Each low dimension subspace V we work on is of constant dimension. Furthermore, the restricted circuit C|V is multilinear. These properties ensure us that the entire circuit is of constant size and thus the time required to go over all possible multilinear circuits equivalent to C|V is polynomial in |F|. Hence, given the existence of some “liftable” representation of C|V , we can simply try to lift each possible representation until we succeed. The questions remaining is what representation do we use and how do we lift it? Assume first that the ∆ measure of C (i.e., the rank of the simplified circuit) is low. Notice that this case is somewhat analogous to the case of only one multiplication gate when dealing with ΣΠΣ(k, d, ρ) circuits. Since we are looking for a ΣΠΣ(k) circuit and not a single multiplication gate, this task is not as simple as before. The main problem arising is that many different lifts might compute the same polynomial. Hence, we must make sure that the number of (lifted) circuits computing the same polynomial is small. In Section 5.2 we give an algorithm that reconstructs a low ∆ measured multilinear ΣΠΣ(k) circuit given black boxes computing its restrictions in several low dimension subspaces. As in the previous section, the subspaces we work with contain a low dimension subspace V and several other subspaces 0 containing it. For each possible multilinear ΣΠΣ(k) circuit C equivalent to C|V , we obtain, by using the fact that C is a low ∆ measured circuit, a single “guess” of an n-variate ΣΠΣ(k) circuit. We prove that at least one of our guesses is correct (i.e., is equivalent to C). Since there is only a polynomial number of circuits equivalent to C|V in V , the total number of “guesses” is of polynomial size. When the ∆ measure of C is high, we reduce our problem the low ∆ measure problem using a unique partition of the circuit into low ∆ measured subcircuit. In Section 5.1, we present a method to find integers κ and r that ensure the existence of the a (κ, r)-strong partition (as defined in Section 3) C1,...,Cs of C, with the following property: For each multilinear-rank preserving subspace V , and 0 0 0 0 0 multilinear circuit C s.t. C ≡ C|V , there is a unique (κ, r)-strong partition C1,...,Cs0 of C that is 0 0 equivalent to the restriction of the partition of C. Namely, s = s and for each 1 ≤ i ≤ s, Ci|V ≡ Ci. 0 Hence, given a circuit C ≡ C|V , by using Algorithm 1 of Section 3, we obtain some equivalents of C1|V ,...,Cs|V as required. In the final section we gather our findings and present the main algorithm for reconstructing a multilinear ΣΠΣ(k) circuit. ************ The first phase of finding the circuit in a low dimension subspace is done in a completely different manor than in the general case. In each subspace V , we first find a ΣΠΣ(k) circuit computing C|V and then find a partition of it14 into low ∆ measured circuits. In section 5.1 we explain the process

14For a ΣΠΣ(k) circuit, each partition of [k] defines a partition of the circuit. Each subcircuit is the sum of multipli-

33 (that is, show how to obtain a partition of a circuit) and prove that the partition we find is equivalent (i.e., computes the same polynomials) to the restriction of some specific partition of C. This implies a way to reduce our problem to the problem of reconstructing a single subcircuit of the partition. In the second phase, as before we reduce our problem to the problem of finding a single low ∆ measured circuit given its restrictions to various subspaces. Unlike in the general case, we now construct a ΣΠΣ(k) circuit computing the polynomial f rather than Cf . Also, as opposed to the general case, some of the linear functions in the gcd may be restricted to constants in the low dimension subspaces. This will make the task of finding the gcd of the circuit more difficult. In Section 5.2 we give an algorithm that reconstructs a low ∆ measured multilinear ΣΠΣ(k) circuit given black boxes computing its restrictions in several low dimension subspaces. In the last section we summarize our results and present the reconstruction algorithm for multilinear circuits. We then prove its correctness and analyze its complexity.

5.1 Finding a Partition of the Circuit in a Low Dimension Subspace This section includes results analogous to sections 3 and 4.1. Given a multilinear ΣΠΣ(k) circuit C and a low dimensional subspace V we show how to find a (κ, r)-strong partition (recall Definition 3.3 of Section 3.1) of C|V to subcircuits. We then prove that the partition we find is equal to the restriction of some specific (predetermined) (κ, r)-strong partition of C (namely, each subcircuit in the partition that we found is the restriction of some subcircuit from the predetermined partition). The way we find the partition of C|V is very simple. In the multilinear case V is of dimension O(1) and as the degree of a multilinear circuit is bounded by its rank, we get that C|V is computable by a multilinear ΣΠΣ(k) circuit of a constant size. Hence, by using brute force techniques, we can construct a multilinear ΣΠΣ(k) circuit computing the same polynomial as C|V in time poly(|F|). We then run Algorithm 1 (see Section 3.1) to find the required partition. The main result of this section shows that the partition found above is equivalent to the restriction of the partition that Algorithm 1 finds when given as input the circuit C. Specifically, we show the following: Assume that given C, κ and rinit as input Algorithm 1 outputs the (κ, r)-strong partition C1,...,Cs of C. Then for any ΣΠΣ(k) circuit 0 0 0 C such that C ≡ C|V , algorithm 1, given the input C , κ and rinit, outputs a (κ, r)-strong partition 0 0 0 0 0 C1,...,Cs such that there exist a reordering of C1,...,Cs where for each i ∈ [s], Ci ≡ Ci|V . To prove the result we shall need to analyze the running time of Algorithm 1 and discuss some of its properties. Note that in Section 4 we only used the existence of a partition and did not run Algorithm 1 so we didn’t need to analyze its running time there. For convenience, we repeat the notations and analysis of Algorithm 1 given in Section 3.1:

• Let n, k, rinit, κ ∈ N, and C be a n-variate multilinear ΣΠΣ(k, d) circuit. (k−2)·dlog (κ)e • Algorithm 1 outputs an integer rinit ≤ r ≤ k k · rinit and a partition I of [k] corre- sponding to a (κ, r)-strong partition of C.

• Denote by m the total number of iterations preformed by the algorithm.

∆ • ι = dlog(κ)e.

Since we are about to use the algorithm it is essential to discuss its running time:

Lemma 5.1. Algorithm 1 finds a (κ, r) strong partition of [k] in time O(log(κ) · n3k4). cation gates in the corresponding subset. A formal definition is given in Section 3.1

34 Proof. In each iteration of the main loop, all actions except the inner loop require time O(k2). The inner loop has O(k2) iterations in which we check the ∆ measure of a circuit having at most nk linear functions. To check the rank of a circuit we preform a gaussian elimination to the matrix of coefficients of the linear functions in it. This requires O(n3k) time (there are faster algorithms of course but it this will not affect the total running time). Hence, the total running time of a single iteration in the main loop is O(n3k3). In every ι iterations, the number of elements in I is reduced by at least one. The number of elements in I begins with k and ends with at least 1. Hence, the number of iterations is at most ι · (k − 1). Thus, the total running time is O(ι · n3k4) = O(log(κ) · n3k4).

To prove the uniqueness of the partition we first analyze the ∆ measure of the subcircuits and their multiplication gate. The following lemma uses the notations of Algorithm 1 (most importantly, m and rm were defined there).

Lemma 5.2. Let C1,...,Cs denote the subcircuits of C induced by the partition found in Algorithm 1. If ι ≥ 2 then we have the following: For every i 6= i0 ∈ [s] and any two multiplication gates M,M 0 in 0 the circuits Ci,Ci0 (respectively), it holds that ∆(M,M ) ≥ rm−1.

0 0 Proof. Let Ci = M1 +...+Ml (where each Mj is a multiplication gate) and Ci0 = M1 +...+Ml0 be two 0 different subcircuits in the partition. Assume w.l.o.g. and for a contradiction that ∆(M1,M1) < rm−1. Then, (1) (2)

∀j1, j2 ∈ [l], ∆(Mj1 ,Mj2 ) ≤ ∆(Ci) ≤ r = rm−ι ≤ rm−2. Inequality 1 holds according to Lemma 2.17. Since the partition is (κ, r)-strong (Lemma 3.4), inequal- ity 2 holds as well. Analogically, we have that

0 0 0 0 0 ∀j , j ∈ [l ], ∆(M 0 ,M 0 ) ≤ rm−2. 1 2 j1 j2 Hence, (1) (2) 0 0 0  rm ≤ ∆(Ci,Ci0 ) = ∆ Ml,Ml−1,...,M1,M1,M2,...,Ml0 ≤ l m0 X X 0 0 0 ∆(Mj,Mj−1) + ∆(Mj,Mj−1) + ∆(M1,M1) ≤ j=2 j=2

rm−1 + (k − 1) · rm−2 < 2 · rm−1 ≤ k · rm−1 = rm Note that inequality 1 holds as otherwise the m’th iteration would not have been the last one (con- tradicting the definition of m). Inequality 2 follows from Lemma 2.17. This proves the claim.

+ 0 Theorem 5.3. Let rinit, κ ∈ N and let C and C be two minimal multilinear ΣΠΣ(k) circuits computing the same non-zero polynomial.

0 0 0 0 • Let C1,...,Cs and C1,...,Cs0 (s ≥ s) be the partitions of C and C found by Algorithm 1 when given κ, r and C (or C0) as input.

2 • Assume that rinit ≥ RM (2k) and that κ > k (that is, ι ≥ 3).

Then, s = s0 and there exists a reordering of the subcircuits such that for each i ∈ [s], it holds that 0 Ci ≡ Ci.

35 Proof. Consider the circuit C0 − C. Clearly it computes the identically zero polynomial. Let M be 0 a multiplication gate in C1. M also appears in some minimal zero subcircuit of C − C. Denote this minimal zero subcircuit with C˜. Note that C˜ must contain at least one multiplication gate from C0 as otherwise we will get that C is not minimal, in contradiction. Assume w.l.o.g. that there exists a 0 ˜ 0 15 ˆ ˜ multiplication gate M in C that was “originally” a multiplication gate of C1. Then for each M ∈ C, we have that

∗ 0 ∆(M,Mˆ ) ≤ ∆(C˜) < RM (2k) ≤ rinit ≤ rm−1, ∆(M,Mˆ ) ≤ ∆(C˜) < rm−1, where equality ∗ holds as C˜ computes the zero polynomial. Therefore, by Lemma 5.2 either Mˆ ∈ C1 ˆ 0 0 or M ∈ C1. It follows that each minimal identically zero subcircuit of C − C is contained in the union of exactly two subcircuits, one from C and one from C0. In other words, we have shown that for a ˜ 0 0 0 ˜ ˜ 0 0 minimal subcircuit C of C − C, there exist i ∈ [s] and i ∈ [s ] such that C ∩ C ⊆ Ci and C ∩ C ⊆ Ci0 . It is now left to prove that each two minimal zero circuits of C0 − C are either composed of multiplication gates of the same pair of subcircuits (of C and C0) or from distinct pairs of subcircuits. Formally, we must prove the following claim: Let C,˜ Cˆ be two different minimal zero circuits of C0 −C. ˜ ˆ ˜ 0 ˆ 0 Assume that C ∩ C1 6= ∅, C ∩ C1 6= ∅ and that C ∩ C1 6= ∅. Then C ∩ C1 6= ∅. Proving this will show 0 that C1 − C1 is a sum of identically zero circuits. ˆ ˜ 0 ˜ 0 0 ˆ 0 0 ˆ 0 Let N ∈ C ∩ C1, M ∈ C ∩ C1, M ∈ C ∩ C1 and N ∈ C ∩ C . We will show that N ∈ C ∩ C1. The following inequalities hold:

0 ∆(M ,M) ≤ ∆(C˜) < RM (2k) ≤ rinit ≤ rm−ι ≤ rm−3,

0 ∆(M,N) ≤ ∆(C1) ≤ rm−ι ≤ rm−3, ∆(N,N ) ≤ ∆(Cˆ) < RM (2k) ≤ rm−3. Hence, by Lemma 2.17 we get that

0 0 0 0 2 ∆(M ,N ) ≤ ∆(M ,M) + ∆(M,N) + ∆(N,N ) < 3 · rm−3 ≤ k · rm−3 = rm−1.

0 0 0 0 0 Since M ∈ C1, we have by Lemma 5.2 that N ∈ C1. It follows that C1 − C1 is a sum of identically 0 0 zero circuits, i.e., C1 ≡ C1. Analogically, we have w.l.o.g. that Ci ≡ Ci for all i ∈ [s]. Furthermore, if s < s0 then C0 is not minimal, meaning that s = s0.

Corollary 5.4. Let C be a non-zero minimal multilinear ΣΠΣ(k) circuit:

+ 2 • Let rinit, κ ∈ N be such that rinit ≥ RM (2k) and κ > k .

• Let C1,...,Cs be the partition of C outputted by Algorithm 1 when given C, rinit and κ as input.

(k−1)·dlog (κ)e • Let V be an (rinit · k k )-multilinear-rank-preserving subspace for C.

0 0 • Let C be a ΣΠΣ(k) circuit in dim(V ) indeterminates such that C ≡ C|V .

0 0 0 0 • Let C1,...,Cs0 be the partition of C outputted by Algorithm 1 when given C , rinit and κ as input.

Then, s = s0 and there exists a reordering of the subcircuits such that for each i ∈ [s], it holds that 0 Ci|V ≡ Ci. 15For convenience we abuse notations and sometimes refer to a circuit as a set of multiplication gates.

36 Proof. According to Theorem 5.3, it suffices to show that Algorithm 1, given C|V , rinit and κ as input outputs C1|V ,...,Cs|V . Let r be the integer outputted by Algorithm 1 when given C, rinit and κ as (k−2)·dlog (κ)e dlog (κ)e input. We have that r ≤ rinit · k k . Hence V is (r · k k )-multilinear-rank-preserving for C. It follows that for every subcircuit C˜ of C andr ˜ ≤ kdlogk(κ)e · r, it holds that

∆(C˜) ≤ r˜ ⇔ ∆(C˜|V ) ≤ r.˜ Hence, by uniqueness of the partition found by Algorithm 1 (Theorem 3.6),16 we get that the two partitions are equal (up to reordering of the indices). This proves the claim.

5.2 Lifting a Low Rank Multilinear ΣΠΣ(k) Circuit In this section we give an algorithm that given restrictions, of a low ∆ measured circuit, to various low dimension subspaces, glues these representations to one n-variate multilinear ΣΠΣ(k) circuit computing the same polynomial as the original circuit. As in the previous section we saw how to find a strong partition of the circuit in a low dimensional subspace, we can combine the two results to get an algorithm for reconstructing the original circuit. more formally, the problem we solve is the following: Let k, r, n ∈ N. Let C be an n-variate multilinear ΣΠΣ(k) circuit computing a polynomial f such that ∆(C) ≤ r. Let B ⊆ [n] and α ∈ F be such that VB,α is a liftable r-multilinear-rank-preserving t dimensional subspace for C (recall Definition 2.13). Given black boxes computing the polynomial f| for each B0 ⊇ B of size |B0| = VB0,α |B| + 2, we would like to construct an n-variate ΣΠΣ(k) multilinear circuit C0 that computes the ∆ polynomial f. For convenience, we set V = VB,α. For each A ⊆ [n] let

A A ∆ V = VB,α = VB∪A,α. (8) We assume w.l.o.g. that the set B is in fact the set {n − t + 1, n − t + 2, . . . , n} and that the shifting vector vα,0 = {0, 0,..., 0} (that is α = 0). As a first step towards computing a ΣΠΣ(k) circuit for f we first construct a ΣΠΣ(k) circuit computing a restriction of f to V . Notice that since C|V is a multilinear ΣΠΣ(k) circuit in t variables, its size is bounded from above by some integer function of t and k. Hence, by going over all circuits bounded by that size restriction we will at some point find C|V . This “pool of circuits” is of size poly(|F|), assuming that k and t are constants. Therefore, in the first model we work on, the circuit C|V is given to us as input while in practice we try to lift the circuit for each guess of C|V . The lifting process consists of two phases. First we find the linear functions of gcd(C) and then, having access to sim(C) we find a simple circuit C0 such that C0 ≡ sim(C). Algorithm 7 constructs gcd(C). 2 O(t) Lemma 5.5. Algorithm 7 outputs gcd(C) in n · |F| time. Proof. To prove the correctness of the algorithm we show that each linear function in gcd(C) is found exactly once. Clearly, each linear function ` ∈ gcd(C) such that `|V is not constant either belongs to L or is found in the main iteration. We prove that each linear function that is constant on V is found in the main iteration and that no unnecessary function is added to gcd(C). Step 5 is meant to find whether xi appears in gcd(C). The other steps are reached only when xi appears in gcd(C). In these steps we find the linear function containing xi (there is only one such function since the circuit is multilinear). To prove the correctness of step 5, we use the following two lemmas stating sufficient and necessary conditions for the appearance of a variable and of a linear function in gcd(C):

16We proved uniqueness in the case that κ ≥ R(2k, d, r)/r, but in the case of multilinear circuit it is not difficult to see that κ ≥ RM (2k)/r is sufficient to guarantee uniqueness.

37 Input: The circuit C|V , where V is a liftable r-multilinear-rank-preserving t dimensional subspace for C. Black boxes computing the circuits C|V {i,j} for each i 6= j ∈ [n − t]. Output: gcd(C). 1 S ← [n − t]; 2 L ← gcd(C|V ); 3 while S 6= ∅ do 4 Pick some i ∈ S and remove it from S; 5 In a brute force manner, look for a linear function ` such that xi appears in ` and ` is a factor of (C|V {i} ). If no such function exists or ` contains a variable xj that appears in sim(C|V ) then continue to the next iteration; 6 Assuming that a linear function ` was found, find, for each j ∈ S, a linear function `j j j dividing C|V {i,j} such that ` |V {i} = `. Denote by αj the coefficient of xj in ` . If αj 6= 0, then remove j from S; P 7 Add the linear function ` + αjxj to gcd(C); 8 L ← L\{`|V }; 9 end 10 Add all functions left in L to gcd(C); Algorithm 7: Lifting the g.c.d of a low ∆ measured circuit

Lemma 5.6. Let i ∈ [n − t]. Then xi appears in gcd(C) if and only if xi appears in gcd(C|V {i} ) Proof. Notice that V {i} is multilinear-rank-preserving subspace for C. Hence, Lemma 2.6 implies that

gcd(C|V {i} ) = gcd(C)|V {i} and sim(C|V {i} ) = sim(C)|V {i} .

Now, if xi appears in gcd(C) then xi appears in gcd(C)|V {i} . Hence xi appears in gcd(C|V {i} ). If, on 17 the other hand, xi does not appear in gcd(C) then it appears in sim(C) and in a similar manner we get that xi appears in sim(C|V {i} ). Since the circuit C|V {i} is multilinear, xi does not appear in gcd(C|V {i} ). Lemma 5.7. The following holds for every linear function l appearing in the circuit C: It holds that ` ∈ gcd(C) if and only if ` divides C and each xi appearing in ` does not appear in sim(C|V ).

Proof. Assume that ` ∈ gcd(C). Clearly ` divides C. Since C is a multilinear circuit, any xi appearing in ` cannot appear in sim(C). Hence, it cannot appear in sim(C|V ). This proves the first direction of the claim. Let ` be a linear function dividing C such that `∈ / gcd(C). To prove the second direction it is enough to show that some variable appearing in ` also appears in sim(C|V ). Indeed, as `∈ / gcd(C), it follows that ` divides sim(C). Hence, ` is equal to some non-trivial linear combination of the linear functions in the default basis of span1(Lin(sim(C))). Let L1,...,Lr be this basis. Since V H is r-multilinear-rank-preserving for C, we have that L1|V ,...,Lr|V are linearly independent linear functions and thus `|V is not a constant function. Let xi be some variable appearing in `|V . Obviously, xi appears in ` (and hence in sim(C)). By Lemma 2.6 we get that sim(C|V ) = sim(C)|V , and therefore, xi appears in sim(C|V ). This completes the proof of the lemma.

17 We ignore the possibility that xi does not appear in the circuit at all. In that case, xi obviously does not appear in any restriction of the circuit.

38 We continue the proof of Lemma 5.5. The two previous lemmas show that in order to check whether xi appears in gcd(C) it is enough to check that there exists a function `, containing xi, such that ` divides (C|V {i} ) and each other variable xj that appears in ` does not appear in sim(C|V ). This proves that if we found a “good” ` in Step 5 then xi ∈ gcd(C).

Lemma 5.8. Let i ∈ [n] and let l ∈ gcd(C|V {i} ) be the linear function containing xi. There exists a unique function L ∈ gcd(C) such that L|V {i} = l

Proof. Since l divides C|V {i} , the function l must originate from some irreducible polynomial g dividing C (that is, l divides (g|V {i} )). If g is not a linear function then we must have that g divides sim(C) meaning that xi must appear in sim(C) which is a contradiction. Hence, there exists a linear function L ∈ gcd(C) such that L|V {i} = l. Assume that the function L is not unique. That is, there exists 0 0 0 a function L ∈ gcd(C) such that L 6= L and L |V {i} = l. In such a case, xi appears in two linear functions of gcd(C) and the circuit is not multilinear.

We now finish the proof of Lemma 5.5. So far we have that there exists a single linear function j L ∈ C such that L|V {i} = l. Hence, for each j ∈ [n − t] chosen in Step 6, the linear function l is j unique and L|V {i,j} = l . This proves the algorithm correctness. We now analyze the complexity of the algorithm. We require, for each circuit C|V {i,j} , the linear functions dividing it. According to Lemma A.2 of Section A.1 there is a deterministic algorithm that O(t) finds the linear functions dividing a polynomial given black box access to it in |F| time. Hence, 2 O(t) to find the linear functions dividing each circuit C|V {i,j} we require n · |F| running time. Having these linear functions, it is clear that the algorithm requires an additional n2 · poly(t) running time. 2 O(t) Hence, the total time required for the algorithm is n · |F| . After constructing gcd(C) we obtain black boxes computing the different restrictions of C/ gcd(C) = sim(C). We now present an algorithm that is given sim(C)|V and black boxes computing sim(C)|V {i} for each i ∈ [n − t] as input and outputs a circuit C0 computing the same polynomial as sim(C). We abuse notations for convenience and refer to the circuit sim(C) as C.

Input: The circuit C|V . Black box access to the polynomials computed by C|V {i} for each i ∈ [n − t]. Output: A circuit C0 of n inputs such that C0 ≡ C. 1 Let L1,...,Lr be the default basis of span1(Lin(C|V )). Find a circuit C¯ of r variables such that C¯(L1,...,Lr) = C|V . Note that the equality is between the circuits and not just the polynomials they compute; 2 For each i ∈ [n − t], find a (super) set of elements of F, {βi,j}j∈[r] for which i ∆ ¯ i C = C(L1 + βi,1xi,...,Lr + βi,rxi) ≡ C|V {i} and C is a multilinear circuit. If such a set does not exist, output “fail”; 0 ∆ ¯ P P 3 output C = C(L1 + i∈[n−t] βi,1xi,...,Lr + i∈[n−t] βi,rxi); n Algorithm 8: Lifting a low rank multilinear circuit to F

n Lemma 5.9. Let V ⊆ F be an affine subspace of dimension t. Let C be an n-variate ΣΠΣ(k) O(t) multilinear circuit over F. Algorithm 8, given the corresponding inputs, runs in n · |F| time. If V is a liftable ∆(C)-multilinear-rank preserving subspace then it outputs a multilinear ΣΠΣ(k) circuit equivalent to C.

39 Proof. We first analyze the time complexity. In Step 2, in order to find a circuit of the wanted form, r we simply go over all possibilities for {βi,j}j∈[r]. There are |F| such possible sets. For each option we preform a PIT test to check if the found circuit computes the needed polynomial. We preform a deterministic PIT test in 2O(t) time since C is multilinear (see Schwartz-Zippel lemma). We have that r O(t) O(t) for each i ∈ [n − t] we “spend” |F| · 2 = |F| time. Hence, the total running time of Step 2 is at O(t) most n · |F| . Clearly, all other steps require poly(t, k) time and thus, since t ≥ k, the total running O(t) time is n · |F| . We now prove the correctness of the algorithm. We first prove that in Step 2, there exists a circuit of the form Ci for each i ∈ [n − t]. Since V is rank(C)-multilinear-rank-preserving for C, ¯ r then we have that rank(C|V ) = rank(C) = r. Let {Lj}j=1 be a basis of span1(Lin(sim(C))). Since ¯ r H rank(C|V ) = rank(C), we have that the linear functions of {Lj|V }j=1 are linearly independent . We ¯ r ¯ r define {Lj}j=1 as the basis of span1(Lin(sim(C))) such that the linear functions {Lj|V }j=1 are the default basis of the subspace they span (that is, L¯j|V = Lj for each j ∈ [r]). Let C˜ be the r-variate circuit such that C˜(L¯1,..., L¯r) = C. Then

C˜(L¯1|V ,..., L¯r|V ) = C|V = C¯(L1,...,Lr). ˜ ¯ It follows that C = C. Therefore, for each i ∈ [n − t], the circuit C|V {i} is a circuit of the form

C¯(L1 + αi,1xi,...,Lr + αi,rxi)

This means that in Step 2, we are guaranteed to find a circuit Ci for each i ∈ [n − t]. In the following lemma we prove that C0 ≡ C thus proving the algorithm correctness.

Lemma 5.10. Let {βi,j}i∈[n−t],j∈[r] be a (super) set of field elements such that for each i ∈ [n − t]: ¯ C(L1 + βi,1xi,...,Lr + βi,rxi) ≡ C|V {i}

Let {αi,j}i∈[n−t],j∈[r] be a (super) set of field elements such that X X C = C¯(L1 + αi,1xi,...,Lr + αi,rxi) i∈[n−t] i∈[n−t]

Let A1,A2 ⊆ [n − t] such that A1 ∩ A2 = ∅. Then

¯ X X X X C(L1 + βi,1xi + αi,1xi,...,Lr + βi,rxi + αi,rxi) ≡ sim(C)|V A1∪A2 i∈A1 i∈A2 i∈A1 i∈A2 Specifically, X X C¯(L1 + βi,1xi,...,Lr + βi,rxi) ≡ C i∈[n−t] i∈[n−t]

Proof. By induction on |A1|: The base case where A1 = ∅ is trivial. To prove the induction step we present a few definitions: For each i ∈ [n − t] Let pi(y1, . . . , yr, x) be the polynomial computed by C¯(y1 +αi,1x, . . . , yr +αi,rx). Let qi(y1, . . . , yr, x) be the polynomial computed by C¯(y1 +βi,1x, . . . , yr + βi,rx). H Since for each i ∈ [n − t], we have that L1,...,Lr, xi are linearly independent linear functions, it follows that for each r + 1 indeterminates y1, . . . , yr, x, it holds that

qi(y1, . . . , yr, x) = pi(y1, . . . , yr, x).

40 Let m ∈ A1 be some integer. For each j ∈ [r] define:

∆ X X Lˆj = Lj + αi,jxi + βi,jxi.

i∈A2 i∈A1\{m} Then, X X X X C¯(L1 + βi,1xi + αi,1xi,...,Lr + βi,rxi + αi,rxi) =

i∈A1 i∈A2 i∈A1 i∈A2

C¯(Lˆ1 + βm,1xm,..., Lˆr + βm,rxm) ≡ qm(Lˆ1,..., Lˆr, xm) = ˆ ˆ ¯ ˆ ˆ pm(L1,..., Lr, xm) ≡ C(L1 + αm,1xm,..., Lr + αm,rxm) ≡ sim(C)|V A1∪A2 . This proves the Lemma.

By setting A1 = [n − t],A2 = ∅ in the former lemma, we get the proof of Lemma 5.9. We conclude this section with Algorithm 9 combining the section results. The algorithm works in the more general model where we are only given black boxes computing the restrictions of C.

Input: k, n ∈ N, B ⊆ [n], α ∈ F. For some ΣΠΣ(k) multilinear n-variate circuit C we receive black box access to the polynomials computed by C| A for each A ⊆ [n] of size |A| ≤ 2. VB,α Output: A set C of n-variate ΣΠΣ(k) multilinear circuits. ∆ ∆ 1 Define V = VB,α and t = dim(V ); 2 For each ΣΠΣ(k) multilinear circuit Cˆ of t inputs such that Cˆ ≡ C|V , run Algorithms 7 and 8 where Cˆ is given as C|V ; 3 For each Cˆ where Algorithms 7 and 8 did not fail, add their output to C; n Algorithm 9: Lifting a multilinear circuit to F

Theorem 5.11. Let k, n ∈ N. Let C be an n-variate ΣΠΣ(k) multilinear circuit. Let B ⊆ [n], α ∈ F and the corresponding restrictions of C be the input of Algorithm 9. Let t = |B|. Then Algorithm 9 2 O(k·t) O(k·t) runs in n · |F| time and outputs a set of size |F| . If VB,α is a liftable ∆(C)-multilinear-rank- preserving subspace for C then there exists at least one circuit C0 ∈ C such that C0 ≡ C.

Proof. Clearly, if VB,α is a liftable ∆(C)-multilinear-rank-preserving subspace for C, then when we 0 run Algorithms 7 and 8 on C|V then we get a circuit C computing the same polynomial as C. This proves the correctness of the algorithm. We now analyze the time complexity and the size of C. For each “guess” of C|V , according to 2 O(t) Lemmas 5.9 and 5.5, we require running time of n · |F| . For each such guess we increase the size of C by at most one. The question remaining is how many guesses for C|V do we have. Using combinatorial methods18, it can be shown that the number of multilinear multiplication gates is at 2t most |F| · exp(t). Since we have at most k multiplication gates, the number of possible ΣΠΣ(k) O(k·t) multilinear circuits is at most |F| . Hence, the total running time of the algorithm is

O(k·t) O(t) 2 2 O(k·t) |F| · |F| · n = n · |F| .

O(k·t) The maximum size of C is |F| . This proves the theorem.

18 2t We bound the number of coefficients in a multiplication gate by 2t, this gives us |F| options. To determine which indeterminates appear in the same linear functions, we require a partition of [t]. There are exp(t) possible partitions of [t].

41 5.3 The Algorithm We are now ready to present the main reconstruction algorithm for ΣΠΣ(k) multilinear circuits. Algorithm 10 receives a black box computing an n-variate multilinear ΣΠΣ(k) circuit over a field F and outputs a multilinear ΣΠΣ(k) circuit computing the same polynomial. We give a lemma analyzing the algorithm that leads to Theorem 2.

Input: k, n ∈ N. A black box computing an n-variate ΣΠΣ(k) multilinear circuit C. Output: A multilinear ΣΠΣ(k) circuit C0 such that C0 ≡ C. ∆ ∆ 3 0 ∆ 3(k−2) 4 2 1 Let rinit = RM (2k), κ = k , r = rinit · k and S ⊆ F such that |S| = n k + 1; 2 foreach α ∈ S and B ⊆ [n] of size |B| = r0 · 2k do A 2 n 3 For each A ⊆ [n] define VB,α as the affine subspace: span1 {xi|i ∈ B ∪ A} + (α, α , . . . , α ); 4 Define A = {A | A ⊆ [n]\B, |A| ≤ 2}. For each A ∈ A find a multilinear ΣΠΣ(k) circuit CA A such that C ≡ C| A ; VB,α A 5 For each such circuit run Algorithm 1 with inputs n, k, rinit, κ, and C . Denote the polynomials computed by the partition circuits as f A, . . . , f A ; 1 sA

6 If there exist two such sets A1,A2 ∈ A such that sA1 6= sA2 , proceed to the next pair of B, α. Otherwise, denote s as the number of the polynomials in each partition. Also, reorder the polynomials so that for each A1,A2 ∈ A and each j ∈ [s] it will hold that A1 A2 fj |VB,α = fj |VB,α ; + Ps 7 foreach k1, . . . , ks ∈ N such that i=1 ki = k do A 8 For each j ∈ [s], activate Algorithm 9 with inputs kj, B, α and {fj }A∈A. Denote the outputted set as CB,α,j; ˆ ˆ Ps ˆ 9 For each (C1,..., Cs) ∈ CB,α,1 × ... × CB,α,s check whether j=1 Cj ≡ C. If so, output Ps ˆ j=1 Cj. If no such circuits exist, proceed to the next iteration; 10 end 11 end Algorithm 10: Reconstruction of a multilinear ΣΠΣ(k) circuit

Lemma 5.12. Algorithm 10, given as input k, n and a black box computing an n-variate ΣΠΣ(k) 2 0 2O(k ) multilinear circuit C, outputs a circuit C computing the same polynomial as C in (n + |F|) time. Proof. We first prove the algorithm correctness. Before the algorithm outputs a circuit, it verifies the output is legal (that is, we verify that C0 ≡ C). Hence, it suffices to prove that in some iteration, the algorithm outputs a circuit. According to Corollary 2.15, there exists a set B ⊆ [n] and a field element α ∈ S such that VB,α is a liftable r0-multilinear-rank-preserving subspace for C. We focus on the iterations where such B, α are chosen. Let C1,...,Cs and r be a the circuits of the partition and the integer outputted by 3 0 0 Algorithm 1 when its input is the circuit C, rinit = RM (2k) and κ = k . Let k1, . . . , ks the number of multiplication gates in C1,...,Cs (respectively). 0 Recall that the subspace VB,α is a liftable r -multilinear-rank-preserving subspace for C. Also, 2 0 (k−1)·dlog (κ)e rinit ≥ RM (2k), κ > k and r = rinit · k k . Hence, by Corollary 5.4 we have that for each A ∈ A it holds that sA = s and that for each set A there exist some reordering of the polynomial A A {f }j such that f ≡ Cj| A . In particular, this shows that Step 6 does not fail the algorithm. In j j VB,α A Step 6 we reorder the polynomials so that w.l.o.g. it holds that f ≡ Cj| A for each j ∈ [s]. j VB,α

42 0 0 In the inner loop, we focus on the iteration where k1, . . . , ks are chosen as k1, . . . , ks. By Theo- 0 0 rem 5.11 we have that for each j, the set CB,α,j contains a circuit Cj such that Cj ≡ Cj. Hence, one P 0 P 0 of the circuits we check is the circuit j=1 Cj. Clearly, j=1 Cj ≡ C. This proves the correctness of the algorithm. We now analyze the time complexity of Algorithm 10. In Step 4, we find several circuits via brute force techniques. As detailed in the proof of Theorem 5.11, the number of ΣΠΣ(k) multilinear O(k·t) (t + 2)-variate circuits over F is |F| . For each such circuit, we perform a PIT check with |A| different multilinear O(t)-variate circuits. The time required for the PIT checks for each circuit is O(t) O(t) 2 2 O(kt) 2 · |A| = 2 · n . Hence, the total amount of time needed in Step 4 is n · |F| . Step 5 requires, according to Lemma 5.1, O(log(κ)·t3k3 ·|A|) = n2 ·poly(t, k) time. Step 6 requires O(n4k2) PIT checks of t-variate multilinear polynomials. Hence, it requires n4 · 2O(t) time (since t ≥ 2k). The inner loop has exp(k) iterations. We analyze the running time of each iteration. Step 8 2 O(k·t) contains s ≤ k calls to Algorithm 9. This requires, according to Theorem 5.11, n · |F| running O(kt) s O(k2t) time. In Step 9 we preform, according to Theorem 5.11, (|F| ) = |F| PIT tests. Each such test is between two n-variate ΣΠΣ(k) circuits. To do so deterministically we use an algorithm 2 given in [KS08] that does so in n2O(k ) time (see Lemma A.6 in Section A.4). Hence, Step 9 requires 2 O(k2t) 2O(k ) |F| · n time. 2 O(k2t) 2O(k ) In total, each iteration of the main loop requires |F| · n time. The number of such iterations is at most |S| · n|B| = |S| · nt. Hence, the total running time of the algorithm is:

2 2 4 2 k3(k−1)R (2k)2k O(k3(k−1)+2R (2k)2k) 2O(k ) 2O(k ) (n k + 1) · n M · |F| M · n = (n + |F|) .

This proves Lemma 5.12.

43 References

[AV08] M. Agrawal and V. Vinay. Arithmetic circuits: A chasm at depth four. In Proceedings of the fourty ninth Annual FOCS, pages 67–75, 2008.

[BB98] D. Bshouty and N. H. Bshouty. On interpolating arithmetic read-once formulas with ex- ponentiation. J. of Computer and System Sciences, 56(1):112–124, 1998.

[BBB+00] A. Beimel, F. Bergadano, N. H. Bshouty, E. Kushilevitz, and S. Varricchio. Learning functions represented as multiplicity automata. JACM, 47(3):506–530, 2000.

[BHH95] N. H. Bshouty, T. R. Hancock, and L. Hellerstein. Learning arithmetic read-once formulas. SIAM J. on Computing, 24(4):706–735, 1995.

[BOT88] M. Ben-Or and P. Tiwari. A deterministic algorithm for sparse multivariate polynominal interpolation. In Proceedings of the 20th Annual STOC, pages 301–309, 1988.

[DS06] Z. Dvir and A. Shpilka. Locally decodable codes with 2 queries and polynomial identity testing for depth 3 circuits. SIAM J. on Computing, 36(5):1404–1434, 2006.

[FK06] L. Fortnow and A. R. Klivans. Efficient learning algorithms yield circuit lower bounds. In Proceeding of the 19th annual COLT, pages 350–363, 2006.

[GK98] D. Grigoriev and M. Karpinski. An exponential lower bound for depth 3 arithmetic circuits. In Proceedings of the 30th Annual STOC, pages 577–582, 1998.

[GR00] D. Grigoriev and A. A. Razborov. Exponential complexity lower bounds for depth 3 arith- metic circuits in algebras of functions over finite fields. Applicable Algebra in Engineering, Communication and Computing, 10(6):465–487, 2000.

[HH91] T. R. Hancock and L. Hellerstein. Learning read-once formulas over fields and extended bases. In Proceedings of the 4th Annual COLT, pages 326–336, 1991.

[Kal85] E. Kaltofen. Polynomial-time reductions from multivariate to bi- and univariate integral polynomial factorization. SIAM J. on computing, 14(2):469–489, 1985.

[Kal95] E. Kaltofen. Effective Noether irreducibility forms and applications. J. of Computer and System Sciences, 50(2):274–295, 1995.

[KM93] E. Kushilevitz and Y. Mansour. Learning decision trees using the fourier spectrum. SIAM J. Comput., 22(6):1331–1348, 1993.

[KS01] A. Klivans and D. Spielman. Randomness efficient identity testing of multivariate polyno- mials. In Proceedings of the 33rd Annual STOC, pages 216–223, 2001.

[KS06a] A. Klivans and A. Shpilka. Learning restricted models of arithmetic circuits. Theory of computing, 2(10):185–206, 2006.

[KS06b] A. R. Klivans and A. A. Sherstov. Cryptographic hardness for learning intersections of half- spaces. In Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pages 553–562, 2006.

[KS07] N. Kayal and N. Saxena. Polynomial identity testing for depth 3 circuits. Computational Complexity, 16(2):115–138, 2007.

44 [KS08] Z. Karnin and A. Shpilka. Deterministic black box polynomial identity testing of depth-3 arithmetic circuits with bounded top fan-in. In Proceedings of the 23rd Annual CCC, pages 280–291, 2008.

[KT90] E. Kaltofen and B. M. Trager. Computing with polynomials given by black boxes for their evaluations: Greatest common divisors, factorization, separation of numerators and denominators. J. of Symbolic Computation, 9(3):301–320, 1990.

[Raz08] R. Raz. Elusive functions and lower bounds for arithmetic circuits. In Proceedings of the 40th Annual ACM Symposium on Theory of Computing (STOC), pages 711–720, 2008.

[Sch80] J. T. Schwartz. Fast probabilistic algorithms for verification of polynomial identities. JACM, 27(4):701–717, 1980.

[Shp02] A. Shpilka. Affine projections of symmetric polynomials. J. of Computer and System Sciences, 65(4):639–659, 2002.

[Shp07] A. Shpilka. Interpolation of depth-3 arithmetic circuits with two multiplication gates. In Proceedings of the 39th Annual STOC, pages 284–293, 2007.

[SV08] A. Shpilka and I. Volkovich. Read-once polynomial identity testing. In Proceedings of the 40th Annual STOC, pages 507–516, 2008.

[SW01] A. Shpilka and A. Wigderson. Depth-3 arithmetic circuits over fields of characteristic zero. Computational Complexity, 10(1):1–27, 2001.

[VSBR83] L. G. Valiant, S. Skyum, S. Berkowitz, and C. Rackoff. Fast parallel computation of polynomials using few processors. SIAM J. on Computing, 12(4):641–644, November 1983.

[Zip79] R. Zippel. Probabilistic algorithms for sparse polynomials. In Symbolic and algebraic computation, pages 216–226. 1979.

A Toolbox

In this section we present several algorithms that were used throughout the paper. These algorithms are not directly related to the paper’s topic and were thus chosen to appear in the appendix.

A.1 Finding Linear Factors In several sections of the paper we encounter the following problem: Given black box oracle access to a circuit computing a t-variate polynomial f of degree d over a field F, we would like to find all the linear factors of f. To solve this problem we use an algorithm for black box factoring of a multivariate polynomial devised in [Kal85, KT90, Kal95]. The algorithm outputs black boxes to the irreducible factors of the polynomial and their multiplicities. It requires that the field we are working with is of a large enough size (Ω(d5)). We assume that if the field is not large enough, we are allowed to make queries from an extension field. The following lemma gives the requirements and results of the algorithm:

Lemma A.1. Let d, t be integers and f be a polynomial of t variables and of degree d over the field F. 5 Assume that |F| = Ω(d ). Then there is a randomized algorithm that gets as input a black box access to f, and the parameters t and d, and outputs, in poly(t, d, log |F|) time, with probability 1 − exp(−t)

45 19 black boxes to all the linear factors, with their multiplicities , of f. The algorithm uses O(t log |F|) random bits. In order to find the linear factors of a given black box polynomial, we check which factors are linear functions and reconstruct them. Namely, we interpolate each factor as a linear function (To interpolate a linear function of t indeterminates we must query the polynomial in t + 1 different points). We then use a polynomial identity test to check whether the factor is identical to the linear function. The PIT test we use is the “classic” randomized PIT test based on the Schwartz-Zippel lemma [Sch80, Zip79] requiring O(t log(d)) random bits and one query. In our setting t is relatively small so we can go over all choices for random bits to get a deterministic algorithm.

Lemma A.2. Let d, n be integers and f be a polynomial of t variables and of degree d. Let F be a 5 finite field such that |F| = Ω(d ). Then there is a deterministic algorithm that gets as input a black O(t) box access to f, and the parameters t and d, and outputs, in |F| time all the linear factors, with their multiplicities, of f. We note that while the original algorithm of Lemma A.1 does not necessarily output all the correct multiplicities and all the correct linear functions (in case that the characteristic of the field is a factor of the multiplicity), there is an easy way of taking care of that when the linear factors is all that we care about. Indeed, there is a simple algorithm that given black box access to gpi outputs both g and pi, when g is a linear function, using queries from a polynomial size extension field (the algorithm is randomized, but when t is small we can use its brute force version).

A.2 Brute Force Interpolation Throughout the paper we construct circuits computing polynomials for which we have black box access. In definition 2.19 we presented the notion of the default circuit for a polynomial f. Algorithm 11 shows how to construct this circuit. Namely, we solve the following problem: Let f be a t-variate polynomial over the field F of degree d. Given a black box computing f, we would like to construct the circuit Cf . Algorithm 11 solves the stated problem. Since we use brute force techniques the running time of the algorithm is exponential in t. However, for our applications, the number of indeterminates is relatively small. That is, t is substantially smaller than d and |F|. The methods we use take these relations into consideration (that is, the running time is exponential in t but not in |F| nor d). We start by finding the linear factors of f using the methods described in section A.1. We then construct sim(Cf ) via brute force interpolation of f/Lin(f) according t to all possible bases for F . Lemma A.3. Let f be a t-variate polynomial over a field F. Algorithm 11, given a black box holding O(t2) f as input (as well as t and deg(f)), outputs the circuit Cf in |F| time. Proof. We first prove the correctness of the algorithm. The linear functions we find in Step 1 are the linear functions of Lin(f). By definition, gcd(Cf ) = Lin(f) so the output of gcd(Cf ) is correct. By the choice of r it follows that sim(cf ) is a polynomial in exactly r liner functions, and clearly the polynomial h(L1,...,Lr) equals to sim(Cf ). Since L1,...,Lr are the default basis, it follows that h(L1,...,Lr) ≡ sim(Cf ). We now analyze the running time of the algorithm: The first step finds all the linear factors of f. O(t) Lemma A.2 states that this can be done deterministically in |F| time. In Step 2, at each iteration

i 19In fact, the basic algorithm only guarantees that if p is the characteristic of the field and gp ·e is a factor of f (where i e is not divisible by p), then the black-box corresponding to g that is outputted by the algorithm, holds gp and its multiplicity is e.

46 Input: t, d ∈ N and a black box holding a t-variate polynomial f of degree bounded by d. Output: The circuit Cf = gcd(Cf ) · sim(Cf ). 1 Find the linear factors of f. Output their product as gcd(Cf ); H 2 foreach t linearly independent linear functions {Li}i∈[t], in the variables x¯ = (x1, . . . , xt) do 3 Represent f/Lin(f) as a polynomial in the Li’s That is, look for a polynomial h˜ such that h˜(L1,...,Lr) = f, for some r ≤ t; 4 end 20 5 For some set {Li} for which the polynomial depends on a minimal number of linear functions (any such basis will do), denote the polynomial as hˆ and the r linear functions on which it depends on as Lˆ1,..., Lˆr; 6 Let L1,...,Lr be the linear functions forming the default basis of span1{Lˆ1,..., Lˆr}. Let h be the polynomial such that hˆ(Lˆ1,..., Lˆr) = h(L1,...,Lr); 7 Set sim(C) as h(L1,...,Lr); Algorithm 11: Brute force interpolation we interpolate a polynomial of degree bounded by d with t inputs. This requires21 dO(t) time. The t O(t2) number of iterations is the number of bases for F . It can easily be shown that this number is |F| . The time required by Step 6 is linear in the size of the description of the polynomial hˆ. Hence, it O(t) requires d time. To Conclude, by assuming that |F| ≥ d we have that the total running time is O(t2) |F| .

A.3 Reconstructing Linear Functions In [Shp07] a method for efficiently reconstructing ΣΠΣ(2) circuits was given. One of the algorithms presented there reconstructs a multiplication gate when given its restrictions to several co-dimension 1 subspaces on which it does not vanish. We use this algorithm in our paper for a similar purpose (we reconstruct the linear factors of a generalized multiplication gate). Parts 2 and 3 of algorithm 6 in [Shp07] reconstruct a set of linear functions given their restrictions to several co-dimension 1 subspaces. The following lemma summarizes the results of the algorithm needed for our paper:

Lemma A.4. (implicit in [Shp07]) Let L be a (multi) set containing d linear functions in n indeter- H minates. Let {ϕ1, . . . , ϕm} be a set of linearly independent linear functions such that m ≥ 100 log(d). For each j ∈ [m] define the (multi) set

∆  Lj = `|ϕj =0 ` ∈ L .

m Then there exists a deterministic algorithm that given {Lj}j=1 outputs L in poly(n, d) time.

A.4 A Deterministic PIT Algorithm for Depth-3 Circuits In [KS08] a deterministic black-box PIT algorithm for ΣΠΣ(k, d, ρ) circuits was given. That is, [KS08] give a deterministic algorithm (Algorithm 1 in [KS08]) that verifies, in quasi-polynomial time, whether a black-box ΣΠΣ(k, d, ρ) circuit C computes the zero polynomial. The following lemma gives the exact analysis of their algorithm.

21In order to use a simple interpolation of a polynomial of t inputs and degree d we may use Lagrange’s formula which gives a description of size dO(t) of the polynomial.

47 Lemma A.5. (Lemma 4.10 of [KS08]) Let C be an n-variate ΣΠΣ(k, d, ρ) circuit. Then there exist a deterministic algorithm that, given black box access to C, verifies whether C computes the zero polynomial. The running time of the algorithm is  kd  R(k, d, ρ) + 2  n + 2k + 1 · (d + 1)R(k,d,ρ) = 2 2

 2  n · exp 2O(k ) · (log d)k−1 + kρ log d .

When C is a multilinear ΣΠΣ(k) circuit [KS08] obtain an ever better result. Lemma A.6. (Lemma 5.8 of [KS08]) Let C be an n-variate ΣΠΣ(k) multilinear circuit. Then there exist a deterministic algorithm that, given black box access to C, verifies whether C computes the zero 2 polynomial. The running time of the algorithm is n2O(k ) .

B Proof of Lemma 4.9

In this section we give the proof of Lemma 4.9. The proof relies on the following lemma, which is a small generalization of claim 4.8 of [DS06], that regards the rank of ΣΠΣ(k, d) circuits when restricted to many co-dimension 1 subspaces. Intuitively, the lemma shows that if L is a set of linear functions such that for every ϕ ∈ L we have that ∆(C) drops when restricted to the codimension 1 subspace defined by ϕ = 0, then the set L is not too large.

Pk 22 Lemma B.1. Let C = i=1 Mi be a simple and minimal ΣΠΣ(k, d) circuit such that k ≥ 3, d ≥ 4. + H Let χ ∈ N and L be a set of m linearly independent linear functions such that for some A ⊆ [k], the following holds for every ϕ ∈ L:

• For every i ∈ [k], it holds that i ∈ A if and only if ϕ∈ / Mi (in particular, ϕ is a linear factor of gcd(C[k]\A)).

∆(CA) • ∆(CA|ϕ=0) < 2χ·log(d) .

Assume that ∆(CA) ≥ 4k. Then ∆(C ) |L| = m ≤ max{2k log(dk) + 2k, A }. χ The proof is almost identical to the proof of claim 4.8 in [DS06] and is postponed to Appendix B.1. We continue with generalizations of the claim that is needed for the proof of Lemma 4.9.

Lemma B.2. Let C be an n-variate ΣΠΣ(k, d) circuit over the field F such that k ≥ 3, d ≥ 4. Let 0 + 0 A ⊆ [k] and χ, r ∈ N be such that χ > 1 and 4k ≤ r ≤ ∆(CA). Let L be a set of m linearly independentH linear functions. Assume that the following holds for every ϕ ∈ L:

• For every i ∈ [k], it holds that i ∈ A if and only if ϕ∈ / Mi (in particular, ϕ is a linear factor of gcd(C[k]\A)).

r0 • ∆(CA|ϕ=0) < 2χ·log(d) .

22If d < 4 then the reconstruction problem is trivial using simple interpolation so we assume that d ≥ 4. If k < 3 then the lemma is trivial as it concerns the ∆ function of a non-empty subcircuit of C which must consist of a single multiplication gate.

48 Then r0 |L| = m ≤ max{2k log(dk) + 2k, }. χ 0 Proof. We prove the claim by induction on ∆(CA) − r . The basis of the induction is when ∆(CA) − 0 0 r = 0, or r = ∆(CA). In this case the claim follows directly from Lemma B.1. So assume that 0 r0 ∆(CA) − r > 0. For a contradiction we assume that m > max{2k log(dk) + 2k, χ }. Note that as 0 0 0 ∆(CA) − r > ∆(CA) − (r + 1) ≥ 0 the induction hypothesis holds for ∆(CA) − (r + 1) and so r0+1 r0 0 m ≤ max{2k log(dk) + 2k, χ }. In particular this implies that m = χ + 1. As m < r ≤ ∆(CA) there is some linear function in Lin(sim(CA)) that is not in span1(L). Assume w.l.o.g. that this linear function is x1. We now would like to apply the induction hypothesis on the circuit C|x1=α for some α ∈ F (or for α from an extension field of F). Assume that for some α ∈ F, the following properties hold.

• The linear function x1 − α does not appear in the circuit C. That is, x1 − α 6∈ Lin(C).

• For every ϕ ∈ L, the linear function ϕ|x1=α does not appear in any multiplication gate of

CA|x1=α. I.e. ϕ|x1=α 6∈ Lin(CA|x1=α).

• The circuit sim(CA)|x1=α is simple. We now continue the proof assuming that we have an α that satisfies the above properties, and later we shall prove that such an α indeed exists (it will be a corollary of Lemma B.3). In order to apply the induction hypothesis on the circuit C|x1=α we now show that the conditions of the lemma are satisfied. 0 ∆ 0 Let C = C|x1=α. As x1 − α∈ / Lin(C), we have that C is a ΣΠΣ(k) circuit (and not a ΣΠΣ(k − 1) circuit, since no multiplication gate vanished). By the assumptions that sim(CA)|x1=α is simple and 0 that x1 ∈ Lin(sim(CA)) it follows that ∆(CA|x1=α) = ∆(CA)−1. Let L = {ϕ|x1=α | ϕ ∈ L}. From the 0 H assumption that x1 ∈/ span1(L) we get that linear functions in L are linearly independent . Consider some ϕ ∈ L. By the choice of α we have that ϕ|x1=α ∈/ Mi|x1=α for any i ∈ A. In addition, as ϕ ∈ Mi, 0 0 for i∈ / A, we have that ϕ|x1=α ∈ Mi|x1=α for all i∈ / A. Hence, the following holds for every ϕ ∈ L : 0 0 0 • For each i ∈ [k], it holds that i ∈ A if and only if ϕ ∈/ Mi (where Mi is the i’th multiplication

gate of C|x1=α). 0 0 • Let ϕ ∈ L be the linear function such that ϕ = ϕ|x1=α. Then, ∆((C|x1=α)A|ϕ =0) = ∆(CA|ϕ=0,x1=α) ≤ r0 ∆(CA|ϕ=0) < 2χ·log(d) . 0 0 0 Since r ≤ ∆(CA) − 1 = ∆(CA|x1=α) we may apply the induction hypothesis on C|x1=α, L , A, r and χ to conclude23 that r0 m = |L| = |L0| ≤ max{2k log(dk) + 2k, }. χ r0 However, this contradicts the (contradiction) assumption that m = χ + 1. Therefore in order to conclude the proof of Lemma B.2 we only need to prove the existence of an appropriate α ∈ F (or is some small extension field of F). The following lemma is stronger than what we currently need, however, we state it in its stronger form as it will later be needed. ∆ Pk ˆ ∆ Lemma B.3. Let C = i=1 Cfi be an n-variate ΣΠΣ(k, d, ρ) circuit over a field F. Let C = Pk n i=1 gcd(Cfi ). Let V0 ⊆ F be a subspace of co-dimension r˜. Let A ⊆ [k] and let L be a set of linearly independentH linear functions with the following properties.

23 We did not make sure that deg(C|x1=α) ≥ 4. However, if this is not the case, we may add a new indeterminate y 4 4 and work with the circuit C|x1=α · y (that is, multiply every multiplication gate by y ).

49 k  • For every ϕ ∈ L, we have that ϕ∈ / span1 ∪i=1Lin (sim(Cfi )) .

• For every ϕ ∈ L and i ∈ [k], it holds that i ∈ A if and only if ϕ is not a factor of Cf (i.e. ϕ is P i a factor of gcd( i6∈A Cfi )).

Then there exists an affine subspace V = V0 + v0 (V is a shift of V0) with the following properties. 1. No linear function ` ∈ Lin(Cˆ) vanishes when restricted to V .

2. For every ϕ ∈ L and ` ∈ ∪i∈A gcd(Cfi ) it holds that either ϕ|V  `|V or ϕ|V is a constant function. ˆ 3. For every pair of linear function `1  `2 ∈ Lin(C), we have that either `1|V and `2|V are constants or that `1|V  `2|V . 0 n 0 0 Proof. Let V be a subspace of dimensionr ˜ such that F = V0 ⊕ V . Let {ψi}i∈[r0] be a basis for V . r˜ Pr˜ For everyu ¯ = (u1, . . . , ur˜) ∈ F let Vu¯ = V0 + i=1 ui · ψi, be the affine shift of V0 corresponding tou ¯. We now show that there exist anr ˜-variate polynomial p, such that if Vu¯ does not satisfy the conclusion of the lemma then p(¯u) = 0. The following lemma of [KS08] shows the existence of an r˜-variate polynomial possessing similar properties. Lemma B.4. (implicit in the proof of Lemma 4.3 of [KS08]) Let L˜ be a set of n-variate linear n functions over the field . Let V ⊆ be a subspace of co-dimension r˜. Let {V } r˜ be the different F F u¯ u¯∈F |L|˜  ˜ shifts of V . Then there exists an r˜-variate non-zero polynomial h, of degree at most 2 + |L|, such r˜ that for every u¯ ∈ F it holds that if h(¯u) 6= 0 then 1. For any two linear functions ` `0 ∈ L˜ we have that either `| `|0 or both functions become  Vu¯  Vu¯ constant when restricted to Vu¯. ˜ 2. For any ` ∈ L we have that `|Vu¯ 6= 0. We proceed with the proof of Lemma B.3. Let h˜ be ther ˜-variate polynomial guaranteed by ˜ ∆ ˆ ˜ Qk Lemma B.4, for L = Lin(C) ∪ L. Let h = h · i=1 sim(Cfi ). r˜ |L|˜  Letu ¯ ∈ F be a vector such that h(¯u) 6= 0 (note that if |F| > 2 ≥ deg(h) then suchu ¯ exists, otherwise we move to a slightly larger extension field). We now show that for thisu ¯, the affine space Vu¯ has the required properties. It is clear that Vu¯ satisfies Properties 1 and 3 of the lemma. In order to show that the Property 2 is also satisfied we assume w.l.o.g.24 that there is at least one linear function in L that is not constant on V0. ¿From the assumption of the lemma we get that every ϕ ∈ L and every ` ∈ ∪i∈A gcd(Ci) satisfy `  ϕ. Therefore, by the choice ofu ¯ we have that Property 2 also holds. This completes the proof of Lemma B.3.

To complete the proof of Lemma B.2, setr ˜ = 1 and let V0 be the co-dimension 1 subspace on which x1 = 0. Thus, there exist a field element α such that V = V0 + α · (1, 0,..., 0) is the affine shift guaranteed by Lemma B.3. It is clear that restricting C to V amounts to substituting x1 = α. We now notice that the three properties of Lemma B.3, that are satisfied by V , imply the corresponding three properties that α should satisfy. This completes the proof of Lemma B.2.

The next lemma is a generalization of Lemma B.2 for ΣΠΣ(s, d, r) circuits.

24 Notice that if it is not the case then any Vu¯ satisfies the second requirement of the lemma.

50 + ∆ Ps Lemma B.5. Let s, d, r ∈ N and let C = i=1 Cfi be an n-variate ΣΠΣ(s, d, r) circuit of degree d 0 + 0 over a field F. Let A ⊆ [s] and r , χ ∈ N such that χ > 1 and and 4s ≤ r ≤ ∆(CA) − s · r. Let L be a set of m linearly independentH linear functions. Assume that the following holds for every ϕ ∈ L:

• For every i ∈ [s], we have that i∈ / A if and only if ϕ divides fi.

s • ϕ∈ / span1 (∪i=1Lin (sim(Cfi ))).

r0 • ∆(CA|ϕ=0) < 2χ·log(d) . Then r0 m ≤ max{2s log(ds) + 2s, } + r · s. χ Proof. The proof is by a reduction to the proof of Lemma B.2. The idea is to first restrict the circuit to a subspace on which all the non linear terms of the multiplication gates are set to constants. That is, after the restriction we are left with a ΣΠΣ(s, d) circuit. Then we use Lemma B.2 to get the required ˆ Ps result. Let C = i=1 gcd(Cfi ). Set

∆ s r˜ = dim1 (span1 (∪i=1Lin (sim(Cfi )))) ≤ s · r.

Assume w.l.o.g. that the liner functions in sim(Cfi ) are homogeneous, i.e. have no constant term. Let s V0 be the co-dimensionr ˜ subspace in which the linear functions of span1 (∪i=1Lin (sim(Cfi ))) vanish. 0 n 0 Similarly to the proof of Lemma B.3 we take V be a subspace of dimensionr ˜ such that F = V0 ⊕ V . 0 r˜ Pr˜ Let {ψi}i∈[r0] be a basis for V . For everyu ¯ = (u1, . . . , ur˜) ∈ F let Vu¯ = V0 + i=1 ui · ψi, be the affine shift of V0 corresponding tou ¯. We now show that there exists an affine subspace V , that is a shift of V0, having the following properties:

• No linear function ` ∈ Lin(Cˆ) vanishes when restricted to V . ˆ • For every ϕ ∈ L and ` ∈ CA it holds that either ϕ|V  `|V or ϕ|V is a constant function. ˆ • For every pair of linear function `1  `2 ∈ Lin(C), we have that either `1|V and `2|V are constants or `1|V  `2|V .

• For every i ∈ [s] it holds that sim(Cfi )|V 6= 0. Indeed, in the proof of Lemma B.3 we get that there exists a non zeror ˜-variate polynomial h such that if h(¯u) 6= 0 then the affine space Vu¯ satisfies the first three requirements above (we use the 0 ∆ Qs notations of the proof of Lemma B.3). Consider the polynomial h (¯u) = i=1 sim(Cfi )(¯u). Clearly, if 0 h (¯u) 6= 0 then for every i ∈ [s] it holds that sim(Cfi )|Vu¯ 6= 0. Letu ¯ be such that h(¯u) · h(¯u) 6= 0 (if 0 |F| > deg(h) + deg(h ) then suchu ¯ exists, otherwise we pick it from an extension field of F). It follows ∆ s that V = Vu¯ is the required affine shift of V0. We also note that as span1 (∪i=1Lin (sim(Cfi ))) vanishes on V0, then for anyy ¯ ∈ Vu¯ it holds that sim(Cfi )(¯y) = sim(Cfi )(¯u), for every i ∈ [s]. In particular, for every i ∈ [s] we have that sim(Cfi )|V is a nonzero constant. Let

∆ L|V = {ϕ|V | ϕ ∈ L}.

H The linear functions of L|V are not necessarily linearly independent . However,

dim1(span1(L|V )) ≥ dim1(span1(L)) − (n − dim(V )) = m − r˜ ≥ m − s · r.

51 0 H 0 Let L ⊆ L|V be some set of linearly independent linear functions such that |L | ≥ m − s · r. We 0 would like to use the result of Lemma B.2 with regards to the set L , the circuit C|V , the set A and the integers χ and r0. We now prove that the requirements of Lemma B.2 are satisfied. We first notice that C|V is indeed an n-variate ΣΠΣ(s, d, 0) = ΣΠΣ(s, d) circuit (indeed each sim(Cfi ) is a non zero 0 constant when restricted to V ). We now prove that 4s ≤ r ≤ ∆((C|V )A). Note, that sim(CA)|V is a ˆ simple circuit, as otherwise there should exist two linear functions `1  `2 ∈ Lin(C) such that `1|V is not a constant function and `1|V ∼ `2|V , which is in contradiction to the choice of V . Therefore we get that sim(CA|V ) = sim(CA)|V . It follows that

∆(CA|V ) = rank(sim(CA|V )) = rank(sim(CA)|V ) ≥

rank(sim(CA)) − (n − dim(V )) = ∆(CA) − r˜ ≥ ∆(CA) − r · s. Hence, 0 4s ≤ r ≤ ∆(CA) − s · r ≤ ∆((C|V )A). We now prove that L0 satisfies the requirements of the lemma . Let ϕ0 be some linear function in L0. 0 0 Let ϕ ∈ L such that ϕ|V = ϕ . Let i ∈ [s]. If i∈ / A then ϕ is a factor of fi and thus ϕ is a factor of fi|V . If i ∈ A then ϕ is not a factor of fi. Hence, for every liner function ` ∈ gcd(Cfi ) we have that 0 0 `  ϕ. It follows from the properties of V that since ϕ is not constant, then ϕ = ϕ|V  `|V . Since gcd(Cfi )|V = β · Cfi|V for some 0 6= β ∈ F (as the non-linear term of Cfi is restricted to a non-zero 0 0 constant in V ), we have that ϕ is not a factor of fi|V . To show that the second requirement of L holds we notice that r0 ∆((C| ) | 0 ) ≤ ∆(C | ) < . V A ϕ =0 A ϕ=0 2χ · log(d) We thus have that the requirements of Lemma B.2 are satisfied and thus

r0 m − r · s ≤ |L0| ≤ max{2s log(ds) + 2s, }. χ This completes the proof of Lemma B.5.

We now give the final lemma before the proof of Lemma 4.9. The difference from Lemma B.5 is that we only require that the rank of some subcircuit of CA is reduced when restricted to the subspace on which ϕ = 0.

+ ∆ Ps Lemma B.6. Let d, r, s ∈ N and let C = i=1 Cfi be an n-variate ΣΠΣ(s, d, r) circuit over a field 0 + 0 F. Let A ⊆ [s] such that |A| ≥ 2. Let r , χ ∈ N be such that χ > 1 and for every subset A ⊆ A of 0 0 H size |A | = 2, we have that 4s ≤ r ≤ ∆(CA0 ) − s · r. Let L be a set of m linearly independent linear functions. Assume that the following holds for every ϕ ∈ L:

• For every i ∈ [s], it holds that ϕ is a factor of fi if and only if i∈ / A.

s • ϕ∈ / span1(∪i=1Lin(sim(Cfi ))).

0 0 r0 • There exists some subset A ⊆ A of size |A | = 2 such that ∆(CA0 |ϕ=0) < 2χ·log(d) . Then |A|  r0  m ≤ max{2s log(ds) + 2s, } + r · s . 2 χ

52 0 Proof. For each A ⊆ A Define LA0 as the set of linear functions in L for which: r0 ∆(C 0 | ) < . A ϕ=0 2χ · log(d)

According to Lemma B.5 when applied on the circuit C([s]\A)∪A0 , the set of linear functions LA0 , the subset A0, and the integers χ and r0, we have that

r0 |L 0 | ≤ max{2s log(ds) + 2s, } + r · s A χ Thus, X |A|  r0  |L| ≤ |L 0 | ≤ max{2s log(ds) + 2s, } + r · s . A 2 χ A0⊆A, |A0|=2

We now give the proof of Lemma 4.9. For convenience we repeat it:

∆ Ps ˆ Lemma. Let k, s, d, r ∈ N be such that s ≤ k and let C = i=1 Cfi be a ΣΠΣ(s, d, r) circuit. Let L be H 0 a set of linearly independent linear functions and A ⊆ [s] be a set of size |A| ≥ 2. Let χ, r ∈ N be such 0 0 0 0 P that χ > 1 and r satisfies that for every A ⊆ A of size |A | = 2 we have that r ≤ ∆( i∈A0 Cfi )−r ·k. Assume that for each ϕ ∈ Lˆ, the following holds:

• For every i ∈ [s], ϕ is a factor of fi if and only if i∈ / A.

• For every i ∈ [s], ϕ∈ / Lin(sim(Cfi )).

r0 • ∃i 6= j ∈ A, such that ∆(Cfi|ϕ=0 ,Cfj |ϕ=0 ) < 2·χ·log(d) . Then |A|  r0  |L|ˆ ≤ · max{2k log(dk) + 2k, } + r · k . 2 χ

25 Proof. Note that if for every i ∈ A and ϕ ∈ L, it holds that C(fi|ϕ=0) = (Cfi )|ϕ=0, then the claim follows directly from Lemma B.6. So let i ∈ A and ϕ ∈ L. Let V be the co-dimension 1 subspace on which ϕ = 0. Since ϕ∈ / Lin(sim(Cfi )), we have that rank(sim(Cfi )|V ) = rank(sim(Cfi )). In particular,

V is ∆(Cfi )-rank preserving for sim(Cfi ). Furthermore, as i ∈ A we have that ϕ is not a factor of fi.

Namely, fi|V 6= 0. Hence, by Lemma 4.14 we have that Cfi |V = Cfi|V . This completes the proof of Lemma 4.9.

B.1 proof of Lemma B.1 In this section we prove Lemma B.1. The following notations are required for the proof. The proof is almost identical to the proof of Claim 4.8 of [DS06], however in order to make the reading easier we give the complete proof here instead of pointing out the required changes.

25 The first circuit is the default circuit for C(fi|ϕ=0) and the second circuit is the circuit resulting from the restriction of the default circuit of fi to the subspace on which ϕ = 0.

53 B.1.1 Notations ˆ ∆ ∆ For convenience, we refer to sim(CA) as Cˆ and define k = |A| andr ˆ = rank(Cˆ). As in [DS06] we assume w.l.o.g.26 that the linear functions appearing in the circuit are homogenous, and that all the multiplication gates have the same degree, dˆ. Let

kˆ kˆ dˆ X X Y Cˆ = Ni = Li,j, i=1 i=1 j=1

ˆ ˆ ˆ ˆ ˆ ˆ where N1,..., Nkˆ are the multiplication gates of C. It will be very convenient to think of C, N1,..., Nkˆ as sets of indices. That is, we shall abuse notations and write

Cˆ = {(i, j) | i ∈ [kˆ], j ∈ [dˆ]},

Nˆi = {(i, j) | j ∈ [dˆ]}. For a set H ⊆ Cˆ, we denote with rank(H) the dimension of the vector space spanned by the linear functions whose indices appear in H. That is

rank(H) , dim (span1{Lij :(i, j) ∈ H}) . For the rest of the proof we will treat subsets of Cˆ interchangeably as sets of indices and as (multi)sets of linear functions. We also assume w.l.o.g. that L = {x1, . . . , xm}. We would next like to define, for each t ∈ [m], certain subsets of Cˆ that capture the structure of ˆ ˆ C|xt=0. Fix some t ∈ [m], and consider what happens to C when we set xt to zero. The resulting ˆ circuit C|xt=0 is generally not simple, and can therefore be partitioned into two disjoint sets: A set ˆ containing the indices of the linear functions appearing in gcd(C|xt=0), and a set containing the indices ˆ of the remaining linear functions (these are the linear functions appearing in sim(C|xt=0)). To be more ˆ ˆ precise, denote by δt the degree of gcd(C|xt=0). In every multiplication gate Ni, there are δt linear functions that the restriction of their product to the linear space defined by the equation xt = 0, is ˆ ˆ equal to gcd(C|xt=0). In other words, the product of these δt linear functions is equal to gcd(C) under i i ˆ i the restriction xt = 0. Denote the set of indices of these functions by Gt, and let Rt , Ni\Gt be the set of indices of the remaining linear functions of this multiplication gate. We thus have (for some kˆ choice of constants {ci}i=1) kˆ ˆ X Y sim(C|xt=0) = ci (Lij|xt=0), i=1 i (i,j)∈Rt and, ˆ ˆ Y ∀i ∈ [k] , gcd(C|xt=0) = (Lij|xt=0). i (i,j)∈Gt

Skˆ i Skˆ i We now define, for each t ∈ [m], the sets Rt , i=1 Rt, and Gt , i=1 Gt. The following claim summarizes some facts that we will later need.

Claim B.7. For every t ∈ [m] :

1. Rt ∩ Gt = ∅. 26This can be done by adding another indeterminate, y, to the circuit as is done in Lemma 3.5 of [DS06]. This will not have any affect on the claim and proof, beside saving some additional notations.

54 2. Cˆ = Rt ∪ Gt.

i i0 0 3. |Gt| = |Gt | for all i, i . ˆ ˆ ˆ 4. |Gt| = k · deg( gcd(C|xt=0) ) = k · δt. ˆ 5. Rt contains the indices of the linear functions appearing in sim(C|xt=0). ˆ 6. rt = rank(sim(C|xt=0)) = rank(Rt).

i i Proof. Items (1) and (2) follows directly from the definition of Rt and Gt as Rt and Gt give a partition ˆ ˆ of the indices in Ni. Items (3) and (4) hold as the linear factors of gcd(C|xt=0) belong to all the i ˆ ˆ multiplication gates. By definition, Rt is the set of linear functions in Ni that belong to sim(C|xt=0), ˆ which implies item (5). Finally, by definition rt = rank(sim(C|xt=0)), and by (5) we have that Rt is ˆ the set of linear functions appearing in sim(C|xt=0).

B.1.2 The Proof We finally give the proof of Lemma B.1. For the readers convenience we restate it here, rephrased to fit the new notations.

Pk Lemma Let C = i=1 Mi be a simple and minimal ΣΠΣ(k, d) circuit such that k ≥ 3, d ≥ 4. Let + χ, m ∈ N . Let A ⊆ [k] be such that:

• For each i ∈ [k] and t ∈ [m], it holds that i ∈ A if and only if xt ∈/ Mi. ˆ rˆ • For each t ∈ [m], it holds that rt = rank(sim((C|xt=0))) < 2χ·log(d) . Assume thatr ˆ ≥ 4k. Then rˆ m ≤ max{2k log(dk) + 2k, }. χ As a first step we assume for a contradiction that rˆ m > max{2k log(dk) + 2k, }. χ

Having defined, for each t ∈ [m], the sets Rt and Gt, we would now like to show that there exist a ’small’ (∼ log(d)) number of sets Rt, such that their union covers almost all of Cˆ. As rank(Cˆ) is relatively high, and for each t, rt = rank(Rt) is (assumed to be) relatively small, we will get a contradiction. We construct the cover step by step, in each step we will find an index t ∈ [m] such that the set Rt covers at least half of the linear functions not yet covered. This idea is made precise by the following claim.

Claim B.8. For every integer 1 ≤ q ≤ log(dˆ) there exist q indices t1, . . . , tq ∈ [m] for which

q [ R ≥ kˆdˆ(1 − 2−q). ts s=1 Proof. by induction on q: Base case q = 1: In order to prove the claim for q = 1, it is sufficient to show that there exists t ∈ [m] 1 ˆ ˆ 1 ˆ ˆ for which |Rt| ≥ 2 kd. Suppose, on the contrary, that for all t ∈ [m], |Rt| < 2 kd. Claim B.7 implies

55 1 ˆ ˆ ˆ ˆ that for all t ∈ [m], |Gt| ≥ 2 kd (as |Gt| + |Rt| = k · d). This in turn implies (by item (3) of Claim B.7) that for all t ∈ [m] 1 |G1| ≥ d.ˆ (9) t 2 Lemma B.9 is more general than what is required at this point, however, we will need it in its full generality when we handle q > 1.

Lemma B.9. Let C be a simple ΣΠΣ(k, d) circuit with n inputs. Let t ∈ [n], i0 ∈ [k]. Denote

δt = deg(gcd(C|xt=0)). Suppose that the linear functions in Ni0 are ordered such that

gcd(C|xt=0) = (Li0,1|xt=0)(x) · (Li0,2|xt=0)(x) · ... · (Li0,δt |xt=0)(x).

Then, there exist a matching, M = {P1,..., Pδt } ⊆ C × C, consisting of δt disjoint pairs of linear functions, such that for each j ∈ [δt]:

• The two linear functions in Pj span xt.

• The first element of Pj is Li0,j.

Proof. We can reorder the linear functions in each gate Ni, i 6= i0, such that

∀j ∈ [δt]: L1,j|xt=0 ∼ L2,j|xt=0 ∼ ... ∼ Li0,j|xt=0 ∼ ... ∼ Lk,j|xt=0.

As C is simple, it cannot be the case that, for some j, Li0,j appears in all the multiplication gates.

Therefore, for every j ∈ [δt] there exists an index α(j) ∈ [k], such that Li0,j 6∼ Lα(j),j. It follows that,

∀j ∈ [δt]: xt ∈ span1{Li0,j,Lα(j),j}.

For each j ∈ [δt] let Pj = (Li0,j,Lα(j),j). Set M = {P1,..., Pδt }. It is clear that each Pj satisfies the two conditions of the lemma, and that the Pj’s are disjoint. We continue with the proof of Claim B.8. From Equation 9 and Lemma B.9 we conclude that 1 ˆ for each t ∈ [m], there exists a matching Mt ⊂ C × C, containing at least 2 d disjoint pairs of linear functions, such that every pair in Mt spans xt. We now present a lemma, initially given in [DS06]. It in fact relates to a lower bound on the size of locally decodable codes of 2 queries. As the topic of locally decodable codes does not directly relate to this paper we do not elaborate on it any further. n Lemma B.10. [Corollary 2.9 of [DS06]] Let F be any field, and let a1, . . . , am ∈ F . For every i ∈ [n] let Mi ⊂ [m] × [m] be a set of disjoint pairs of indices (j1, j2) such that ei ∈ span1{aj1 , aj2 }. Then

n X |Mi| ≤ m log(m) + m. i=1

We proceed with the proof of Claim B.8. From Lemma B.10 we get that

m 1 X dmˆ ≤ δ · m ≤ |M | ≤ kˆdˆlog(kˆdˆ) + kˆdˆ 2 t t t=1 which gives m ≤ 2kˆ log(kˆdˆ) + 2kˆ < 2k log(kd) + 2k. This contradict the assumption that m > 2k log(dk) + 2k.

56 Induction step: Let us now assume that we have found q − 1 indices t1, . . . , tq−1 ∈ [m] for which

q−1 [ R ≥ kˆdˆ(1 − 2−(q−1)). ts s=1 Let

q−1 [ R , Rts , (10) s=1 ˆ S , C\R. (11) Then, by our assumption |S| ≤ kˆdˆ2−(q−1). (12) The proof goes along the same lines as the proof for q = 1: we show that there exists an index t ∈ [m], such that Rt covers at least half of S. We will argue that if such an index does not exists, then a contradiction to our assumption can be derived. Our main tools in doing so are Lemma B.9 and Lemma B.10. Claim B.11. There exists t ∈ [m], such that for all i ∈ [kˆ],

i ˆ −q |Gt ∩ S| < d2 .

Roughly, the lemma states that there exists some variable, xt, such that most of the linear functions ˆ in S do not belong to gcd(C|xt=0). In particular it implies that Rt covers a large fraction of S, as needed.

Proof. Assume, on the contrary, that for every t ∈ [m], there exists it ∈ [kˆ], for which

it ˆ −q |Gt ∩ S| ≥ d2 .

−q ¿From Lemma B.9 we get that, for every t ∈ [m], there exists a matching Mt, consisting of dˆ2 disjoint pairs of linear functions, such that each pair spans xt, and that the first element in each pair it ˆ is in Gt ∩ S (from the lemma we actually get that Mt contains deg(gcd(C|xt=0)) many pairs, but it we are only interested in the pairs whose first element is in Gt ∩ S). We would now like to apply Lemma B.10 on the matchings {Mt}t∈[m], however, we also need that all the linear functions in all the matchings belong to S. We achieve this by projecting all the functions in R to zero. As the dimension of the linear functions in R is small (by our assumption that each rts is small) we can find a linear transformation that sends the linear functions in R to zero but leaves many of the linear functions {xt} linearly independent. This is formalized in the next claim. m Claim B.12. There exists a subset B ⊂ [m], of size |B| = m − rank(R) ≥ 2 , and a linear transfor- n n mation π : F → F such that

• ker(π) = span1(R).

• ∀t ∈ B , π(xt) = xt.

57 Proof. Calculating, we get that

q−1 q−1 q−1 [ X X rank(R) = rank( Rts ) ≤ rank(Rts ) = rts s=1 s=1 s=1 rˆ rˆ m ≤ (q − 1) ≤ ≤ , (13) log(d)2χ 2χ 2 where the second inequality follows from the assumptions in Lemma B.1, the third inequality follows from the fact that q ≤ log dˆ≤ log d. The last inequality follows from our (contradiction) assumption. Let m0 = m − rank(R). From Equation 13 we get that m0 ≥ m/2. In particular, there exists a subset B ⊂ [m], of size |B| = m0, such that

span1({xt|t ∈ B}) ∩ span1(R) = {0}. n n Hence, there exists a linear transformation π : F → F such that

• ker(π) = span1(R).

• ∀t ∈ B , π(xt) = xt. This completes the proof of Claim B.12.

Let B be the set obtained from the claim above, and π the corresponding linear transformation. 0 0 We assume, w.l.o.g., that B = [m ]. From here on, we only consider variables xt such that t ∈ [m ] 0 0 0 0 0 (i.e. t ∈ B). Fix such t ∈ [m ], and let Mt = π(Mt). In other words, Mt = {(π(L), π(L ))}(L,L )∈Mt . Clearly, 0 ˆ −q |Mt| = |Mt| ≥ d2 . (14) 0 0 0 Note that the pairs in Mt still span xt, as for any pair (L, L ) ∈ Mt, with xt = αL + βL , we have that 0 0 xt = π(xt) = π(αL + βL ) = απ(L) + βπ(L ). Since all the linear functions appearing in R were projected to zero, we know that all the pairs in 0 27 0 each Mt are contained in the multiset S , {π(L): L ∈ S}. After this long preparation we apply 0 Lemma B.10 to the matchings Mt, and derive the following inequality: m0 X 0 0 0 0 |Mt| ≤ |S | log(|S |) + |S |. (15) t=1 As |S0| = |S| (remember that S0 is a multiset), we get by Equation 12 that |S0| ≤ kˆdˆ2−(q−1). (16) By Equations 14, 15 and 16, it follows that

m0 ˆ −q 0 ˆ −q X 0 0 0 0 m/2 · (d2 ) ≤ m · (d2 ) ≤ |Mt| ≤ |S | log(|S |) + |S | t=1 ≤ kˆdˆ2−(q−1) log(kˆdˆ2−(q−1)) + kˆdˆ2−(q−1). Hence, m ≤ kˆ log(kˆdˆ2−(q−1)) + kˆ < kˆ log(kˆdˆ) + kˆ which contradicts our (contradiction) assumption. This completes the proof of Claim B.11.

27 0 Note that we can replace each pair in Mt , that contains the zero vector, with a singleton.

58 Let us now proceed with the proof of Claim B.8. Take tq to be the index guaranteed by Claim B.11. We have that ˆ i ˆ −q ∀i ∈ [k]: |Gtq ∩ S| < d2 . In particular ˆ ˆ −q |Gtq ∩ S| < kd2 . ˆ Notice that by equations 10 and 11, and by the fact that Rtq and Gtq give a partition of C, we get Sq that the complement of s=1 Rts is exactly Gtq ∩ S. From this we get that adding Rtq to R gives

q [ R ≥ kˆdˆ(1 − 2−q). ts s=1 This completes the proof of the Claim B.8.

Having proved Claim B.8, we are now just steps away from completing the proof of Lemma B.1. ˆ Taking q to be blog(d)c in Claim B.8, we get that there exist indices t1, . . . , tblog(dˆ)c ∈ [m], such that

blog(dˆ)c [ ˆ ˆ ˆ Rts ≥ kd − 2k.

s=1 Thus, blog(dˆ)c  blog(dˆ)c ˆ [ X rˆ − 2k ≤ rank  Rts  ≤ rts . s=1 s=1 The last inequality tells us that there exists some t ∈ [m] for which

rˆ − 2kˆ rˆ − 2kˆ rt ≥ ≥ . (17) blog(dˆ)c log(d)

Sincer ˆ ≥ 4k and χ ≥ 1, we have that rˆ − 2k rˆ rˆ r ≥ ≥ ≥ . t log(d) 2 log(d) 2χ log(d)

This is in contradiction to the assumption in the statement of the lemma. This completes the proof of Lemma B.1.

59