Formalizing Randomized Matching Algorithms

Dai Tri Man Leˆ and Stephen A. Cook Department of Computer Science, University of Toronto {ledt, sacook}@cs.toronto.edu

Abstract—Using Jerˇabek’s´ framework for probabilistic rea- set to another. Using this method, Jerˇabek´ developed tools for soning, we formalize the correctness of two fundamental RNC2 describing algorithms in ZPP and RP. He also showed in [13], algorithms for bipartite within the theory VPV [14] that the theory VPV+sWPHP(L ) is powerful enough to for polytime reasoning. The first algorithm is for testing if a FP has a perfect matching, and is based on the formalize proofs of very sophisticated derandomization results, Schwartz-Zippel Lemma for polynomial identity testing applied e.g. the Nisan-Wigderson theorem [23] and the Impagliazzo- to the Edmonds polynomial of the graph. The second algorithm, Wigderson theorem [12]. (Note that Jerˇabek´ actually used the due to Mulmuley, Vazirani and Vazirani, is for finding a perfect single-sorted theory PV1+sWPHP(PV), but these two theories matching, where the key ingredient of this algorithm is the are isomorphic.) Isolating Lemma. In [15], Jerˇabek´ developed an even more systematic ap- proach by showing that for any bounded P/poly definable set, I.INTRODUCTION there exists a suitable pair of surjective “counting functions” 1 There is a substantial literature on theories such as PV, S2 , which can approximate the cardinality of the set up to a poly- VPV, V 1 which capture polynomial time reasoning [8], [4], nomially small error. From this and other results he argued [17], [7]. These theories prove the existence of polynomial convincingly that VPV + sWPHP(LFP) is the “right” theory time functions, and in many cases they can prove properties for reasoning about probabilistic polynomial time algorithms. of these functions that assert the correctness of the algorithms However so far no one has used his framework for feasible computing these functions. But in general these theories cannot reasoning about specific interesting randomized algorithms in prove the existence of probabilistic polynomial time relations classes such as RP and RNC2. such at those in ZPP, RP, BPP because defining the relevant In the present paper we analyze (in VPV) two such algo- probabilities involves defining cardinalities of exponentially rithms using Jerˇabek’s´ framework. The first one is the RNC2 large sets. Of course stronger theories, those which can define algorithm for determining whether a bipartite graph has a #P or PSPACE functions can treat these probabilities, but perfect matching, based on the Schwartz-Zippel Lemma [26], such theories are too powerful to capture the spirit of feasible [33] for polynomial identity testing applied to the Edmonds reasoning. polynomial [9] associated with the graph. The second algo- Note that we cannot hope to find theories that exactly cap- rithm, due to Mulmuley, Vazirani and Vazirani [21], is in the ture probabilistic complexity classes such as ZPP, RP, BPP function class associated with RNC2, and uses the Isolating because these are ‘semantic’ classes which we suppose are not Lemma to find such a perfect matching when it exists. Prov- recursively enumerable (cf. [29]). Nevertheless there has been ing correctness of these algorithms involves proving that the significant progress toward developing tools in weak theories probability of error is bounded above by 1/2. We formulate that might be used to describe some of the algorithms in these this assertion in a way suggested by Jerˇabek’s´ framework (see classes. Definition 1). This involves defining polynomial time functions Paris, Wilkie and Woods [24] and Pudlak´ [25] observed that from {0, 1}n onto {0, 1} × Φ(n), where Φ(n) is the set of we can simulate approximate counting in bounded arithmetic random bit strings of length n which cause an error in the by applying variants of the weak pigeonhole principle. It seems computation. We then show that VPV proves that the function unlikely that any of these variants can be proven in the the- is a surjection. ories for polynomial time, but they can be proven in Buss’s Our proofs are carried out in the theory VPV for polyno- theory S2 for the polynomial hierarchy. The first connection mial time reasoning, without the weak surjective pigeonhole between the weak pigeonhole principle and randomized algo- principle sWPHP(LFP). Jerˇabek´ used the sWPHP(LFP) prin- rithms was noticed by Wilkie (cf. [17]), who showed that ran- ciple to prove theorems justifying the above definition of error b 1 domized polytime functions witness Σ1-consequences of S2 + probability, but we do not need it to apply his definition. B 1 sWPHP(PV) (i.e., Σ1 -consequences of V + sWPHP(LFP) Many proofs concerning determinants are based on the La- in our two-sorted framework), where sWPHP(PV) denotes the grange expansion. Since our proofs in VPV can only use poly- surjective weak pigeonhole principle for all PV functions (i.e. nomial time concepts, we cannot formalize such proofs, and polytime functions). must use other techniques. In the same vein, the standard proof Building on these early results, Jerˇabek´ [13] showed that of the Schwartz-Zippel Lemma assumes that a multivariate we can “compare” the sizes of two bounded P/poly definable polynomial given by an arithmetic circuit can be expanded to sets within VPV by constructing a surjective mapping from one a sum of monomials. But this sum in general has exponentially B many terms so again we cannot directly formalize this proof function is Σ0 -definable from L if it is p-bounded and its B in VPV. graph is represented by a Σ0 (L)-formula. The theory V 0 for AC0 is the basis to develop theories II.PRELIMINARIES for small complexity classes within P in [7]. The theory V 0 2 A. Basic bounded arithmetic consists of the vocabulary LA and axiomatized by the sets of 2-BASIC axioms, which express basic properties of symbols The theory VPV for polynomial time reasoning used here 2 in LA, together with the comprehension axiom schema is a two-sorted theory described by Cook and Nguyen [7]. B 2  The two-sorted language has variable x, y, z, . . . ranging over Σ0 (LA)-COMP: ∃X ≤ y ∀z < y X(z) ↔ ϕ(z) , N and variables X,Y,Z,... ranging over finite subsets of N, where ϕ ∈ ΣB(L2 ) and X does not occur free in ϕ. 2 0 A interpreted as bit strings. Two sorted vocabulary LA includes In [7, Chapter 5], it was shown that V 0 is finitely axioma- the usual symbols 0, 1, +, ·, =, ≤ for arithmetic over N, the tizable and a p-bounded function is in FAC0 iff it is provably length function |X| for strings, the set membership relation total in V 0. A universally-axiomatized conservative extension ∈, and string equality =2 (subscript 2 is usually omitted). We V 0 of V 0 was also obtained by introducing function symbols will use the notation X(t) for t ∈ X, and think of X(t) as and their defining axioms for all FAC0 functions. th the t bit in the string X. In [7, Chapter 9], Cook and Nguyen showed how to asso- 2 The number terms in the base language LA are built from ciate a theory VC to each complexity class C ⊆ P, where VC the constants 0, 1, variables x, y, z, . . . and length terms |X| extends V 0 with an additional axiom asserting the existence using + and ·. The only string terms are string variables, but of a solution to a complete problem for C. General techniques 2 when we extend LA by adding string-valued functions, other are also presented for defining a universally-axiomatized con- string terms will be built as usual. The atomic formulas are servative extension VC of VC which has function symbols t = u, X = Y , t ≤ u, t ∈ X for any number terms x, y and defining axioms for every function in FC , and VC ad- and string variables X,Y . Formulas are built from atomic mits induction on open formulas in this enriched vocabulary. formulas using ∧, ∨, ¬ and ∃x, ∃X, ∀x, ∀X. Bounded number It follows from Herbrand’s Theorem that the provably-total quantifiers are defined as usual, and bounded string quantifier functions in VC (and hence in VC ) are precisely the func- ∃X ≤ t, ϕ stands for ∃X(|X| ≤ t ∧ ϕ) and ∀X ≤ t, ϕ stands tions in FC . Using this framework, Cook and Nguyen defined for ∀X(|X| ≤ t → ϕ), where X does not appear in term t. explicitly theories for various complexity classes within P. B 2 Σ0 is the class of all LA-formulas with no string quantifiers Since we need some basic linear algebra in this paper, we B and only bounded number quantifiers. Σ1 -formulas are those are interested in the two-sorted theory V #L and its univer- ~ ~ B of the form ∃X < t ϕ, where ϕ ∈ Σ0 and the prefix of sal conservative extension V #L from [6]. Recall that #L the bounded quantifiers might be empty. These classes are is usually defined as the class of functions f such that for B B extended to Σi (and Πi ) for all i ≥ 0, in the usual way. some nondeterministic logspace Turing machine M, f(x) is Two-sorted complexity classes contain relations R(~x, X~ ), the number of accepting computations of M on input x. Since where ~x are number arguments and X~ are string arguments. counting the number of accepting paths of nondeterministic In defining complexity classes using machines or circuits, the logspace is AC0-equivalent to powering, V #L was number arguments are represented in unary notation and the defined to be the extension of the base theory V 0 with an string arguments are represented in binary. The string argu- additional axiom stating the existence of powers Ak for every 0 ments are the main inputs, and the number arguments are matrix A over Z. The closure of #L under AC -reductions auxiliary inputs that can be used to index the bits of strings. is called DET. It turns out that computing the determinant In the two sorted setting, we can define AC0 to be the of integer matrices is complete for DET under AC0- reduc- class of relations R(~x, X~ ) such that some alternating Turing tions. In fact Berkowitz’s algorithm can be used to reduce the machine accepts R in time O(log n) with a constant number determinant to matrix powering. Moreover, V #L proves that of alternations. Then from the descriptive complexity charac- the function Det, which computes the determinant of integer terization of AC0, it can be shown that a relation R(~x, X~ ) is matrices based on Berkowitz’s algorithm, is in the language of 0 B ~ in AC iff it is represented by some Σ0 -formula ϕ(~x, X). V #L. Unfortunately it is an open question whether the theory Given a class of relations C, we associate a class FC of V #L also proves the cofactor expansion formula and other string-valued functions F (~x, X~ ) and number functions f(~x, X~ ) basic properties of determinants. However from results in [28] with C as follows. We require that these functions to be p- it follows that V #L proves that the usual properties of de- bounded, i.e., the length of the outputs of F and f is bounded terminants follow from the Cayley-Hamilton Theorem (which by a polynomial in x and |X|. Then we define FC to consist states that a matrix satisfies its characteristic polynomial). of all p-bounded number functions whose graphs are in C and In this paper, we are particularly interested in the theory all p-bounded string functions whose bit graphs are in C. VPV for polytime reasoning [7, Chapter 8.2] since we will B B We write Σi (L) to denote the class of Σi -formulas which use it to formalize all of our theorems. The universal theory 2 may have function and predicate symbols from L ∪ LA.A VPV is based on Cook’s single-sorted theory PV [8], which B string function is Σ0 (L)-definable if it is p-bounded and its bit was historically the first theory designed to capture polytime B graph is represented by a Σ0 (L)-formula. Similarly, a number reasoning. A nice property of PV (and VPV) is that their universal theorems translate into families of propositional tau- We use [X,Y ) to denote {Z ∈ Z | X ≤ Z < Y }, i.e., the tologies with polynomial size proofs in any extended Frege interval of integers between X and Y − 1. We also use the proof system. standard notation [n] to denote the set {1, . . . , n}. 0 The vocabulary LFP of VPV extends that of V with addi- Given a square matrix M, we write M[i | j] to denote the tional symbols introduced based on Cobham’s machine inde- (i, j) of M, i.e., the square matrix formed by removing pendent characterization of FP [5]. Let Z

then F ∈ LFP. proved within the theory T . We will often abuse the notation by letting LFP denote the set C. The weak pigeonhole principle of function symbols in L . FP The weak surjective pigeonhole principle for a function F , The theory VPV can then be defined to be the theory over sWPHP(F ) 0 denoted by , is the universal closure of the formula LFP whose axioms are those of V together with defining axioms for every function symbols in LFP. VPV proves the (n > 0 ∧ A > 0) → ∃Y < (n + 1)A ∀X < nA, F (X) 6= Y, scheme ΣB(L )-COMP and the following schemes 0 FP stating that F cannot map [0, nA) onto [0, (n + 1)A). Thus, B  the weak surjective pigeonhole principle for the class of VPV Σ0 (LFP)-IND: ϕ(0) ∧ ∀x ϕ(x) → ϕ(x + 1) → ∀xϕ(x) B  functions, denoted by sWPHP(LFP), is the schema Σ0 (LFP)-MIN: ϕ(y) → ∃x ϕ(x) ∧ ¬∃z < xϕ(z) B sWPHP(F ) | F ∈ L . Σ0 (LFP)-MAX: FP  ϕ(0) → ∃x ≤ y ϕ(x) ∧ ¬∃z ≤ y z > x ∧ ϕ(z) Note that this principle is believed to be weaker than the

B usual “strong” surjective pigeonhole principle stating that we where ϕ is any Σ0 (LFP)-formula. It follows from Herbrand’s cannot map [0,A) onto [0,A+1). For example, sWPHP(LFP) Theorem that the provably-total functions in VPV are precisely can be proven in the theory V 3 (the two-sorted version of the functions in L . ΣP FP Buss’s theory S3) for FP 3 reasoning (cf. [29]), but it is not Observe that VPV extends V #L since matrix powering can 2 known if the usual surjective pigeonhole principle for VPV easily be carried out in polytime, and thus all theorems of functions can be proven within the theory S V i for the V #L from [6], [28] are also theorems of VPV. From results in i≥1 polynomial hierarchy (the two-sorted version of Buss’s theory [28] it follows that VPV proves the Cayley-Hamilton Theorem, S := S Si in [4]). and hence the cofactor expansion formula and other usual 2 i≥1 2 properties of determinants of integer matrices. D. Jerˇabek’s´ framework for probabilistic reasoning 1 In our introduction, we mentioned V , the two sorted ver- We will give a brief and simplified overview of Jerˇabek’s´ 1 1 sion of Buss’s S2 theory [4]. Although V is also designed framework [13], [14], [15] for probabilistic reasoning within for polytime reasoning in the sense that the provably total VPV + sWPHP(LFP). For more complete definitions and re- 1 functions of V are FP functions, there is evidence showing sults, the reader is referred to Jerˇabek’s´ work. 1 1 that V is stronger than VPV. For example, the theory V Let F (R) be a VPV 0-1 valued function (which may have B B B proves the Σ1 -IND, Σ1 -MIN and Σ1 -MAX schemes while other arguments). We think of F as defining a relation on bi- B VPV cannot prove these Σ1 schemes assuming factoring is nary numbers R. Let Φ(n) = {R < 2n | F (R) = 0}. Observe   hard. We refer to [7, Section 8.2.1] for a more detailed com- that bounding the probability PrR<2n F (R) = 0 from above 1 parison between VPV and V . It is worth noting that, in this by the ratio s/t is the same as showing that t·|Φ(n)| ≤ s·2n. 1 paper, we do not use V to formalize our theorems since the More generally, many probability inequalities can be restated weaker theory VPV suffices for all our needs. as inequalities between cardinalities of sets. This is problem- atic since even for the case of polytime definable sets, it B. Notation follows from Toda’s theorem [30] that we cannot express their If ϕ is a function of n variables, then let ϕ(α1, . . . , αn−1, •) cardinalities directly using bounded formulas (assuming that denote the function of one variable resulting from ϕ by fixing the polynomial hierarchy does not collapse). Hence we need the first n − 1 arguments to α1, . . . , αn−1. We write ψ : X  an alternative method to compare the sizes of definable sets Y to denote that ψ is a surjection from X onto Y. without exact counting. The method proposed by Jerˇabek´ in [13], [14], [15] is based s, t be VPV terms. Then on the following simple observation: if Γ(n) and Φ(n) are  Pr n R ∈ Φ(n)] s/t definable sets and there is a function F mapping Γ(n) onto R<2 - Φ(n), then the cardinality of Φ(n) is at most the cardinality means that there exists a VPV function G(n, •) mapping the of Γ(n). Thus instead of counting the sets Γ(n) and Φ(n) di- set [s] × 2n onto the set [t] × Φ(n). rectly, we can compare the sizes of Γ(n) and Φ(n) by showing Since we are not concerned with justifying the above defini- the existence of a surjection F , which in many cases can be tion, our theorems can be formalized in VPV without sWPHP. easily carried out within weak theories of bounded arithmetic. In this paper, we will restrict our discussion to the case when III.EDMONDS’THEOREM the sets are bounded polytime definable sets and the surjections Let G be a bipartite graph with two disjoint sets of vertices are polytime functions, all of which can be defined within U = {u1, . . . , un} and V = {v1, . . . , vn}. We use a pair (i, j) VPV, since this is sufficient for our results. to encode the edge {ui, vj} of G. Thus, the edge relation of G The remaining challenge is then to formally verify that can be encoded by a En×n, where we define the definition of cardinality comparison through the use of (i, j) ∈ E, i.e., E(i, j) = 1, iff {ui, vj} is an edge of G. surjections is a meaningful and well-behaved definition. The Each perfect matching in G can be encoded by an n × n basic properties of surjections like “any set can be mapped M satisfying M(i, j) → E(i, j) for all onto itself” and “surjectivity is preserved through function i, j ∈ [n]. Recall that a permutation matrix is a square boolean compositions” roughly correspond to the usual reflexivity and matrix that has exactly one entry of value 1 in each row and transitivity of cardinality ordering, i.e., |Φ| ≤ |Φ| and |Φ| ≤ each column and 0’s elsewhere. |Γ| ≤ |Λ| → |Φ| ≤ |Λ| for all bounded definable sets Φ, Γ and Let An×n be the matrix obtained from G by letting Ai,j be ∆. However, more sophisticated properties, e.g., dichotomy an indeterminate Xi,j for all (i, j) ∈ E, and let Ai,j = 0 for |Φ| ≤ |Γ| ∨ |Γ| ≤ |Φ| or “uniqueness” of cardinality, turn out all (i, j) 6∈ E. The matrix of indeterminates A(X~ ) is called the to be much harder to show. Edmonds matrix of G, and Det(A)(X~ ) is called the Edmonds As a result, Jerˇabek´ proposed in [15] a systematic and so- polynomial of G. Here, we can think of A(X~ ) as a function phisticated framework to justify his definition of size compar- that takes as input an W~ n×n and returns an in- ison. He observed that estimating the size of a P/poly defin- teger matrix A(W~ ), where the (i, j) entry of A(W~ ) has value able set Φ ⊆ [0, 2n) within an error 2n/poly(n) is the same Wi,j if (i, j) ∈ E, and has zero value otherwise. Similarly, we as estimating Pr n [X ∈ Φ] within an error 1/poly(n), 2 X∈[0,2 ) can think of Det(A)(X~ ) as polynomial of n variables Xi,j, which can be solved by drawing poly(n) independent random where the terms with at least one variable not present in the n samples X ∈ [0, 2 ) and check if X ∈ Φ. This gives us a Edmonds matrix will have zero coefficients. polytime random sampling algorithm for approximating the The following theorem draws an important connection be- size of Φ. Since a counting argument [13] can be formalized tween determinants and matchings. The standard proof uses within VPV + sWPHP(LFP) to show the existence of suitable the Lagrange expansion for the determinant, which has expo- average-case hard functions for constructing Nisan-Wigderson nentially many terms, and hence cannot be formalized in VPV. generators, this random sampling algorithm can be derandom- However we will give an alternative proof which can be so ized to show the existence of an approximate cardinality S of formalized. Φ for any given error E = 2n/poly(n) in the following sense. ~ The theory VPV + sWPHP(LFP) proves the existence of S, y Theorem 1 (Edmonds’ Theorem [9]). (VPV `) Let Det(A)(X) and a pair of P/poly “counting functions” (F,G) such that be the Edmonds polynomial of the bipartite graph G. Then G has a perfect matching iff Det(A)(X~ ) 6≡ 0.  F : [0, y) × Φ ] [0,E)  [0, y · S) Proof: For the direction (=⇒) we need the following   G : 0, y · (S + E)  [0, y) × Φ lemma.

Intuitively the pair (F,G) witnesses that S−E ≤ |Φ| ≤ S+E. Lemma 1. (VPV `) Det(M) ∈ {−1, 1} for any permutation This allows him to show many properties, expected from car- matrix M. dinality comparison, that are satisfied by his method within Proof of Lemma 1: We will construct a sequence of VPV+sWPHP(LFP) (see Lemmas 2.10 and 2.11 in [15]). It is matrices worth noting that proving the uniqueness of cardinality within Nn,Nn−1,...,N1, some error seems to be the best we can do within bounded arithmetic, where exact counting is not available. where Nn = M and we construct Ni−1 from Ni by choosing ji satisfying N(i, ji) = 1 and letting Ni−1 = Ni[i | ji]. N For the present paper, the following definition is all we need From how the matrices i are constructed, we can easily ΣB(L ) ` = n, . . . , 1 to know about Jerˇabek’s´ framework. show by 0 FP induction on that the matrices N` are permutation matrices. Finally, using the cofactor expan- n B Definition 1. Let Φ(n) = {R < 2 | F (R) = 1}, where F (R) sion formula, we prove by Σ0 (LFP) induction on ` = 1, . . . , n is a VPV function (which may have other arguments) and let that Det(N`) ∈ {−1, 1}. From the lemma we see that if M is the permutation ma- Using this lemma, we have the following coRP algorithm trix representing a perfect matching of G, then VPV proves for the PIT problem when F = Z. Given a polynomial Det(A)(M) = Det(M) ∈ {1, −1}, so Det(A)(X~ ) is not P (X ,...,X ) identically 0. 1 n For the direction (⇐=) it suffices to describe a polytime of degree at most D, we choose a sequence R~ ∈ [0, 2D)n function F that takes as input an integer matrix Bn×n = at random. If P is given implicitly as a circuit, the degree of A(W~ ), where A(X~ ) is the Edmonds matrix of a bipartite P might be exponential, and thus the value of P (R~) might ~ graph G and Wn×n is an integer value assignment, and reason require exponentially many bits to encode. In this case we use in VPV that if Det(B) 6= 0, then F outputs a perfect matching the method of Ibarra and Moran [10] and let Y be the result of G. of evaluating P (R~) using arithmetic modulo a random integer Assume Det(B) 6= 0. Note that finding a perfect matching from the interval [1,Dk] for some fixed k. If Y = 0, then we of G is the same as extracting a nonzero diagonal of B. For report that P ≡ 0. Otherwise, we report that P 6≡ 0. (Note that this purpose, we construct a sequence of matrices if P has small degree, then we can evaluate P (R~) directly.) Unfortunately the Schwartz-Zippel Lemma seems hard to Cn,Cn−1,...,C1, prove in bounded arithmetic. The main challenge is that the as follows. We let Cn = B. For i = n, . . . , 2, we let Ci−1 = degree of P can be exponentially large. Even in the special Ci[i | ji] and the index ji is chosen using the following method. case when P is given as the symbolic determinant of a matrix Suppose we already know Ci satisfying Det(Ci) 6= 0. By the of indeterminates and hence the degree of P is small, the cofactor expansion along the last row of Ci, polynomial P still has up to n! terms. Thus, we will focus on i a much weaker version of the Schwartz-Zippel Lemma that X i+j Det(Ci) = C(i, j)(−1) Det(Ci[i | j]). involves only Edmonds’ polynomials since this will suffice 2 j=1 for us to establish the correctness of a FRNC algorithm for deciding if a bipartite graph has a perfect matching. Thus, since Det(Ci) 6= 0, at least one of the terms in the sum on the right-hand side is nonzero. Thus, we can choose the A. Edmonds’ polynomials for complete bipartite graphs least index ji such that C(i, ji) · Det(Ci[i | ji]) 6= 0. In this section we will start with the simpler case when To extract the perfect matching, we let P be an n×n matrix, every entry of an Edmonds matrix is a variable, since it clearly where each entry P (i, j) contains the pair (i, j) (we can use demonstrates our techniques. This case corresponds to the the value n·i+j to represent the pair (i, j)). Then we construct Schwartz-Zippel Lemma for Edmonds’ polynomials of com- a sequence of matrices plete bipartite graphs. Let A be the full n × n Edmonds’ matrix A, where A = Qn,Qn−1,...,Q1, i,j xi,j for all 1 ≤ i, j ≤ n. We consider the case that S is the where Qn = P and Qi−1 = Qi[i | ji], i.e., we delete from Qi interval of integers S = [0, s) for s ∈ N, so |S| = s. Then exactly the row and column we deleted from Ci. It is not hard Det(A)(~x) is a nonzero polynomial of degree exactly n, and to prove that the set of edges {P`(`, j`) | 1 ≤ ` ≤ n} is our we want to show that desired perfect matching.   Pr~r∈Sn2 Det(A)(~r) = 0 - n/s. IV. SCHWARTZ-ZIPPEL LEMMA  n2 The Schwartz-Zippel Lemma [26], [33] is one of the most Let Z(n, s) := ~r ∈ S | Det(A)(~r) = 0 , i.e., the set fundamental tools in the design of randomized algorithms. of zeros of the Edmonds polynomial Det(A)(~x). Then by Definition 1, it suffices to exhibit a VPV function mapping The lemma provides us a coRP algorithm for the polynomial 2 [n] × Sn onto S × Z(n, s). For this it suffices to give a VPV identity testing problem (PIT): given an arithmetic circuit com- 2 function mapping [n] × Sn −1 onto Z(n, s). We will define a puting a multivariate polynomial P (X~ ) over a field F, we want to determine if P (X~ ) is identically zero. The PIT problem VPV function is important since many problems, e.g., primality testing [1], n2−1 F (n, s, •):[n] × S  Z(n, s), perfect matching [21], and software run-time testing [32], can be reduced to PIT. Moreover, many fundamental results in so F (n, s, •) takes as input a pair (i, ~r), where i ∈ [n] and 2 complexity theory like IP = PSPACE [27] and the PCP the- ~r ∈ Sn −1 is a sequence of n2 − 1 elements. orem [2], [3] make heavy use of PIT in their proofs. The Let B be an n×n matrix with elements from S. For i ∈ [n] Schwartz-Zippel lemma can be stated as follows. let Bi denote the leading principal submatrix of B that consists of the i × i upper-left part of C. In other words, Bn = B, and Theorem 2 (Schwartz-Zippel Lemma). Let P (X1,...,Xn) Bi−1 := Bi[i | i] for i = n, . . . , 2. The following fact follows be a non-zero polynomial of degree D ≥ 0 over a field (or B easily from the least number principle Σ0 (LFP)-MIN. integral domain) F. Let S be a finite subset of F and let R~ denote the sequence hR1,...,Rni. Then Fact 1. (VPV `) If Det(B) = 0, then there is i ∈ [n] such that Det(B ) = 0 for all i ≤ j ≤ n, and either i = 1 or i > 1  ~  j Pr ~ n P (R) = 0 ≤ D/|S|. R∈S and Det(Bi−1) 6= 0. We claim that given Det(B) = 0 and given i as in the then the element A(i, j) in the Edmonds matrix A is not zero. fact, the element B(i, i) is uniquely determined by the other We apply the algorithm in the proof of Theorem 3, except elements in B. Thus if i = 1 then B(i, i) = 0, and if i > 1 that the sequence of principal submatrices of B used in Fact 1 then by the cofactor expansion of Det(Bi) along row i, is replaced by the sequence Bn,Bn−1,...,B1 determined by 0 M as follows. We let Bn = B, and for i = n, . . . , 2 we 0 = Det(Bi) = Bi(i, i) · Det(Bi−1) + Det(Bi) (IV.1) let Bi−1 = Bi[i | ji], where the indices ji are chosen the 0 0 where Bi is obtained from Bi by setting B (i, i) = 0. This same way as in the proof of Theorem 1 when constructing equation uniquely determines Bi(i, i) because Det(Bi−1) 6= 0. the perfect matching M. The output of F (n, s, (i, ~r)) is defined a follows. Let B We note that the mapping H(n, s, A, •) in this case may be the n × n matrix determined by the n2 − 1 elements in ~r not be in DET since the construction of M depends on the by inserting the symbol ∗ (for unknown) in the position for sequential polytime algorithm from Theorem 1 for extracting B(i, i). Try to use the method above to determine the value of a perfect matching. ∗ = Bi(i, i), assuming that ∗ is chosen so that Det(B) = 0. C. Formalizing the RNC2 algorithm for the bipartite perfect This method could fail because Det(Bi−1) = 0. In this case, or matching decision problem if the solution to the equation (IV.1) gives a value for Bi(i, i) which is not in S, output the default “dummy” zero sequence An instance of the bipartite perfect matching decision prob- ~0n×n. Otherwise let C be B with ∗ replaced by the obtained lem is a bipartite graph G encoded by a matrix En×n, and value of Bi(i, i). If Det(C) = 0 then output C, otherwise we are to decide if G has a perfect matching. Here is an output the dummy zero sequence. RDET algorithm for the problem. The algorithm is essentially due to Lovasz´ [19]. From E, construct the Edmonds matrix Theorem 3. (VPV `) Let A(~x) be the Edmonds matrix of a 2 A(~x) for G and choose a random sequence ~r ∈ [2n]n . If complete bipartite graph K . Let S denote the set [0, s). n×n n,n Det(A)(~r) 6= 0 then we report that G has a perfect matching. Then the function F (n, s, •) defined above is a polytime sur- 2 Otherwise, we report G does not have a perfect matching. jection that maps [n] × Sn −1 onto Z(n, s). We claim that VPV proves correctness of this algorithm. Proof: It is easy to see that F (n, s, •) is polytime (in The correctness assertion states that if G has a perfect match- fact it belongs to the complexity class DET). To see that ing then the algorithm reports YES with probability at least F is surjective, let C be an arbitrary matrix in Z(n, s), so 1/2, and otherwise it certainly reports NO. Theorem 1 shows Det(C) = 0. Let i ∈ [n] be determined by Fact 1 when that VPV proves the latter. Conversely, if G has a perfect B = C. Let ~r be the sequence of n2 − 1 elements consisting matching given by a permutation matrix M then the function of the rows of C with C(i, i) deleted. Then the algorithm H(n, 2n, A, M, •) of Theorem 4 witnesses that the probability for computing F (n, s, (i, ~r)) correctly computes the missing of Det(A)(~r) = 0 is at most 1/2, according to Definition 1, element C(i, i) and outputs C. where A is the Edmonds matrix for G. Hence VPV proves the correctness of this case too. B. Edmonds’ polynomials for general bipartite graphs Since RDET ⊆ FRNC2, this algorithm is also an FRNC2 For general bipartite graphs, an entry of an Edmonds matrix algorithm. A might be 0, so we cannot simply use leading principal submatrices in our construction of the surjection F . However V. FORMALIZINGTHE HUNGARIANALGORITHM ~ ~ given a sequence Wn×n making Det(A)(W ) 6= 0, it follows The Hungarian algorithm is a combinatorial optimization from Theorem 1 that we can find a perfect matching M in algorithm which solves the maximum-weight bipartite match- polytime. Thus, the nonzero diagonal corresponding to the ing problem in polytime and anticipated the later development perfect matching M will play the role of the main diagonal in of the powerful primal-dual method. The algorithm was devel- our construction. The rest of the proof will proceed similarly. oped by Kuhn [18], who gave the name “Hungarian method” Thus, we have the following theorem. since it was based on the earlier work of two Hungarian math- Theorem 4. (VPV `) There is a VPV function H(n, s, A, W,~ •) ematicians: D. Konig˝ and J. Egervary.´ Munkres later reviewed the algorithm and showed that it is indeed polytime [22]. where An×n is the Edmonds matrix for an arbitrary bipartite graph and W~ is a sequence of n2 (binary) integers, such that Although the Hungarian algorithm is interesting by itself, we 2 if Det(A)(W~ ) 6= 0 then H(n, s, A, W,~ •) maps [n] × Sn −1 formalize the algorithm since we need it in the VPV proof of 2 onto {~r ∈ Sn | Det(A)(~r) = 0}, where S = [0, s). the Isolating Lemma for perfect matchings in Section VI-A. The Hungarian algorithm finds a maximum-weight match- In other words, it follows from Definition 1 that the function ing for any weighted bipartite graph. The algorithm and its H(n, s, A, W,~ •) in the theorem witnesses that correctness proof are simpler if we make the two following   changes. First, since edges with negative weights can never Pr~r∈Sn2 Det(A)(~r) = 0 - n/s. be in a maximum-weight matching, and thus can be safely Proof: Assume Det(A)(W~ ) 6= 0. Then the polytime func- deleted, we can assume that every edge has nonnegative weight. tion described in the proof of Theorem 1 produces an n × n Second, by assigning zero weight to every edge not present, permutation matrix M such that for all i, j, if M(i, j) = 1 we only need to consider weighted complete bipartite graphs. Let G = (X ] Y,E) be a complete bipartite graph, where this process can only repeat at most as many time as the initial X = {xi | 1 ≤ i ≤ n} and Y = {yi | 1 ≤ i ≤ n}, and let cost of the cover (~u,~v). Assuming that all edge weights are ~w be an integer weight assignment to the edges of G, where small (i.e. presented in unary), the algorithm terminates in wi,j ≥ 0 is the weight of the edge {xi, yj} ∈ E. polynomial time. Finally, we get an equality subgraph H~u,~v n n A pair of integer sequences ~u = huiii=1 and ~v = hviii=1 containing a perfect matching M, which by Theorem 5 is also is called a weight cover if a maximum-weight matching of G.

∀i, j ∈ [n], wi,j ≤ ui + vj. (V.1) Algorithm 2 (Finding a minimum-weight perfect matching). In the Isolating Lemma for bipartite matchings, we need a VPV The cost of a cover is cost(~u,~v) := Pn (u + v ). We also i=1 i i function Mwpm that takes as inputs an edge relation E of define w(M) := P w . The Hungarian algorithm is n×n (i,j)∈M i,j a bipartite graph G and a nonnegative weight assignment ~w to based on the following important observation. the edges in E, and outputs a minimum-weight perfect match- Lemma 2. (VPV `) For any matching M and weight cover ing if such a matching exists, or outputs ∅ to indicate that no (~u,~v), we have w(M) ≤ cost(~u,~v). perfect matching exists. Recall that the Hungarian algorithm returns a maximum-weight matching, and not a minimum- Proof: Since the edges in a matching M are disjoint, weight perfect matching. However, we can use the Hungarian summing the constraints wi,j ≤ ui + vj over all edges of M P algorithm to compute Mwpm(n, E, ~w) as follows. Let yields w(M) ≤ (i,j)∈M (ui+vj). Since no edge has negative  weight, we have ui + vj ≥ 0 for all i, j ∈ [n]. Thus, c = n · max wi,j | (i, j) ∈ E + 1. P w(M) ≤ (ui + vj) ≤ cost(~u,~v) 0 0 (i,j)∈M Construct a sequence ~wn×n by letting wi,j = c − wi,j for 0 for every matching M and every weight cover (~u,~v). all (i, j) ∈ E and letting wi,j = 0 for all (i, j) 6∈ E. Then, Given a weight cover (~u,~v), the equality subgraph H is run the Hungarian algorithm on the complete bipartite graph ~u,~v 0 the subgraph of G whose vertices are X ]Y and whose edges Kn,n with edge weight assignment ~w . Since we assigned zero weights to the edges not present in G and add very large are precisely those {xi, yj} ∈ E satisfying wi,j = ui + vj. weights to other edges, the Hungarian algorithm will always Theorem 5. (VPV `) Let H = H~u,~v be the equality subgraph, prefer the edges present in the original bipartite graph G. Thus, and let M be a maximum cardinality matching of H. Then the if the Hungarian algorithm returns a perfect matching M of following three statements are equivalent Kn,n with at least one edge not in E, then G cannot have 1) w(M) = cost(~u,~v). a perfect matching, so we let Mwpm output ∅. On the other 2) M is a maximum-weight matching of G and the cover hand, if the Hungarian algorithm returns a perfect matching 0 (~u,~v) is a minimum-weight cover of G. M of Kn,n satisfying M ⊆ E, then, by the way w~ is defined, 3) M is a perfect matching of the equality subgraph H. M is a minimum-weight perfect matching of G, and hence we (cf. Appendix II-C for the full proof of this theorem.) let Mwpm output M.

2 Below we give a simplified version of the Hungarian algo- VI. FRNC ALGORITHM FOR FINDING A BIPARTITE rithm which runs in polynomial time when the edge weights PERFECT MATCHING are small (i.e. presented in unary notation). The correctness of Below we recall the elegant FRNC2 (or more precisely the algorithm easily follows from Theorem 5. RDET) algorithm due to Mulmuley, Vazirani and Vazirani [21] Algorithm 1 (The Hungarian algorithm). We start with an for finding a bipartite perfect matching. Although the original arbitrary weight cover (~u,~v) with small weights: e.g. let algorithm works for general undirected graphs, we will only focus on bipartite graphs in this paper. u = max{w | 1 ≤ j ≤ n} i i,j Let G be a bipartite graph with two disjoint sets of vertices and vi = 0 for all i ∈ [n]. If the equality subgraph H~u,~v has U = {u1, . . . , un} and V = {v1, . . . , vn}. Suppose each a perfect matching M, we report M as a maximum-weight edge (i, j) ∈ E is assigned an integer weight wi,j ≥ 0, matching of G. Otherwise, change the weight cover (~u,~v) and we want to a find a minimum-weight perfect matching as follows. Since the maximum matching M is not a perfect of G. It turns out there is a DET algorithm for this problem matching of H, the Hall’s condition fails for H. Thus, it is under two assumptions: the weights must be polynomial in not hard (cf. Corollary 1 from Appendix II-B) to construct n, and the minimum-weight perfect matching must be unique. in polytime a subset S ⊆ X satisfying |N(S)| < |S|, where We let A(X~ ) be an Edmonds matrix of the bipartite graph. wi,j N(S) denotes the neighborhood of S. Thus, we can calculate Replace Xi,j with Wi,j = 2 (this is where we need the the quantity weights to be small). We then compute Det(A)(W~ ) using Berkowitz’s FNC2 algorithm. Assume that there exists exactly δ = min{u + v − w | x ∈ S ∧ y 6∈ N(S)}, i j i,j i j one (unknown) minimum-weight perfect matching M. We will and decrease ui by δ for all xi ∈ S and increase vj by δ show in Theorem 8 that w(M) is exactly the position of the for all yj ∈ N(S) without violating the weight cover prop- least significant 1-bit, i.e., the number of trailing zeros, in the Pn ~ erty (V.1). This strictly decreases the sum i=1(ui +vi). Thus binary expansion of Det(A)(W ). Once having w(M), we can test if an edge (i, j) ∈ E belongs to the unique minimum- Example 2. Mulmuley, Vazirani and Vazirani [21] used the weight perfect matching M as follows. Let w0 be the position Isolating Lemma to construct a randomized polytime reduction of the least significant 1-bit of Det(A[i | j])(W~ ). We will show from the CLIQUE problem (decide if a graph has a clique of in Theorem 9 that the edge (i, j) is in the perfect matching if size k) to the UNIQUE-CLIQUE problem (given a graph with 0 and only if w is precisely w(M) − wi,j. Thus, we can test all at most one clique of size k, decide if the graph has such a edges in parallel. Note that up to this point, everything can be clique). done in DET ⊆ FNC2 since the most expensive operation is If we have a polytime algorithm for UNIQUE-CLIQUE or the Det function, which is complete for DET. UNIQUE-SAT, then RP = NP. Thus, we will aim to prove What we have so far is that, assuming that the minimum- a special case of the Isolating Lemma, which still suffices to weight perfect matching exists and is unique, there is a DET formalize the FRNC2 algorithm for finding a bipartite perfect algorithm for finding this minimum-weight perfect matching. matching. For this weak version of the Isolating Lemma, we But how do we guarantee that if a minimum-weight perfect want to show that, given a bipartite graph G and the family F matching exists, then it is unique? It turns out that we can of perfect matchings of G, if we assign random weights to the assign every edge (u , v ) ∈ E a random weight w ∈ [2m], i j i,j edges, then the probability that this weight assignment does where m = |E|, and use the Isolating Lemma [21] to ensure not isolate a perfect matching is small. Note that although the that the graph has a unique minimum-weight perfect matching family F here might be exponentially large, F is polytime with probability at least 1/2. definable since recognizing a perfect matching is easy. The RDET ⊆ FRNC2 algorithm for finding a perfect match- ing is now complete: assign random weights to the edges, and Theorem 7 (Isolating a Perfect Matching). (VPV `) Let F run the DET algorithm for the unique minimum-weight perfect be the family of perfect matchings of a bipartite graph G matching problem. If a perfect matching exists, with probabil- with edges E = {e1, . . . , em}. Let ~w be a random weight ity at least 1/2, this algorithm returns a perfect matching. assignment to the edges in E. Then

A. Isolating a perfect matching Pr ~w∈[k]m [~w is not isolating for F] - m/k. We will recall the Isolating Lemma [21], the key ingredient 2 For brevity, we will call a weight assignment ~w “bad” if ~w of Mulmuley-Vazirani-Vazirani FRNC algorithm for finding is not isolating for F. Let Φ := ~w ∈ [k]m | ~w is bad for F . a perfect matching. Let X be a set with n points {a1, . . . , an} Then Theorem 7 wants us to show the existence of a VPV and let F be a family of subsets of X. We assign a weight function mapping [m] × [k]m−1 onto Φ. We also note that the wi to each point of ai ∈ X, and define the weight of a set P set Φ is polytime definable since ~w ∈ Φ iff ∃i, j ∈ [n], Y ∈ F to be w(Y ) := wi. Note that several sets of ai∈Y  E(i, j) and M(i, j) and ¬M 0(i, j) and M,M 0  F might achieve the minimum-weight. However, if minimum- , weight is achieved by a unique Y ∈ F, then we say that the encode two perfect matchings with the same weight n weight assignment ~w = hwiii=1 is isolating for F. (Every where M denotes the output produced by applying the Mwpm weight assignment is isolating if the family F is empty.) function (Algorithm 2) on G, and M 0 denotes the output pro- duced by applying Mwpm on the graph obtained from G by Theorem 6 (Isolating Lemma [21]). Let F be a family of sub- deleting the edge (i, j). sets of an n-element set X = {a , . . . , a }. Let ~w = hw in 1 n i i=1 Proof of Theorem 7: By Definition 1, it suffices for us to be a random weight assignment to the elements in X. Then construct explicitly a VPV function ϕ mapping [m] × [k]m−1 ~ ~ Pr ~w∈[k]n [~w is not isolating for F] ≤ n/k. onto Φ ∪ {0}, where 0 is our designated “dummy” target. We interpret each copy {i} × [k]m−1 as the set of all possible Unfortunately this theorem seems too general for us to give weight assignments to m − 1 edges E \{e }. Our function ϕ VPV i a proof since such a proof seems to require the set of non- will map each set {i} × [k]m−1 onto the set of bad weight isolating weight assignments to be polytime definable, which assignments ~w, with respect to which, the graph G contains in turn requires an algorithm to find the minimum weight two distinct minimum-weight perfect matchings M1 and M2 subset of any family of sets, and we do not know if such such that ei ∈ M1 \ M2. an algorithm exists. In other words, F can be the solution The function ϕ takes as input a sequence set of a hard problem, and thus it is not likely that we can find the minimum-weight set in F in polynomial time. The hi, w1, . . . , wi−1, wi+1, . . . , wmi problem remains hard even for problem instances, where we from [m] × [k]m−1 and does the following. Use the function are promised that |F| ≤ 1 and we want to find the unique Mwpm (from Algorithm 2) to find a minimum-weight perfect element in F. matching M1 of G with the edge ei deleted. Use Mwpm to Example 1. Valiant and Vazirani showed in their paper with a find a minimum-weight perfect matching M2 of the subgraph suggestive title “NP is as easy as detecting unique solutions” G \{uj, vk}, where uj and vk are the two endpoints of ei. If [31] that NP problems with unique solutions remain difficult. both perfect matchings M1 and M2 exist and satisfy w(M1)− They constructed a randomized polytime reduction from SAT w(M2) ∈ [k], then ϕ outputs the sequence to UNIQUE-SAT using the hash-function technique. hw1, . . . , wi−1, w(M1) − w(M2), wi+1, . . . , wmi . Otherwise, ϕ outputs the “dummy” sequence ~0. weight wi,j such that G has a unique minimum-weight perfect wi,j To show that ϕ is surjective, consider an arbitrary bad weight matching. Let W~ n×n be a sequence satisfying Wi,j = 2 m assignment ~w = hwiii=1 ∈ Φ. Since ~w is bad, there are two for all (i, j) ∈ E. Let A be the Edmonds matrix of G, and let distinct minimum-weight perfect matchings M1 and M2 and B = Det(A)(W~ ). Let M denote the unique minimum weight some edge ei ∈ M1 \ M2. Thus from how ϕ was defined, perfect matching of G. hi, w , . . . , w , w , . . . , w i ∈ [m]×[k]m−1 is an element 1 i−1 i+1 m Theorem 8. (VPV `) The weight p = w(M) is exactly the that gets mapped to the bad weight assignment ~w. position of the least significant 1-bit of Det(B). B. Extracting the unique minimum-weight perfect matching If in Lemma 3 we tried to extract an appropriate nonzero In the previous section we formalized the Isolating Lemma, diagonal of B using the determinant and minors of B as our which with high probability gives us a weight assignment ~w guide, then in the proof of this theorem we do the reverse. for which a bipartite graph G has a unique minimum weight From a minimum-weight perfect matching M of G, we want perfect matching. This is the first step of the Mulmuley-Vazirani- to rebuild in polynomially many steps suitable minors of B Vazirani algorithm. Now we proceed with the second step, until we fully recover the determinant of B. It is not hard to where we need to output this minimum-weight perfect match- B prove by Σ0 (LFP) induction that in every step of this process, ing using a DET function. each “partial determinant” of B has the least significant 1-bit wi,j Let B be the matrix we get by substituting Wi,j = 2 at position p. Note that the technique we used to prove this for each nonzero entry (i, j) of the Edmonds matrix A of G. theorem does have some similarity to that of Lemma 1, even We want to show that if M is the unique minimum weight though the proof of this theorem is much more complicated. perfect matching of G with respect to ~w, then the weight We refer to Appendix IV for the detailed proof of this theorem. w(M) is exactly the position of the least significant 1-bit in the To extract the edges of M in DET, we need to decide if binary expansion of Det(B). The usual proof of this fact is not an edge (i, j) belongs to the unique minimum-weight perfect hard, but it uses properties of the Lagrange expansion, which matching M without knowledge of other edges in M. The has exponentially many terms and hence cannot be formal- next theorem, whose proof follows directly from Lemma 3 ized in VPV. Our proof avoids using the Lagrange expansion and Theorem 8, gives us that method. completely, and utilizes properties of the cofactor expansion instead. Theorem 9. (VPV `) For every edge (i, j) ∈ E, we have (i, j) ∈ M if and only if w(M) − wi,j is exactly the position Lemma 3. (VPV `) There is a VPV function that takes as of the least significant 1-bit of the minor Det(B[i | j]). inputs: an n × n Edmonds’ matrix A and a weight sequence We refer to Appendix V for the proof of this theorem. wi,j W~ = hWi,j = 2 | 1 ≤ i, j ≤ ni . C. Related bipartite matching problems And if B = A(W~ ) satisfying Det(B) 6= 0 and p is the position The correctness of the Mulmuley-Vazirani-Vazirani algo- of the least significant 1-bit of Det(B), then the VPV function rithm can easily be used to establish the correctness of RDET outputs a perfect matching M of weight at most p. algorithms for related matching problems, for example, the It is worth noting that the lemma holds regardless of whether maximum (cardinality) bipartite matching problem and the or not the bipartite graph corresponding to A and weight as- minimum weight bipartite perfect matching problem, where signment W~ has a unique minimum-weight perfect matching. the weights assigned to the edges are small. We refer to [21] The proof of Lemma 3 is very similar to that of Theo- for more details on these reductions. rem 1. Recall that in Theorem 1, given a matrix B satisfying Det(B) 6= 0, we want to extract a nonzero diagonal of B. In VII.CONCLUSIONAND FUTURE WORK this lemma, we are given the position p of the least significant We have only considered randomized matching algorithms 1-bit of Det(B), and we want to get a nonzero diagonal of for bipartite graphs. For general undirected graphs, we need B whose product has the least significant 1-bit at position at Tutte’s matrix (cf. [21]), a generalization of Edmonds’ matrix. most p. For this, we can use the same method of extracting the Since every is a skew where nonzero diagonal from Theorem 1 with the following modi- each variable appears exactly twice, we cannot directly apply fication. When choosing a term of the Lagrange expansion our technique for Edmonds’ matrices, where each variable ap- on the recursive step, we will also need to make sure the pears at most once. However, by using the recursive definition chosen term will produce a nonzero sub-diagonal of B that of the Pfaffian instead of the cofactor expansion, we believe will not contribute too much weight to the diagonal we are that it is also possible to generalize our results to general trying to extract. This will ensure that the least significant 1- undirected graphs. We also note that the Hungarian algorithm bit of diagonal’s weight is at most p. We refer to Appendix III only works for weighted bipartite graphs. To find a maximum- for the full proof of this lemma. weight matching of a weighted undirected graph, we need to For the next two theorems, we are given a bipartite graph formalize Edmonds’ blossom algorithm (cf. [16]). Once we G = (U ] V,E), where we have U = {u1, . . . , un} and have the correctness of the blossom algorithm, the proof of V = {v1, . . . , vn}, and each edge (i, j) ∈ E is assigned a the Isolating Lemma for undirected graph perfect matchings will be the same as that of Theorem 7. We leave the detailed [6] S. Cook and L. Fontes. Formal Theories for Linear Algebra. In proofs for the general undirected graph case for future work. Computer Science Logic, pages 245–259. Springer, 2010. [7] S. Cook and P. Nguyen. Logical foundations of proof complexity. It is worth noticing that symbolic determinants of Edmonds’ Cambridge University Press, 2010. matrices result in very special polynomials, whose structures [8] S.A. Cook. Feasibly constructive proofs and the propositional calculus can be used to define the VPV surjections witnessing the prob- (preliminary version). In Proceedings of the seventh annual ACM symposium on Theory of computing, pages 83–97. ACM, 1975. ability bound in the Schwartz-Zippel Lemma as demonstrated [9] J. Edmonds. Systems of distinct representatives and linear algebra. J. in this paper. It remains an open problem whether we can Res. Natl. Bureau of Standards, 71A:241–245, 1967. prove the full version of the Schwartz-Zippel Lemma using [10] O.H. Ibarra and S. Moran. Probabilistic Algorithms for Deciding Equivalence of Straight-Line Programs. Journal of the ACM (JACM), Jerˇabek’s´ method within the theory VPV. 30(1):217–228, 1983. We have shown that the correctness proofs for several ran- [11] R. Impagliazzo and V. Kabanets. Constructive proofs of concentration domized algorithms can be formalized in the theory VPV for bounds. Approximation, Randomization, and Combinatorial Optimiza- tion. Algorithms and Techniques, pages 617–631, 2010. polynomial time reasoning. But some of the algorithms in- [12] R. Impagliazzo and A. Wigderson. P= BPP if E requires exponential volved are in the subclass DET of polynomial time, where circuits: Derandomizing the XOR lemma. In Proceedings of the twenty- DET is the closure of #L (and also of the integer deter- ninth annual ACM symposium on Theory of computing, pages 220–229. 0 ACM, 1997. minant function Det) under AC reductions. As mentioned [13] E. Jerˇabek.´ Dual weak pigeonhole principle, Boolean complexity, and in Section II-A the subtheory V #L of VPV can define all derandomization. Annals of Pure and Applied Logic, 129(1–3):1–37, functions in DET, but it is open whether V #L can prove 2004. [14] E. Jerˇabek.´ Weak pigeonhole principle, and randomized computation. properties of Det such as the expansion by minors. However PhD thesis, Faculty of Mathematics and Physics, Charles University, the theory V #L + CH can prove such properties, where CH Prague, 2005. is an axiom stating the Cayley-Hamilton Theorem. Thus in [15] E. Jerˇabek.´ Approximate counting in bounded arithmetic. Journal of Symbolic Logic, 72(3):959–993, 2007. the statements of Lemma 1 and Theorem 3 we could have [16] B.H. Korte and J. Vygen. Combinatorial optimization: theory and replaced (VPV `) by (V #L + CH `). We could have done algorithms. Springer Verlag, 2008. the same for Theorem 4 if we changed the argument W~ of the [17] J. Kraj´ıcek.ˇ Bounded arithmetic, propositional logic, and complexity theory. Cambridge University Press, 1995. function H to M, where M is a permutation matrix encoding [18] H.W. Kuhn. The Hungarian method for the assignment problem. Naval a perfect matching for the underlying bipartite graph. This research logistics quarterly, 2(1-2):83–97, 1955. modified statement of the theorem still proves the interesting [19] L. Lovasz.´ On determinants, matchings and random algorithms. In L. Budach, editor, Fundamentals of Computation Theory, pages 565– direction of the correctness of the bipartite perfect matching 574. Akademia-Verlag, 1979. algorithm in Section IV-C, since the function H is used only to [20] R.A. Moser and G. Tardos. A constructive proof of the general Lovasz´ bound the error assuming that G does have a perfect matching. Local Lemma. Journal of the ACM (JACM), 57(2):1–15, 2010. [21] K. Mulmuley, U.V. Vazirani, and V.V. Vazirani. Matching is as easy as We leave open the question of whether any of the other matrix inversion. Combinatorica, 7(1):105–113, 1987. correctness proofs can be formalized in V #L + CH . [22] J. Munkres. Algorithms for the assignment and transportation problems. We believe that Jerˇabek’s´ framework deserves to be studied Journal of the Society for Industrial and Applied Mathematics, pages 32–38, 1957. in greater depth since it helps us to understand better the con- [23] N. Nisan and A. Wigderson. Hardness vs randomness. Journal of nection between probabilistic reasoning and weak systems of Computer and System Sciences, 49(2):149–167, 1994. bounded arithmetic. We are working on using Jerˇabek’s´ ideas [24] J.B. Paris, A.J. Wilkie, and A.R. Woods. Provability of the pigeonhole principle and the existence of infinitely many primes. Journal of to formalize constructive aspects of fundamental theorems in Symbolic Logic, 53(4):1235–1244, 1988. finite probability in the spirit of the recent beautiful work by [25] P. Pudlak.´ Ramsey’s theorem in bounded arithmetic. In Computer Moser and Tardos [20], Impagliazzo and Kabanets [11], etc. Science Logic, pages 308–317. Springer, 1991. [26] J.T. Schwartz. Fast probabilistic algorithms for verification of polyno- mial identities. Journal of the ACM (JACM), 27(4):717, 1980. ACKNOWLEDGEMENT [27] A. Shamir. IP = PSPACE. Journal of the ACM, 39(4):869–877, 1992. [28] M. Soltys and S. Cook. The proof complexity of linear algebra. Annals We would like to thank Emil Jerˇabek´ for answering our of Pure and Applied Logic, 130:277–323, 2004. questions regarding his framework. We also thank the referees [29] N. Thapen. The weak pigeonhole principle in models of bounded for their constructive comments. This work was financially arithmetic. PhD thesis, University of Oxford, 2002. [30] S. Toda. PP is as Hard as the Polynomial-Time Hierarchy. SIAM Journal supported by the Ontario Graduate Scholarship and the Natural on Computing, 20(5):865–877, 1991. Sciences and Engineering Research Council of Canada. [31] L.G. Valiant and V.V. Vazirani. NP is as Easy as Detecting Unique Solutions. Theoretical Computer Science, 47:85–93, 1986. REFERENCES [32] H. Wasserman and M. Blum. Software reliability via run-time result- checking. Journal of the ACM, 44(6):826–849, 1997. [1] M. Agrawal and S. Biswas. Primality and identity testing via chinese [33] R. Zippel. An explicit separation of relativised random polynomial time remaindering. Journal of the ACM, 50(4):429–443, 2003. and relativised deterministic polynomial time. Information Processing [2] S. Arora, C. Lund, R. Motwani, M. Sudan, and M. Szegedy. Proof Letters, 33(4):207–212, 1989. verification and the hardness of approximation problems. Journal of the ACM, 45(3):501–555, 1998. [3] S. Arora and S. Safra. Probabilistic checking of proofs: A new characterization of NP. Journal of the ACM, 45(1):122, 1998. [4] S.R. Buss. Bounded arithmetic. Bibliopolis, Naples, 1986. [5] A. Cobham. The Intrinsic Computational Difficulty of Functions. In Proceedings of the 1995 International Congress for Logic, Methodology and Philosophy of Science, pages 24–30. North-Holland, 1965. APPENDIX I Given a matching M, a vertex v is M-saturated if v is THE 2-BASIC AXIOMS incident with an edge in M. We will say v is M-unsaturated The axioms B1–B12 are based on the 1-BASIC axioms of if it is not M-saturated. A path P = hv1, . . . , vki is an M- alternating path if P alternates between edges in M and edges theory I∆0. The axioms L1 and L2 characterize |X|, the length of X, to be the largest element of the set X plus one, in E \ M. More formally, P is an M-alternating path if either or 0 if X is empty. of the following two conditions holds: • For every i ∈ {1, . . . , k − 1}, {vi, vi+1} ∈ E \ M if i is odd, and {vi, vi+1} ∈ M if i is even. B1. x + 1 6= 0 • For every i ∈ {1, . . . , k − 1}, {vi, vi+1} ∈ E \ M if i is B2. x + 1 = y + 1 → x = y even, and {vi, vi+1} ∈ M if i is odd.

B3. x + 0 = x An M-alternating path hv1, . . . , vki is an M-augmenting B4. x + (y + 1) = (x + y) + 1 path if the vertices v1 and vk are M-unsaturated. B5. x · 0 = 0 Theorem 10 (Berge’s Theorem). (VPV `) Let G = (X]Y,E) B6. x · (y + 1) = (x · y) + x be a bipartite graph. A matching M is maximum iff there is B7. (x ≤ y ∧ y ≤ x) → x = y no M-augmenting path in G. B8. x ≤ x + y Proof: (=⇒): Assume that all matchings N of E satisfy B9. 0 ≤ x |N| ≤ |M|. Suppose for a contradiction that there is an M- augmenting path P . Then M 0 = M ⊕ P is a matching greater B10. x ≤ y ∨ y ≤ x than M, a contradiction. B11. x ≤ y ↔ x < y + 1 (⇐=): We will prove the contrapositive. Assume there is B12. x 6= 0 → ∃y ≤ x (y + 1 = x) another matching M 0 satisfying |M 0| > |M|. We want to L1. X(y) → y < |X| construct an M-augmenting path in G. Consider Q = M 0 ⊕ M. Since |M 0| > |M| implies that L2. y + 1 = |X| → X(y) |M 0 \ M| > |M \ M 0|, it follows that SE. |X| = |Y | ∧ ∀i < |X| (X(i) = Y (i)) → X = Y |Q ∩ M 0| > |Q ∩ M| (II.1) Note that we can compute cardinalities of the sets directly here APPENDIX II since all the sets we are considering here are small. Now let FORMALIZINGTHE HUNGARIANALGORITHM H be the graph whose edge relation is Q and whose vertices Before proceeding with the Hungarian algorithm, we need are simply the vertices of G. We then observe the following to formalize the two most fundamental theorems in bipartite properties of H: 0 matching: Berge’s Theorem and Hall’s Theorem. • Since Q is constructed from two matchings M and M , A. Formalizing Berge’s Theorem and the augmenting-path al- every vertex of H can only be incident with at most two 0 gorithm edges one from M and another from M . So every vertex of H has degree at most 2. Let G = (X ] Y,E) be a bipartite graph, where X = • Any path of H must alternate between the edges of M {x | 1 ≤ i ≤ n} and Y = {y | 1 ≤ i ≤ n}. Formally to i i and M 0. make sure that X and Y are disjoint, we can let xi := i and We will provide a polytime algorithm to extract from the graph yi := n + i. The edge relation E of G can be encoded by a H an augmenting path with respect to M, which gives us the matrix En×n, where E(i, j) = 1 iff xi is adjacent to yj. Note that we often abuse notation and write {u, v} ∈ E to denote contradiction. that u and v are adjacent in G, which formally means either 1: Initialize K = H and i = 1 2: while K 6= ∅ do u ∈ X ∧ v ∈ Y ∧ E(u, v − n), or 3: Pick the least vertex v ∈ K v ∈ X ∧ u ∈ Y ∧ E(v, u − n). 4: Compute the connected component Ci containing v 5: if C is an M-augmenting path then This complication is due to the usual convention of using an i 6: return C and halt. n × n matrix to encode the edge relation of a bipartite graph i 7: end if with 2n vertices. 8: Update K = K \ C and i = i + 1. An n × n matrix M encodes a matching of G iff M is a i 9: end while permutation matrix satisfying Note that since H has n vertices, the while loop can only ∀i, j ∈ [n],M(i, j) → E(i, j). iterate at most n times. It only remains to show the following.

We represent a path by a sequence of vertices hv1, . . . , vki Claim: The algorithm will always return an M-augmenting 0 with {vi, vi+1} ∈ E for all 1 ≤ i < k. path under the assumption that |M | > |M|. Suppose for a contradiction that the algorithm would never NL in [7, Chapter 9]. The matrix T is constructed to help us produce any M-augmenting path. Since H has degree at most trace a path for every vertex v ∈ R to s. 2, in every iteration of the while loop, the connected compo- By induction on i, we can prove that Row(i, S) ⊆ X for nent Ci is every odd i ≤ n, and Row(i, S) ⊆ Y for every even i ≤ 2n. 0 ∗ • either a cycle, which means |Ci ∩ M| = |Ci ∩ M |, or Thus, after having S and T , we choose the largest even i such • a path but not an M-augmenting path, which implies that that Row(i∗,S) 6= ∅, and then search the set Row(i∗,S) ∩ Y 0 |Ci ∩ M| ≥ |Ci ∩ M |. for an M-unsaturated vertex t. If such vertex t exists, then 0 S we use T to trace back a path to s. This path will be our Since Q = M ⊗ M = i Ci and all Ci are disjoint, we have M-augmenting path. If no such vertex t exists, we report that [ [ 0 0 |Q ∩ M| = | (Ci ∩ M)| ≥ | (Ci ∩ M )| = |Q ∩ M |. there is no M-augmenting path. i i B. Formalizing Hall’s Theorem But this contradicts (II.1). Theorem 11 (Hall’s Theorem). (VPV `) Let G = (X ] Y,E) Algorithm 3 (The augmenting-path algorithm). As a corollary be a bipartite graph. Then G has a perfect matching iff for of Berge’s Theorem, we have the following simple algorithm every subset S ⊆ X, |S| ≤ |N(S)|, where N(S) denotes the for finding a maximum matching of a bipartite graph G. We neighborhood of S in G. start from any matching M of G, say empty matching. Repeat- The condition ∀S ⊆ X, |S| ≤ |N(S)|, which is necessary edly locate an M-augmenting path P and augment M along P and sufficient for a bipartite graph to have a perfect matching, and replace M by the resulting matching. Stop when there is is called Hall’s condition. We encode a set S ⊆ X in the no M-augmenting path. Then we know that M is maximum. theorem as a binary string of length n, where S(i) = 1 iff Thus, it remains to show how to search for an M-augmenting x ∈ S. Similarly, we encode the neighborhood N(S) as a path given a matching M of G. i binary string of length n, and we define Algorithm 4 (The augmenting-path searching algorithm). [ First, from G and M we construct a directed graph H, where N(S) := Row(i, E) | xi ∈ S , the vertices VH of H are exactly the vertices X ] Y of G, where the union can be computed by taking the disjunction  and the edge relation EH of H is a 2n × 2n matrix defined of all binary vectors in Row(i, E) | xi ∈ S componentwise. as follows: Note that we can compute the cardinalities of S and N(S)  directly since both of these sets are subsets of small sets X EH := (x, y) ∈X × Y | {x, y} ∈ E \ M  and Y . ∪ (y, x) ∈ Y × X | {y, x} ∈ M . Proof: (=⇒): Assume that G has a perfect matching The key observation is that t is reachable from s by an M- M, constructed using the algorithm discussed above. Given a alternating path in the bipartite graph G iff t is reachable from subset S ⊆ X, we know vertices S are matched to the vertices s in the directed graph H. in some subset T ⊆ Y by the perfect matching M, where After constructing the graph H, we can search for an M- |S| = |T |. Since T ⊆ N(S), we have |N(S)| ≥ |T | = |S|. augmenting path using the breadth first search algorithm as (⇐=): We will prove the contrapositive. Assume G does follows. Let s be an M-unsaturated vertex in X. We construct not have a perfect matching, we want to construct a subset two 2n×2n matrices S and T as follows. S ⊆ X such that |N(S)| < |S|. Let M be a maximum matching constructed by the above 1: The row Row(1,S) of S encodes the set {s}, the starting algorithm. Since M is not a perfect matching, there is some point of our search. M-unsaturated vertex s ∈ X. Let S and T be the result of run- 2: From Row(i, S), the set Row(i + 1,S) is defined as fol- ning the “augmenting-path searching” algorithm from s, then lows: j ∈ Row(i + 1,S) ↔ there exists some k ≤ n, Sn R := i=2 Row(i, S) is the set of all vertices reachable from  Row(i, S)(k) ∧ EH (k, j) ∧ ∀l ≤ i, ¬Row(`, S)(k) . s by an M-alternating path. Since there is no M- augmenting path, all the vertices in R are M-saturated. We want to show After finish constructing Row(i + 1,S), we can update the following two claims. T by setting T (j, k) to 1 for every k ∈ Row(i, S) and Claim 1: The vertices in R∩X are all matched to the vertices j ∈ Row(i + 1,S) satisfying EH (k, j). in R ∩ Y by M, and |R ∩ X| = |R ∩ Y |. Suppose for a Intuitively, Row(i, S) encodes the set of vertices that are of contradiction that some vertex v ∈ R is not matched to some distance i − 1 from s, and T is the auxiliary matrix that can vertex u ∈ R by M. Since we already know that all vertices R be used to recover a path from s to any vertex j ∈ Row(i, S) are M-saturated, v is matched by some vertex w 6∈ R by M. for all i ∈ [2n]. But this is a contradiction since w must be reachable from s Sn Remark 1. Let R = i=2 Row(i, S), then R the set of vertices by an alternating path, and so the augmenting-path searching reachable from s by an M-alternating path. This follows from algorithm must already have added w to R. Thus, the vertices the fact that our construction mimics the construction of the in R ∩ X are all matched to the vertices in R ∩ Y by M, formula δCONN, which was used to define the theory VNL for which implies that |R ∩ X| = |R ∩ Y |. Claim 2: N(R ∩ X) = R ∩ Y . We claim that (~u0,~v0) is again a weight cover. The condition 0 0 Since R ∩ X are matched to R ∩ Y , we know N(R ∩ X) ⊇ wi,j ≤ ui + vj might only be violated for xi ∈ S and yi 6∈ R ∩ Y . Suppose for a contradiction that N(R ∩ X) ⊃ R ∩ Y . N(S). But since we chose δ ≤ ui + vj − wi,j, it follows that Let v ∈ N(R ∩ X) \ R ∩ Y , and u ∈ R ∩ X be the vertex w ≤ (u − δ) + v = u0 + v0 . adjacent to v. Since u is reachable from s by an M- alternating i,j i j i j path P , we can extend P to get an M-alternating path from Since s to v, which contradicts that v is not added to R. 0 0 Pn 0 0 We note that N({s} ∪ (R ∩ X)) = R ∩ Y . Let S = {s} ∪ cost(~u ,~v ) = i=1(ui + vi) Pn (R ∩ X). Then S is the desired set since = i=1(ui + vi) + δ|N(S)| − δ|S|, | {z } | {z } |N(S)| = |R ∩ Y | = |R ∩ X| < |S|. =cost(~u,~v) <0 it follows that cost(~u0,~v0) < cost(~u,~v). (3)=⇒(1): Suppose M is a perfect matching of H. Then From the proof of Hall’s Theorem, we have the following w = u + v holds for all edges in M. Summing equali- corollary saying that if a bipartite graph does not have a perfect i,j i j ties w = u + v over all edges of M yields the equality matching, then we can find a subset of vertices that violates i,j i j cost(~u,~v) = w(M). Hall’s condition in polytime. Corollary 1. (VPV `) There is a VPV function that, on input D. Using the Hungarian algorithm to find a minimum-weight a bipartite graph G that does not have a perfect matching, perfect matching outputs a subset S ⊆ X such that |S| > |N(S)|. In this section, we will elaborate a bit more on the correct- ness of Algorithm 4. We will show that there is a VPV function C. Proof of Theorem 5 Mwpm(n, E, ~w), where the matrix En×n encodes the edge relation of a bipartite graph and ~w is a nonnegative weight Let H = H~u,~v be the equality subgraph for the weight cover (~u,~v), and let M be a maximum matching of H. Recall assignment to the edges in E, and Mwpm(n, E, ~w) outputs Theorem 5 wants us to show that VPV proves equivalence of the minimum-weight perfect matching if such matching exists. the following three statements: Note that the Hungarian algorithm we formalized previously returns a maximum weight matching and not minimum-weight 1) w(M) = cost(~u,~v) matching, and the matching might not be perfect. However, we 2) M is a maximum-weight matching and the cover (~u,~v) can use the Hungarian algorithm to define Mwpm(n, E, ~w) as is a minimum-weight cover of G follows. 3) M is a perfect matching of H  1: Let c = n · max w | (i, j) ∈ E + 1 Proof of Theorem 5: (1)=⇒(2): Assume that cost(~u,~v) = i,j 2: Construct the sequence ~w0 as follows w(M). By Lemma 2, no matching has weight greater than ( cost(~u,~v), and no cover with weight less than w(M). c − w if (i, j) ∈ E w0 = i,j (2)=⇒(3): Assume M is a maximum-weight matching and i,j 0 otherwise. (~u,~v) is a minimum-weight cover of G. Suppose for a con- tradiction that the maximum matching M is not a perfect 3: Run the Hungarian algorithm on the complete bipartite matching of H. We will construct a weight cover whose cost 0 graph Kn,n with weight assignment ~w to get a maximum- is strictly less than cost(~u,~v), which is absurd since (~u,~v) is weight matching M. a minimum-weight cover. 4: if M contains an edge that is not in E then Since the maximum matching M is not a perfect matching 5: return the empty matching ∅ of H, by Corollary 1, we can construct in polytime a subset 6: else S ⊆ X satisfying 7: return M |N(S)| < |S|. 8: end if Then we can calculate the quantity Note that we assign zero weights to the edges not present and very large weights to other edges, the Hungarian algorithm δ = min{ui + vj − wi,j | xi ∈ S ∧ yj 6∈ N(S)}, will always prefer the edges that are present in the original bipartite graph. More formally, for any perfect matching M 0 0 n 0 and construct a pair of sequences ~u = huiii=1 and ~v = and non-perfect matching N, we have 0 n hviii=1, as follows w0(M) ≥ nc − n · maxw | (i, j) ∈ E ( i,j ui − δ if xi ∈ S   0 = (n − 1)c + c − n · max wi,j | (i, j) ∈ E ui = ui if xi 6∈ S = (n − 1)c + 1 ( v + δ if y ∈ N(S) > (n − 1)c v0 = j j i 0 vj if yj 6∈ N(S) ≥ w (N) 0 The last inequality follows from the fact that wi,j ≤ c for matching M, we delete the row i and column ji, where ji is all (i, j) ∈ N. Thus, if the Hungarian algorithm returns a the index satisfying P (i, ji) = (i, k). So we can never match matching M with at least one edge not in E, then the original any vertex to k again. graph cannot have a perfect matching. Also from the way the It remains to show that w(M) ≤ p. Since weight assignment ~w0 was defined, every maximum-weight n 0 Y w(M) perfect matching of Kn,n with weight assignment ~w is a C`(`, j`) = 2 , minimum-weight matching of the original bipartite graph. `=1 the binary expansion of Qn C (`, j ) has a unique one at APPENDIX III `=1 ` ` position w(M) and zeros elsewhere. Thus, it follows from PROOFOF LEMMA 3 how the matching M was constructed that w(M) ≤ p. Proof: We construct a sequence of matrices APPENDIX IV Cn,Cn−1,...,C1 PROOFOF THEOREM 8 where Cn = B and Ci−1 = Ci[i | ji] for i = n . . . , 2 and the Proof: Let P be a “position” matrix, where P (i, j) = index ji is chosen as follows. Assume we are given jn, . . . , ji+1 (i, j) for all i, j ∈ [n]. We construct two matrix sequences and the least significant 1-bit in the binary expansion of Cn,Cn−1,...,C1 Qn,Qn−1,...,Q1 Qn  `=i+1 C`(`, j`) Det(Ci) where • we let Cn = B and Qn = P , and is at most p, we want to choose ji so that the least significant • for i = 2, . . . , n, let j be the index satisfying 1-bit of i n ! n ! (i, ji) = Pi(i, M(i)), Y Y C`(`, j`) Det(Ci−1) = C`(`, j`) Det(Ci[i | ji]) then we let Ci−1 = Ci[i | ji] and Qi−1 = Qi[i | ji]. `=i `=i B We will prove by Σ0 (LFP) induction on i = 1, . . . , n that the is also at most p. This can be done as follows. From the position of the least significant 1-bit of cofactor expansion, we have Qn  C`(`, j`) Det(Ci) Qn  `=i+1 C`(`, j`) Det(Ci) `=i+1 is always p. The base case i = 1 trivially holds. For the Pi i+jQn  Qn  = j=1(−1) `=i+1 C`(`, j`) Ci(i, j)Det(Ci[i | j]). inductive case, we assume that `=i+1 C`(`, j`) Det(Ci) has the least significant 1-bit at position p, we want to show Since the sum has the least significant 1-bit at position p, at   Qn C (`, j ) Det(C ) least one of the term in the sum must have the least significant that `=i+2 ` ` i+1 also has least significant 1-bit at position p. From the cofactor expansion formula, 1-bit at position at most p. Thus, we can choose ji such that the position of the least significant 1-bit of Det(Ci+1) Qn  i+1 `=i+1 C`(`, j`) Ci(i, ji)Det(Ci[i | ji]) X (i+1)+j = (−1) Ci+1(i + 1, j)Det(Ci+1[i + 1 | j]), is minimized, which guarantees that the least significant 1-bit j=1 in its binary expansion is at most p. From the construction and thus it suffices to show that the least significant 1-bit in B of Ci, we can prove by Σ0 (LFP) induction on i = n, . . . , 1 the binary expansion of the term that the position of the least significant 1-bit in the binary ex- Qn  Ti+1 := Ci+1(i + 1, ji+1)Det(Ci+1[i + 1 | ji+1]) pansion of `=i+1 C`(`, j`) Det(Ci) is at most p. Observe that, when i = 1, we actually have is strictly less than that of   Qn Qn Ci+1(i + 1, j)Det(Ci+1[i + 1 | j]) `=i+1 C`(`, j`) Det(Ci) = `=1 C`(`, j`). for all j 6= j . Thus, the position of the least significant 1-bit of the binary i+1 Qn Let q be the position of least significant 1-bit in the binary expansion of C`(`, j`) is at most p. `=1 expansion of Ti+1. Suppose for a contradiction that there is Similarly to the proof of Theorem 1, we can extract the 0 some j 6= j` such that least significant 1-bit of desired perfect matching by letting P be a “position” matrix, 0 0 where P (i, j) = (i, j) for all i, j ∈ [n]. Then we compute Ci+1(i + 1, j )Det(Ci+1[i + 1 | j ]) another sequence of matrices is at most q. Then, we can extend the set of edges 0 Qn,Qn−1,...,Q1, {(n, M(n)),..., (i + 2,M(i + 2)), (i + 1, j )} 0 where Qn = P and Qi−1 = Qi[i | ji], i.e., we delete from Qi with i edges we extract from Ci+1[i+1 | j ] (using the method exactly the row and column we deleted from Ci. from Lemma 3) to get a perfect matching of G with weight To prove that M = {Q`(`, j`) | 1 ≤ ` ≤ n} is a perfect at most p, which contradicts that M is the unique minimum- matching, we note that whenever a pair (i, k) is added to the weight perfect matching of G. APPENDIX V PROOFOF THEOREM 9 Proof: (=⇒): Assume (i, j) ∈ M. Then the bipartite 0 graph G = G \{ui, vj} must have a unique minimum-weight perfect matching of weight w(M) − wi,j. Thus, from Theo- rem 8, the position of the least significant 1-bit of the minor Det(B[i | j]) is w(M) − wi,j. (⇐=): We prove the contrapositive. Assume (i, j) 6∈ M. Suppose for a contraction that the position of the least signifi- cant 1-bit of Det(B[i | j]) is w(M)−wi,j. Then by Lemma 3, we can extract from the submatrix B[i | j] a perfect matching 0 Q of the bipartite graph G = G\{ui, vj} with weight at most 0 w(M) − wi,j. But then M = Q ∪ {(i, j)} is another perfect matching of G with w(M 0) ≤ w(M), a contradiction.