<<

COMPUTING THE OF (SOME) COMPLEX MATRICES

Alexander Barvinok

June 2014

Abstract. We present a deterministic algorithm, which, for any given 0 <  < 1 and an n × n real or complex matrix A = (aij ) such that |aij − 1| ≤ 0.19 for all i, j computes the permanent of A within relative error  in nO(ln n−ln ) time. The method can be extended to computing hafnians and multidimensional permanents.

1. Introduction and main results

The permanent of an n × n A = (aij) is defined as

n X Y per A = aiσ(i),

σ∈Sn i=1 where Sn is the of of the set {1, . . . , n}. The prob- lem of efficient computation of the permanent has attracted a lot of attention. It is #P -hard already for 0-1 matrices [Va79], but a fully polynomial randomized approximation scheme, based on the Monte Carlo approach, is con- structed for all non-negative matrices [J+04]. A deterministic polynomial time algorithm based on matrix scaling for of non-negative matrices within a factor of en is constructed in [L+00] and the bound was recently improved to 2n in [GS13]. An approach based on the idea of “correlation decay” from statistical physics results in a deterministic polynomial time algorithm ap- proximating per A within a factor of (1 + )n for any  > 0, fixed in advance, if A is the of a constant degree expander [GK10]. There is also interest in computing permanents of complex matrices [AA13]. The well-known Ryser’s algorithm (see, for example, Chapter 7 of [Mi78]) computes the

1991 Mathematics Subject Classification. 15A15, 68C25, 68W25. Key words and phrases. permanent, hafnian, algorithm. This research was partially supported by NSF Grant DMS 0856640.

Typeset by AMS-TEX 1 permanent of a matrix A over any field in O (n2n) time. A randomized approxi- mation algorithm of [F¨u00]computes the permanent of a complex matrix within a (properly defined) relative error  in O 3n/2−2 time. The randomized algorithm of [Gu05], see also [AA13] for an exposition, computes the permanent of a complex matrix A in polynomial in n and 1/ time within an additive error of kAkn, where kAk is the operator norm of A. In this paper, we present a new approach to computing permanents of real or complex matrices A and show that if |aij − 1| ≤ γ for some absolute constant γ > 0 (we can choose γ = 0.19) and all i and j, then, for any  > 0 the value of per A can be computed within relative error  in nO(ln n−ln ) time (we say that α ∈ C approximates per A within relative error 0 <  < 1 if per A = α(1 + ρ) where |ρ| < ). We also discuss how the method can be extended to computing hafnians of symmetric matrices and multidimensional permanents of tensors. (1.1) The idea of the algorithm. Let J denote the n × n matrix filled with 1s. Given an n×n complex matrix A, we consider (a branch of) the univariate function (1.1.1) f(z) = ln perJ + z(A − J). Clearly, f(0) = ln per J = ln n! and f(1) = ln per A. Hence our goal is to approximate f(1) and we do it by using the Taylor polynomial expansion of f at z = 0: m X 1 dk (1.1.2) f(1) ≈ f(0) + f(z) . k! dzk z=0 k=1 It turns out that the right hand side of (1.1.2) can be computed in nO(m) time. We present the algorithm in Section 2. The quality of the approximation (1.1.2) depends on the location of complex zeros of the permanent. (1.2) Lemma. Suppose that there exists a real β > 1 such that  per J + z(A − J) 6= 0 for all z ∈ C satisifying |z| ≤ β. Then for all z ∈ C with |z| ≤ 1 the value of f(z) = ln perJ + z(A − J) is well-defined by the choice of the branch of the logarithm for which f(0) is a real number, and the right hand side of (1.1.2) approximates f(1) within an additive error of n . (m + 1)βm(β − 1) In particular, for a fixed β > 1, to ensure an additive error of 0 <  < 1, we can choose m = O (ln n − ln ), which results in the algorithm for approximating per A within relative error  in nO(ln n−ln ) time. We prove Lemma 1.2 in Section 2. Thus we have to identify a class of matrices A for which the number β > 1 of Lemma 1.2 exists. We prove the following result. 2 (1.3) Theorem. There is an absolute constant δ > 0 (we can choose δ = 0.195) such that if Z = (zij) is a complex n × n matrix satisfying

|zij − 1| ≤ δ for all i, j then per Z 6= 0.

We prove Theorem 1.3 in Section 3. For any matrix A = (aij) satisfying

|aij − 1| ≤ 0.19 for all i, j, we can choose β = 195/190 in Lemma 1.2 and thus obtain an approximation algo- rithm for computing per A. The sharp value of the constant δ in Theorem 1.3 is not known to the author. A simple example of a 2 × 2 matrix  1+i 1−i  2 2 A = 1−i 1+i 2 2 for which per A = 0 shows that in Theorem 1.3 we must have √ 2 δ < ≈ 0.71. 2 What is also not clear is whether the constant δ can improve as the size of the matrix grows. (1.4) Question. Is it true that for any 0 <  < 1 there is a positive integer N() such that if Z = (zij) is a complex n × n matrix with n > N() and

|zij − 1| ≤ 1 −  for all i, j then per Z 6= 0? We note that for any 0 <  < 1, fixed in advance, a deterministic polynomial time algorithm based on scaling approximates the permanent of a given n × n real matrix A = (aij) satisfying

 ≤ aij ≤ 1 for all i, j within a multiplicative factor of nκ() for some κ() > 0 [BS11]. (1.5) Ramifications. In Section 4, we discuss how our approach can be used for computing hafnians of symmetric matrices and multidimensional permanents of tensors. The same approach can be used for computing partition functions associated with cliques in graphs [Ba14] and graph homomorphisms [BS14]. In each case, the main problem is to come up with a version of Theorem 1.3 bounding the complex roots of the partition function away from the vector of all 1s. Isolating zeros of complex extensions of real partition functions is a problem studied in statistical physics and also in connection to , see, for example, [SS05]. 3 2. The algorithm (2.1) The algorithm for approximating the permanent. Given an n × n complex matrix A = (aij), we present an algorithm which computes the right hand side of the approximation (1.1.2) for the function f(z) defined by (1.1.1). Let

(2.1.1) g(z) = perJ + z(A − J), so f(z) = ln g(z). Hence

g0(z) f 0(z) = and g0(z) = g(z)f 0(z). g(z)

Therefore, for k ≥ 1 we have

k−1 dk X k − 1  dj   dk−j  (2.1.2) g(z) = g(z) f(z) dzk z=0 j dzj z=0 dzk−j z=0 j=0

(we agree that the 0-th derivative of g is g). We note that g(0) = n!. If we compute the values of

dk (2.1.3) g(z) for k = 1, . . . , m, dzk z=0 then the formulas (2.1.2) for k = 1, . . . , m provide a non-degenerate triangular system of linear equations that allows us to compute

dk f(z) for k = 1, . . . , m. dzk z=0 Hence our goal is to compute the values (2.1.3). We have

k k n d d X Y  g(z) = 1 + z aiσ(i) − 1 dzk z=0 dzk z=0 σ∈Sn i=1 X X   = ai1σ(i1) − 1 ··· aikσ(ik) − 1

σ∈Sn 1≤i1,... ,ik≤n X =(n − k)! (ai1j1 − 1) ··· (aikjk − 1) ,

1≤i1,... ,ik≤n 1≤j1,... ,jk≤n where the last sum is over all pairs of ordered k-subsets (i1, . . . , ik) and (j1, . . . , jk) 2 of the set {1, . . . , n}. Since the last sum contains n!/(n − k)! = nO(k) terms, the complexity of the algorithm is indeed nO(m). 4 (2.2) Proof of Lemma 1.2. The function g(z) defined by (2.1.1) is a polynomial in z of degree d ≤ n with g(0) = n! 6= 0, so we factor

d Y  z  g(z) = g(0) 1 − , α i=1 i

α1, . . . , αd are the roots of g(z). By the condition of Lemma 1.2, we have

|αi| ≥ β > 1 for i = 1, . . . , d.

Therefore,

d X  z  (2.2.1) f(z) = ln g(z) = ln g(0) + ln 1 − for |z| ≤ 1, α i=1 i where we choose the branch of ln g(z) that is real at z = 0. Using the standard Taylor expansion, we obtain

m k  1  X 1  1  ln 1 − = − + ζm, αi k αi k=1 where

+∞ 1  1 k 1 X |ζm| = ≤ m . k αi (m + 1)β (β − 1) k=m+1 Therefore, from (2.2.1) we obtain

m d k! X 1 X  1  f(1) = f(0) + − + ηm, k αi k=1 i=1 where n |η | ≤ . m (m + 1)βm(β − 1) It remains to notice that

d k 1 X  1  1 dk − = f(z) . k α k! dzk z=0 i=1 i

 5 3. Proof of Theorem 1.3

Let us denote by U n×n(δ) ⊂ Cn×n the closed polydisc

n×n n o U (δ) = Z = (zij): |zij − 1| ≤ δ for all i, j .

Thus Theorem 1.3 asserts that per Z 6= 0 for Z ∈ U n×n(δ) and δ = 0.195. First, we establish a simple geometric lemma.

d (3.1) Lemma. Let u1, . . . , un ∈ R be non-zero vectors such that for some 0 ≤ α < π/2 the angle between any two vectors ui and uj does not exceed α. Let u = u1 + ... + un. Then n √ X kuk ≥ cos α kuik. i=1

Proof. We have

n !2 2 X X X kuk = hui, uji ≥ kuikkujk cos α = (cos α) kuik , 1≤i,j≤n 1≤i,j≤n i=1 and the proof follows.  We prove Theorem 1.3 by induction on n, using Lemma 3.1 and the following two lemmas.

(3.2) Lemma. For an n × n matrix Z = (zij) and j = 1, . . . , n, let Zj be the (n − 1) × (n − 1) matrix obtained from Z by crossing out the first row and the j-th column of Z. Suppose for some δ > 0 and for some 0 < τ < 1, for any Z ∈ U n×n(δ) we have per Z 6= 0 and n X |per Z| ≥ τ |z1j| |per Zj| . j=1

Let A, B ⊂ U n×n(δ) be any two n × n matrices that differ in one column (or in one row) only. Then the angle between two complex numbers per A and per B, interpreted as vectors in R2 = C does not exceed

2δ θ = . (1 − δ)τ

Proof. Since per Z 6= 0 for all Z ∈ U n×n(δ), we may consider a branch of ln per Z defined for Z ∈ U n×n(δ). 6 Using the expansion

n X (3.2.1) per Z = z1j per Zj, j=1 we conclude that ∂ per Z ln per Z = j for j = 1, . . . , n. ∂z1j per Z

n×n Therefore, since |zij| ≥ 1−δ for j = 1, . . . , n, we conclude that for any Z ∈ U (δ), we have n X ∂ 1 (3.2.2) ln per Z ≤ . ∂z (1 − δ)τ j=1 1j

Since the permanent is invariant under permutations of rows, permutations of columns and taking the transpose of the matrix, without loss of generality we may assume that the matrix B ∈ U n×n(δ) is obtained from A ∈ U n×n(δ) by replacing the entries a1j by numbers b1j such that

|b1j − 1| ≤ δ for j = 1, . . . , n.

Then   n   X ∂ |ln per A − ln per B| ≤  sup ln per Z  max |a1j − b1j| . n×n ∂z j=1,... ,n Z∈U (δ) j=1 1j

Since |b1j − a1j| ≤ 2δ for all j = 1, . . . , n, the proof follows from (3.2.2).  (3.3) Lemma. Suppose that for some π 0 ≤ θ < − 2 arcsin δ 2 and for any two matrices A, B ∈ U n×n(δ) which differ in one row (or in one column), the angle between two complex numbers per A and per B, interpreted as vectors in R2 = C does not exceed θ. Then for any matrix Z ∈ U (n+1)×(n+1)(δ), we have n+1 X |per Z| ≥ τ |z1j| |per Zj| j=1 7 with τ = pcos (θ + 2 arcsin δ),

where Zj is the n × n matrix obtained from Z by crossing out the first row and the j-th column. Proof. We use the first row expansion (3.2.1) and observe that any two matrices Zj and Zk, can be obtained from one from another by a replacing one column and a of columns. Therefore, the angle between any two complex numbers per Zj and per Zk does not exceed θ. Since

− arcsin δ ≤ arg z1j ≤ arcsin δ for j = 1, . . . , n, the angle between any two numbers z1j per Zj and z1k per Zk does not exceed θ + 2 arcsin δ. The proof follows by Lemma 3.1.  (3.4) Proof of Theorem 1.3. One can see that for a sufficiently small δ > 0, the equation 2δ (3.4.1) θ = (1 − δ)pcos(θ + 2 arcsin δ)

has a solution 0 < θ < π/2. Numerical computations show that we can choose δ = 0.195 and θ ≈ 0.7611025127. Let τ = pcos(θ + 2 arcsin δ) ≈ 0.6365398112. We proceed by induction on n. More precisely, we prove the following three statements (3.4.2)–(3.4.4) by induction on n:

(3.4.2) For every Z ∈ U n×n(δ), we have per Z 6= 0; (3.4.3) Suppose A, B ∈ U n×n(δ) are two matrices which differ by one row (or one column). Then the angle between two complex numbers per A and per B, interpreted as vectors in R2 = C, does not exceed θ; n×n (3.4.4) For a matrix Z ∈ U (δ), Z = (zij), let Zj be the (n − 1) × (n − 1) matrix obtained by crossing out the first row and the j-th column. Then

n X |per Z| ≥ τ |z1j| |per Zj| . j=1

For n = 1 the statement (3.4.2) is obviously true. Moreover, the angle between any two numbers a, b ∈ U 1×1(δ) does not exceed

2 arcsin δ ≈ 0.3925149004 < θ, 8 so (3.4.3) holds as well. The statement (3.4.4) is vacuous. Lemma 3.3 implies that if the statement (3.4.3) holds for n × n matrices then the statement (3.4.4) holds for (n + 1) × (n + 1) matrices. The statement (3.4.4) for (n + 1) × (n + 1) matrices together with the statement (3.4.2) for n×n matrices implies the statement (3.4.2) for (n+1)×(n+1) matrices. Finally, Lemma 3.2 implies that if the statement (3.4.4) holds for (n+1)×(n+1) matrices then the statement (3.4.3) holds for (n + 1) × (n + 1) matrices. This concludes the proof of (3.4.2)–(3.4.4) for all positive integer n. 

4. Ramifications A similar approach can be applied to computing other quantities of interest.

(4.1) Hafnians. Let A = (aij) be a 2n×2n real or complex matrix. The quantity X haf A = ai1j1 ··· ainjn ,

{i1,j1},... ,{in,jn} where sum is taken over all (2n)!/n!2n unordered partitions of the set {1,..., 2n} into n pairwise disjoint unordered pairs {i1, j1},..., {in, jn}, is called the hafnian of A, see for example, Section 8.2 of [Mi78]. For any n × n matrix A we have

 0 A  haf = per A AT 0 and hence computing the permanent of an n × n matrix reduces to computing the hafnian of a symmetric 2n × 2n matrix. The computational complexity of hafnians is understood less well than that of permanents. Unlike in the case of the perma- nent, no fully polynomial (randomized or deterministic) polynomial approximation scheme is known to compute the hafnian of a non-negative real symmetric matrix. Unlike in the case of the permanent, no deterministic polynomial time algorithm approximating the hafnian of a 2n × 2n non-negative symmetric matrix within a factor of cn, where c > 0 is an absolute constant, is known. On the other hand there is a polynomial time randomized algorithm based on the representation of the hafnian as the expectation of the of a random matrix, which ap- proximates the hafnian of a given non-negative symmetric 2n × 2n matrix within a factor of cn, where c ≈ 0.56 [Ba99]. Also, for any 0 <  < 1 fixed in advance, there is a deterministic polynomial time algorithm based on scaling, which, given a 2n × 2n symmetric matrix A = (aij) satisfying

 ≤ aij ≤ 1 for all i, j, computes haf A within a multiplicative factor of nκ() for some κ() > 0 [BS11]. With minimal changes, the approach of this paper can be applied to computing hafnians. Namely, let J denote the 2n × 2n matrix filled with 1s and let us define

f(z) = ln hafJ + z(A − J). 9 Then (2n)! f(0) = ln haf J = ln and f(1) = ln haf A n!2n and one can use the Taylor polynomial approximation (1.1.2) to estimate f(1). As in Section 2, one can compute the right hand side of (1.1.2) in nO(m) time. The statement and the proof of Theorem 1.3 carries over to hafnians almost verbatim. Namely, let δ > 0 be a real for which the equation (3.4.1) has a solution 0 < θ < π/2 (hence one can choose δ = 0.195). Then haf Z 6= 0 as long as Z = (zij) is a 2n × 2n symmetric complex matrix satisfying

|zij − 1| ≤ δ for all i, j.

Instead of the row expansion of the permanent (3.2.1) used in Lemmas 3.2 and 3.3, one should use the row expansion of the hafnian

2n X haf Z = z1j haf Zj, j=2 where Zj is the symmetric (2n − 2) × (2n − 2) matrix obtained from Z by crossing out the first and the j-th row and the first and the j-th column. As in Section 2, we obtain an algorithm of nO(ln n−ln ) complexity of approximating haf Z within relative error  > 0, where Z = (Zij) is a 2n × 2n symmetric complex matrix satisfying |zij − 1| ≤ γ, for all i, j. and γ > 0 is an absolute constant (one can choose γ = 0.19). (4.2) Multidimensional permanents. Let us fix an integer ν ≥ 2 and let

A = (ai1...iν ) , 1 ≤ i1, . . . , iν ≤ n, be an ν-dimensional cubical n×...×n array of real or complex numbers. We define

n X Y PER A = aiσ1(i)...σν−1(i).

σ1,... ,σν−1∈Sn i=1

If ν = 2 then A is an n × n matrix and PER A = per A. For ν > 2 it is already an

NP-hard problem to tell PER A from 0 even if ai1...iν ∈ {0, 1} since the problem reduces to detecting a perfect in a hypergraph, see, for example, Problem SP1 in [A+99]. However, for any 0 <  < 1, fixed in advance, there is a polynomial time deterministic algorithm based on scaling, which, given a real array A satisfying

 ≤ ai1...iν ≤ 1 for all 1 ≤ i1, . . . , iν ≤ n 10 computes PER A within a multiplicative factor of nκ(,ν) for some κ(, ν) > 0 [BS11]. With some modifications, the method of this paper can be applied to computing this multidimensional version of the permanent. Namely, let J be the array filled with 1s and let us define

f(z) = ln PERJ + z(A − J).

Then f(0) = ln PER J = (ν − 1) ln n! and f(1) = ln PER A and one can use the Taylor polynomial approximation (1.1.2) to estimate f(1). As in Section 2, one can compute the right hand side of (1.1.2) in nO(m) time, where the implicit constant in “O(m)” depends on ν. The proof of Theorem 1.3 carries to multidimensional permanents with some modifications. Namely, for some sufficiently small δν > 0 the equation

2δ θ = ν q  (1 − δν ) cos (ν − 1)θ + 2 arcsin δν has a solution θ ≥ 0 such that (ν − 1)θ + 2 arcsin δν < π/2. For ν = 2, we get the equation (3.4.1) with a possible choice of δ2 = 0.195, while for ν = 3 we can choose δ3 = 0.125 and for ν = 4 we can choose δ4 = 0.093. Then PER Z 6= 0 as long as

Z = (zi1...iν ) is an array of complex numbers satisfying

|zi1...iν − 1| ≤ δν for all 1 ≤ i1, . . . , iν ≤ n.

We proceed as in the proof of Theorem 1.3, only instead of the first row expansion of the permanent (3.2.1) used in Lemmas 3.2 and 3.3, we use the first index expansion

X PER Z = z1j2...jν PER Zj2...jν ,

1≤j2,... ,jν ≤n

where Zj2...jν is the ν-dimensional array of size (n−1)×· · ·×(n−1) obtained from Z by crossing out the section with the first index 1, the section with the second index j2 and so forth, concluding with crossing out the section with the last index jν . As in Section 2, we obtain at algorithm of nO(ln n−ln ) complexity of approximating PER Z within relative error  > 0, where Z is a ν-dimensional cubic n × · · · × n array of complex numbers satisfying

|zi1...iν − 1| ≤ γν for all 1 ≤ i1, . . . , iν ≤ n, and 0 < γν < δν are absolute constants (one can choose γ2 = 0.19, γ3 = 0.12 and γ4 = 0.09). 11 References [AA13] S. Aaronson and A. Arkhipov, The computational complexity of linear optics, Theory of Computing 9 (2013), 143–252. [A+99] G. Ausiello, P. Crescenzi, G. Gambosi, V. Kann, A. Marchetti-Spaccamela, and M. Pro- tasi, Complexity and Approximation. Combinatorial Optimization Problems and their Approximability Properties, Springer-Verlag, Berlin, 1999. [Ba99] A. Barvinok, Polynomial time algorithms to approximate permanents and mixed dis- criminants within a simply exponential factor, Random Structures & Algorithms 14 (1999), no. 1, 29–61. [Ba14] A. Barvinok, Computing the partition function for cliques in a graph, preprint arXiv:1405.1974 (2014). [BS11] A. Barvinok and A. Samorodnitsky, Computing the partition function for perfect match- ings in a hypergraph, Combinatorics, Probability and Computing 20 (2011), no. 6, 815– 835. [BS14] A. Barvinok and P. Sober´on, Computing the partition function for graph homomor- phisms, preprint arXiv:1406.1771 (2014). [C+13] J.-Y. Cai, X. Chen and P. Lu, Graph homomorphisms with complex values: a dichotomy theorem, SIAM Journal on Computing 42 (2013), no. 3, 924–1029. [F¨u00] M. F¨urer, Approximating permanents of complex matrices, Proceedings of the Thirty- Second Annual ACM Symposium on Theory of Computing, ACM, New York, 2000, pp. 667–669. [GK10] D. Gamarnik and D. Katz, A deterministic for computing the permanent of a 0, 1 matrix, Journal of Computer and System Sciences 76 (2010), no. 8, 879–883. [Gu05] L. Gurvits, On the complexity of mixed discriminants and related problems, Mathemati- cal Foundations of Computer Science 2005, Lecture Notes in Computer Science, vol. 3618, Springer, Berlin, 2005, pp. 447–458. [GS13] L. Gurvits and A. Samorodnitsky, Bounds on the permanent and some applications, preprint available at http://www.cs.huji.ac.il/∼salex/ (2013). [J+04] M. Jerrum, A. Sinclair and E. Vigoda, A polynomial-time approximation algorithm for the permanent of a matrix with nonnegative entries, Journal of the ACM 51 (2004), no. 4, 671–697. [L+00] N. Linial, A. Samorodnitsky, and A. Wigderson, A deterministic strongly polynomial algorithm for matrix scaling and approximate permanents, Combinatorica 20 (2000), no. 4, 545–568. [Mi78] H. Minc, Permanents, Encyclopedia of Mathematics and its Applications, Vol. 6, Addison-Wesley Publishing Co., Reading, Mass., 1978. [SS05] A.D. Scott and A.D. Sokal, The repulsive lattice gas, the independent-set polynomial, and the Lov´aszlocal lemma, Journal of Statistical Physics 118 (2005), no. 5-6, 1151–1261. [Va79] L.G. Valiant, The complexity of computing the permanent, Theoretical Computer Science 8 (1979), no. 2, 189–201.

Department of Mathematics, University of Michigan, Ann Arbor, MI 48109-1043, USA E-mail address: [email protected]

12