Parameterization of Tensor Network Contraction
Total Page:16
File Type:pdf, Size:1020Kb
Parameterization of tensor network contraction Bryan O’Gorman Berkeley Quantum Information & Computation Center, University of California, Berkeley, CA, USA Quantum Artificial Intelligence Laboratory, NASA Ames, Moffett Field, CA, USA [email protected] Abstract We present a conceptually clear and algorithmically useful framework for parameterizing the costs of tensor network contraction. Our framework is completely general, applying to tensor networks with arbitrary bond dimensions, open legs, and hyperedges. The fundamental objects of our framework are rooted and unrooted contraction trees, which represent classes of contraction orders. Properties of a contraction tree correspond directly and precisely to the time and space costs of tensor network contraction. The properties of rooted contraction trees give the costs of parallelized contraction algorithms. We show how contraction trees relate to existing tree-like objects in the graph theory literature, bringing to bear a wide range of graph algorithms and tools to tensor network contraction. Independent of tensor networks, we show that the edge congestion of a graph is almost equal to the branchwidth of its line graph. 2012 ACM Subject Classification Theory of computation → Fixed parameter tractability; Theory of computation → Quantum information theory; Theory of computation → Graph algorithms analysis Keywords and phrases Tensor networks, Parameterized complexity, Tree embedding, Congestion Funding The author was supported by a NASA Space Technology Research Fellowship. Acknowledgements The author thanks Benjamin Villalonga for motivating this work, useful discussions, and feedback on the manuscript. 1 Introduction Tensor networks are widely used in chemistry and physics. Their graphical structure provides an effective way for expressing and reasoning about quantum states and circuits. As a model for quantum states, they have been very successful in expressing ansatzes in variational algorithms (e.g., PEPS, MPS, and MERA). As a model for quantum circuits, they have been used in state-of-the-art simulations [27, 19, 18, 20, 24]. In the other direction, quantum circuits can also simulate tensor networks, in the sense that (additively approximate) tensor network contraction is complete for quantum computation [3]. The essential computation in the application of tensor networks is tensor network con- traction, i.e., computing the single tensor represented by a tensor network. Tensor network contraction is #P-hard in general [6] but fixed-parameter tractable. Markov and Shi [22] defined the contraction complexity of a tensor network and showed that contraction can be arXiv:1906.00013v1 [cs.DS] 31 May 2019 done in time that scales exponentially only in the treewidth of the line graph of the tensor network. Given a tree decomposition of the line graph of a tensor network, a contraction order can be found such that the contraction takes time exponential in the width of the decomposition, and vice versa. However, the translation between contraction orders and tree decompositions does not account for polynomial prefactors. This is acceptable in the- ory, where running times of O(n2n) and of O(2n) are both “exponential”; in practice, the difference between Θ(n2n) and Θ(2n) can be the difference between feasible and infeasible. We give an alternative characterization of known results in terms of tree embeddings of the tensor network rather than tree decompositions of the line graph thereof. In this context, we call such tree embeddings contraction trees. While one can efficiently interconvert 2 Parameterization of tensor network contraction between a contraction tree of a tensor network and a tree decomposition of the line graph, contraction trees exactly model the matrix multiplications done by a contraction algorithm in an abstract way. That is, the time complexity of contraction is exactly and directly expressed as a property of contraction trees, in contrast to tree decompositions of line graphs, which only capture the exponent. Our approach is thus more intuitive and precise, and easily applies to tensor networks with arbitrary bond dimensions and open legs. We show that contraction trees also capture the space needed by a matrix-multiplication- based contraction algorithm. In practice, space often competes with time as the limiting constraint. Even further, we can express the time used by parallel algorithms as a property of rooted contraction trees, which are to contraction orders as partial orders are to total orders. In a contraction tree, tensors are assigned to the leaves and each wire is “routed” through the tree from one leaf to another. The congestion of a vertex of the contraction tree is the number of such routings that pass through it, and similarly for the congestion of an edge. The vertex congestion of a graph G, denoted vc(G), is the minimum over contraction trees of the maximum congestion of a vertex, and similarly for the edge congestion, denoted ec(G). Formally, our main results are the following two theorems. vc(G)+1 I Theorem 1. A tensor network (G, M) can be contracted in time n2 and space n2vc(G)+1, or in time 21.5ec(G)+1 and space 2ec(G)+1. More precisely, the tensor network can P vc(t) be contracted in time min(T,b) t∈T 2 , where the minimization is over contraction trees (T, b). The contraction can be done using space equal to the minimum weighted, directed modified cutwidth of a rooted contraction tree using edge weights w(f) = 2ec(f). If the contraction is done as a series of matrix multiplications, these precise space and time bounds are tight (though not necessarily simultaneously achievable). I Theorem 2. A parallel algorithm can contract a tensor network (G, M) in time P vc(t) min(T,b) maxl t 2 , where the minimization is over rooted contraction trees (T, b), the maximization is over leaves l of T , and the summation is over vertices of t on the unique path from the leaf l to the root r. In other words, the time is the minimum vertex-weighted height of a rooted contraction tree, where the weight of a vertex is w(t) = 2vc(t). If the contraction is done as matrix multiplications in parallel, this is tight. Given a tree decomposition of a line graph with width k − 1, we can efficiently construct a contraction tree of the original graph with vertex congestion k. Thus one immediate application of our framework is as a way of precisely assessing the costs of contraction implied by different tree decompositions (even of the same width) computed using existing methods. This is especially useful in distinguishing between contraction orders that have the same time requirements but different space requirements; prior to this work, there was no comprehensive way of quantifying the space requirements, which in practice can be the limiting factor. Alternatively, one can start with existing algorithms for computing good branch decompositions, which can be converted into contraction trees of small edge congestion. More broadly, identifying the right abstraction (i.e., contraction trees) and precise quantification of the space and time costs is a foundation for minimizing those costs as much as possible. In Section 2, we go over the graph-theoretic concepts that are the foundation of this work. In Section 3, we present seemingly unrelated graph properties in a unified framework that may be of independent interest. Section 3, while strictly unnecessary for understanding the main results, helps explain the relationship between our work and prior work. In Section 4, we introduce the cost model on which our results are based. In Section 5, we give our main results. In Section 6, we discuss extensions and generalizations of the main ideas. In Section 7, B. O’Gorman 3 Figure 1 Two unrooted contraction trees for a tensor network with 6 tensors. Each color corresponds to a wire of the tensor network. The inclusion of a color in the representation of a vertex or edge of the contraction tree indicates the contribution of the corresponding wire’s weight to the congestion of the vertex or edge, respectively. we conclude with some possible directions for future work. In Appendix A, we prove that the edge congestion of a graph is almost equal to the branchwidth of its line graph. 2 Background Let [i, n] = {j ∈ Z|i ≤ j ≤ n}, [n] = [1, n], and [n] = [n1] × [n2] × · · · × [nr] for n = + r S (n1, . , nr) ∈ (Z ) . Let G[S] = V, E ∩ 2 be the subgraph of G induced by a subset of the vertices S ⊂ V (G). For two disjoint sets of vertices of an edge-weighted graph, w(S, S0) = P {u,v}∈E|u∈S,v∈S0 w({u, v}) is the sum of the weights of the edges between S ⊂ V and 0 P S ⊂ V . More generally, for r disjoint sets of vertices, w(S1,...,Sr) = [r] w(Si,Sj) {i,j}∈( 2 ) is the sum of the weights of the edges with endpoints in distinct sets. In this context, we will denote singleton sets by their sole arguments, e.g., w(u, v) = w({u}, {v}) = w({u, v}). 2.1 Tensor networks and contraction A tensor can be defined in several equivalent ways. Most concretely, it is a multidimensional array. Specifically, a rank-r tensor is an r-dimensional array of size d = (d1, . , dr). More abstractly, a tensor is a collection of numbers indexed by the Cartesian product of some set of indices, e.g., [T ] indexed by i ∈ [n]. Alternatively, a tensor i1,i2...,ir (i1,i2,...,ir )∈[d1]×[d2]×···×[dr ] can be thought of as a multilinear map T :[d] → C. (Our focus will be on complex-valued tensors.) I Definition 3 (Tensor network). A tensor network (G, M) is an undirected graph G with edge weights w and a set of tensors M = {Mv|v ∈ V (G)} such that Mv is a |N(v)|-rank deg(v) P tensor with 2 entries, where deg(v) = u∈N(v) w({u, v}) is the weighted degree of v.