
1 Fast Search of the Optimal Contraction Sequence in Tensor Networks Ling Liang, Jianyu Xu, Lei Deng, Member, IEEE, Mingyu Yan, Xing Hu, Zheng Zhang, Member, IEEE, Guoqi Li, Member, IEEE, Yuan Xie, Fellow, IEEE Abstract—Tensor network and tensor computation are widely [17]. In addition, tensor networks are capable of compressing applied in scientific and engineering domains like quantum the large-size parameters or data in neural networks [18]–[20] physics, electronic design automation, and machine learning. As or signal processing algorithms [21], [22]. one of the most fundamental operations for tensor networks, a tensor contraction eliminates the sharing orders among tensors Tensor contraction, a process of computing a tensor network and produces a compact sub-network. Different contraction by eliminating the sharing orders among pairs of tensors, is sequence usually yields distinct storage and compute costs, and one of the most fundamental operations in tensor network searching the optimal sequence is known as a hard problem. processing [23]. In a tensor network, the contraction operation Prior work have designed heuristic and fast algorithms to solve iteratively merges two nodes into one until the whole network this problem, however, several issues still remain unsolved. For example, the data format and data structure are not efficient, cannot be merged anymore. Different contraction sequence the constraints during modeling are impractical, the search of can result in distinct memory and compute costs. Therefore, the optimal solution might fail, and the search cost is very high. finding the optimal contraction sequence which consumes In this paper, we first introduce a logk order representation and less compute or storage resources is critical for reducing the design an adjacency matrix-based data structure to efficiently consequent contraction cost. accelerate the search of the optimal contraction sequence. Then, we propose an outer product pruning method with acceptable However, this might be a very hard problem. On one hand, overhead to reduce the search space. Finally, we use a multithread to find the contraction sequence with optimal compute cost is optimization in our implementation to further improve the proved to be NP-hard in [24]. On the other hand, for any tensor execution performance. We also present in-depth analysis of network, the optimal storage cost of contraction sequences factors that influence the search time. This work provides a equals the treewidth of its line graph, which has been proved full-stack solution for optimal contraction sequence search from both high-level data structure and search algorithm to low-level by [25]. Since the problem of computing the treewidth of a execution parallelism, and it will benefit a broad range of tensor- graph is NP-hard in general, it is a rational hyphothesis that the related applications. problem of finding the contraction sequence with optimal stor- age cost is also a hard problem. Therefore, designing heuristic Keywords: Tensor Contraction, Adjacency Matrix, BFS al- search algorithms seems the only way to find the contraction gorithm, Search Space Reduction, Multithread Optimization sequence with optimal storage or compute cost. There exist both depth-first constructive search (DFS) algorithms [24] and I. INTRODUCTION breadth-first constructive search (BFS) algorithms [26], [27] to Tensor networks are widely applied in a wide range of appli- do this search. Dynamic programming can also be applied to cations. The most well-known fields are many-body quantum solve this problem [27], [28]. These techniques go through all physics [1]–[5], matrix product states and projected entangled possible contraction sequences before determining the optimal pair states [6]–[11], multiscale entanglement renormalization one. ansatz [12], [13], and quantum circuit design [14], [15]. However, the search space grows exponentially as the Besides the applications in quantum physics, tensor networks network size increases. Some algorithms have also been in- are recently applied in IC modeling [16] and EDA problems vestigated in order to shrink the original search space. For the storage cost, an optimization algorithm is proposed in [29], This work was partially supported by National Science Foundation (Grant but this method does not guarantee an optimal sequence. For No. 1817037). Corresponding author: Lei Deng. Ling Liang, Jianyu Xu, the compute cost, cost capping is used to prune the sequences Xing Hu, Zheng Zhang, and Yuan Xie are with the Department of Elec- trical and Computer Engineering, University of California, Santa Barbara, that cost more than the optimal one, and some outer product CA 93106, USA (email: flingliang, xu jy15, xinghu, [email protected], constraints are added to further reduce the search space [27]. [email protected]). Guoqi Li is with the Department of Precision However, the search complexity depends on the variance of the Instrument, Center for Brain Inspired Computing Research, Tsinghua Univer- sity, Beijing 100084, China (email: [email protected]). Mingyu sharing orders between tensors. In addition, some algorithms Yan is with the State Key Laboratory of Computer Architecture, Institute of have been proposed to accelerate the search of contraction Computing Technology, Chinese Academy of Sciences, Beijing 100190, China sequences in closed tensor networks (i.e. the tensor network (email: [email protected]), and also with the Department of Electrical and Computer Engineering, University of California, Santa Barbara, CA being contracted into a scalar) [30]. A polynomial search 93106, USA. Lei Deng is with the Department of Electrical and Computer solution is proposed by [31] for considering both storage and Engineering, University of California, Santa Barbara, CA 93106, USA (email: compute costs, however, their solution is only effective for the [email protected]), and also with the Department of Precision Instrument, Center for Brain Inspired Computing Research, Tsinghua University, Beijing tree structure. 100084, China. Based on the observations from prior work, several issues 2 Table I: Variable definition. Variable Definition Variable Definition τi An original single tensor from a tensor network TI A tensor after contraction, where I is a set of i to denote the subscripts of original single tensors FOTI Free order collection of tensor TI SOTI TJ Sharing order collection between tensors TI and TJ , where I \ J = φ SETI Storage expense of tensor TI CETI TJ Compute expense for contracting tensors TI and TJ , where I \ J = φ MS Maximum storage expense of a contraction sequence MC Maximum compute expense of a contraction sequence RTI Row vector of possible tensor TI OTI Outer product vector used in the outer product pruning for possible tensor TI TI1 ; TI2 Two split source tensors to contract possible tensor TI sq A contraction sequence v CV Binomial coefficient Setv A set of possible tensors that are contracted by v original tensors should be considered in the search of the optimal contraction � = 1 � = 2 � = 3 $ $ sequence. First, since the search space is vast, efficient data � �� �% � ( ' format and data structure indeed matter and should be designed �' $ 0 ��# to accelerate the search process. Second, we should propose 1 an algorithm which can find the optimal solution based on the & & 2 ��% ��' data structure, shrink the search space, and fit general tensor 3 networks without specific structure constraints. At last, the 1 search time should be superior. In order to make the optimal 0 1 2 0 �((1,2,0) contraction search more efficient, we propose the following � $ � $ � $ �% �' ( �# ��' techniques. (1) We design a search algorithm based on the �$ �& �( adjacency matrix structure which is friendly to data access & & ��% ��' and network update. (2) Since the outer product between two 3 3 3 2 tensors can be pruned from the search space, we design an � � � efficient algorithm to identify the prunable tensors. (3) Finally, $ & ( 4 4 we adopt multithread optimization for parallel execution of our search algorithm, which can further improve the efficiency. 1 Figure 1: Examples of tensors with different order configu- Our proposed method will benefit a broad range of applications ration. The top subfigures visualize the original tensors; the that rely on tensor computation. middle and bottom subfigures show the graph representation of tensors. II. PRELIMINARIES In this section, we first introduce the background of tensor, where I denotes the subscript set of involved original single tensor contraction, and the problem definition of finding the tensors, e.g. here TI = T01. In this example, T01 is a two- optimal contraction sequence of a tensor network in Section order tensor (i.e. matrix). The element in tensor T01, such as II-A. Then, in Section II-B, we describe the vanilla BFS search T01(2; 1), can be calculated from the contraction between τ0 algorithm that we adopt as the basis of our algorithm design. and τ1 by N 0 N 1 τ0 τ0 A. Tensor Network Contraction X X T01(2; 1) = τ0(α; β; 2) × τ1(α; β; 1): (1) Tensor. The variables commonly used in this paper are α=0 β=0 listed in Table I. We define a tensor in a network as τi. In Figure 2(b), we term N 2 and N 2 as free orders which Tensor can be regarded as generalization of vector and matrix τ0 τ1 to represent high-order data. The number of orders in a tensor have only one end. These orders will be preserved after the contraction. Furthermore, τ0 and τ1 share two orders that have is denoted as M, and the length of the m-th order is denoted 0 1 m m + the same ends, which are remarked as N and N . We as N , where M; N 2 Z . Any element in a tensor τ0τ1 τ0τ1 τi τi m m 0 1 M−1 call them sharing orders and we have N = N . The τ (n ; n ; :::; n ) τiτj τjτi can be represented as i τi τi τi where we have nm 2 f0; 1; :::; N m − 1g contraction between two tensors can be interpreted as eliminat- τi τi .
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages13 Page
-
File Size-