Quantum Neural Network States: A Brief Review of Methods and Applications

Zhih-Ahn Jia,1, 2, 3, ∗ Biao Yi,4 Rui Zhai,5 Yu-Chun Wu,1, 2, † Guang-Can Guo,1, 2 and Guo-Ping Guo1, 2, 6 1Key Laboratory of Quantum Information, Chinese Academy of Sciences, School of Physics, University of Science and Technology of China, Hefei, Anhui, 230026, P.R. China 2CAS Center For Excellence in Quantum Information and Quantum Physics, University of Science and Technology of China, Hefei, Anhui, 230026, P.R. China 3Microsoft Station Q and Department of Mathematics, University of California, Santa Barbara, California 93106-6105, USA 4Department of Mathematics, Capital Normal University, Beijing 100048, P.R. China 5Institute of Technical Physics, Department of Engineering Physics, Tsinghua University, Beijing 10084, People’s Republic of China 6Origin , Hefei, 230026, P.R. China One of the main challenges of quantum many-body physics is the exponential growth in the dimensionality of the Hilbert space with system size. This growth makes solving the Schr¨odinger equation of the system extremely difficult. Nonetheless, many physical systems have a simplified internal structure that typically makes the parameters needed to characterize their ground states exponentially smaller. Many numerical methods then become available to capture the physics of the system. Among modern numerical techniques, neural networks, which show great power in approximating functions and extracting features of big data, are now attracting much interest. In this work, we briefly review the progress in using artificial neural networks to build quantum many- body states. We take the Boltzmann machine representation as a prototypical example to illustrate various aspects of the states of a neural network. We briefly review also the classical neural networks and illustrate how to use neural networks to represent quantum states and density operators. Some physical properties of the neural network states are discussed. For applications, we briefly review the progress in many-body calculations based on neural network states, the neural network state approach to tomography, and the classical simulation of quantum computing based on Boltzmann machine states.

I. INTRODUCTION and many classical algorithm-based tensor networks have been developed, such as the density-matrix renormaliza- tion group [13], projected entangled pair states (PEPS) One of the most challenging problems in condensed [14], folding algorithm [15], entanglement renormaliza- matter physics is to find the eigenstate of a given Hamil- tion [16], and time-evolving block decimation [17]. The tonian. The difficulty stems mainly from the power scal- research on tensor-network states also includes studies ing of the Hilbert space dimension, which grows expo- on finding new representations of quantum many-body nentially with the system size [1,2]. To obtain a bet- states. ter understanding of quantum many-body physical sys- During the last few years, has grown tems beyond the mean-field paradigm and to study the rapidly as an interdisciplinary field. Machine learning behavior of strongly correlated electrons requires effec- techniques have also been successfully applied in many tive approaches to the problem. Although the dimen- different scientific areas [18–20]: , speech sion of the Hilbert space of the system grows exponen- recognition, and chemical synthesis, Combining quantum tially with the number of particles in general, fortunately, physics and machine learning has generated a new ex- physical states frequently have some internal structures, citing field of research, [21], for example, obeying the entanglement area law, making which has recently attracted much attention [22–29]. The it easier to solve problems than in the general case [3– research on quantum machine learning can be loosely arXiv:1808.10601v3 [quant-ph] 26 Feb 2019 7]. Physical properties of the system usually restrict the categorized into two branches: developing new quan- form of the ground state, for example, area-law states tum algorithms, which share some features of machine [8], ground states of local gapped systems [9]. There- learning and behave faster and better than their classi- fore, many-body localized systems can be efficiently rep- cal counterparts [22–24], using classical machine learning resented by a tensor network [3, 10–12], which is a new methods to assist the study of quantum systems, such as tool developed in recent years to attack difficulties in rep- distinguishing phases [25], quantum control [30], error- resenting quantum many-body states efficiently. Tensor correcting of topological codes [31], and quantum tomog- network approach achieves some great success in quan- raphy [32, 33]. The latter is the focus of this work. Given tum many-body problems. It has become a standard tool the substantial progress so far, we stress here that ma- chine learning can also be used to attack the difficulties encountered with quantum many-body states. ∗Electronic address: [email protected], [email protected] Since 2001, researchers have been trying to use ma- †Electronic address: [email protected] chine learning techniques, especially neural networks, to 2 deal with the quantum problems, for example, solving we briefly review the neural network representation of the Schr¨odingerequations [34–37]. Later, in 2016, neural the density operator and the quantum state tomography networks were introduced as a variational ansatz for rep- scheme based on neural network states. In SectionV, we resenting quantum many-body ground states [26]. This discuss the entanglement features of the neural network stimulated an explosion of results to apply machine learn- states. The application of these states in classically sim- ing methods in the investigations of condensed matter ulating quantum computing circuit is discussed in Sec- physics; see, e.g., Refs. [22–25, 27–29, 38]. Carleo and tionVI. In the last section, some concluding remarks are Troyer initially introduced the restricted BM (RBM) to given. solve the transverse-field and antiferromag- netic Heisenberg model and study the time evolution of these systems [26]. Later, the entanglement properties of II. ARTIFICIAL NEURAL NETWORKS AND the RBM states were investigated [27], as was their rep- THEIR REPRESENTATIONAL POWER resentational power [29, 39]. Many explicit RBM con- structs for different systems were given, including the A neural network is a mathematical model that is an Ising model [26], toric code [28], graph states [29], sta- abstraction of the biological nerve system, which con- bilizer code [38, 40], and topologically ordered states sists of adaptive units called neurons that are connected [28, 38, 39, 41]. Furthermore, the deep BM (DBM) via a an extensive network of synapses [44]. The ba- states were also investigated under different approaches sic elements comprising the neural network are artificial [29, 42, 43]. neurons, which are the mathematical abstractions of the Despite all the progress in applying neural networks in biological neurons. When activated each neuron releases quantum physics, many important topics still remain to neurotransmitters to connected neurons and changes the be explored. The obvious topics are the exact definition electric potentials of these neurons. There is a threshold of a quantum neural network state and the mathematics potential value for each neuron; while the electric poten- and physics behind the efficiency of quantum neural net- tial exceeds the threshold, the neuron is activated. work states. Although RBM and DBM states are being There are several kinds of artificial neuron mod- investigated from different aspects, there are many other els. Here, we introduce the most commonly used Mc- neural networks. It is natural to ask if they can similarly Culloch–Pitts neuron model [45]. Consider n inputs be used for representing quantum states and what are x1, x2, ··· , xn, which are transmitted by n correspond- the relationships and differences between these represen- ing weighted connections w1, w2, ··· , wn (see Figure1). tations. Digging deeper, one central problem in studying After the signals have reached the neuron, they are added neural network is its representational power. We can ask together according to their weights, and then the value a) what is its counterpart in quantum mechanics and how is compared with the bias b of the neuron to determine to make the neural network work efficiently in represent- whether the neuron is activated or deactivated. The pro- ing quantum states, and b) what kind of states can be cess is governed by the activation function f, and the Pn efficiently represented by a specific neural network. In output of a neuron is written y = f( i=1 wixi − b). this work, we investigate partially these problems and Note that we can regard the bias as a weight w0 for some review the important progress in the field. fixed input x0 = −1, the output then has a more com- Pn The work is organized as follows. In SectionII, we pact form, y = f( i=0 wixi). Putting a large number of introduce the definition of artificial neural network. We neurons together and allowing them to connect with each explain in SectionIIA the feed-forward neural network, other in some kind of connecting pattern produces a neu- perceptron and logistic neural networks, convolutional ral network. Note that this is just a special representation neural network, and , to be able to picture the neural network intuitively, espe- the so-called Boltzmann machine (BM). Next, in Sec- cially in regard to feed-forward neural networks. Indeed, tionIIB we explain the representational power for the there are many other forms of mathematical structures feed-forward neural network in representing given func- by which to characterize the different kinds of neural net- tions and the BM in approximating given probability works. Here we give several important examples of neural distributions. In SectionIII, we explain how a neural networks. network can be used as a variational ansatz for quan- tum states; the method given in SectionIIIA is model- independent, that is, the way to construct states can be A. Some examples of neural networks applied to any neural network (with the ability to con- tinuously output real or complex numbers). Some con- An artificial neural network is a set of neurons where crete examples of neural network states are given in Sec- some, or all, the neurons are connected according to a tion IIIA1. Section IIIA2 is devoted to the efficient certain pattern. Note that we put neurons at both in- representational power of neural network in represent- put and output ends. These input and output neurons ing quantum states, and in SectionIIIB we introduce are not neurons as introduced previously, but depend on the basic concepts of a tensor-network state, which is the learning problems. There may or may not be activa- closely relevant to neural network states. In SectionIV, tion functions associated with them. In what follows, we 3

x1 w1 TABLE I: Some popular activation functions. activation sum output function x2 w2 function value (a) input u f(x) = 1 f () 1+e−x values  x −x weights y e −e tanh tanh(x) = ex+e−x cos cos(x) b x a e j xn wn bias softmax σ(x)j = P xi i e rectified linear unit ReLU(x) = max{0, x}  x, x ≥ 0 hidden layers exponential linear unit ELU(x) = α(ex − 1), x < 0 softplus SP(x) = ln(ex + 1)

output aThe softmax function acts on vectors x, which are usually used layer input in the final layer of the neural networks. (b) layer

tron simulates the NAND operation. Suppose that x1, x2 are two inputs of the neuron each with weight −2 and the bias of the neuron set to −3. When the inputs are x1 = x2 = 1, f(x1w1 + x2w2) = 0, otherwise its output FIG. 1: (a) McCulloch–Pitts neuron model, where x1, ··· , xn is 1, which is exactly the output of the NAND operation. are inputs to the neurons, w1, ··· , wn are weights correspond- ing to each input, Σ is the summation function, b the bias, f Although the perceptron is powerful in many applica- the activation function, and y the output of the neuron; (b) tions, it still has some shortcomings. The most outstand- a simple artificial neural network. ing one is that the activation function is not continuous, a small change in the weights may produce a large change in the output of the network, this makes the learning pro- briefly introduce the feed-forward, convolutional neural cess difficult. One way to remedy this shortcoming is to networks, and the Boltzmann machines. smooth out the activation function, usually, by choosing the logistic function,

1. Rosenblatt’s perceptron and logistic neural network 1 f(x) = . (2) 1 + e−x To explain the neural network, we first see an example This resulting network is usually named the logistic neu- of the feed-forward neural network, perceptron, which ral network (or sigmoid neural network). In practice, was invented by Rosenblatt. In the history of artificial logistic neural networks are used extensively. Many prob- neural networks, the multilayer perceptron plays a crucial lems can be solved by this network. We remark that the role. In a perceptron, the activation function of each logistic function is chosen for convenience in the updat- neuron is set to a Heaviside step function ing process of learning and many other smooth step-like  0, x ≤ 0, functions can be chosen as an activation function. In a f(x) = (1) 1, x > 0, neural network, the activation functions need not be all the same. In TableI, we list some popular activation where one value represents the activation status of the functions. neuron and zero value represents the deactivation status Next, we see how the neural network learns with of the neuron. The output value of one neuron is the the method. We illustrate the learn- input value of the neurons connecting it. ing process in the supervised machine framework using N The power of the perceptron mainly comes from its two different sets of labelled data S = {(xi, yi)}i=1 and M hierarchical recursive structure. It has been shown that T = {(zi, ti)}i=1, known as training data and test data, the perceptron can be used for doing universal classical respectively; here yi (resp. ti) is the label of xi (resp. computation [45–47]. To see this, we note that NAND zi). Our aim is to find the weights and biases of the neu- and FANOUT operations are universal for classical com- ral network such that the network output y(xi) (which putation. In the perceptron, we still assume a FANOUT depends on network parameters Ω = {wij, bi}) approx- 1 operation works . We only need to show that the percep- imates yi for all training inputs xi. To quantify how well the neural network approximates the given labels, we need to introduce the cost function, which measures the difference between y(xi) and yi, 1 Here, we emphasize the importance of the FANOUT operation, N which is usually omitted from the universal set of gates in the 1 X classical computation theory. However, the operation is forbid- C(Ω) := C(y(x ), y ) = ky(x ) − y k2, (3) i i 2N i i den in quantum computation by the famous no-cloning theorem. i=1 4 where N denotes the number of data in the training set, connected layer. Through a differentiable function, every Ω the set of network parameters wij and bj, and the sum layer transforms the former layer’s data (usually pixels) runs over all data in the training set. Here, we choose into a new set of data (pixels). the quadratic norm, therefore, the cost function is called For regular neural networks, each neuron is fully con- quadratic. Now our aim is to minimize the cost function nected with the neurons in the previous layer. However, as a multivariable function of the network parameters for the convolution layer of a convolutional neural net- such that C(Ω) ≈ 0, this can be done by the well-known work, the neurons only connect with neurons in a local gradient descent method. neighborhood of the previous layer. More precisely, in The intuition behind the gradient decent method is a convolutional layer, the new pixel values of the k-th that we can regard the cost function, in some sense, as layer are obtained from the (k − 1)-th layer by a filter the height of a map where the place is marked by network that determines the size of the neighborhood and then parameters. Our aim is to go down repeatedly from some (k) P (k−1) gives vij = p,q wij;pqvp,q where the sum runs over initial place (given a configuration of the neural network) (k) until we reach the lowest point. Formally, from some the neurons in the local neighborhood of vij . After the given configuration of the neural network, i.e., given pa- filter scans the whole image (all pixel values), a new im- age (new set of pixel values) is obtained. The pooling rameters wij and bi, the gradient decent algorithm needs to compute repeatedly the gradient ∇C = ( ∂C , ∂C ). layers are usually periodically added in-between succes- ∂wij ∂bi sive convolutional layers and its function is to reduce the The updating formulae are given by data set. For example, the max (or average) pooling 0 ∂C chooses the maximum (or average) value of the pixels of wij → w = wij − η , (4) ij ∂wij the previous layer contained in the filter. The last fully 0 ∂C bi → b = bi − η , (5) connected layer is the same as the one in the regular neu- i ∂bi ral network and outputs a class label used to determine where η is a small positive parameter known as the learn- which class the image is categorized in. ing rate. The weights and biases of the convolutional neural net- In practice, there are many difficulties in applying gra- works are learnable parameters, but the variables such as dient method to train the neural network. The mod- the size of the filter and the number of interleaved con- ified form, the stochastic gradient descent, is usually volutional and pooling layers are usually fixed. The con- used to speed up the training process. In the stochas- volutional neural network performs well in classification- tic gradient method, sampling over the training set is type machine learning tasks such as image recognition 0 0 introduced, i.e., we randomly choose N samples S = [18, 50, 51]. As has been shown numerically [52], the {(X1,Y1), ··· , (XN 0 ,YN 0 )} such that the average gradi- convolutional neural network can also be used to build 0 ent of cost function over S equals roughly the average quantum many-body states. gradient over the whole training set S. Then the updat- ing formulae are accordingly modified as

0 0 η PN ∂C(Xi) 3. Boltzmann machine wij → w = wij − 0 , (6) ij N i=1 ∂wij 0 0 η PN ∂C(Xi) bi → b = bi − 0 , (7) i N i=1 ∂bi Now we introduce another special type of artificial neural networks, the Boltzmann machine (also known 2 where C(Xi) = ky(Xi) − Yik /2 is the cost function over as the stochastic Hopfield network with hidden units), the training input Xi. which is an energy-based neural network model [53, 54]. The test data T is usually chosen differently from S, Recently introduced in many different physical areas and when the training process is done, the test data is [26, 27, 29, 31, 38, 39, 55–60], the quantum versions of used to test the performance of the neural network, which BMS, quantum BMs, have also been investigated [61]. for many traditionally difficult problems (such as classifi- As the BM is very similar to the classical Ising model, cation and recognition) is very good. As discussed later, here we explain the BM neural network by frequently the feed-forward neural network and many other neural referring to the terminology of the Ising model. Notice networks also work well in approximating quantum states that the BM is very different from the perceptrons and [48, 49], this being the main theme of this paper. logistic neural network as it does not treat each neuron individually. Therefore, there is no activation function attached to each specific neuron. Instead, the BM treats 2. Convolutional neural network neurons as a whole. Given a graph G with vertex set V (G) and edge set Convolutional neural networks are another important E(G), the neurons s1, ··· , sn (spins in the Ising model) class of neural network and are most commonly used to are put on vertices, n = |V (G)|. If two vertices i and j analyze images. A typical convolutional neural network are connected, there is a weight wij (coupling constant in consists of a sequence of different interleaved layers, in- the Ising model) between the corresponding neurons si cluding a convolutional layer, a pooling layer, and a fully and sj. For each neuron si, there is also a corresponding 5 local bias (local field in the Ising model). As has been renormalization ansatz (MERA) [16], branching [70, 71], done for Ising model, for each series of input values s = and tree tensor network [72], matrix product operator (s1, ··· , sn) (spin configuration in the Ising model), we [73–76], projective entangled pair operator [77–79], and define its energy as continuous tensor networks [80–82]. A large number of numerical algorithms based on tensor networks are now X X E(s) = − wijsisj − sibi. (8) available, including the density-matrix renormalization hiji∈E(G) i group [13], folding algorithm [15], entanglement renor- malization [16], time-evolving block decimation [17], and Up to now, everything is just as for the Ising model. No tangent space method [83]. new concepts or techniques are introduced. The main One of the most important properties that empowers difference is that, the BM construction introduces a col- tensor networks is that entanglement is much easier to oring on each vertex. Each vertex receives a label hidden treat in this representation. Many studies have appeared or visible. We assume the first k neurons are hidden in recent years that indicate that tensor networks have a neurons denoted by h , ··· , h , and the left l neurons 1 k close relationship with state-of-the-art neural network ar- are visible neurons denoted by v , ··· , v and k + l = n. 1 l chitectures. From theory, machine learning architectures Therefore, the energy is now E(h, v). The BM is a para- were shown in Ref. [84] to be understood via the tensor metric model of a joint probability distribution between networks and their entanglement pattern. In practical variables h and v with the probability given by applications, tensor networks can also be used for many e−E(h,v) machine-learning tasks, for example, performing learn- p(h, v) = , (9) ing tasks by optimizing the MPS [85, 86], preprocessing Z the dataset based on layered tree tensor networks [87], P −E(h,v) classifying images via the MPS and tree tensor networks where Z = h,v e is the partition function. The general BM is very difficult to train, and therefore [85–88], and realizing quantum machine learning via ten- some restricted architecture on the BM is introduced. sor networks [89]. Both tensor networks and neural net- The restricted BM (RBM) was initially invented by works can be applied to represent quantum many-body Smolensky [62] in 1986. In the RBM, it is assumed that states; the difference and connections of the two kinds of the graph G is a bipartite graph; the hidden neurons only representations are extensively explored in several works connect with visible neurons and there are no intra-layer [27, 29, 38, 40, 43, 59, 90],. We shall review some of these connections. This kind of restricted structure makes the progress in detail in SectionIII. neural network easier to train and therefore has been ex- tensively investigated and used [26, 27, 29, 31, 38, 39, 55– 60]. The RBM can approximate every discrete probabil- ity distribution [63, 64]. B. Representational power of neural network The BM is most notably a stochastic recurrent neu- ral network whereas the perceptron and the logistic neu- Next we comment on the representational power of ral network are feed-forward neural networks. There are neural networks, which is important in understanding many other types of neural networks. For a more compre- the representational power of quantum neural network hensive list, see textbooks such as Refs. [65, 66]. The BM states. In 1900, Hilbert formulated his famous list of 23 is crucial in quantum neural network states and hence its problems, among which the thirteenth problem is devoted neural network states are also the most studied. In later to the possibility of representing an n-variable function sections, we shall discuss the physical properties of the as a superposition of functions of a lesser number of vari- BM neural network states and their applications. ables. This problem is closely related to the represen- tational power of neural networks. Kolmogorov [91, 92] 4. Tensor networks and Arnold [93] proved that for continuous n-variable functions, this is indeed the case. The result is known as the Kolmogorov–Arnold representation theorem (al- Tensor networks are certain contraction pattern of ten- ternatively, the Kolmogorov superposition theorem); sors, which play an important role in many scientific ar- eas such as , quantum informa- tion and quantum computation, computational physics, Theorem 1. Any n-variable real continuous function and quantum chemistry [3–7, 12]. We discuss some de- f : [0, 1]n → R expands as sums and compositions of con- tails of tensor networks latter in this review. Here, we tinuous univariate functions; more precisely, there exist only comment on the connection between tensor networks real positive numbers a, b, λp, λp,q and a real monotonic and machine learning. increasing function φ : [0, 1] → [0, 1] such that Many different tensor network structures have been de- veloped over the years for solving different problems like 2n+1 n X X the matrix product states (MPS) [67–69], projective en- f(x1, ··· , xn) = F ( λpφ(xp + aq) + bq), (10) tangled pair states (PEPS) [14], multiscale entanglement q=1 p=1 6 or (v1,,vN ,) 2n+1 n X X f(x1, ··· , xn) = F ( λpqφ(xp + aq)), (11) q=1 p=1 neural network where F is a real and continuous function called the outer  function, and φ called the inner function. Note that a and F may be different in two representations.  v v v v v v v Obviously, the mathematical structure in the theo- v1 v2 3 v4 5 6 7 8 N 1 N rem is very similar to the mathematical structure of feed-forward neural networks. Since the initial work of FIG. 2: Schematic diagram for the neural network ansatz Kolmogorov and Arnold, numerous follow-up work con- state. tributed to understanding more deeply the representa- tion power of neural networks from different aspects [94]. build a neural network. If a neural network can represent Mathematicians have considered the problem in differ- a function or distribution in polynomial time (the number ent support sets and different metric between functions. of parameters depends polynomially on the number of The discrete-function version of the problem is also ex- input neurons), we say that the representation is efficient. tensively investigated. As mentioned in the previous sec- tion, McCulloch and Pitts [45] showed that any Boolean function can be represented by the perceptron and, based III. ARTIFICIAL NEURAL NETWORK ANSATZ on this fact, Rosenblatt developed the learning algorithm FOR QUANTUM MANY-BODY SYSTEM [95]. Slupecki proved that all k-logic functions can be represented as a superposition of one-variable functions A. Neural network ansatz state and any given significant function [96]. Cybenko [97], Fu- nahashi [98] and Hornik and colleagues [99] proved that n We now describe how neural network can be used as a n-variable functions defined on a compact subset of R may be approximated by a four-layer network with only variational ansatz for quantum many-body systems. For logistic activation functions and a linear activation func- a given many-body pure state |Ψi of an N-particle p-level tion. Hecht [100] went a step further; he proved that physical system any n-variable continuous function can be represented p X by a two-layer neural network involving logistic activa- |Ψi = Ψ(v1, v2 ··· , vN )|v1i⊗|v2i⊗· · ·⊗|vN i. tion functions of the first layer and arbitrary activation v1,v2··· ,vN =1 functions on the second layer. These results are summa- rized as follows; The coefficient Ψ(v1, v2 ··· , vN ) of the state can be re- garded as an N-variable complex function. To character- Theorem 2. The feed-forward neural network can ap- ize a state, we only need to give the corresponding value proximate any continuous n-variable functions and any of Ψ function for each variable v = (v1, v2 ··· , vN ). One n-variable discrete functions. of the difficulties in quantum many-body physics is that the complete characterization of an N-particle system re- For the stochastic recurrent neural network BM, the quires O(pN ) coefficients, which is exponentially large in power in approximating probability distributions has also the system size N and therefore is computationally inef- been studied extensively. An important result of Le Roux ficient. Let us now see how neural network can be used and Bengio [63] claims that to attack the difficulty. Theorem 3. Any discrete probability distribution p : To represent a quantum state, we first need to build a n B → R≥0 can be approximated with an RBM with k +1 specific architecture of the neural network for which we hidden neurons where k = |supp(p)| is the cardinality of denote the set of adjustable parameters as Ω = {wij, bi}. the support of p (i.e., the number of vectors with non- The number of input neurons is assumed to be the same zero probabilities) arbitrarily well in the metric of the as the number of physical particles N. For each series Kullback–Leibler divergence. of inputs v = (v1, ··· , vN ), we anticipate the neural net- work to output a complex number Ψ(v, Ω), which de- The theorem states that any discrete probability dis- pends on values of both the input and parameters of the tribution can be approximated by the RBM. The bound neural network. In this way, a variational state of the number of hidden neurons is later improved [64]. X Here we must stress that these representation theorems |Ψ(Ω)i = Ψ(v, Ω)|vi, (12) are applicable only if the given function or probability N v∈Zp distribution can be represented by the neural network. In practice, the number of parameters to be learned cannot is obtained, where the sum runs over all basis labels, |vi be too large for the number of input neurons when we denotes |v1i ⊗ · · · ⊗ |vN i, and Zp the set {0, 1, ··· , p − 1} 7

(the labels of local basis). The state in Equation (12) an example. We assume the biases are b1, ··· , b4 for hid- is a variational state. For a given Hamiltonian H, the den neurons h1, ··· , h4 respectively; the weights between corresponding energy functional is neurons are denoted by wij. We construct the state co- efficient neuron by neuron next. hΨ(Ω)|H|Ψ(Ω)i In Figure3, the output for h , i = 1, 2, 3 is y = E(Ω) = . (13) i i hΨ(Ω)|Ψ(Ω)i f(v1w1i + v2w2i − bi), respectively. These outputs are transmitted to h4; after acting with h4, we get the state In accordance with the variational method, the aim now coefficient, is to minimize the energy functional and obtain the cor- responding parameter values, with which the (approxi- Ψlog(v1, v2, Ω) = f(w14y1 + w24y2 + w34y3 − b4), (14) mate) ground state is obtained. The process of adjusting where Ω = {wij, bi}. Summing over all possible in- parameters and finding the minimum of the energy func- put values, we obtain the quantum state |Ψlog(Ω)i = tional is performed using neural network learning (see P Ψ (v , v , Ω)|v , v i up to a normalization fac- v1,v2 log 1 2 1 2 Figure2). Alternatively, if the appropriate dataset exists, tor. We see that the logistic neural network states have a we can also build the quantum neural network states by hierarchical iteration control structure that is responsible standard machine learning procedures rather than min- for the representation power of the network in represent- imizing the energy functional. We first build a neural ing states. network with learnable parameters Ω and then train the However, when we want to give the neural network network with the available dataset. Once the training parameters of a given state |Ψi explicitly, we find that process is completed, the parameters of the neural net- f(z) = 1/(1 + e−z) cannot exactly take values zero and work are fixed; we also obtain the corresponding approx- one as they are the asymptotic values of f. This short- imate quantum states. coming can be remedied by a smoothing step function in The notion of the efficiency of the neural network another way. Here we give a real function solution; the ansatz in representing a quantum many-body state is de- complex case can be done similarly. The idea is very sim- fined as the dependency relation of the number of non- ple. We cut the function into pieces and then glue them zero parameters |Ω| involved in the representation and together in some smooth way. Suppose that we want to the number of physical particles N: if |Ω| = O(poly(N)), construct a smooth activation function F (x) such that the representation is called efficient. The aim when solv-  a ing a given eigenvalue equation is therefore to build a = 0, x ≤ − 2 ,  a a neural network for which the ground state can be repre- F (x) ∈ (0, 1) − 2 < x < 2 , (15)  a sented efficiently. = 1, x ≥ 2 , To obtain the quantum neural network states from the above construction, we first need to make the neural net- we can choose a kernel function work a complex neural network, specifically, use complex  4x 2 a a2 + a , − 2 ≤ x ≤ 0, parameters and output complex values. In practice, some K(x) = 2 4x a (16) a − a2 , 0 ≤ x < 2 , neural networks may have difficulty outputing complex values. Therefore, we need to develop another way to The required function can then be constructed as build a quantum neural network state |Ψi. We know that a Z x+ 2 wavefunction Ψ(v) can be written as Ψ(v) = R(v)eiθ(v) F (x) = K(x − t)s(t)dt, (17) a where the amplitude R(v) and phase θ(v) are both real x− 2 functions; hence, we can represent them by two separate where s(t) is step function. It is easy to check that the neural networks with parameter sets Ω and Ω . The 1 2 constructed function F (x) is differentiable and satisfies quantum states are determined from the union of these Equation (15). In this way, the explicit neural network sets, Ω = Ω ∪ Ω (Figure4). In SectionIV, we give an 1 2 parameters can be obtained for any given state. explicit example to clarify the construction. Hereinafter, Note that the above representation of the quantum most of our discussion remains focused on the complex state Ψ(v) by a neural network is to develop the complex- neural network approach. valued neural network. It will be difficult in some Let us now see some concrete examples of neural net- cases. Because the quantum state Ψ(v) can also be work states. expressed as an amplitude R(v) and phase eiθ(v) as Ψ(v) = R(v)eiθ(v), we can also represent the amplitude and phase by two neural networks separately as R(Ω1, v) 1. Some examples of neural networks states and θ(Ω2, v) where Ω1 and Ω2 are two respective param- eter sets of the neural networks. The approach is used The first neural network state we consider is the logis- in representing a density operator by purification; to be tic neural network state, where weights and biases now discussed in SectionIV. must be chosen as complex numbers and the activation For the BM states, we notice that the classical BM function f(z) = 1/(1 + e−z) is also a complex function. networks can approximate a discrete probability distri- As shown in Figure3, we take the two-qubit state as bution. The quantum state coefficient Ψ(v) is the square 8

Then the wavefunction is X X P a v +P h (b +P v W ) Ψ(v, Ω) ∼ ··· e i i i j j j i i ij ,

h1 hl

Y aivi Y = e Γj(v; bj,Wij), (21) i j logistic neural network state where by ‘∼’ we mean that the overall normalization fully connected BM state factor and the partition function Z(Ω) are omitted, h (b +P v W ) Γ = P e j j i i ij is 2 cosh(b + P v W ) or j hj j i i ij P bj + viWij 1 + e i for hj takes values in {±1} and {0, 1}, respectively. This kind of product form of the wavefunc- tion plays an important role in understanding the RBM states. The DBM has more than one hidden layer; indeed, as RBM state DBM state has been shown in Ref. [29], any BM can be transformed into a DBM with two hidden layers. Hence, we shall only FIG. 3: Examples of two-qubit neural network ansatz states. be concerned with the DBM with two hidden layers. The wavefunction is written explicitly as X X X X exp−E(v,h,g) root of the probability distribution and therefore should Ψ(v, Ω) ∼ ··· ··· , (22) Z also be able to be represented by the BM. This is one h1 hl g1 gq reason that the BM states are introduced as a represen- where the energy function is now of the form E(v, h, g) = tation of quantum states. Here we first treat instances of P P P P − i viai − k ckgk − j hjbj − i,j;hiji Wijvihj − fully connected BM states (Figure3). Instances for the P RBM and DBM are similar. As in the logistic states, the jk;hkji Wkjhjgk. It is also difficult to train the DBM, weights and biases of the BM are now complex numbers. in general, but the DBM states have a stronger repre- The energy function is defined as sentational power than the RBM states; the details are discussed in the next subsection. X X X E(h, v) = − ( viai + hjbj + wijvihj i j hiji 2. Representational power of neural network states X X + wjj0 hjhj0 + wii0 vivi0 ), (18) hjj0i hii0i As the neural network states were introduced in many-body physics to represent the ground state of the where ai and bj are biases of visible neurons and hidden transverse-field Ising model and the antiferromagnetic neurons, respectively; wij, wjj0 , and wii0 are connection Heisenberg model efficiently [26], many researchers have weights. The state coefficients are now studied their representation power. We now know that RBMs are capable of representing many different classes X X e−E(h,v) Ψ (v, Ω) = ··· (19) of states [28, 29, 38, 58]. Unlike their unrestricted coun- BM Z h1 hl terparts, RBMs allow an efficient sampling and they are also the most studied cases. The DBM states are also P −E(h,v) with Z = v,h e the partition function, and the explored in various works [29, 42, 43]. In this section, we sum runs over all possible values of the hidden neurons. briefly review the progress in this direction. P Ψ (v,Ω)|vi v BM We first list some known classes of states that can The quantum state is |ΨBM (Ω)i = N where N is the normalizing factor. be efficiently represented by RBM: Z2-toric code states Because the fully connected BM states are extremely [28]; graph states [29]; stabilizer states with generators of difficult to train in practice, the more commonly used pure type, SX , SY , SZ and their arbitrary union [38]; per- ones are the RBM states where there is one hidden layer fect surface code states, surface code states with bound- and one visible layer. There are no intra-layer connec- aries, defects, and twists [38]; Kitaev’s D(Zd) quantum tions [hidden (resp. visible) neurons do not connect with double ground states [38]; the arbitrary stabilizer code hidden (resp. visible) neurons]. In this instance, the en- state [40]; ground states of the double semion model ergy function becomes and the twisted quantum double models [41]; states of the Affleck–Lieb–Kennedy–Tasaki model and the two- X X X dimensional CZX model [41]; states of Haah’s cubic code E(h, v) = − aivi − bjhj − viWijhj. i j ij model [41]; and the generalized-stabilizer and hypergraph X X X states [41]. The algorithmic way to obtain the RBM pa- = − aivi − hj(bj + viWij). (20) rameters of the stabilizer code state for arbitrary given i j i stabilizer group S has also been developed [40]. 9

Although many important classes of states may be rep- P A B . A tensor net- j1,··· ,jl i1,··· ,ip,j1,··· ,jl j1,··· ,jl,k1,··· ,kq resented by the RMB, there is a crucial result regarding work is a set of tensors for which some (or all) of the a limitation: [29] there exist states that can be expressed indices are contracted. as PEPS [101] but cannot be efficiently represented by Representing the tensor network graphically is quite a RBM; moreover, the class of RBM states is not closed convenient. The corresponding diagram is called a tensor under unitary transformations. One way to remedy the network diagram, in which, a rank-n tensor is represented defect is by adding one more hidden layer, that is, using as a vertex with n-edges, for example, a scalar is just a the DBM. vertex, a vector is a vertex with one edge, and a matrix The DBM can efficiently represent physical states in- is a vertex with two edges: cluding:

• Any state which can be efficiently represented by scalar : ; vector : ; matrix : . (23) RBMs 2;

• Any n-qubit quantum states generated by a quan- The contraction is graphically represented by connecting tum circuit of depth T ; the number of hidden neu- two vertices with the same edge label. For two vectors rons is O(nT )[29]; and matrices, this corresponds to the inner product and the matrix product, respectively. Graphically, they look • Tensor network states consist of n-local tensors like with bound dimension D and maximum coordina- X tion number d; the number of hidden neurons is inner product : aibi = ; (24) 2d O(nD )[29]; i

• The ground states of Hamiltonians with gap ∆; the m2 X number of hidden neurons is O( ∆ (n−log )) where matrix product : AijBjk = . (25)  is the representational error [29]; j Although there are many known results concerning the How can we use the tensor network to represent a BM states, the same for other neural networks neverthe- many-body quantum state? The idea is to regard the less has been barely explored. wavefunction Ψ(v1, ··· , vn) = hv|Ψi as a rank-n ten-

sor Ψv1,··· ,vn . In some cases, the tensor wavefunction can break into some small pieces, specifically, contrac-

B. Tensor network states tion of some small tensors. For example Ψv1,··· ,vn = P A[1] A[2] ··· A[n] . Graphically, α1,··· ,αn i1;αnα1 i2;α1α2 in;αn−1αn Let us now introduce a closely related representation of we have the quantum many-body states—the tensor network rep- resentation, which was originally developed in the con- text of condensed matter physics based on the idea of the renormalization group. Tensor network states have = , (26) now applications in many different scientific fields. Ar- guably, the most important property of the tensor net- work states is that entanglement is much easier to read out than other representations. Although there are many different types of tensor net- where each A[k] is a local tensor depending only on works, we focus here on the two simplest and easily ac- ik;αk−1αk cessible ones, the MPS and the PEPS. For other more some subset of indices {v1, ··· , vn}. In this way, physi- comprehensive reviews, see [3–7, 12]. cal properties such as entanglement are encoded into the By definition, a rank-n tensor is a complex variable contraction pattern of the tensor network diagram. It turns out that this kind of representation is very power- with n indices, for example Ai1,i2,··· ,in . The number of values that an index i can take is called the bond di- ful in solving many physical problems. k There are several important tensor network structures. mension of ik. The contraction of two tensors is a new tensor, that being defined as the sum over any num- We take two prototypical tensor network states used ber of pairs of indices; for example, C = for 1d and 2d systems, MPS states, [67–69] and PEPS i1,··· ,ip,k1,··· ,kq states [14], as examples to illustrate the construction of tensor-network states. In TableII, we list some of the most popular tensor-network structures including MPS, 2 This can be done by setting all the parameters involved in the PEPS, MERA [16], branching MERA [70, 71], and tree deep hidden layer to zeros; only the parameters of the shallow tensor networks [72], We also list the main physical prop- hidden layer remain nonzero. erties of these structures, such as correlation length and 10

TABLE II: Some popular tensor network structures and their properties.

Tensor network structure Entanglement S(A) correlation length ξ local observable hOˆi diagram Matrix product state O(1) finite exact

Projective entangled pair state (2d) O(|∂A|) finite/infinite approximate

Multiscale entanglement

renormalization ansatz (1d) O(log |∂A|) finite/infinite exact

Branching multiscale entanglement

renormalization ansatz (1d) O(log |∂A|) finite/infinite exact

Tree tensor networks O(1) finite exact

entanglement entropy. For more examples, see Refs. [3– ods have been developed in recent years. 7, 12] The tensor network states have a close relationship A periodic-boundary-condition MPS state is just like with neural network states. Their connections are ex- the right-hand side of Equation (26), which consists of tensively explored in many studies [29, 39, 59]. Here, we many local rank-3 tensors. For the open boundary case, briefly discuss how to transform a RBM state into a ten- the boundary local tensor is replaced with rank-2 ten- sor network state. To do this, we need to regard visible sors, and the inner part remains the same. The MPSs and hidden neurons as tensors. For example, the visible correspond to the low energy eigenstates of local gapped neuron vi and hidden neuron hj is now replaced by 1d Hamiltonians [102, 103]. The correlation length of the   MPS is finite and they obey the entanglement area law, (i) 1 0 V = a , (28) thus they cannot be used for representing quantum states 0 e i of critical systems that break the area law [8]. The PEPS state can be regarded as a higher-  1 0  H(j) = , (29) dimensional generalization of MPS. Here we give an ex- 0 ebj ample of a 2d 3 × 3 PEPS state with open boundary and the weighted connection between vi and hj is now also replaced by a tensor  1 1  W (ij) = . (30) 1 ewij

ΨPEPS(v) = . It is easy to check that both RBM and tensor network representations give the same state. Note that by some further optimization, any local RBM state can be trans- formed into an MPS state [59]. The general correspon- dence between RBM and tensor-network states has been (27) discussed in Ref. [59]. One crucial thing is that here we The typical local tensors for PEPS states are rank-5 ten- are only concerned with reachability, specifically, whether sors for the inner part, rank-4 tensors for the bound- one representation can be represented by another. How- ary part and rank-3 tensors for the corner part. The ever, in practical applications, we must also know the 2d PEPSs capture the low-energy eigenstates of 2d local efficiency to represent one by the other. As indicated Hamiltonians, which obey the entanglement area law [8]. in Section IIIA2, there exist some tensor network states PEPSs have some difference with MPS; their correlation which cannot be efficiently represented by RBM. length is not always finite and can be used to represent We note that there are also several studies trying to quantum states of critical systems. However, there is, combine the respective advantages of a tensor network by now, no efficient way to contract physical information and a neural network to give a more powerful represen- from PEPS exactly, therefore, many approximate meth- tation of the quantum many-body states [90]. 11

C. Advances in quantum many-body calculations pairs. Cai and Liu [48] produced expressions of the neu- ral network states in this model using the feed-forward There are several studies concerning numerical tests neural networks. They used the variational Monte Carlo of the accuracy and efficiency of neural network states method to find the ground state for the 1d system and −3 for different physical systems and different physical phe- obtained precisions to ∼ O(10 ). Liang and colleagues nomena [22–29, 34–38]. The early work trying to use a [52] investigated the model using the convolutional neural neural network to solve the Schr¨odingerequations [34–37] network, and showed that the precision of the calculation date back to 2001. Recently, in 2016, Carleo and Troyer based on convolutional neural network exceeds the string made the approach popular in calculating physical quan- bond state calculation. tities of the quantum systems [26]. Here we briefly dis- Hubbard model. —The Hubbard model is a model of cuss several examples of numerical calculations in many- interacting particles on a lattice and endeavors to capture body physics, including spin systems, and bosonic and the between conductors and insulators. fermionic systems. It has been used to describe superconductivity and cold Transverse-field Ising model. —The Hamiltonian for atom systems. The Hamiltonian is of the form the Ising model immersed in a transverse field is given X † † X by H = −t (ˆci,σcˆj,σ +c ˆj,σcˆi,σ) + U nˆi,↑nˆi,↓, (34) hiji,σ i X X HtIsing = −J ZiZj − B Xi, (31) hiji i where the first term accounts for the kinetic energy and † the second term the potential energy; ci,σ and ci,σ de- where the first sum runs over all nearest neighbor pairs. note the usual creation and annihilation operators, with For the 1d case, the system is gapped as long as J 6= B † nˆi,σ = ci,σci,σ. The phase diagrams of the Hubbard but gapless when J = B. In Ref. [26], Carleo and Troyer model have not been completely determined yet. In demonstrated that the RBM state works very well in Ref. [105], Nomura and colleagues numerically analyzed finding the ground state of the model. By minimizing the ground state energy of the model by combining the the energy E(Ω) = hΨ(Ω)|HtIsing|Ψ(Ω)i/hΨ(Ω)|Ψ(Ω)i RBM and the pair product states approach. They showed with respect to the network parameters Ω using the im- numerically that the accuracy of the calculation sur- proved gradient-descent optimization, they showed that passes the many-variable variational Monte Carlo ap- the RBM states achieve an arbitrary accuracy for both proach when U/t = 4, 8. A modified form of the model, 1d and 2d systems. described by the Bose–Hubbard Hamiltonian, was stud- Antiferromagnetic Heisenberg model. —The antifer- ied in Ref. [49] using a feed-forward neural network. The romagnetic Heisenberg model is of the form result is in good agreement with the calculation given by X an exact diagonalization and the Gutzwiller approxima- H = J SiSj, (J > 0) (32) tion. hiji Here we briefly mention several important examples of numerical calculations of many-body physical systems. where the sum runs over all nearest neighbor pairs. In Numerous other numerical works concerning many dif- Ref. [26], the calculation of the model is performed for the ferent physical models have appeared. We refer the inter- 1d and 2d systems using the RBM states. The accuracy ested readers to e.g., Refs. [22–29, 34–38, 48, 49, 52, 105] of the neural network ansatz turns out to be much better than the traditional spin-Jastrow ansatz [104] for the 1d system. The 2d system is harder, and more hidden neu- IV. DENSITY OPERATORS REPRESENTED rons are needed to reach a high accuracy. In Ref. [105], BY NEURAL NETWORK a combined approach is presented; the RBM architec- ture was combined with a conventional variational Monte A. Neural network density operator Carlo method with paired-product (geminal) wave func- tions to calculate the ground-state energy and ground state. They showed that the combined method has a In realistic applications of quantum technologies, the higher accuracy than that achieved by each method sep- states that we are concerned about are often mixed be- arately. cause the system is barely isolated from its environment. The mixed states are mathematically characterized by J1-J2 Heisenberg model. —The J1-J2 Heisenberg † model (also known as the frustrated Heisenberg model) the density operator ρ which is (i) Hermitian ρ = ρ; is of the form (ii) positive semi-definite hΨ|ρ|Ψi ≥ 0 for all |Ψi; and (iii) trace one Trρ = 1. The pure state |Ψi provides a X X H = J1 SiSj + J2 SiSj, (33) representation of the density operator ρΨ = |ΨihΨ| and hiji hhijii the general mixed states are non-coherent superpositions (classical mixture) of pure density operators. Let us con- where the first sum runs over all nearest neighbor pairs sider the situation for which the physical space of the and the second sum runs over the next-nearest-neighbor system is HS with basis v1, ··· , vn and the environment 12 AmplitudePhase by the RBM are s P −E(Ω1,v,h,e) log P e−E(Ω2,v,h,e) h e i h ΨSE(Ω, v, e) = e 2 , Z(Ω1) (38) P P P −E(Ωi,v,h,e) where Z(Ωi) = h e v e is the partition function corresponding to Ωi. The density operator can FIG. 4: RBM construction of the latent space purification for now be obtained from Equation (37). a density operator.

B. Neural network quantum state tomography space is HE with basis e1, ··· , em. For a given mixed state ρS of the system, then if we take into account the Quantum state tomography aims to identify or recon- effect of the environment there is a pure state |ΨSEi = struct an unknown quantum state from a dataset of ex- P P v e Ψ(v, e)|vi|ei for which ρS = TrE|ΨSEihΨSE|. perimental measurements. The traditional exact brute- Every mixed state can be purified in this way. force approach to quantum state tomography is only fea- In Ref. [106], Torlai and Melko explored the possibil- sible for systems with a small number of degress of free- ity of representing mixed states ρS using the RBM. The dom otherwise the demand on computational resources is idea is the same as that for pure states. We build a neu- high. For pure states, the compressed sensing approach ral network with parameters Ω, and for the fixed basis circumvents the experimental difficulty and requires only |vi, the density operator is given by the matrix entries a reasonable number of measurements [107]. The MPS ρ(Ω, v, v0), which is determined by the neural network. tomography works well for states with low entanglement Therefore, we only need to map a given neural network [108, 109]. For general mixed states, the efficiency of the with parameters Ω to a density operator as permutationally invariant tomography scheme based on the internal symmetry of the quantum states is low [110]. X ρ(Ω) = |viρ(Ω, v, v0)hv0|. (35) Despite all the progress, the general case for quantum v,v0 state tomography is still very challenging. The neural network representation of quantum states To this end, the purification method of the density provides another approach to state tomography. Here we operators is used. The environment is now represented by review its basic idea. For clarity (although there will be some extra hidden neurons e1, ··· , em besides the hidden some overlap), we discuss its application to pure states neurons h1, ··· , hl. The purification |ΨSEi of ρS is now and mixed states separately. captured by the parameters of the network, which we still From the work by Torlai and colleagues, [33] for a denote as Ω, i.e., pure quantum state, the neural network tomography works as follows. To reconstruct an unknown state |Ψi, X X (i) |ΨSEi = ΨSE(Ω, v, e)|vi|ei. (36) we first perform a collection of measurements {v }, v e i = 1, ··· ,N and therefore obtain the probabilities (i) (i) 2 pi(v ) = |hv |Ψi| . The aim of the neural network By tracing out the environment, the density operator also tomography is to find a set of RBM parameters Ω such is determined by the network parameters that the RBM state Φ(Ω, v(i)) mimics the probabilities p (v(i)) as closely as possible in each basis. This can X X ∗ 0 0 i ρS = [ ΨSE(Ω, v, e)ΨSE(Ω, v , e)]|vihv |. (37) be done in neural network training by minimizing the v,v0 e distance function (total divergence) between |Φ(Ω, v(i))|2 (i) and pi(v ). The total divergence is chosen as To represent the density operators, Ref. [106] takes the approach to represent the amplitude and phase of the pu- N X (i) 2 (i) rified state |ΨSEi by two separate neural networks. First, D(Ω) = DKL[|Φ(Ω, v )| |pi(v )], (39) the environment units are embedded into the hidden neu- i=1 ron space, i.e., they introduced some new hidden neurons (i) 2 (i) e1, ··· , em, which are fully connected to all visible neu- where DKL[|Φ(Ω, v )| |pi(v )] is the Kullback–Leibler rons (See Figure4). The parameters corresponding to (KL) divergence in basis {v(i)}. the amplitude and phase of the wave function are now en- Note that to estimate the phase of |Ψi in the refer- coded in the RBM with two different sets of parameters. ence basis, a sufficiently large number of measurement iθ(Ω2,a,v) That is, the state ΨSE(Ω, v, e) = R(Ω1, a, v)e bases should be included. Once the training is completed, with Ω = Ω1 ∪ Ω2. R(Ω1, a, v) and θ(Ω2, a, v) are both we get the target state |Φ(Ω)i in the RBM form, which characterized by the corresponding RBM (this structure is the reconstructed state for |Ψi. In Ref. [33], Torlai is called the latent space purification by authors). In this and colleagues test the scheme for the W-state, modified way, the coefficients of the purified state |ΨSEi encoded W state with local phases, Greenberger–Horne–Zeilinger 13 and Dicke states, and also the ground states for the transverse-field Ising model and XXZ model. They find the scheme is very efficient and the number of measure- (a) ment bases usually scales only polynomially with system size. The mixed state case is studied in Ref. [106] and is based on the RBM representations of the density oper- local RBM ators. The core idea is the same as for the pure state; that is, to reconstruct an unknown density operator ρ, we need to build an RBM neural network density σ(Ω) (b) with RBM parameter set Ω. Before training the RBM, we must perform a collection of measurements {v(i)} and obtain the corresponding probability distribution local DBM (i) (i) (i) pi(v ) = hv |ρ|v i. The training process involves minimizing the total divergence between the experimen- FIG. 5: Example of (a) a local RBM state and (b) a local tal probability distribution and the probability distribu- DBM state. tion calculated from the test RBM state σ(Ω). After the training process, we obtain a compact RBM repre- sentation of the density operator ρ, which may be used using the folding trick, folding the odd layers and even to calculate the expectation of the physical observable. layers separately, every BM is reduced into a DBM with Neural network state tomography is efficient and accu- only two hidden layers [29, 43]. Then we showed that rate in many cases. It provides a good supplement to the the locally connected DBM states obey the entanglement traditional tomography schemes. area law, and the DBM with nonlocal connections pos- sess volume-law entanglement [43], see Figure5(b) for an illustration of a local DBM state. V. ENTANGLEMENT PROPERTIES OF The relationship between the BM and tensor network NEURAL NETWORK STATES states was investigated in Refs. [29, 58, 59], and some algorithmic way of transforming an RBM state into a The notion of entanglement is ubiquitous in physics. MPS was given in Ref. [59]. The capability to represent To understand the entanglement properties of the many- tensor network states using the BM was investigated in body state is a central theme in both condensed matter [29, 58] from a complexity theory perspective. physics and quantum . Tensor net- One final aspect is realizing the holographic geometry- work representations of quantum states have an impor- entanglement correspondence using BM states [43, 60]. tant advantage in that entanglement can be read out When proving the entanglement area law and volume more easily. Here, we discuss the entanglement prop- law of the BM states, the concept of locality must be in- erties of the neural network states for a comparison with troduced, this means that we must introduce a geometry tensor networks. between neurons. This geometry results in the entangle- For a given N-particle quantum system in state |Ψi, ment features of the state. When we try to understand we divide the N particles into two groups A and Ac. the holographic entanglement entropy, we first tile the With this bipartition, we calculate the R´enyi entan- neurons in a given geometry and then make it learn from α 1 α data. After the learning process is done, we can see the glement entropy SR(A) := 1−α log TrρA, which char- acterizes the entanglement between A and Ac, where connecting pattern of the neural network and analyze the corresponding entanglement properties, which have a di- ρA = TrAc (|ΨihΨ|) is the reduced density matrix. If the R´enyi entanglement entropy is nonzero, then A and rect relationship to the given geometry, such as the signs Ac are entangled. of the space curvature. The entanglement property is encoded in the geometry Although much progress on the entanglement proper- of the contraction patterns of the local tensors for tensor ties of neural network states has been made, we still know network states. For neural network states, it was shown very little about it. The entanglement features of neural that the entanglement is encoded in the connecting pat- networks other than the BM have not been investigated terns of the neural networks [27, 43, 58–60]. For RBM at all and remain to be explored in future work. states, Deng, Li, and Das Sarma [27] showed that locally connected RBM states obey the entanglement area law, see Figure5(a) for an illustration of a local RBM state. VI. QUANTUM COMPUTING AND NEURAL Nonlocal connections result in the volume-law entangle- NETWORK STATES ment of the states [27]. We extended this result for any BM, showing that by cutting the intra-layer connection There is another crucial application of neural network and adding hidden neurons, any BM state may be re- states, namely, classical simulation of quantum comput- duced to a DBM state with several hidden layers. Then ing, which we briefly review in this section. It is well- 14

iπHi 0 known that quantum algorithms can provide exponen- 4 + iπvHi, where v = vi, vi. We easily check that W (v ,H )+W (v0 ,H ) 1 v v0 tial speedup over some of the best known classical algo- P e H i i H i i = √ (−1) i i , which com- Hi=0,1 2 rithms for many problems such as factoring integers [111]. pletes the construction of the Hadamard gate operation. Quantum computers are being actively developed of late, The Z(θ) gate operation, but one crucial problem, known as quantum supremacy

[112], emerges naturally. Quantum supremacy concerns −iθ iθ the potential capabilities of quantum computers that Z(θ)|0i = e 2 |0i,Z(θ)|1i = e 2 |1i, (43) classical computers practically do not have and the re- sources required to simulate quantum algorithms using can be constructed similarly. We can also add a new vis- 0 a classical computer. Studies of classical simulations ible neuron vi and a hidden neuron Zi, and vi becomes of quantum algorithms can also guide us to understand a hidden neuron that should be traced. The connection ln 2 iθv what are the practical applications of the quantum com- weight is given be WZ(θ)(v, Zi) = − 2 + 2 + iπvZi 0 puting platforms developed recently in different labora- where v = vi, vi. The DBM transform of the controlled tories. Here we introduce the approach to simulating Z(θ) gates is slightly different from single qubit gates quantum circuits based on the neural network represen- because it is a two-qubit operation acting on vi and vj. tation of quantum states. To simplify the calculation, we give here the explicit con- Following Ref. [29], we first discuss how to simulate struction for CZ. This can be done by introducing a new quantum computing via DBM states, since in the DBM hidden neuron Hij, which connects both vi and vj with formalism, all operations can be written out analytically. the same weights as those given by the Hadamard gate. A general quantum computing process can be loosely di- In summary, we have vided into three steps: (i) initial state preparation, (ii) applying quantum gates, and (iii) measuring the output DBM state. For the DBM state simulation in quantum com- H ⇔ → , (44) puting, the initial state is first represented by a DBM network. We are mainly concerned in how to apply a uni- versal set of quantum gates in the DBM representations. As we shall see, this can be achieved by adding hidden neurons and weighted connections. Here the universal quantum gates is chosen as single-qubit rotation around Z(θ) ⇔ DBM→ , (45) zˆ-axis Z(θ), the Hadamard gate H, and controlled rota- tions around thez ˆ-axis CZ(θ)[113]. We continue still to denote the calculating basis by |vi; the input state is then represented by DBM neural net- work as Ψin(v, Ω) = hv|Ψin(Ω)i. To simulate the circuit • DBM quantum computing, characterized by unitary transform ⇔ → . (46) Z UC , we need to devise strategies so that we can apply all the universal quantum gates to achieve the transform,

DBM Note that we can also achieve the quantum gate H in hv|Ψ (Ω)i → hv|Ψ (Ω)i = hv|U |Ψ (Ω)i. (40) in out C in the DBM setting by adding directly a new visible neuron 0 Let us first consider how to construct the Hadamard vi and connecting it with the (hidden) neuron vi. Z(θ) gate operation can also be realized in the DBM setting by changing the bias of the visible neuron vi. We choose the method 1 1 H|0i = √ (|0i + |1i),H|0i = √ (|0i − |1i). (41) presented above simply to make the construction clearer 2 2 and more systematic. The above protocol based on the DBM is an exact If H acts on the i-th qubit of the system, we can then simulation but has a drawback in that the sampling of represent the operation in terms of the coefficients of the the DBM quickly becomes intractable with increasing state, depth of the circuit because the gates are realized by adding deep hidden neurons. In contrast, RBMs are Ψ(··· v ··· ) →H Ψ0(··· v0 ··· ) i i easier to train; a simulation based on the RBM has al- 1 0 X viv ready been developed [114]. The basic idea is the same = √ (−1) i Ψ(··· vi ··· ). (42) as the DBM approach, the main difference being that v =0,1 2 i Hadamard gate cannot be exactly simulated in the RBM In DBM settings, it is clear now that the Hadamard setting. In Ref. [114], the authors developed the approx- DBM transform of the i-th qubit adds a new visible neu- imation method to simulate the Hadamard gate oper- 0 ron vi, which replaces vi, and another hidden neuron ation. The RBM realizations of Z(θ) and CZ(θ) are Hi and vi now becomes a hidden neuron. The connec- achieved by adjusting the bias and introducing a new iπ ln iπv tion weight is given by WH (v, Hi) = 8 − 2 − 2 − hidden neuron and weighted connections, respectively. 15

VII. CONCLUDING REMARKS • Developing the representation theorem for the com- plex function is also a very important topic in quan- In this work, we discussed aspects of the quantum neu- tum neural network states. Because we must build ral network states. Two important kinds of neural net- the quantum neural network states from complex works, feed-forward and stochastic recurrent, were cho- neural networks, as we have discussed, so it is im- sen as examples to illustrate how neural networks can portant to understand the expressive power of the be used as a variational ansatz state of quantum many- complex neural network. body systems. We reviewed the research progress on neu- ral network states. The representational power of these • Having a good understanding of entanglement fea- states was discussed and entanglement features of the tures is of great importance in understanding quan- RBM and DBM states reviewed. Some applications of tum phases and the quantum advantage over some quantum neural network states, such as quantum state information tasks. Therefore, we can also ask if tomography and classical simulations of quantum com- there is an easy way to read out entanglement prop- puting, were also discussed. erties from specific neural networks such as the ten- In addition to the foregoing, we present some remarks sor network. on the main open problems regarding quantum neural network states. We hope that our review of the quantum neural net- • One crucial problem is to explain why the neu- work states inspires more work and exploration of the ral network works so well for some special tasks. crucial topics highlighted above. There should be deep reasons for this. Understand- ing the mathematics and physics behind the neural networks may help to build many other important Acknowledgments classes of quantum neural network states and guide us in applying the neural network states to different scientific problems. Z.-A. Jia thanks Zhenghan Wang and hospitality of Department of Mathematics of UCSB. He also acknowl- • Although the BM states have been studied from edges Liang Kong and Tian Lan for discussions during various aspects, many other neural networks are his stay in Yau Mathematical Science Center of Tsinghua less explored in regard to representing quantum University, and he also benefits from the discussion with states both numerically and theoretically. This Giuseppe Carleo during the first international conference raises the question whether other networks can also on “Machine Learning and Physics” at IAS, Tsinghua efficiently represent quantum states, and what are University. This work was supported by the Anhui Ini- the differences between these different representa- tiative in Quantum Information Technologies (Grant No. tions? AHY080000).

[1] T. J. Osborne, “Hamiltonian complexity,” Reports on [8] J. Eisert, M. Cramer, and M. B. Plenio, “Colloquium,” Progress in Physics 75, 022001 (2012). Rev. Mod. Phys. 82, 277 (2010). [2] F. Verstraete, “Quantum hamiltonian complexity: [9] L. Amico, R. Fazio, A. Osterloh, and V. Vedral, “En- Worth the wait,” Nature Physics 11, 524 (2015). tanglement in many-body systems,” Rev. Mod. Phys. [3] R. Or´us,“A practical introduction to tensor networks: 80, 517 (2008). Matrix product states and projected entangled pair [10] M. Friesdorf, A. H. Werner, W. Brown, V. B. Scholz, states,” Annals of Physics 349, 117 (2014). and J. Eisert, “Many-body localization implies that [4] Z. Landau, U. Vazirani, and T. Vidick, “A polynomial eigenvectors are matrix-product states,” Phys. Rev. time algorithm for the ground state of one-dimensional Lett. 114, 170505 (2015). gapped local hamiltonians,” Nature Physics 11, 566 [11] F. Verstraete, V. Murg, and J. Cirac, “Matrix prod- (2015). uct states, projected entangled pair states, and vari- [5] I. Arad, Z. Landau, U. Vazirani, and T. Vidick, “Rig- ational renormalization group methods for quantum orous rg algorithms and area laws for low energy eigen- spin systems,” Advances in Physics 57, 143 (2008), states in 1d,” Communications in Mathematical Physics https://doi.org/10.1080/14789940801912366. 356, 65 (2017). [12] R. Orus, “Tensor networks for complex quantum sys- [6] N. Schuch, M. M. Wolf, F. Verstraete, and J. I. Cirac, tems,” arXiv preprint arXiv:1812.04011 (2018). “Computational complexity of projected entangled pair [13] S. R. White, “Density matrix formulation for quan- states,” Phys. Rev. Lett. 98, 140506 (2007). tum renormalization groups,” Phys. Rev. Lett. 69, 2863 [7] A. Anshu, I. Arad, and A. Jain, “How local is the in- (1992). formation in tensor networks of matrix product states [14] F. Verstraete and J. I. Cirac, “Renormalization algo- or projected entangled pairs states,” Phys. Rev. B 94, rithms for quantum-many body systems in two and 195143 (2016). higher dimensions,” arXiv preprint cond-mat/0407066 16

(2004). [37] S. Manzhos and T. Carrington, “An improved neural [15] M. C. Ba˜nuls,M. B. Hastings, F. Verstraete, and J. I. network method for solving the schr¨odingerequation,” Cirac, “Matrix product states for dynamical simulation Canadian Journal of Chemistry 87, 864 (2009). of infinite chains,” Phys. Rev. Lett. 102, 240603 (2009). [38] Z.-A. Jia, Y.-H. Zhang, Y.-C. Wu, L. Kong, G.-C. Guo, [16] G. Vidal, “Entanglement renormalization,” Phys. Rev. and G.-P. Guo, “Efficient machine-learning representa- Lett. 99, 220405 (2007). tions of a surface code with boundaries, defects, domain [17] G. Vidal, “Efficient classical simulation of slightly en- walls, and twists,” Phys. Rev. A 99, 012307 (2019). tangled quantum computations,” Phys. Rev. Lett. 91, [39] Y. Huang and J. E. Moore, “Neural network representa- 147902 (2003). tion of tensor network and chiral states,” arXiv preprint [18] Y. LeCun, Y. Bengio, and G. Hinton, “,” arXiv:1701.06246 (2017). Nature 521, 436 (2015). [40] Y.-H. Zhang, Z.-A. Jia, Y.-C. Wu, and G.-C. Guo, [19] G. E. Hinton and R. R. Salakhutdinov, “Reducing the “An efficient algorithmic way to construct boltzmann dimensionality of data with neural networks,” Science machine representations for arbitrary stabilizer code,” 313, 504 (2006). arXiv preprint arXiv:1809.08631 (2018). [20] R. S. Sutton and A. G. Barto, Reinforcement learning: [41] S. Lu, X. Gao, and L.-M. Duan, “Efficient represen- An introduction, Vol. 1 (MIT press Cambridge, 1998). tation of topologically ordered states with restricted [21] J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, boltzmann machines,” arXiv preprint arXiv:1810.02352 N. Wiebe, and S. Lloyd, “Quantum machine learning,” (2018). Nature 549, 195 (2017). [42] W.-C. Gan and F.-W. Shu, “Holography as deep learn- [22] P. Rebentrost, M. Mohseni, and S. Lloyd, “Quan- ing,” International Journal of Modern Physics D 26, tum support vector machine for big data classification,” 1743020 (2017). Phys. Rev. Lett. 113, 130503 (2014). [43] Z.-A. Jia, Y.-C. Wu, and G.-C. Guo, “Holographic [23] V. Dunjko, J. M. Taylor, and H. J. Briegel, “Quantum- entanglement-geometry duality in deep neural network enhanced machine learning,” Phys. Rev. Lett. 117, states,” to be published (2018). 130501 (2016). [44] T. Kohonen, “An introduction to neural computing,” [24] A. Monr`as,G. Sent´ıs, and P. Wittek, “Inductive super- Neural Networks 1, 3 (1988). vised quantum learning,” Phys. Rev. Lett. 118, 190503 [45] W. S. McCulloch and W. Pitts, “A logical calculus of (2017). the ideas immanent in nervous activity,” The bulletin [25] J. Carrasquilla and R. G. Melko, “Machine learning of mathematical biophysics 5, 115 (1943). phases of matter,” Nature Physics 13, 431 (2017). [46] M. Minsky and S. A. Papert, Perceptrons: An introduc- [26] G. Carleo and M. Troyer, “Solving the quantum many- tion to computational geometry (MIT press, 2017). body problem with artificial neural networks,” Science [47] M. A. Nielsen, Neural Networks and Deep Learning (De- 355, 602 (2017). termination Press, 2015). [27] D.-L. Deng, X. Li, and S. Das Sarma, “Quantum en- [48] Z. Cai and J. Liu, “Approximating quantum many-body tanglement in neural network states,” Phys. Rev. X 7, wave functions using artificial neural networks,” Phys. 021021 (2017). Rev. B 97, 035116 (2018). [28] D.-L. Deng, X. Li, and S. Das Sarma, “Machine learn- [49] H. Saito, “Solving the bose–hubbard model with ma- ing topological states,” Phys. Rev. B 96, 195145 (2017). chine learning,” Journal of the Physical Society of Japan [29] X. Gao and L.-M. Duan, “Efficient representation of 86, 093001 (2017). quantum many-body states with deep neural networks,” [50] Y. LeCun, Y. Bengio, et al., “Convolutional networks Nature Communications 8, 662 (2017). for images, speech, and time series,” The handbook of [30] M. August and X. Ni, “Using recurrent neural networks brain theory and neural networks 3361, 1995 (1995). to optimize dynamical decoupling for quantum mem- [51] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Ima- ory,” Phys. Rev. A 95, 012335 (2017). genet classification with deep convolutional neural net- [31] G. Torlai and R. G. Melko, “Neural decoder for topo- works,” in Advances in neural information processing logical codes,” Phys. Rev. Lett. 119, 030501 (2017). systems (2012) pp. 1097–1105. [32] Y. Zhang and E.-A. Kim, “Quantum loop topography [52] X. Liang, W.-Y. Liu, P.-Z. Lin, G.-C. Guo, Y.-S. Zhang, for machine learning,” Phys. Rev. Lett. 118, 216401 and L. He, “Solving frustrated quantum many-particle (2017). models with convolutional neural networks,” Phys. Rev. [33] G. Torlai, G. Mazzola, J. Carrasquilla, M. Troyer, B 98, 104426 (2018). R. Melko, and G. Carleo, “Neural-network quantum [53] G. E. Hinton and T. J. Sejnowski, “Optimal perceptual state tomography,” Nature Physics 14, 447 (2018). inference,” in Proceedings of the IEEE conference on [34] C. Monterola and C. Saloma, “Solving the nonlinear Computer Vision and Pattern Recognition (IEEE New schrodinger equation with an unsupervised neural net- York, 1983) pp. 448–453. work,” Optics Express 9, 72 (2001). [54] D. H. Ackley, G. E. Hinton, and T. J. Sejnowski, “A [35] C. Monterola and C. Saloma, “Solving the nonlinear learning algorithm for boltzmann machines,” Cognitive schr¨odinger equation with an unsupervised neural net- science 9, 147 (1985). work: estimation of error in solution,” Optics commu- [55] G. Torlai and R. G. Melko, “Learning thermodynamics nications 222, 331 (2003). with boltzmann machines,” Phys. Rev. B 94, 165134 [36] C. Caetano, J. Reis Jr, J. Amorim, M. R. Lemes, and (2016). A. D. Pino Jr, “Using neural networks to solve nonlinear [56] K.-I. Aoki and T. Kobayashi, “Restricted boltzmann differential equations in atomic and molecular physics,” machines for the long range ising models,” Modern International Journal of Quantum Chemistry 111, 2732 Physics Letters B 30, 1650401 (2016). (2011). [57] S. Weinstein, “Learning the einstein-podolsky-rosen cor- 17

relations on a restricted boltzmann machine,” arXiv baum, “Exponential thermal tensor network approach preprint arXiv:1707.03114 (2017). for quantum lattice models,” Phys. Rev. X 8, 031082 [58] L. Huang and L. Wang, “Accelerated monte carlo simu- (2018). lations with restricted boltzmann machines,” Phys. Rev. [77] P. Czarnik and J. Dziarmaga, “Variational approach to B 95, 035105 (2017). projected entangled pair states at finite ,” [59] J. Chen, S. Cheng, H. Xie, L. Wang, and T. Xiang, Phys. Rev. B 92, 035152 (2015). “Equivalence of restricted boltzmann machines and ten- [78] M. M. Parish and J. Levinsen, “Quantum dynamics of sor network states,” Phys. Rev. B 97, 085104 (2018). impurities coupled to a fermi sea,” Phys. Rev. B 94, [60] Y.-Z. You, Z. Yang, and X.-L. Qi, “Machine learn- 184303 (2016). ing spatial geometry from entanglement features,” Phys. [79] A. Kshetrimayum, H. Weimer, and R. Or´us,“A sim- Rev. B 97, 045153 (2018). ple tensor network algorithm for two-dimensional steady [61] M. H. Amin, E. Andriyash, J. Rolfe, B. Kulchytskyy, states,” Nature communications 8, 1291 (2017). and R. Melko, “Quantum boltzmann machine,” Phys. [80] F. Verstraete and J. I. Cirac, “Continuous matrix prod- Rev. X 8, 021050 (2018). uct states for quantum fields,” Phys. Rev. Lett. 104, [62] P. Smolensky, Information processing in dynamical sys- 190405 (2010). tems: Foundations of harmony theory, Tech. Rep. [81] J. Haegeman, J. I. Cirac, T. J. Osborne, I. Pizorn, (COLORADO UNIV AT BOULDER DEPT OF COM- H. Verschelde, and F. Verstraete, “Time−dependent PUTER SCIENCE, 1986). variational principle for quantum lattices,” Phys. Rev. [63] N. L. Roux and Y. Bengio, “Representational power Lett. 107, 070601 (2011). of restricted boltzmann machines and deep belief net- [82] J. Haegeman, T. J. Osborne, H. Verschelde, and F. Ver- works,” Neural Computation 20, 1631 (2008). straete, “Entanglement renormalization for quantum [64] G. Montufar and N. Ay, “Refinements of universal ap- fields in real space,” Phys. Rev. Lett. 110, 100402 proximation results for deep belief networks and re- (2013). stricted boltzmann machines,” Neural Comput. 23, [83] J. Haegeman, T. J. Osborne, and F. Verstraete, 1306 (2011). “Post−matrix product state methods: To tangent space [65] C. Bishop, C. M. Bishop, et al., Neural networks for and beyond,” Phys. Rev. B 88, 075133 (2013). pattern recognition (Oxford university press, 1995). [84] Y. Levine, O. Sharir, N. Cohen, and A. Shashua, [66] L. V. Fausett et al., Fundamentals of neural net- “Bridging many-body quantum physics and deep works: architectures, algorithms, and applications, learning via tensor networks,” arXiv preprint Vol. 3 (Prentice-Hall Englewood Cliffs, 1994). arXiv:1803.09780 (2018). [67] M. Fannes, B. Nachtergaele, and R. F. Werner, [85] E. Stoudenmire and D. J. Schwab, “Supervised learning “Finitely correlated states on quantum spin chains,” with tensor networks,” in Advances in Neural Informa- Communications in mathematical physics 144, 443 tion Processing Systems (2016) pp. 4799–4807. (1992). [86] Z.-Y. Han, J. Wang, H. Fan, L. Wang, and P. Zhang, [68] A. Kl¨umper, A. Schadschneider, and J. Zittartz, “Ma- “Unsupervised generative modeling using matrix prod- trix product ground states for one-dimensional spin-1 uct states,” Phys. Rev. X 8, 031012 (2018). quantum antiferromagnets,” EPL (Europhysics Letters) [87] E. M. Stoudenmire, “Learning relevant features of data 24, 293 (1993). with multi-scale tensor networks,” Quantum Science [69] A. Klumper, A. Schadschneider, and J. Zittartz, and Technology 3, 034003 (2018). “Equivalence and solution of anisotropic spin-1 models [88] D. Liu, S.-J. Ran, P. Wittek, C. Peng, R. B. Garc´ıa, and generalized tj fermion models in one dimension,” G. Su, and M. Lewenstein, “Machine learning by Journal of Physics A: Mathematical and General 24, two-dimensional hierarchical tensor networks: A quan- L955 (1991). tum information theoretic perspective on deep architec- [70] G. Evenbly and G. Vidal, “Class of highly entangled tures,” arXiv preprint arXiv:1710.04833 (2017). many-body states that can be efficiently simulated,” [89] W. Huggins, P. Patel, K. B. Whaley, and E. M. Phys. Rev. Lett. 112, 240502 (2014). Stoudenmire, “Towards quantum machine learning [71] G. Evenbly and G. Vidal, “Scaling of entanglement en- with tensor networks,” arXiv preprint arXiv:1803.11537 tropy in the (branching) multiscale entanglement renor- (2018). malization ansatz,” Phys. Rev. B 89, 235113 (2014). [90] I. Glasser, N. Pancotti, M. August, I. D. Rodriguez, [72] Y.-Y. Shi, L.-M. Duan, and G. Vidal, “Classical simula- and J. I. Cirac, “Neural-network quantum states, string- tion of quantum many-body systems with a tree tensor bond states, and chiral topological states,” Phys. Rev. network,” Phys. Rev. A 74, 022320 (2006). X 8, 011006 (2018). [73] M. Zwolak and G. Vidal, “Mixed-state dynamics in [91] A. Kolmogorov, “The representation of continuous func- one-dimensional quantum lattice systems: A time- tions of several variables by superpositions of continuous dependent superoperator renormalization algorithm,” functions of a smaller number of variables,” . Phys. Rev. Lett. 93, 207205 (2004). [92] A. N. Kolmogorov, “On the representation of contin- [74] J. Cui, J. I. Cirac, and M. C. Ba˜nuls,“Variational uous functions of many variables by superposition of matrix product operators for the steady state of dissi- continuous functions of one variable and addition,” in pative quantum systems,” Phys. Rev. Lett. 114, 220601 Doklady Akademii Nauk, Vol. 114 (Russian Academy of (2015). Sciences, 1957) pp. 953–956. [75] A. A. Gangat, T. I, and Y.-J. Kao, “Steady states of [93] V. I. Arnold, Vladimir I. Arnold-Collected Works: Rep- infinite-size dissipative quantum chains via imaginary resentations of Functions, Celestial Mechanics, and time evolution,” Phys. Rev. Lett. 119, 010501 (2017). KAM Theory 1957-1965, Vol. 1 (Springer Science & [76] B.-B. Chen, L. Chen, Z. Chen, W. Li, and A. Weichsel- Business Media, 2009). 18

[94] D. Alexeev, “Neural-network approximation of func- tation,” Phys. Rev. A 52, 3457 (1995). tions of several variables,” Journal of Mathematical Sci- [114] B. J´onsson,B. Bauer, and G. Carleo, “Neural-network ences 168, 5 (2010). states for the classical simulation of quantum comput- [95] F. Rosenblatt, Principles of neurodynamics. perceptrons ing,” arXiv preprint arXiv:1808.05232 (2018). and the theory of brain mechanisms, Tech. Rep. (COR- NELL AERONAUTICAL LAB INC BUFFALO NY, 1961). [96] J. Slupecki, “A criterion of fullness of many-valued systems of propositional logic,” Studia Logica 30, 153 (1972). [97] G. Cybenko, “Approximations by superpositions of a sigmoidal function,” Mathematics of Control, Signals and Systems 2, 183 (1989). [98] K.-I. Funahashi, “On the approximate realization of continuous mappings by neural networks,” Neural net- works 2, 183 (1989). [99] K. Hornik, M. Stinchcombe, and H. White, “Mul- tilayer feedforward networks are universal approxima- tors,” Neural networks 2, 359 (1989). [100] R. Hecht-Nielsen, “Kolmogorov’s mapping neural net- work existence theorem,” in Proceedings of the IEEE International Conference on Neural Networks III (IEEE Press, 1987) pp. 11–13. [101] X. Gao, S.-T. Wang, and L.-M. Duan, “Quantum supremacy for simulating a translation-invariant ising ,” Phys. Rev. Lett. 118, 040502 (2017). [102] M. B. Hastings, “Solving gapped hamiltonians locally,” Phys. Rev. B 73, 085115 (2006). [103] M. B. Hastings, “An area law for one-dimensional quan- tum systems,” Journal of : Theory and Experiment 2007, P08024 (2007). [104] R. Jastrow, “Many-body problem with strong forces,” Phys. Rev. 98, 1479 (1955). [105] Y. Nomura, A. S. Darmawan, Y. Yamaji, and M. Imada, “Restricted boltzmann machine learning for solving strongly correlated quantum systems,” Phys. Rev. B 96, 205152 (2017). [106] G. Torlai and R. G. Melko, “Latent space purification via neural density operators,” Phys. Rev. Lett. 120, 240503 (2018). [107] D. Gross, Y.-K. Liu, S. T. Flammia, S. Becker, and J. Eisert, “Quantum state tomography via compressed sensing,” Phys. Rev. Lett. 105, 150401 (2010). [108] M. Cramer, M. B. Plenio, S. T. Flammia, R. Somma, D. Gross, S. D. Bartlett, O. Landon-Cardinal, D. Poulin, and Y.-K. Liu, “Efficient quantum state to- mography,” Nature communications 1, 149 (2010). [109] B. Lanyon, C. Maier, M. Holz¨apfel, T. Baumgratz, C. Hempel, P. Jurcevic, I. Dhand, A. Buyskikh, A. Da- ley, M. Cramer, et al., “Efficient tomography of a quantum many-body system,” Nature Physics 13, 1158 (2017). [110] G. T´oth, W. Wieczorek, D. Gross, R. Krischek, C. Schwemmer, and H. Weinfurter, “Permutationally invariant quantum tomography,” Phys. Rev. Lett. 105, 250403 (2010). [111] M. A. Nielsen and I. L. Chuang, Quantum computation and quantum information (Cambridge university press, 2010). [112] J. Preskill, “Quantum computing and the entanglement frontier,” arXiv preprint arXiv:1203.5813 (2012). [113] A. Barenco, C. H. Bennett, R. Cleve, D. P. DiVincenzo, N. Margolus, P. Shor, T. Sleator, J. A. Smolin, and H. Weinfurter, “Elementary gates for quantum compu-