Arxiv:1902.04057V3 [Cond-Mat.Dis-Nn] 19 Jan 2020 Statistical Mechanics Applications [16], As Well As Den- Spin Systems [13]
Total Page:16
File Type:pdf, Size:1020Kb
Deep autoregressive models for the efficient variational simulation of many-body quantum systems 1, 1, 1, 2, 1, Or Sharir, ∗ Yoav Levine, † Noam Wies, ‡ Giuseppe Carleo, § and Amnon Shashua ¶ 1The Hebrew University of Jerusalem, Jerusalem, 9190401, Israel 2Center for Computational Quantum Physics, Flatiron Institute, 162 5th Avenue, New York, NY 10010, USA Artificial Neural Networks were recently shown to be an efficient representation of highly-entangled many-body quantum states. In practical applications, neural-network states inherit numerical schemes used in Variational Monte Carlo, most notably the use of Markov-Chain Monte-Carlo (MCMC) sampling to estimate quantum expectations. The local stochastic sampling in MCMC caps the potential advantages of neural networks in two ways: (i) Its intrinsic computational cost sets stringent practical limits on the width and depth of the networks, and therefore limits their expressive capacity; (ii) Its difficulty in generating precise and uncorrelated samples can result in estimations of observables that are very far from their true value. Inspired by the state-of-the-art generative models used in machine learning, we propose a specialized Neural Network architecture that supports efficient and exact sampling, completely circumventing the need for Markov Chain sampling. We demonstrate our approach for two-dimensional interacting spin models, showcas- ing the ability to obtain accurate results on larger system sizes than those currently accessible to neural-network quantum states. Introduction.– The theoretical understanding and been limited to relatively shallow architectures, far from modeling of interacting many-body quantum matter rep- the very deep networks used in modern machine learn- resents an outstanding challenge since the early days of ing applications. This practical limitation is mostly due quantum mechanics. At the heart of several problems in to two main factors. First, it is computationally expen- condensed matter, chemistry, nuclear matter, and more sive to obtain quantum expectation values over ConvNet lies the intrinsic difficulty of fully representing the many- states using stochastic sampling based on Markov Chain body wave-function, in principle needed to exactly solve Monte Carlo (MCMC), as is it is customary in VMC Schrodinger’s equation. These mainly fall into two cat- applications. Second, there is an intrinsic optimization egories: on one hand, there are states traditionally used bottleneck to be faced when dealing with a large number in stochastic Variational Monte Carlo (VMC) calcula- of parameters. However, both limitations are routinely tions [1]. Chief example are Jastrow wave-functions [2], faced when learning deep autoregressive-models, recently carrying high entanglement, but also with a limited varia- introduced machine-learning techniques that have en- tional freedom. On the other hand, more recently, tensor- abled previously intractable applications. network approaches have been put forward, based on In this paper, we propose a pivotal shift in the use of non-stochastic variational optimization, and most chiefly Neural-network Quantum States (NQS) for many-body on entanglement-limited variational wave-functions [3–6]. quantum systems, that markedly sets a discontinuity In an attempt to circumvent the limitations of the ap- with traditionally adopted VMC methods. Inspired by proaches above, architectures based on Artificial Neural the latest advances in generative machine learning mod- Networks (ANN) were proposed as variational wave func- els, we introduce variational states for which both the tions [7]. Restricted Boltzmann machines (RBM), which sampling and the optimization issues are substantially represent relatively veteran machine learning constructs, alleviated. Our model is composed of a ConvNet that were shown to be capable of representing volume-law en- allows direct, efficient, and i.i.d. sampling from the tanglement scaling in 2D [8–11]. Recently, other neural- highly entangled wave function it represents. The net- network architectures have been explored. Most notably, work architecture draws upon successful autoregressive convolutional neural networks (ConvNets) – leading deep models for representing and sampling from probability learning architectures that stand at the forefront of em- distributions. Those are widely employed in the machine pirical successes in various Artificial Intelligence domains learning literature [15], and have been recently used for – have been applied to both bosonic [12] and frustrated arXiv:1902.04057v3 [cond-mat.dis-nn] 19 Jan 2020 statistical mechanics applications [16], as well as den- spin systems [13]. sity matrix reconstructions from experimental quantum Despite the provable theoretical advantage of ConvNet systems [17]. We generalize these autoregressive models architectures [14], however, early numerical studies have to treat complex-valued wave-functions, obtaining highly expressive architectures parametrizing an automatically normalized many-body quantum wave-function. Neural Autoregressive Quantum States.– We consider ∗ [email protected] † [email protected] in the following a pure quantum system, constituted by [email protected] N discrete degrees of freedom s (s , . , s )(e.g. spins, ‡ ≡ 1 N § gcarleo@flatironinstitute.org occupation numbers, etc.) such that the wave-function ¶ [email protected] amplitudes Ψ(s) fully specify its state. Here we follow 2 the approach introduced in [7], and represent ln(Ψ(s)) probabilities for si to take one of the M possible dis- as a feed-forward ANN, parametrized by a possibly large crete values sj, conditioned on given s1, . , si 1. It is − number of network connections. Given an arbitrary set crucial that each output vector vi does not depend on of quantum numbers, s, the output value computation of the value of si or any of the variables appearing with the corresponding NQS, known as its forward pass, can a larger index, si+1, . , sN , for a pre-chosen ordering. generally be described as a sequence of K matrix-vector To ensure that each network outputs a valid conditional multiplications separated by the applications of a non- distribution, it is then sufficient to take the exponent linear element-wise activation function σ:C C. More of each entry and normalizing it according to the l1 formally, the unnormalized log amplitudes are→ given by norm, i.e. p (s s , . , s )=exp(v )/ exp(v ) , i i i 1 1 i,si s0 i,s0 also known as a|Softmax− operation. | | P ln(Ψ(s)) = WK σ (Wk 1σ ( σ (W1s))) , (1) Even though it is possible to use N separate networks − ··· for each of the N conditional probabilities, and each ac- K ri ri 1 cepting a variable number of inputs, in practice it is where Wi C × − i=1, r0=N, rK =1, r1, . , rK 1 are knownW≡ { as the∈ widths}of the network, and K as the− more common to use a single ANN that accepts N in- depth. In practice, specialized variants of eq.1 are com- puts and outputs N probability vectors. In this case, monly used, e.g. early applications have focused on shal- the autoregressive property is enforced by masking the low architectures (k=1) such as Restricted Boltzmann inputs si, . , sN for the i’th output vector, i.e. ensur- Machines, for which the activation function is typically ing that the contributions of higher-ordered spins to the taken to be σ(z)= ln cosh(z). Other, deeper, choices are output of the network vanish. PixelCNN [18] is such an often advantageous, such as convolutional networks, in architecture, and is built as a sequence of masked convo- which most of the matrices are restricted to act on a lutional layers, whose filters are restricted to having zeros subset of the quantum numbers, computing convolutions at positions “ahead”. For example, in a one dimensional with small filters. system, a filter of width R, where R is odd, would be con- strained to have (w , . , w(R 1) , 0,..., 0), and thus the Given a NQS representation of a many-body quan- 1 − /2 tum state, estimating physical observables Ψ Ψ ith output of each layer depends uniquely on the indices h |O| i at s1, . , si 1. of a local operator , is in general analytically in- − tractable, but can beO realized numerically through a A chief advantage of networks with the autoregressive stochastic procedure, as done in VMC. Specifically, property, is that directly drawing samples according to Ψ Ψ = Oloc , where ... denote statistical ex- P (s) is conceptually straightforward. One can sample h |O| i h iP h iP pectation values over the Born probability density each si in sequence, according to its given conditional 2 loc (s) Ψ(s) , and O s s0 Ψ(s0)/Ψ(s) is the probability that depends just on the previously sampled s0 P ≡ | | ≡ h |O| i (s1, . , si 1). Carefully exploiting the intrinsic sparse- corresponding statistical estimator. In the vast majority − of VMC applications, includingP NQS so-far, a MCMC al- ness of the network weights, further leads to a very effi- gorithm is typically used to generate samples from (s). cient algorithm for sampling [19]. Remarkably, the com- P While MCMC is a rather flexible technique, it comes with plexity of sampling a full string s1 . sN in a PixelCNN a large computational cost, especially for deep ANNs. architecture can be reduced to the complexity of just a Additionally, though MCMC asymptotically generates single forward pass. samples that are correctly distributed, in practice it can Our NAQS model for representing wave-functions is be plagued by very large autocorrelation times, and lack based on the same NADE principles so-far described. of ergodicity, that can severely affect the quality of the Specifically, just as probability functions can be factor- samples being generated. ized into a product of conditional probabilities, we repre- In light of these limitations, we propose here a sent a normalized wave-function as a product of normal- specialized network architecture that instead supports ized conditional wave-functions, such that efficient and exact sampling. Our approach is an extension of Neural Autoregressive Density Estima- N Ψ(s1, .