1 Variational Autoencoder Analysis of Ising Model Statistical Distributions

Variational Autoencoder Analysis of Ising Model Statistical Distributions and Phase Transitions David Yevick Department of Physics University of Waterloo Waterloo, ON N2L 3G7 Abstract: Variational autoencoders employ an encoding neural network to generate a probabilistic representation of a data set within a low-dimensional space of latent variables followed by a decoding stage that maps the latent variables back to the original variable space. Once trained, a statistical ensemble of simulated data realizations can be obtained by randomly assigning values to the latent variables that are subsequently processed by the decoding section of the network. To determine the accuracy of such a procedure when applied to lattice models, an autoencoder is here trained on a thermal equilibrium distribution of Ising spin realizations. When the output of the decoder for synthetic data is interpreted probabilistically, spin realizations can be generated by randomly assigning spin values according to the computed likelihood. The resulting state distribution in energy-magnetization space then qualitatively resembles that of the training samples. However, because correlations between spins are suppressed, the computed energies are unphysically large for low-dimensional latent variable spaces. The features of the learned distributions as a function of temperature, however, provide a qualitative indication of the presence of a phase transition and the distribution of realizations with characteristic cluster sizes. Keywords: Statistical methods, Ising model, Variational Autoencoders, Phase Transitions Introduction: Machine learning can simulate the behavior of many physics and engineering systems using standardized and simply adapted program libraries. However, in contrast to standard numerical or analytic methods, which can attain arbitrary accuracy if the system properties and initial conditions are known, machine learning extracts features of a data set through multivariable optimization. Errors therefore arise from both the limited number of data records and the dependence of the optimization procedure on computational metaparameters such as the number and connectivity of the network computing elements. [1][2][3] On the other hand, if the properties of a physical system are partially indeterminate because of intrinsic stochasticity or measurement inaccuracy, machine learning can yield predictions that are at least as accurate as those of deterministic procedures. Initial applications of machine learning in condensed matter physics employed principal component analysis [4][5][6][7] and neural networks. [8][9][10][11][12][13] These were followed by increasingly sophisticated applications involving both supervised learning from sets of input data and output labels [14][15][16][17][18][19][20] and unsupervised learning in which the network is instead trained on unlabeled data [5,21–29]. Representative supervised learning physics examples include quantum system [30–35] and impurity [36] simulations, predictions of crystal structures [37,38], and density functional approximations. [39] Unsupervised learning procedures include cluster analysis which classifies data according to perceived similarities, and feature extraction that projects the data onto a low-dimensional space while retaining its distinguishing properties. These have been successfully applied, often together with additional information such as locality or symmetry, to identify manifest and hidden order parameters and different phases or states. [5,25,40–44] Additionally, algebraic expressions have been 1 obtained for the symmetries or governing equations associated with a physical system [45][46,47] by algebraically parametrizing the functional form of the single neuron output layer of a Siamese neural network [39] or by similarly parametrizing the highest-order principal component of the relevant data records. [48–52] While such procedures in principle enable the identification of novel system properties, they appear difficult to employ if the desired result cannot be expressed as simple combinations of low order polynomials in the system variables. This article examines in detail the applicability of variational autoencoders (VAEs) to the Ising model in the vicinity of the phase transition temperature. VAEs are simply implemented with high level machine learning packages, and therefore can be readily adapted to physics problems [53–56][19,57,58], Conversely, VAE methods have been analyzed and enhanced with statistical mechanics algorithms. [59,60]. The VAE is a generative modeling procedure that employs neural networks to encode and then decode the information of the input data set. [61,62] While a VAE employs latent variables similarly to, for example, a restricted Boltzmann machine that employs simple expectation maximization applied to a single hidden layer, the VAE transformations between the physical variables and a restricted set of latent variables employ a multilayer neural network, differential backpropagation and a novel cost function. In particular, the VAE input data is mapped to a bottleneck with a small number of nodes (generally 2) through an encoder typically consisting of one or more neural layers with decreasing node numbers. The output of the encoder, which is termed the latent value representation, is then passed through a decoder network with an expanding number of nodes in each layer until the original number of nodes is attained. These neural networks can consist of either standard dense layers or convolutional layers that preserve local features in the data. The VAE is trained by comparing each output record to the original input. Subsequently, simulated data can be generated by providing appropriately distributed random values for the latent variables and evaluating the corresponding output variables. A single layer VAE with a linear activation function is analogous to principal component analysis (PCA), which identifies orthogonal linear projections of the data that maximize the variance of each projection. Reconstructing the original data from these projections minimizes the mean squared error such that Gaussian distributed data yields the smallest residual error. Although easily coded, the PCA requires the eigenvalues and eigenvectors of a matrix formed from the data records as is therefore computationally intensive for large input data sets. This article therefore quantifies the accuracy of VAEs trained on the standard two-dimensional Ising model and investigates the properties of the outputs associated with different regions of the latent variables for a range of temperatures around the phase transition temperature. Variational Autoencoders: As indicated above, the initial stage of a variational autoencoder inputs a set of data records, denoted here by into a series of network layers with a diminishing number of neurons termed the encoder. Each data record in the set can be described by a point in a high-dimensional space with coordinates corresponding to, for example, each pixel in an image or each spin in a lattice. The encoder network with parameters { } maps each in to a point in a⃗ lower dimensional space of latent variables. This step is performed in a stochastic manner described by a conditional probability distribution, { }( | ). A decoder neural net with a⃗n increasing number⃗ of neurons in successive layers then maps the latent parameters back to a new point in the original variable space in a manner similarly ⃗ ⃗ represented by { }( | ). The parameters { } and { } of the encoder and decoder are trained by ⃗′ minimizing a loss function, ( , ) through gradient descent backpropagation. This function incorporates both the difference⃗′ ⃗ between the input data and the output of the encoder followed by the decoder and the deviation of the encod er and decoder functions from compact probability distributions 2 in . Closely separated points in latent variable space then yield decoder outputs with nearly identical features. ⃗ To formulate the first objective of the loss function, the distribution of the data records in is associated with a probability distribution function ( ) termed the evidence. The associated information in nats is given by log ( ), here simply abbreviated as log ( ). Since an arbitrary point in the latent space will in general not map to a reconstructed ⃗ point in the high-dimensional output =1 space for which (− ∑) is large, the⃗ loss function should first include− the information⃗ in the output image⃗′ corresponding to ⃗′ ⃗′ ( )( , ) = [log ( )] (1) { } 1 ′ ′ − ∈ � � � ⃗ In Eq.(1) the subscripted expectation symbol refers to an⃗ average⃗ over the latent data point distribution generated from the input data records according to the encoder probability distribution function. Accordingly, Eq.(1) quantifies the reconstruction fidelity in reproducing the data records in from their truncated representation in the latent space. The VAE accomplishes this in part by training the decoder on latent space representations of the actual data (e.g. the subscript in Eq. (1)) which yield large values of ( ) rather than decoding random latent space points which would overwhelmingly generate decoder outputs′ with negligible ( ). ⃗ ′ The loss function further ⃗ biases the latent space points that generate the reconstructed data toward a compact posterior probability distribution, ( ) = { }( | ) ( ). Adjacent regions in latent space then yield reconstructed records with slightly different properties, such′ that if two physical

Load more