Variational Analysis of Ising Model Statistical Distributions and Phase Transitions

David Yevick Department of Physics University of Waterloo Waterloo, ON N2L 3G7

Abstract: Variational employ an encoding neural network to generate a probabilistic representation of a data set within a low-dimensional space of latent variables followed by a decoding stage that maps the latent variables back to the original variable space. Once trained, a statistical ensemble of simulated data realizations can be obtained by randomly assigning values to the latent variables that are subsequently processed by the decoding section of the network. To determine the accuracy of such a procedure when applied to lattice models, an autoencoder is here trained on a thermal equilibrium distribution of Ising spin realizations. When the output of the decoder for synthetic data is interpreted probabilistically, spin realizations can be generated by randomly assigning spin values according to the computed likelihood. The resulting state distribution in energy-magnetization space then qualitatively resembles that of the training samples. However, because correlations between spins are suppressed, the computed energies are unphysically large for low-dimensional latent variable spaces. The features of the learned distributions as a function of temperature, however, provide a qualitative indication of the presence of a phase transition and the distribution of realizations with characteristic cluster sizes.

Keywords: Statistical methods, Ising model, Variational Autoencoders, Phase Transitions

Introduction: can simulate the behavior of many physics and engineering systems using standardized and simply adapted program libraries. However, in contrast to standard numerical or analytic methods, which can attain arbitrary accuracy if the system properties and initial conditions are known, machine learning extracts features of a data set through multivariable optimization. Errors therefore arise from both the limited number of data records and the dependence of the optimization procedure on computational metaparameters such as the number and connectivity of the network computing elements. [1][2][3] On the other hand, if the properties of a physical system are partially indeterminate because of intrinsic stochasticity or measurement inaccuracy, machine learning can yield predictions that are at least as accurate as those of deterministic procedures.

Initial applications of machine learning in condensed matter physics employed principal component analysis [4][5][6][7] and neural networks. [8][9][10][11][12][13] These were followed by increasingly sophisticated applications involving both from sets of input data and output labels [14][15][16][17][18][19][20] and in which the network is instead trained on unlabeled data [5,21–29]. Representative supervised learning physics examples include quantum system [30–35] and impurity [36] simulations, predictions of crystal structures [37,38], and density functional approximations. [39] Unsupervised learning procedures include which classifies data according to perceived similarities, and feature extraction that projects the data onto a low-dimensional space while retaining its distinguishing properties. These have been successfully applied, often together with additional information such as locality or symmetry, to identify manifest and hidden order parameters and different phases or states. [5,25,40–44] Additionally, algebraic expressions have been

1 obtained for the symmetries or governing equations associated with a physical system [45][46,47] by algebraically parametrizing the functional form of the single neuron output layer of a Siamese neural network [39] or by similarly parametrizing the highest-order principal component of the relevant data records. [48–52] While such procedures in principle enable the identification of novel system properties, they appear difficult to employ if the desired result cannot be expressed as simple combinations of low order polynomials in the system variables.

This article examines in detail the applicability of variational autoencoders (VAEs) to the Ising model in the vicinity of the phase transition temperature. VAEs are simply implemented with high level machine learning packages, and therefore can be readily adapted to physics problems [53–56][19,57,58], Conversely, VAE methods have been analyzed and enhanced with statistical mechanics algorithms. [59,60]. The VAE is a generative modeling procedure that employs neural networks to encode and then decode the information of the input data set. [61,62] While a VAE employs latent variables similarly to, for example, a restricted Boltzmann machine that employs simple expectation maximization applied to a single hidden layer, the VAE transformations between the physical variables and a restricted set of latent variables employ a multilayer neural network, differential and a novel cost function. In particular, the VAE input data is mapped to a bottleneck with a small number of nodes (generally 2) through an encoder typically consisting of one or more neural layers with decreasing node numbers. The output of the encoder, which is termed the latent value representation, is then passed through a decoder network with an expanding number of nodes in each layer until the original number of nodes is attained. These neural networks can consist of either standard dense layers or convolutional layers that preserve local features in the data. The VAE is trained by comparing each output record to the original input. Subsequently, simulated data can be generated by providing appropriately distributed random values for the latent variables and evaluating the corresponding output variables.

A single layer VAE with a linear activation function is analogous to principal component analysis (PCA), which identifies orthogonal linear projections of the data that maximize the variance of each projection. Reconstructing the original data from these projections minimizes the mean squared error such that Gaussian distributed data yields the smallest residual error. Although easily coded, the PCA requires the eigenvalues and eigenvectors of a matrix formed from the data records as is therefore computationally intensive for large input data sets. This article therefore quantifies the accuracy of VAEs trained on the standard two-dimensional Ising model and investigates the properties of the outputs associated with different regions of the latent variables for a range of temperatures around the phase transition temperature.

Variational Autoencoders: As indicated above, the initial stage of a variational autoencoder inputs a set of data records, denoted here by into a series of network layers with a diminishing number of neurons termed the encoder. Each data record in the set can be described by a point in a high-dimensional space with𝑁𝑁 coordinates corresponding to,𝐗𝐗 for example, each pixel in an image or each spin in a lattice. The encoder network with parameters { } maps each in to a point in a𝑥𝑥⃗ lower dimensional space of latent variables. This step is performed in a stochastic manner described by a conditional probability 𝑒𝑒 distribution, { }( | ). A decoder neural𝜆𝜆 net with a𝑥𝑥⃗n increasing𝐗𝐗 number𝑧𝑧⃗ of neurons in successive layers then maps the latent parameters back to a new point in the original variable space in a manner similarly 𝐸𝐸 𝜆𝜆𝑒𝑒 𝑧𝑧⃗ 𝑥𝑥⃗ represented by { }( | ). The parameters { } and { } of the encoder and decoder are trained by minimizing a , ( , ) through gradient𝑥𝑥⃗′ descent backpropagation. This function 𝜆𝜆𝑑𝑑 𝑒𝑒 𝑑𝑑 incorporates both𝐷𝐷 the 𝑥𝑥difference⃗′ 𝑧𝑧⃗ between the input𝜆𝜆 data𝜆𝜆 and the output of the encoder followed by the 𝑒𝑒 𝑑𝑑 decoder and the deviation of the𝐿𝐿 𝜆𝜆 encod𝜆𝜆 er and decoder functions from compact probability distributions

2 in . Closely separated points in latent variable space then yield decoder outputs with nearly identical features. 𝑧𝑧⃗ To formulate the first objective of the loss function, the distribution of the data records in is associated with a probability distribution function ( ) termed the evidence. The associated information in nats is given by log ( ), here simply abbreviated as log ( ). Since an arbitrary point𝐗𝐗 in the latent space will𝑁𝑁 in general not map to a reconstructed𝑃𝑃 𝑥𝑥⃗ point in the high-dimensional output 𝑚𝑚=1 𝑚𝑚 space for which (− ∑) is large,𝑃𝑃 the𝑥𝑥⃗ loss function should first include− the𝑃𝑃 information𝑥𝑥⃗ in the output image𝑧𝑧⃗′ corresponding to 𝑥𝑥⃗′ 𝑃𝑃 𝑥𝑥⃗′ ( )( , ) = [log ( )] (1) { } 1 ′ ′ 𝐿𝐿 𝜆𝜆 𝜆𝜆 − 𝑧𝑧∈𝐸𝐸 𝜆𝜆𝑒𝑒 � � � 𝑃𝑃 𝑥𝑥⃗ In Eq.(1) the subscripted expectation symbol refers𝔼𝔼 to an𝑧𝑧⃗ average𝑥𝑥⃗ over the latent data point distribution generated from the input data records according to the encoder probability distribution function. Accordingly, Eq.(1) quantifies the reconstruction fidelity in reproducing the data records in from their truncated representation in the latent space. The VAE accomplishes this in part by training the decoder on latent space representations of the actual data (e.g. the subscript in Eq. (1)) which yield 𝐗𝐗large values of ( ) rather than decoding random latent space points which would overwhelmingly generate decoder outputs′ with negligible ( ). 𝑃𝑃 𝑥𝑥⃗ ′ The loss function further𝑃𝑃 𝑥𝑥⃗ biases the latent space points that generate the reconstructed data toward a compact posterior probability distribution, ( ) = { }( | ) ( ). Adjacent regions in latent space then yield reconstructed records with slightly different properties, such′ that if two physical features are 𝜆𝜆𝑑𝑑 associated with two closely separated points𝑃𝑃 in𝑧𝑧 ⃗the latent𝐷𝐷 space,𝑧𝑧⃗ 𝑥𝑥⃗′ 𝑃𝑃 positions𝑥𝑥⃗ in the latent space between these points yield synthetic output data that interpolates between the features. Since the encoder is described by a conditional probability distribution, { }( | ) that will be associated later with a Gaussian distribution, this is accomplished by adding a second contribution, ( )( , ), termed the Kullback-Leibler 𝐸𝐸 𝜆𝜆𝑒𝑒 𝑧𝑧⃗ 𝑥𝑥⃗ (KL) divergence, between and { }( | ) and { }( | ) to the loss2 function′ with 𝐿𝐿 𝜆𝜆 𝜆𝜆 𝑒𝑒 𝑑𝑑 ( ) 𝐸𝐸 𝜆𝜆 𝑧𝑧⃗ 𝑥𝑥⃗ 𝐷𝐷 𝜆𝜆 𝑧𝑧 𝑥𝑥′ ( , ) = log⃗ �⃗{ }( | ) log { }( | ) { } 2 ′ ′ 𝜆𝜆𝑒𝑒 𝜆𝜆𝑑𝑑 𝐿𝐿 𝜆𝜆 𝜆𝜆 𝑧𝑧∈𝐸𝐸 𝜆𝜆𝑒𝑒 � � ��{ }(𝐸𝐸 | ) 𝑧𝑧⃗ log𝑥𝑥⃗ − { }𝐷𝐷( | )𝑧𝑧⃗ 𝑥𝑥⃗ � (2) { } 𝑧𝑧⃗ 𝑥𝑥⃗ 𝔼𝔼 ′ 𝑒𝑒 𝑑𝑑 𝑧𝑧∈𝐸𝐸 𝜆𝜆𝑒𝑒 � � � 𝜆𝜆 𝜆𝜆 ≡ 𝑧𝑧⃗ 𝑥𝑥⃗ �𝕂𝕂𝕂𝕂�𝐸𝐸 𝑧𝑧⃗ 𝑥𝑥⃗ ∥ 𝐷𝐷 𝑧𝑧⃗ 𝑥𝑥⃗ �� Eq.(2) expresses in nats the information𝔼𝔼 lost when { }( | ) is represented by { }( | ) and therefore the information distance between the two distributions. The Kullback-Leibler divergence is of the form 𝜆𝜆𝑒𝑒 𝜆𝜆𝑑𝑑 log / and is therefore greater than 𝐸𝐸 𝑧𝑧⃗ 𝑥𝑥⃗( / 1) which𝐷𝐷 in turn𝑧𝑧⃗ 𝑥𝑥� ⃗is′ positive unless ( ) and𝑚𝑚 𝑚𝑚 are identical.𝑚𝑚 𝑚𝑚 Accordingly, is minimized when𝑚𝑚 𝑚𝑚 the 𝑚𝑚decoder𝑚𝑚 inverts the action of the encoder. − ∑ 𝑝𝑝 𝑞𝑞 𝑝𝑝 2 − ∑ 𝑝𝑝 𝑞𝑞 𝑝𝑝 − 𝑞𝑞 To𝑝𝑝 minimize the total loss, ( 𝐿𝐿, ) = ( )( , ) + ( )( , ) numerically, the KL divergence is first rewritten in terms of simply evaluated′ quantities.1 ′ Applying2 Bayes′ rule in the form ( ) { }( | ) = ( | ) ( ) yields 𝐿𝐿 𝜆𝜆 𝜆𝜆 𝐿𝐿 𝜆𝜆 𝜆𝜆 𝐿𝐿 𝜆𝜆 𝜆𝜆 { } 𝜆𝜆𝑑𝑑 ′ ′ 𝑃𝑃 𝑧𝑧 𝐷𝐷 𝑧𝑧⃗ 𝑥𝑥⃗′ 𝜆𝜆𝑑𝑑 𝐷𝐷 𝑧𝑧⃗ 𝑥𝑥⃗ 𝑃𝑃 𝑥𝑥⃗ log ( ) + log { }( | ) = log ( ) + log { }( | ) (3)

𝜆𝜆𝑑𝑑 𝜆𝜆𝑑𝑑 and hence 𝑃𝑃 𝑧𝑧 𝐷𝐷 𝑥𝑥⃗′ 𝑧𝑧⃗ 𝑃𝑃 𝑥𝑥′ 𝐷𝐷 𝑧𝑧⃗ 𝑥𝑥⃗′

3

( , ) = { }( | ) ( ) log { }( | ) (4) { } { } ′ 𝑒𝑒 𝑒𝑒 𝐿𝐿 𝜆𝜆 𝜆𝜆 𝔼𝔼𝑧𝑧∈𝐸𝐸 𝜆𝜆𝑒𝑒 � � � �𝕂𝕂𝕂𝕂 �𝐸𝐸 𝜆𝜆 𝑧𝑧⃗ 𝑥𝑥⃗ ∥ 𝑃𝑃 𝑧𝑧⃗ �� − 𝑧𝑧∈𝐸𝐸 𝜆𝜆𝑒𝑒 � � �� 𝐸𝐸 𝜆𝜆 𝑧𝑧⃗ 𝑥𝑥⃗ � In Eq.(4) the structure of the VAE𝑧𝑧⃗ 𝑥𝑥⃗ in which the encoder maps the𝔼𝔼 data in 𝑧𝑧⃗ 𝑥𝑥⃗to the latent variable space and the decoder transforms back to a physical space point is particularly evident. A variational encoder typically approximates the posterior of the encoder, { }( | )𝑥𝑥⃗, for a given data point by 𝑧𝑧a⃗ point chosen from a Gaussian 𝑧𝑧probability⃗ distribution with a mean𝑥𝑥⃗ ′value given by ( , ) and a variance 𝜆𝜆𝑒𝑒 𝑖𝑖 𝑖𝑖 equal to ( , ). Similarly, ( ), the sampling distribution𝐸𝐸 in the𝑧𝑧⃗ 𝑥𝑥 ⃗space of latent variables 𝑥𝑥⃗of the 𝑖𝑖 𝑒𝑒 decoder, is generally equated to a Gaussian function with zero mean and unit variance.𝜇𝜇⃗ 𝑥𝑥 𝜆𝜆As a consequence, 𝑖𝑖 𝑒𝑒 input records�σ⃗ 𝑥𝑥 with𝜆𝜆 similar properties𝑃𝑃 𝑧𝑧⃗ yield localized distributions of latent space values with variances smaller than or of the order of unity. If { }( | ) and ( ) are both Gaussian, an analytic evaluation of the KL divergence in Eq.(4) for a dimensional latent vector space yields 𝐸𝐸 𝜆𝜆𝑒𝑒 𝑧𝑧⃗ 𝑥𝑥⃗𝑖𝑖 𝑃𝑃 𝑧𝑧⃗ 𝑘𝑘 1 { }( | ) ( ) = 𝑘𝑘 1 + log (5) { } 2 2 2 2 𝑒𝑒 𝑧𝑧∈𝐸𝐸 𝜆𝜆𝑒𝑒 � � � 𝜆𝜆 𝑖𝑖 i i 𝔼𝔼 𝑧𝑧⃗ 𝑥𝑥⃗ �𝕂𝕂𝕂𝕂 �𝐸𝐸 𝑧𝑧⃗ 𝑥𝑥⃗ ∥ 𝑃𝑃 𝑧𝑧⃗ �� − �� 𝜎𝜎 − σ − μ � in which and are the mean and standard deviation, respectively,𝑖𝑖=1 of the :th component of the coding (e.g. the : th latent variable component). To incorporate neural network-based encoders and decoders 𝑖𝑖 𝑖𝑖 that support𝜇𝜇 backpropagation𝜎𝜎 into the above formalism the last layer of the encoder𝑖𝑖 , which outputs latent space variables𝑖𝑖 , is duplicated with the outputs of one of the two branches associated with either or log and that of the second branch with for each data record. The KL component of the loss function, 𝑖𝑖 Eq.(42), is evaluated from these outputs and combined with the standard neural network reconstruction𝜎𝜎 𝑖𝑖 𝑖𝑖 loss𝜎𝜎. Further, to generate a normalized Gaussian𝜇𝜇 probability distribution for the latent variable coding of the data, the outputs from the two encoder branches is combined with the output of a normalized Gaussian random number generator according to + with

= + Normal𝑧𝑧⃗ → 𝑧𝑧⃗( 𝑧𝑧⃗=̃ 1, = 0) (6)

before the decoder is applied. In 𝑧𝑧the𝑖𝑖̃ VAE,𝜇𝜇𝑖𝑖 all𝜎𝜎𝑖𝑖 latent∗ variables𝜎𝜎 are𝜇𝜇 local by construction; that is, the latent variables corresponding to one input realization are independent of other realizations. Therefore, the loss averaged over the input data can be expressed as a sum of the losses over each realization in the data set. Accordingly, stochastic gradient descent can be applied to the parameters , , , which can be adjusted for each data point or minibatch or can instead be shared across all datapoints. 𝜆𝜆𝑒𝑒 𝜆𝜆𝑑𝑑 𝜎𝜎⃗ 𝜇𝜇⃗ Once the VAE is trained, a synthetic output realization can be generated simply by selecting any meaningful set of values for the decoder inputs and calculating the resulting output distribution. If the VAE is trained on sets of differing discrete images, points in latent vector space will generate images that 𝑖𝑖 interpolate between the elements of the sets. In𝑧𝑧 standard machine learning applications, this property of the VAE can be employed, for example, to generate facial images that are not associated with a particular person.

Computational Results: This paper considers the classical two-dimensional ferromagnetic Ising model on a periodic lattice of 8 × 8 spins with unit spins described here by the Hamiltonian

= (7) 2 𝐽𝐽 , 𝐻𝐻 � 𝑆𝑆𝑖𝑖𝑆𝑆𝑗𝑗 𝑖𝑖 𝑗𝑗∈nearest neighbors 4 with spin variables = ±1. This enables numerous calculations to be performed with limited computational resources and therefore has been consistently employed as a standard benchmark for the 𝑖𝑖 application of neural network𝑆𝑆 procedures to spin systems. [48,63,64] In the remainder of this paper, the temperature and energy will be expressed in normalized units / and 2 / so that the value of does not enter the numerical results. Further, energies are referred to the energy of the antiferromagnetic 𝑏𝑏 state as spin states at higher temperatures sample both the ferromagnetic𝑇𝑇 𝑘𝑘 and𝐸𝐸 𝐽𝐽 antiferromagnetic regions.𝐽𝐽 For an infinite system, the inversion symmetry is broken below = 2.269, where the system transitions from a disordered to an ordered phase. 𝑍𝑍2 𝑇𝑇 This paper examines two implementations of the VAE. While the synthetic output distributions generated by the decoder for different values of the latent variables is highly dependent on the layer architecture as will be evident from the results of this paper, their general characteristics are effectively identical. The models however employ a relatively small number of network parameters to avoid overfitting, which can lead to divergences or unphysical asymmetries in the energy-magnetization space distributions of the synthetic states. At each temperature, the VAE was trained with 400,000 Ising model spin configurations generated with the Wolff cluster reversal procedure. [65] All VAE layers employed rectified linear (relu) activation functions except for the latent variable layers, which instead employ linear activation functions and the final sigmoid layer. Additional calculations indicated that replacing the rectified linear units with other standard activation functions did not significantly affect the results.

In the dense layer (DL) implementation the 8 × 8 input spin values are converted from ±1 to 0, 1 and after flattening into a 64 × 1 vector are input into a dense 32 neuron layer followed by a dense 8 neuron layer and finally a 16 neuron layer which was found to enhance the quality of the results. The subsequent latent variable layer possesses the standard number, = 2 of neurons unless otherwise stated. Finally, the matched decoder consists of successive dense 16, 8 and 32 neuron layers. To interpret the latent decoder output as a probability, the final 64 neuron layer𝑁𝑁 employs a sigmoid activation function to output a number between 0 and 1 for each array element which is finally reshaped into a 8 × 8 two-dimensional array.

The convolutional layer (CL) procedure instead inserts the 8 × 8 array of input values into a single two- dimensional convolutional layer with sixteen 3 × 3 filters that employs a stride of two and “same” zero padding resulting in 16 filtered 4 × 4 outputs. These are subsequently flattened into a single vector that is coupled to the latent layer. The decoder employs a 16 neuron dense layer to generate 16 values from the latent layer output that are afterwards reshaped into a 4 × 4 two dimensional array. A transposed two-dimensional convolutional layer with 16 filters. ‘same’ padding and a stride of 2 then inverts the corresponding encoder layer and is followed by a 8 × 8 single filter two-dimensional convolution layer with sigmoid activation functions.

Two procedures were employed to convert the continuous output of the final sigmoid activation layer into a discrete spin distribution. In the first of these, a positive +1 spin was assigned to each array element (lattice site) with a value above 0.5 and the remaining elements were set to 1. The second technique generates a random number at each site and assigns the +1 spin value to the sites for which the value output by the neural network at the site is larger than the associated random −value.

To examine the energy-magnetization distribution of the VAE synthetic realizations, a CL VAE was first trained for 130 training epochs on the Wolff realizations at = 3.5 for which the spin configurations sample both the disordered and ordered phases to an approximately equal extent. As discussed in detail in [66][67][68][69], not only is the distribution at this temperature𝑇𝑇 particularly sensitive to numerical

5 modeling error, but its qualitative features for small spin systems are effectively identical to those exhibited by larger spin systems at corresponding temperatures. Accordingly, 200,000 spin configurations were generated with the trained VAE by assigning random values from a unit normal distribution to each coordinate in the two latent parameter space. The joint histograms constructed from the energies and magnetizations of the spin states generated from the associated decoder outputs when positive and negative spins are assigned to the output neurons with value greater and less than 0.5 respectively (the first procedure of the previous paragraph) are given by the thick lines in Figure 1. The thin lines in the figure represent the corresponding distribution associated with the input states. While the energies of the synthetic states are clearly lower than those of the input realizations, the magnetization distribution is reasonably well reconstructed. Assigning instead probabilities given by the VAE output variables according to the second procedure yields the diagram on the left side of Figure 2. Here the input distribution is then qualitatively reproduced but the state energies are displaced toward larger values. While synthetic output distributions generated from the VAE have been previously displayed, the essential information associated with the energy-magnetization diagram was not presented. [70,71]

Similar behavior has been observed in both the PCA for [48] and in restricted Boltzmann machines (RBM) [64] for limited numbers of principle components and hidden neurons, respectively. Indeed, since the number of states increases with energy, the probability of generating a state of higher energy when the local correlations of the spin system are inaccurately modeled exceeds that of generating a lower energy state. Increasing the number of hidden-layer neurons in the RBM or the number of high-order and hence small spatial frequency PCA components in the PCA markedly improved the reconstruction accuracy and therefore the energy-magnetization distribution of the synthetically generated system realizations in the above references. Similarly, the spatial resolution of the reconstructed states in the VAE improves monotonically with the VAE latent space dimension as evident from the middle and rightmost diagrams in Figure 2, which employ 3 and 10 dimensional latent spaces, respectively.

As the VAE maps input distributions with similar structures to closely separated regions in the latent vector space, synthetic system realizations constructed from a grid of points covering the latent vector space display a series of continuously changing configurations with evolving physical properties. This set of realizations are analogous to a basis into which the input realizations are decomposed. Consequently, information regarding the nature of a phase transition can be inferred from the physical properties of these synthetic states. The four diagrams in Figure 3, for example, display an 10 × 10 array where the ( , ):th array element consists of the 8 × 8 output probability distributions generated with the convolutional layer (CL) decoder applied to latent coordinates for which the cumulative probability function𝑛𝑛 𝑚𝑚 associated with the normal Gaussian distribution, ( = , = ), equals 0.05 + 0.1 and 0.05 + 0.1 , respectively. In python, these values form a two-dimensional grid with points at grid_x, grid_y = [ norm.ppf( np.linspace( 0.05, 0.95, 10 ) ) ] * 2. The𝐍𝐍𝐍𝐍𝐍𝐍𝐍𝐍𝐍𝐍𝐍𝐍 probability𝝈𝝈 distributions𝟏𝟏 𝝁𝝁 𝟎𝟎 are obtained from𝑛𝑛 the sigmoid activation𝑚𝑚 function values for the output neurons and therefore correspond to the positive spin probability at each lattice site. The diagram in the upper left corner is calculated at = 2.0 while the diagrams in the upper right, lower left, and lower right quadrants are obtained at = 2.375, = 2.5 and = 3.0 respectively, ( = 2.375 approximates the temperature at which the 8 ×𝑇𝑇8 spin system exhibits a specific heat maximum). 𝑇𝑇 𝑇𝑇 𝑇𝑇 𝑇𝑇 The structure of the 100 spin distributions in each subfigure of Figure 3 becomes more visible if, as in Figure 1 a spin of +1 is assigned to each output location for which the output probability is larger than 0.5 and a spin of 1 to the remaining lattice locations. This yields the four diagrams of Figure 4 together with the magnetization and energy distributions displayed in Figure 5 and Figure 6, respectively. The distributions generated− by points that are adjacent in latent coordinates differ incrementally as can be

6 seen by following either a row or a column of any of the four diagrams in the sets of figures. Figure 4 possesses the additional feature that below the critical temperature, the synthetic realizations are dominated by two regions of highly magnetized states separated by a narrow region of partially magnetized states. Near the transition temperature, the region of partially magnetized states widens and the spin distributions in the intermediate region interpolate between a large cluster of one spin orientation and a large cluster of the opposite orientation. Therefore, the CL VAE transforms a spin distribution into patterns with high frequency spatial components mediated by the small diameter clusters. The wide regions of intermediate cluster sizes at higher temperatures further indicates that at higher temperatures, a broad spectrum of spin patters are present.

The patterns generated by points in the latent value space are dependent on the internal structure of the VAE. For example, Figure 7 displays the probability distributions corresponding to those of Figure 3 but for the DL VAE. Since in the DL procedure the 64 inputs are first flattened into a single vector and the original 8 × 8 lattice only reconstructed in the last layer, the ordered pattern of the output of a CL VAE is replaced by a series of vertical stripes. However, Figure 8, which depicts the DL VAE energy distribution analogous to the CL result of Figure 6 demonstrates that the region of latent vector space associated with intermediate magnetizations again increases rapidly with increasing temperature close to and above the critical transition temperature, reproducing the qualitative behavior of the CL network.

Conclusions and Future Directions: Although the RBM, PCA and VAE generative algorithms have been successfully applied to image manipulation, this and previous papers have demonstrated that they do not accurately reproduce statistical quantities such as the specific heat in lattice theory applications unless the dimensionality and hence degrees of freedom of the intermediate representation (e.g. the number of hidden neurons, principle components or latent space variables) is comparable with the number of lattice sites. The lack of accuracy is especially apparent when contrasted with optimized Monte Carlo or renormalization group-based approaches.[68,72–91] On the other hand, even for small dimensional intermediate representations neural networks reproduce the qualitative behavior of the spin system. The latent representation of a VAE can further, as shown here, indicate the existence of clusters of different scales and the extent to which they contribute to a statistical system at a given temperature. While outside the scope of this paper, this result together with the known ability of the VAE to distinguish images of multiple categories indicates that in physical systems with multiple competing phases the VAE will assign each phase to a separate latent vector space region. The critical temperature for phase transitions and the distribution of cluster sizes for the various phases could then be estimated from the growth of the width of the boundary layer between the locations of the different phase regions with temperature.

An examination of the synthetic VAE realizations for 3 or more latent space variables could also be of further interest since the energy-magnetization distribution of the synthetic and training data was here found to be far more accurate than in the standard two latent variable case. In addition, the analysis systems with 3 or more coexisting phases could be significantly improved if the differing phase regions are more distinctly separated in higher dimensional latent spaces.

Acknowledgements: The Natural Sciences and Engineering Research Council of Canada (NSERC) is acknowledged for financial support.

7

References:

[1] G. Carleo, I. Cirac, K. Cranmer, L. Daudet, M. Schuld, N. Tishby, L. Vogt-Maranto, L. Zdeborová, Machine learning and the physical sciences, Reviews of Modern Physics. 91 (2019) 045002. https://doi.org/10.1103/RevModPhys.91.045002.

[2] I. Goodfellow, Y. Bengio, A. Courville, , (2016).

[3] L. Zdeborová, Machine learning: New tool in the box, Nature Physics. 13 (2017) 420–421. https://doi.org/10.1038/nphys4053.

[4] K. Pearson, LIII. On lines and planes of closest fit to systems of points in space , The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science. 2 (1901) 559–572. https://doi.org/10.1080/14786440109462720.

[5] L. Wang, Discovering phase transitions with unsupervised learning, Physical Review B. 94 (2016). https://doi.org/10.1103/PhysRevB.94.195105.

[6] H. Kiwata, Deriving the order parameters of a spin-glass model using principal component analysis, Physical Review E. 99 (2019). https://doi.org/10.1103/PhysRevE.99.063304.

[7] D. Lozano-Gómez, D. Pereira, M.J.P. Gingras, Unsupervised Machine Learning of Quenched Gauge Symmetries: A Proof-of-Concept Demonstration, n.d.

[8] A. Morningstar, R.G. Melko, Deep learning the ising model near criticality, Journal of Machine Learning Research. 18 (2018).

[9] G. Torlai, R.G. Melko, Learning thermodynamics with Boltzmann machines, Physical Review B. 94 (2016). https://doi.org/10.1103/PhysRevB.94.165134.

[10] K. Ch’Ng, J. Carrasquilla, R.G. Melko, E. Khatami, Machine learning phases of strongly correlated fermions, Physical Review X. 7 (2017). https://doi.org/10.1103/PhysRevX.7.031038.

[11] R. Zhang, B. Wei, D. Zhang, J.J. Zhu, K. Chang, Few-shot machine learning in the three-dimensional Ising model, Physical Review B. 99 (2019). https://doi.org/10.1103/PhysRevB.99.094427.

[12] N. Yoshioka, Y. Akagi, H. Katsura, Transforming generalized Ising models into Boltzmann machines, Physical Review E. 99 (2019). https://doi.org/10.1103/PhysRevE.99.032113.

[13] C. Giannetti, B. Lucini, D. Vadacchino, Machine Learning as a universal tool for quantitative investigations of phase transitions, Nuclear Physics B. 944 (2019). https://doi.org/10.1016/j.nuclphysb.2019.114639.

[14] S. Lloyd, M. Mohseni, P. Rebentrost, Quantum algorithms for supervised and unsupervised machine learning, (2013).

[15] S. Efthymiou, M.J.S. Beach, R.G. Melko, Super-resolving the Ising model with convolutional neural networks, Physical Review B. 99 (2019) 075113. https://doi.org/10.1103/PhysRevB.99.075113.

8

[16] C. Casert, T. Vieijra, J. Nys, J. Ryckebusch, Interpretable machine learning for inferring the phase boundaries in a nonequilibrium system, Physical Review E. 99 (2019). https://doi.org/10.1103/PhysRevE.99.023304.

[17] T. Ohtsuki, T. Mano, Drawing phase diagrams of random quantum systems by deep learning the wave functions, Journal of the Physical Society of Japan. 89 (2020). https://doi.org/10.7566/JPSJ.89.022001.

[18] R. Zhang, B. Wei, D. Zhang, J.J. Zhu, K. Chang, Few-shot machine learning in the three-dimensional Ising model, Physical Review B. 99 (2019). https://doi.org/10.1103/PhysRevB.99.094427.

[19] Q. Ni, M. Tang, Y. Liu, Y.C. Lai, Machine learning dynamical phase transitions in complex networks, Physical Review E. 100 (2019). https://doi.org/10.1103/PhysRevE.100.052312.

[20] M.J.S. Beach, A. Golubeva, R.G. Melko, Machine learning vortices at the Kosterlitz-Thouless transition, Physical Review B. 97 (2018). https://doi.org/10.1103/PhysRevB.97.045207.

[21] O. Balabanov, M. Granath, Unsupervised learning using topological , Physical Review Research. 2 (2020). https://doi.org/10.1103/physrevresearch.2.013354.

[22] S. Durr, S. Chakravarty, Unsupervised learning eigenstate phases of matter, Physical Review B. 100 (2019). https://doi.org/10.1103/PhysRevB.100.075102.

[23] J.F. Rodriguez-Nieva, M.S. Scheurer, Identifying topological order through unsupervised machine learning, Nature Physics. 15 (2019) 790–795. https://doi.org/10.1038/s41567-019-0512-x.

[24] H.Y. Kwon, N.J. Kim, C.K. Lee, C. Won, Searching magnetic states using an unsupervised machine learning algorithm with the Heisenberg model, Physical Review B. 99 (2019). https://doi.org/10.1103/PhysRevB.99.024423.

[25] S.J. Wetzel, Unsupervised learning of phase transitions: From principal component analysis to variational autoencoders, Physical Review E. 96 (2017). https://doi.org/10.1103/PhysRevE.96.022140.

[26] T. Wu, M. Tegmark, Toward an artificial intelligence physicist for unsupervised learning, Physical Review E. 100 (2019). https://doi.org/10.1103/PhysRevE.100.033311.

[27] X. Xu, Q. Wei, H. Li, Y. Wang, Y. Chen, Y. Jiang, Recognition of polymer configurations by unsupervised learning, Physical Review E. 99 (2019). https://doi.org/10.1103/PhysRevE.99.043307.

[28] D. Lozano-Gómez, D. Pereira, M.J.P. Gingras, Unsupervised Machine Learning of Quenched Gauge Symmetries: A Proof-of-Concept Demonstration, n.d.

[29] Y.H. Liu, E.P.L. van Nieuwenburg, Discriminative Cooperative Networks for Detecting Phase Transitions, Physical Review Letters. 120 (2018). https://doi.org/10.1103/PhysRevLett.120.176401.

9

[30] M. Matty, Y. Zhang, Z. Papić, E.A. Kim, Multifaceted machine learning of competing orders in disordered interacting systems, Physical Review B. 100 (2019). https://doi.org/10.1103/PhysRevB.100.155141.

[31] J. Carrasquilla, G. Torlai, R.G. Melko, L. Aolita, Reconstructing quantum states with generative models, Nature Machine Intelligence. 1 (2019) 155–161. https://doi.org/10.1038/s42256-019- 0028-1.

[32] T. Vieijra, C. Casert, J. Nys, W. De Neve, J. Haegeman, J. Ryckebusch, F. Verstraete, Restricted Boltzmann Machines for Quantum States with Non-Abelian or Anyonic Symmetries, Physical Review Letters. 124 (2020) 097201. https://doi.org/10.1103/PhysRevLett.124.097201.

[33] Y.A. Kharkov, V.E. Sotskov, A.A. Karazeev, E.O. Kiktenko, A.K. Fedorov, Revealing quantum chaos with machine learning, Physical Review B. 101 (2020). https://doi.org/10.1103/PhysRevB.101.064406.

[34] R.A. Vargas-Hernández, J. Sous, M. Berciu, R. V. Krems, Extrapolating Quantum Observables with Machine Learning: Inferring Multiple Phase Transitions from Properties of a Single Phase, Physical Review Letters. 121 (2018). https://doi.org/10.1103/PhysRevLett.121.255702.

[35] T. Ohtsuki, T. Mano, Drawing phase diagrams of random quantum systems by deep learning the wave functions, Journal of the Physical Society of Japan. 89 (2020). https://doi.org/10.7566/JPSJ.89.022001.

[36] L.F. Arsenault, A. Lopez-Bezanilla, O.A. Von Lilienfeld, A.J. Millis, Machine learning for many-body physics: The case of the Anderson impurity model, Physical Review B - Condensed Matter and Materials Physics. 90 (2014) 155136. https://doi.org/10.1103/PhysRevB.90.155136.

[37] S. Curtarolo, D. Morgan, K. Persson, J. Rodgers, G. Ceder, Predicting crystal structures with of quantum calculations, Physical Review Letters. 91 (2003). https://doi.org/10.1103/PhysRevLett.91.135503.

[38] C.S. Adorf, T.C. Moore, Y.J.U. Melle, S.C. Glotzer, Analysis of Self-Assembly Pathways with Unsupervised Machine Learning Algorithms, Journal of Physical Chemistry B. 124 (2020) 69–78. https://doi.org/10.1021/acs.jpcb.9b09621.

[39] J.C. Snyder, M. Rupp, K. Hansen, K.R. Müller, K. Burke, Finding density functionals with machine learning, Physical Review Letters. 108 (2012). https://doi.org/10.1103/PhysRevLett.108.253002.

[40] E.P.L. Van Nieuwenburg, Y.H. Liu, S.D. Huber, Learning phase transitions by confusion, Nature Physics. 13 (2017) 435–439. https://doi.org/10.1038/nphys4037.

[41] K. Liu, J. Greitemann, L. Pollet, Learning multiple order parameters with interpretable machines, Physical Review B. 99 (2019). https://doi.org/10.1103/PhysRevB.99.104410.

[42] C. Giannetti, B. Lucini, D. Vadacchino, Machine Learning as a universal tool for quantitative investigations of phase transitions, Nuclear Physics B. 944 (2019). https://doi.org/10.1016/j.nuclphysb.2019.114639.

10

[43] M.J.S. Beach, A. Golubeva, R.G. Melko, Machine learning vortices at the Kosterlitz-Thouless transition, Physical Review B. 97 (2018). https://doi.org/10.1103/PhysRevB.97.045207.

[44] H. Kiwata, Deriving the order parameters of a spin-glass model using principal component analysis, Physical Review E. 99 (2019). https://doi.org/10.1103/PhysRevE.99.063304.

[45] A. Morningstar, R.G. Melko, Deep learning the ising model near criticality, Journal of Machine Learning Research. 18 (2018).

[46] S.J. Wetzel, R.G. Melko, J. Scott, M. Panju, V. Ganesh, Discovering Symmetry Invariants and Conserved Quantities by Interpreting Siamese Neural Networks, (2020).

[47] S.J. Wetzel, M. Scherzer, Machine learning of explicit order parameters: From the Ising model to SU(2) lattice gauge theory, Physical Review B. 96 (2017). https://doi.org/10.1103/PhysRevB.96.184410.

[48] D. Yevick, Conservation laws and spin system modeling through principal component analysis, Computer Physics Communications. 262 (2021). https://doi.org/10.1016/j.cpc.2021.107832.

[49] C. Wang, H. Zhai, Machine learning of frustrated classical spin models. I. Principal component analysis, Physical Review B. 96 (2017). https://doi.org/10.1103/PhysRevB.96.144432.

[50] C. Wang, H. Zhai, Machine learning of frustrated classical spin models (II): Kernel principal component analysis, Frontiers of Physics. 13 (2018). https://doi.org/10.1007/s11467-018-0798-7.

[51] S.J. Wetzel, Unsupervised learning of phase transitions: From principal component analysis to variational autoencoders, Physical Review E. 96 (2017). https://doi.org/10.1103/PhysRevE.96.022140.

[52] H. Kiwata, Deriving the order parameters of a spin-glass model using principal component analysis, Physical Review E. 99 (2019). https://doi.org/10.1103/PhysRevE.99.063304.

[53] T. Heimel, G. Kasieczka, T. Plehn, J. Thompson, QCD or what?, SciPost Physics. 6 (2019). https://doi.org/10.21468/scipostphys.6.3.030.

[54] D. Wu, L. Wang, P. Zhang, Solving Statistical Mechanics Using Variational Autoregressive Networks, Physical Review Letters. 122 (2019). https://doi.org/10.1103/PhysRevLett.122.080602.

[55] S.J. Wetzel, Unsupervised learning of phase transitions: From principal component analysis to variational autoencoders, Physical Review E. 96 (2017). https://doi.org/10.1103/PhysRevE.96.022140.

[56] S. Efthymiou, M.J.S. Beach, R.G. Melko, Super-resolving the Ising model with convolutional neural networks, Physical Review B. 99 (2019) 075113. https://doi.org/10.1103/PhysRevB.99.075113.

[57] A. Rocchetto, S. Aaronson, S. Severini, G. Carvacho, D. Poderini, I. Agresti, M. Bentivegna, F. Sciarrino, Experimental learning of quantum states, (2017). http://arxiv.org/abs/1712.00127 (accessed March 31, 2021).

11

[58] Z. Liu, S.P. Rodrigues, W. Cai, Simulating the Ising Model with a Deep Convolutional Generative Adversarial Network, (2017). http://arxiv.org/abs/1710.04987 (accessed April 16, 2020).

[59] A. Alemi, A. Abbara, Exponential capacity in an autoencoder neural network with a hidden layer, ArXiv. (2017).

[60] A.A. Alemi, I. Fischer, J. v. Dillon, K. Murphy, Deep variational information bottleneck, in: 5th International Conference on Learning Representations, ICLR 2017 - Conference Track Proceedings, International Conference on Learning Representations, ICLR, 2017.

[61] D.P. Kingma, M. Welling, Auto-encoding variational bayes, in: 2nd International Conference on Learning Representations, ICLR 2014 - Conference Track Proceedings, International Conference on Learning Representations, ICLR, 2014.

[62] D.J. Rezende, S. Mohamed, D. Wierstra, Stochastic backpropagation and approximate inference in deep generative models, in: 31st International Conference on Machine Learning, ICML 2014, International Machine Learning Society (IMLS), 2014: pp. 3057–3070.

[63] G. Torlai, R.G. Melko, Learning thermodynamics with Boltzmann machines, Physical Review B. 94 (2016). https://doi.org/10.1103/PhysRevB.94.165134.

[64] D. Yevick, R. Melko, The accuracy of restricted Boltzmann machine models of Ising systems, Computer Physics Communications. 258 (2021). https://doi.org/10.1016/j.cpc.2020.107518.

[65] J.S. Wang, R.H. Swendsen, Cluster Monte Carlo algorithms, Physica A: Statistical Mechanics and Its Applications. (1990). https://doi.org/10.1016/0378-4371(90)90275-W.

[66] D. Yevick, Y.H. Lee, A cluster controller for transition matrix calculations, International Journal of Modern Physics C. 30 (2019). https://doi.org/10.1142/S0129183119500190.

[67] D. Yevick, Y.H. Lee, Dynamic canonical and microcanonical transition matrix analyses of critical behavior, European Physical Journal B. 90 (2017). https://doi.org/10.1140/epjb/e2017-70747-x.

[68] D. Yevick, A projected entropy controller for transition matrix calculations, European Physical Journal B. 91 (2018). https://doi.org/10.1140/epjb/e2018-90171-0.

[69] Y.H. Lee, D. Yevick, Renormalized multicanonical sampling in multiple dimensions, Physical Review E. 94 (2016). https://doi.org/10.1103/PhysRevE.94.043323.

[70] S.J. Wetzel, Unsupervised learning of phase transitions: From principal component analysis to variational autoencoders, Physical Review E. 96 (2017) 022140. https://doi.org/10.1103/PhysRevE.96.022140.

[71] P. Mehta, M. Bukov, C.H. Wang, A.G.R. Day, C. Richardson, C.K. Fisher, D.J. Schwab, A high-bias, low-variance introduction to Machine Learning for physicists, Physics Reports. 810 (2019) 1–124. https://doi.org/10.1016/j.physrep.2019.03.001.

[72] D. Yevick, Accelerated rare event sampling, International Journal of Modern Physics C. 27 (2016). https://doi.org/10.1142/S0129183116500418.

12

[73] D. Yevick, Renormalized multicanonical sampling, International Journal of Modern Physics C. 27 (2016). https://doi.org/10.1142/S0129183116500339.

[74] D. Yevick, J. Thompson, Accuracy and efficiency of simplified tensor network codes, ArXiv. (2019).

[75] T. Lu, D. Yevick, Efficient multicanonical algorithms, IEEE Photonics Technology Letters. 17 (2005). https://doi.org/10.1109/LPT.2004.843244.

[76] T. Lu, D. Yevick, Biased multicanonical sampling, IEEE Photonics Technology Letters. 17 (2005). https://doi.org/10.1109/LPT.2005.849981.

[77] D. Yevick, T. Lu, Improved multicanonical algorithms, Journal of the Optical Society of America A: Optics and Image Science, and Vision. 23 (2006). https://doi.org/10.1364/JOSAA.23.002912.

[78] J. Chen, S. Cheng, H. Xie, L. Wang, T. Xiang, Equivalence of restricted Boltzmann machines and tensor network states, Physical Review B. 97 (2018). https://doi.org/10.1103/PhysRevB.97.085104.

[79] E.M. Stoudenmire, D.J. Schwab, Supervised Learning with Quantum-Inspired Tensor Networks, (2016). http://arxiv.org/abs/1605.05775 (accessed April 16, 2020).

[80] C. Roberts, A. Milsted, M. Ganahl, A. Zalcman, B. Fontaine, Y. Zou, J. Hidary, G. Vidal, S. Leichenauer, TensorNetwork: A Library for Physics and Machine Learning, ArXiv. (2019).

[81] J. Liu, Y. Qi, Z.Y. Meng, L. Fu, Self-learning Monte Carlo method, Physical Review B. 95 (2017). https://doi.org/10.1103/PhysRevB.95.041101.

[82] R.H. Swendsen, Monte carlo renormalization group, Physical Review Letters. 42 (1979) 859–861. https://doi.org/10.1103/PhysRevLett.42.859.

[83] J.C. Walter, G.T. Barkema, An introduction to Monte Carlo methods, Physica A: Statistical Mechanics and Its Applications. 418 (2015) 78–87. https://doi.org/10.1016/j.physa.2014.06.014.

[84] L. Wang, Exploring cluster Monte Carlo updates with Boltzmann machines, Physical Review E. 96 (2017). https://doi.org/10.1103/PhysRevE.96.051301.

[85] L. Huang, L. Wang, Accelerated Monte Carlo simulations with restricted Boltzmann machines, Physical Review B. 95 (2017). https://doi.org/10.1103/PhysRevB.95.035105.

[86] H.W.J. Blöte, Y. Deng, Cluster Monte Carlo simulation of the transverse Ising model, Physical Review E - Statistical Physics, Plasmas, Fluids, and Related Interdisciplinary Topics. 66 (2002) 8. https://doi.org/10.1103/PhysRevE.66.066110.

[87] T.A. Bojesen, Policy-guided Monte Carlo: Reinforcement-learning Markov chain dynamics, Physical Review E. 98 (2018). https://doi.org/10.1103/PhysRevE.98.063303.

[88] Y. Nagai, M. Okumura, A. Tanaka, Self-learning Monte Carlo method with Behler-Parrinello neural networks, Physical Review B. 101 (2020). https://doi.org/10.1103/physrevb.101.115111.

13

[89] S. Li, P.M. Dee, E. Khatami, S. Johnston, Accelerating lattice quantum Monte Carlo simulations using artificial neural networks: Application to the Holstein model, Physical Review B. 100 (2019). https://doi.org/10.1103/PhysRevB.100.020302.

[90] E.M. Inack, G.E. Santoro, L. Dell’Anna, S. Pilati, Projective quantum Monte Carlo simulations guided by unrestricted neural network states, Physical Review B. 98 (2018). https://doi.org/10.1103/PhysRevB.98.235145.

[91] J. Liu, H. Shen, Y. Qi, Z.Y. Meng, L. Fu, Self-learning Monte Carlo method and cumulative update in fermion systems, Physical Review B. 95 (2017). https://doi.org/10.1103/PhysRevB.95.241104.

Figure 1: The joint energy-magnetization distribution of the input 8 × 8 Ising model data for = 3.5 (dashed lines) compared to the result of a VAE with 2 latent variables, where the VAE distribution is generated with the deterministic procedure described in the text. 𝑇𝑇

14

Figure 2: The joint energy-magnetization distribution of the input 8 × 8 Ising model data at = 3.5 (dashed lines) compared to the result of a VAE with 2, 3 and 10 latent variables (left, center and middle diagrams respectively), where the VAE results were generated with the probabilistic procedure described in the text. 𝑇𝑇

15

Figure 3: The 8 × 8 synthetic convolutional layer (CL) VAE probability distributions for a 10 × 10 set of values in the latent vector space that are equally spaced in the cumulative probability function of the normal Gaussian distribution. The temperatures are = 2.0, 2.375, 2.5, 3.0 for the upper right, upper left, lower left, and lower right quadrants, respectively.

𝑇𝑇

16

Figure 4: As in the previous figure, but with a spin of +1 assigned to each output location for which the VAE output probability is larger than 0.5 and a spin of 1 to the remaining lattice locations.

17

Figure 5: The magnetization distributions associated with Figure 4

18

Figure 6: The The energy distributions associated with Figure 4

19

Figure 7: The probability distributions corresponding to those of Figure 3 but for the DL implementation of the VAE.

20

Figure 8: The energy distributions corresponding to those of Figure 6 but for the DL implementation of the VAE.

21