<<

arXiv:1804.02709v2 [cond-mat.stat-mech] 8 Jun 2019 l,tesrnl orltdFrisseswr studied were exam- systems For Fermi correlated of strongly [11]. description the properties the ple, dynamical in networks and , stud- neural equilibrium classical been artificial also Beyond with have was ied problems [10]. method many-body useful quantum moving-window transi- be the a phase to just shown similarly the of also predict and Instead to confu- [9], invented called [8]. tions was scheme e.g. learning method see a sion transition, used; the been learning also have ensem- em- neighboring ble, trees stochastic 7]. random t-distributed [3–5, as and such bedding network) mainly methods, neural model unsupervised artificial Other and XY (an method the (PCA) autoencoder and analysis the component model principal the Ising learning also by the was machine learning to unsupervised supervised applied addition, by In variant [2]. model methods Ising gauge thermo- classical the its the of study and of the transitions in phase used was dynamical it matter, of phases learning [6]. machine data) unsupervised unlabeled labeled of (II) ma- (clustering with and supervised The regression data) (I) or training into [2–5]. (classification divided physics learning roughly in chine are transitions phases tools of phase main context and the matter in of and particular, science have in of methods and, fields various ML engineering, to adopted au- that, recently even Besides been also and [1]. feeds, vehicles digital news as tonomous successfully vision, such been life, computer everyday has recognition, to intel- (ML) applied artificial increasingly learning and and Machine data big ligence. processing enabled has ogy mns h aletdvlpeti hsdrcinof direction this in development earliest the Amongst technol- and science computer in advancement Recent ahn erigo hs rniin nteproainan percolation the in transitions phase of learning Machine a elandadtedt olpeb sn h vrg outp average the using by neu collapse fully-connected data exponent a the critical in and layer learned hidden mo be one XY can just the using percolations, that bond find and tr site whose including lattices, dimensional properties, two the on models mechanical eue otepaedarmfrohrszs( sizes other netw for trained diagram t The phase learn the works. to to indeed network used it the be that train demonstrate to and engin order rela model feature in a the spin orientations devise or is spin we angles network, the ∆ neural (either where convolutional configurations a values, spin in ∆ the w parameter trained using network particular Besides the two from at app diagram also axis phase we consistent and a the as phases, obtain such ferromagnetic phases, quasi-long-range richer contains the model and XY defects—vortices generalized topological The of unbinding and binding 1 olg fPyisadOteetois aya Universit Taiyuan Optoelectronics, and Physics of College nti ae,w pl ahn erigmtost td pha study to methods learning machine apply we paper, this In .INTRODUCTION I. 2 ..Yn nttt o hoeia hsc n Department and Physics Theoretical for Institute Yang C.N. tt nvriyo e oka tn ro,SoyBok Y1 NY Brook, Stony Brook, Stony at York New of University State ν eas td h eeisi-otrizTols trans Berezinskii-Kosterlitz-Thouless the study also We . azo Zhang, Wanzhou Dtd ue1,2019) 11, June (Dated: ,2 1, iy Liu, Jiayu L ′ × L bv h suspected the above nargo containing region a in nesadtertclyb apn tt h renormal- to the [37]. made to group it been ization mapping have auto by Attempts the theoretically understand ac- reduce [34–36]. to and helpful time sampling is correlation Carlo and Monte the 33], the in [32, celerate discussed [24– networks been tensor others also of has many context learning and machine The [21], 23] 31]. model [22, percolation models quantum [15–19], non-equilibrium been [20], phases also systems topological have as disorder systems limitation such Other beyond successfully, [14]. methods studied problems ML sign meth- with the self-learning even of [12], and networks [13], connected ods e.g. using in ntrergms[] i eo h suspected the below (i) [2]: regimes configura- phase three Carlo Monte in calculate prepare tions models. to to (GXY) learning needs one supervised XY boundaries, apply generalized to the order and percolations, In model bond and XY topo- site the or including non-local involve properties, statisti- transitions logical classical whose of types models, two cal study to methods learning o riig yapyn h rie loih ocon- to of algorithm vicinity trained the the in applying figurations by training, for rniin yesnilyaeaigoe oa pn [2]. spins local over phase averaging recognize essentially can by actually transitions fully- (FCN) optimized the network magnetization, connected the as such (OP) eter indeed high with network phases the distinct [2]. the that confidence and verify transition the to learn with used can testing, being and training latter both the all for used from are configurations regimes three transition, phase the predicting) rate ′ where , 1 eew ou nuigmsl uevsdmachine supervised mostly using on focus we Here hntemdludrsuyhsalclodrparam- order local a has study under model the When n z-he Wei Tzu-Chieh and T c niin nov o-oa rtopological or non-local involve ansitions nivrie,i h lsia Ymodel. XY classical the in anti-vortices, eigapoc sn h itgasof histograms the using approach eering oee,i h ups st er isedof (instead learn to is purpose the if However, . fTcnlg hni002,China 030024, Shanxi Technology of y e n h eeaie Ymdl We model. XY generalized the and del ymcielann ehdt t We it. to method learning machine ly opnns steiptinformation input the as components) etrepae ntegnrlzdXY generalized the in phases three he a ewr,teproaintransition percolation the network, ral L eai hs,teprmgei and paramagnetic the phase, nematic tlyrgvscretetmt fthe of estimate correct gives layer ut t nydt ln h temperature the along data only ith ′ r yuigsse size system using by ork 6= iewih fpr Ycoupling. XY pure of weight tive etastosi eti statistical certain in transitions se L fPyisadAstronomy, and Physics of ihu n ute training. further any without ) T 7434,USA 1794-3840, c T c n ii nitreit regime intermediate an (iii) and 2 to,wihinvolves which ition, h rttorgmsaeused are regimes two first The . Ymodels XY d T c n a ne naccu- an infer can one , L × L can T c (ii) , 2

For the characterized by the non-local a CNN and verify that both give successful learning of order parameters, such as the topological phase of the the BKT transition in the XY model. Additionally, we Ising gauge model [2] or the classical XY model [38], the find that using the histograms of the spin orientations convolutional neural network (CNN) is a better tool than also works for the XY model and can be applied effi- the FCN as it encodes spatial information. It was demon- ciently to larger system sizes Additionally, we find that strated in the classical 2d Ising gauge model that the op- using the histograms of the spin orientations also works timized CNN essentially uses violation of local energetic for the XY model and the training can be done efficiently constraints to distinguish the low-temperature from the for larger system sizes. To go beyond the XY model, we high-temperature phase [2]. find it interesting to apply the ML to the generalized XY In percolation, non-local information such as the wrap- (GXY) model [44, 45] as it contains more complex config- ping or spanning of a cluster is needed to characterize the urations such as half-vortices linked by strings (domain phases and the transition in between. Percolation is one walls) and an additional nematic phase. We find the use of the simplest statistical physical models that exhibits of spin configurations (either angles or spin components) a continuous phase transition [39–41], and the system is and histograms both works. The advantage of the latter characterized by a single parameter, the occupation prob- approach is that the training can be done for larger sys- ability p of a site or bond, instead of temperature. If a tem sizes and, moreover, trained network for one system spanning cluster exists in a randomly occupied lattice, size can be applied to other system sizes. then the configuration percolates [42]. Can the neural network be trained to recognize such nonlocal informa- tion and learn or even predict the correct critical point? If so, can it be used to reveal other properties of the con- tinuous transition, such as any of the critical exponents? This motivates us to study percolation using machine learning methods. We first use the unsupervised t-SNE method to charac- (a) (b) terize configurations randomly generated for various oc- cupation probability p and find that it gives clear separa- tion of configurations away from the percolation thresh- old pc. This indicates that other machine learning meth- ods such as supervised ones will likely work, which we also employ. We find that both the FCN and the CNN works for learning the percolation transition, and the networks trained with configurations labeled with information of whether they are generated with p>pc or p

σ σ ns N−n system to form a nematic phase. For both ∆ = 0 and 1 Z = X ps (1 ps) s , (1) − the model reduces to the pure XY model (redefining qθ as {σ} θ in the first case), and hence the transition temperature is identical to that of the pure XY model. However, such where, e.g., on the square lattice with N = L L sites, ps σ × a redefinition is not possible with ∆ = 1. The phase is the probability of site occupation, ns is the numbers diagrams of the GXY models (4) depend6 on the integer of sites being occupied in the configuration labeled by σ. parameter q, and they have been explored extensively [44, In order to obtain the critical phase transition points by 45]. Monte-Carlo simulations, usually the wrapping probabil- ity R is defined [42] in the case of the periodic bound- ary condition. With open boundaries, a cluster growing III. PERCOLATIONS large enough could touch the two opposite boundaries and hence it is referred to a spanning cluster. The wrap- Even though we mainly use supervised learning meth- ping cluster is defined as a cluster that connects opposite ods, we will begin the study of percolation using an un- sides that would be in the otherwise open boundary con- supervised method, the t-distributed stochastic neighbor dition, as illustrated in Fig. 1 (a) and (b). A cluster embedding (t-SNE). We shall see that it can characterize forming along either the x or y direction can contribute configurations randomly generated in percolation to two to R. In ML method, we do not need to measure this distinct groups with high and low probabilities p of oc- observable directly and a naive labeling of each configu- cupation as well as a belt containing configurations gen- ration is given according to how it is generated according erated close to the p = p . This to the occupation probability p and p’s relation with the c gives us confidence to proceed with supervised methods critical value (the percolation threshold) p , i.e., whether c such as FCN and CNN. p>pc (say labeled as ‘1’) or p

1 1 (A) (a) the square lattice and on the triangular lattice, respec- tively, both with L = 16. The behavior is similar to that of site percolation. The upshot is that the t-SNE method can characterize percolation configurations into different phases and near the transition. In order to obtain the transition point p , one can probably divide the belt into 0 0 c 1 1 two halves and the probability value p at the cut can be (B) (b) used as an estimation of the percolation threshold. But we do not do that here, as we will use supervised learning below to learn the transition more accurately.

0 0 B. Learning percolation by FCN 1 1

(C ) ( c )

Neuron

0 0

1 2 3 4 XW1 WY2 FIG. 2: (Color online) The t-SNE distribution of site perco- 1 lation on the square lattice with L = 32 for (A) one step of 2 1 iteration, and (B) 2000 steps of iterations. (C) The result for 3 0 bond percolation on the square lattice with size L = 16 after 4 output layer 1100 iterations. The t-SNE distribution of site percolation on Input input layer the triangular lattice with L = 32 for (a) only one step of it- hidden layer eration and (b) 2000 steps of iteration. (c) The result for the bond percolation on triangular lattice at L = 16 after 1100 iterations. Note that in (A), (a), (B) and (b), we only use two colors pink and yellow, but in (C) and (c) the samples are colored according to the probability p of occupation, from pink to yellow.

FIG. 3: (Color online) Architecture of the fully connected neural network used to obtain pc for the site percolation t-SNE procedure, where each configuration x contains model. The input is a set of = configurations in a two- 32 32 = 1024 elements 0 or 1. Fig. 2 (A) is the dis- × dimensional lattices with periodic lattice with size L =4. A tribution obtained after only one step of iteration in the percolating configuration is illustrated, where the blue squares t-SNE method. Clearly, after the first iteration, the data are the occupied sites while the empty squares are not occu- for both labels are still mixed together and there is no pied. separation into distinct groups. However, after 2000 it- erations, as shown in Fig. 2 (B), the data converges into In Fig. 3, we show the structure of the FCN (which we three distinct groups, with two concentrated clusters and implemented with the TensorFlow library [48]), which a wide ‘belt’. The concentrated cluster with yellow solid consists of one input layer, one hidden layer and one out- circles indicates data generated from non-percolating (or put layer of neurons. The network between two neighbor- subcritical) phase, i.e. ppc. In addition to the two distinct clus- layer. The layers are interconnected by sets of correla- ters, the belt contains the data around the percolation tion weights (usually denoted by a matrix W ) and there transition point pc (roughly between 0.2 and 0.8). Simi- are biases (denoted by a vector b associated with neurons lar behavior in the t-SNE analysis of the distribution is at each layer (except the input one). The input layer ac- also obtained in the percolation study on the triangular cepts the data sets of images or configurations and then lattice, as shown in Figs. 2 (a) & (b). the network processes the data according to the weights We remarked that, there are only two colors used in and biases, as well as some activation function for each Figs. 2 (A), (a), (B) & (b), indicating only above or below neuron. The optimization is performed to minimize some pc, but a continuous hue between purple and yellow was cost function, e.g., the cross entropy function in Eq. (8) used in Figs. 2(C) and (c) to denote the occupation prob- via the stochastic gradient decent method. The number ability p. In Fig. 2 (C) & (c), we show the embeddings of neurons in the input layer should be equal to the total for occupation probability p of the bond percolation on number of lattice sites, i.e. L L in the site percolation × 5

(or the number of spins in the Ising model), or the total or (ii) the average cross-entropy between such pairs, number of bonds in the bond percolation. The values of 1 the neurons are denoted as the input vector x. CE X X yT j log yj + (1 yT j ) log(1 yj ) . ≡−N − −  The hidden layer is to transform the inputs into some- x j thing that the output layer can use and one can employ (8) as many such hidden layers (but we will focus on just An additional term called regularization, such as 2 one hidden layer in FCN). It determines the mapping λ/(2N) i Wi , is introduced to the cost function in P | | relationships. For example, as illustrated in Fig. 3, we order to prevent overfitting. The optimization is done denote W1 as the weight matrix from the input to the with stochastic gradient descent using the TensorFlow hidden layer and b1 the bias vector for the hidden layer. library. The number of neurons in the hidden layers generally is Once the network is optimized after the training stage, chosen to be approximately of the same order as the size we use as input different and independently generated of the input layer or less. Assume the activation function configurations with possibly different sets of p values, and for the neurons in the hidden layer is f. Then the neu- use the average values of the output layer y (sometimes rons in the hidden layer will output yH = f(x W1 +b1). referred to as the average output layer) from the network · Denote W2 as the weight matrix from the hidden to the to estimate the transition. This is referred to as the test output layer and b2 the bias vector for the output layer. stage. The two components in y are in the range [0, 1], Assume the activation function for the neurons in the and the larger the value of the component (associated hidden layer is g. Then the neurons in the output layer with a neuron) gives the more probable prediction the will have states described by y = g(yH W2 + b2). One neuron makes. Usually we plot such average numbers choice usually used as the activation is the· so-called sig- for both output neurons (one is associated label 0 and moid function, the other 1). This results in two curves as a function of the probability p, as illustrated in Fig. 4. The value of 1 p at which the two curves cross is used as an estimate σ(z) , (5) ≡ 1+ e−z of the percolation threshold. Note that, as seen below in Sec. IV, when we encounter three or more phases to related to the Fermi-Dirac distribution in physics. An- identify, then the number of neurons in the output layer other is the so-called softmax function that, when applied will be accordingly three or more. to a vector with components zj , gives another vector with components. (A) (a) 1 L4 1 L8 zj 2 e L16

aj = , (6) , y 0.5 zk 1 L24 Pk e y L32 0 L48 0 0.562 0.593 0.625 0.465 0.5 0.535 and it is related to the Boltzmann weight in physics. Yet (B) p (b) p 0.65 0.54

another choice that has become popular recently is the c p rectifier, defined as fRec(z) = max(0,z), and one unit 0.593 0.5 that uses this activation function is called a rectified lin- 0.55 0.46 ear unit. In principle, one can employ as many hidden 0 0.1 0.2 0.3 0 0.1 0.2 0.3 1/L 1/L layers in the network. It is the universality of the neural (C) (c) network, i.e., it can approximate any given function, that 1 1 makes machine learning powerful. 2

, y 0.5 0.5 1

These weights W ’s and biases b’s need to be optimized y at the training stage. The inputs consist of pairs of x : 0 0 { -0.3 0.2 -0.1 0 0.1 0.2 0.3 -0.4 -0.2 0 0.2 0.4 yT , where each configuration is represented by a vector 1/v 1/v (p - p )L (p - p L x of} L L components of value being 1 (whether a site is c c) occupied)× or 0 (not occupied) and a corresponding label yT indicating whether the configuration x is generated FIG. 4: (Color online) The two values y1 and y2 of the output layer of site percolation on the square lattice (left columns) above or below the percolation threshold pc. This label can be described by by one single binary number, e.g. and triangular lattice (right columns) with the system sizes 1 representing p>p and 0 representing p

In Fig. 4 (A), we show the two average values y1 1 2 and y in the output layer, obtained by testing the net- L2 X y(x) yT (x) , (7) 2 ≡ N x | − | work with site percolation configurations in the range of 6

tions according to whether p>pc or p

1 0.5

y L16 L24 transition point pc = 0.5 [51]. The data collapse and 0.2 0.2 0.48 0.5 0.52 0.33 0.347 0.36 the finite size scaling are shown in Figs. 4 (b) and (c), p p (B) (b) respectively. 0.515 0.375 Similarly, we use the FCN to study bond percolation, c p 0.5 0.347 and the results are shown in Figs. 5 (A) and (B) give the 0.485 0.315 average output layer for the square lattice (with sizes L = 0 0.1 0.2 0.3 0 0.1 0.2 0.3 1/L 1/L 8, 16, 24, 32), and finite size scaling (yielding the critical point at pc =0.5), respectively. Figs. 5 (a) and (b) show FIG. 5: (Color online) The two values y1 and y2 of the out- the results of bond percolation on the triangular lattices. put layer of the machine learning for bond percolation on The bond percolation threshold [51] is at 2 sin(π/18) the square lattice (left columns) and triangular lattice (right 0.347 and our result agrees with it. We remark that in≈ columns) with the system sizes L = 4, 8, 16, 24 using the fully the training, each input configuration has 2L L bond connected network. The first and the second rows show the variables for the square lattice and 3L L bond× variables average results in the output layer y1 and y2 and the finite for triangular lattices. × size scaling for the critical points, respectively. Site percolation on the Bethe lattice was studied an- alytically and exact transition was known, and thus it is interesting to apply the neural network. In Fig. 6 (a) 0.562

K=1 432 1 K5 probability p greater or less than pc. The information y about whether the individual configuration is percolating or not is not known. However, one would expect that 0.48 0.5 0.52 p for the configurations generated sufficiently away from p = pc, the neural network is learning such a property. FIG. 6: (Color online) (a) Bethe lattice with coordination But close to pc even if a configuration is generated at number z = 3. The lattice sites are represented by solid circles ppc it may not be and y2 in the output layer of site percolation on the Bethe percolating. There are a lot of fluctuations near pc; see lattice for K = 3 and K = 5. also Figs. 9 (e) and (f). In Sec. IIIC3, we will use the alternative labeling by giving the information of whether a configuration is percolating or not. We also apply the FCN to site percolation on other C. Learning percolation by CNN lattices. For example, on the triangular lattice, the per- colation threshold pc = 0.5 is also exactly known [51]. We have seen in the previous section that the FCN Similar to the square lattices, we generate the configu- works well in learning percolation transition. However ration with randomly occupied sites with a statistically the information about the lattice structure is not explic- independent probability p and then label the configura- itly used, but rather it might be inferred during opti- 7 mization. For problems that have such natural spatial 2. Site and bond percolations structure, the CNN is naturally suited and can yield bet- ter results. (A) (a) 0.8 6 0.8 8 2 10 ,y

1 0.5 0.5

y 12 14 1. CNN structure 0.2 16 0.2 0.55 0.593 0.65 0.48 0.52

(B) 4 (b)e 0.75 6 0.75

2 8

,y 10 1 0.5 0.5

y 12 14 0.25 16 0.48 0.52 0.25 p 0.33 0.347 0.397 p

FIG. 8: (Color online) The two values y1 and y2 of the output layer, using the CNN for site [(A)&(a)] and bond percolation convolution pooling fully connected layers [(B)&(b)] on the square (left) and triangular (right) lattices, respectively. FIG. 7: (Color online) Architecture of the fully connected neural network used to obtain pc for the site percolation model. The input is the configurations in a two-dimensional We repeat the same study of percolation on square and lattice with periodic boundary condition, illustrated with size triangular lattices with CNN. In the CNN structure, we L = 4. The the colored or shaded squares are the occupied use two combination layers (i.e., convolution+maxpool). sites while the white squares are not occupied. In Figs. 8 (A) and (B), we show the results of site perco- lation and bond percolation results on the square lattice, respectively. The critical point is converged to 0.593 and We first begin by discussing the structure of the CNN 0.5, respectively. During the training, for the site perco- shown in Fig. 7, where the input is a two-dimensional lation on e.g. the square lattice, the L L site occupation × array or an image. There is a filter with a small size configuration is used as input. For the bond percolation, such as 5 5 also called a local receptive field, that pro- the 2L L bond configuration is used. As expected, the × cesses information× of a small region. This same local CNN works well in learning the phase in percolation. For receptive field moves along the lattice to give a coarse- the site and bond percolation on the triangular lattices, grained version of the original 2D array or image. We the results are also shown in Figs. 8 (a) and (b). We note can move the filter not one lattice site but a few (which that some slight improvement in the learning of the tran- is usually called stride to the next region). We can also sition point can be made by careful choosing of p’s; see pad the outer regions with columns or rows of zeros so App. C. We also comment that since the Bethe lattices is as to maintain the same size of the filtered array, which not regular and we do not use CNN for the corresponding is referred to as padding. This results in a filtered or percolation problem. But it should be possible in princi- generally coarse-grained hidden layer. We can use many ple. Moreover, the FCN result of the percolation on the different local receptive fields to obtain many such lay- Bethe lattices is good enough. ers, usually referred to as kernels. Roughly speaking, the original image is converted to small ones with differ- ent features. For each kernel, a further processing called 3. Labeling by cluster identifying algorithm max-pooling is done on non-overlapping small patches, e.g., 2 2 regions, that further coarse grain the arrays. In this subsection, we would like to show a different Other pooling× methods can be used. One can repeat such ways of training the data using the non-local property convolution+pooling layers a few times, but we will use of whether the configuration is percolating or not as the one such combination layer in the percolation and two in label y. In the previous works for thermodynamic phase the later part of the XY and the GXY models. After this, transitions [2, 5, 9], for a configuration x at parameter there is a fully connected layer of neurons, as in the FCN. such as T , the corresponding label y is set to be 0 if This can be repeated a few more layers, but we will only T Tc, the configurations belong to is connected to a final output layer, and the number of another phase, and the label is set be 1. Because the neurons depends on the output type; for example, to dis- configurations around the percolation threshold pc have tinguish digits 0 to 9, there are ten output neurons. For fluctuations and the configurations at ppc may not neurons. Our optimization of the CNN again takes the be percolating, as illustrated in Fig. 9 (e) and (f). There- advantage of the TensorFlow library. fore, it is interesting to label the configuration according 8

The results in Ref. [38] show that the detection of vortices does not necessarily result in the best classification ac- (a) (b) 1 1 curacy, especially for lattices of less than approximately L8 1000 spins. The advantage may show up for larger sizes, 2 L16 but the training becomes more costly. Here we limit our-

, y , L32 1 L8 selves to smaller sizes for using spin orientations (or their y L48 L12 0 0 L20 components) in the training, but later introduce a differ- 0.520 0.593 0.520 0.593 ent approach in feature engineering that can be efficiently (c) p (d) p 1 0.85 applied to larger systems. In term of the two-dimensional L8 image for the CNN, when we use the spin components, L16 the L L sites need to be effectively doubled to L 2L L32 L8 × × L48 L12 that gives the same information of x. L20 0.5 0.6 In the following, we will focus on the pure XY model on 0.520 0.593 0.52 0.593 (e) p (f) p both the square and the honeycomb lattices and the GXY 1 1 model with q =2, 3 and q = 8 on the square lattices. 0.5 L16 0.5 L12 A. The pure XY model 0 0 0 0.593 1 0 0.593 1 p p (A) (a) 0.8 FIG. 9: (Color online) The two values y1 and y2 of the output 0.8 6 4 layer , using the cluster identifying method to label the config- 2 8 6 y 8 1, 0.5 12 0.5

uration, with (a) the FCN and (b) the CNN. The respective y 14 10 accuracy curves of (a) and (b) are shown in (c) with FCN 0.2 16 and in (d) with CNN. Labels {0, 1} for percolation or not are 0.75 0.8935 1 0.2 shown for randomly generated configurations with (e) L = 16 T 0.75 0.8935 1 (B) T and (f) L = 12. Notice the fluctuations near the transition. (b) 0.9 0.9 c

T T =0.891(8) 0.8 Tc=0.89(1) 0.8 c to whether or not the configuration has a spanning or 0 0.1 0.2 0 0.1 0.2 wrapping cluster, instead of the relationship between oc- 1/L 1/L cupation probability p and pc. FIG. 10: (Color online) The two values y1 and y2 of the out- Using the new labeling scheme, we show the results put layer using the CNN for XY model by inputting (A){θi} in Figs. 9 (a) and (c) with sizes L = 8, 16, 32, 48 by us- with sizes L = 6, 8, 12, 14, 16 and (a) {sin θi, cos θi} with sizes ing FCN and (b) and (d) with sizes L = 8, 12, 20 by L = 4, 6, 8, 10 on the square lattices. (B) and (b) Finite-size using CNN. Here no information about whether p>pc scaling of the critical points; the results obtained are consis- or p

x = (cosθ1, sinθ1, ..., cosθN , sinθN ), (9) Since the phase transition points are already known for the XY model on both lattices [46, 56, 57], i.e., Square Honeycomb or the spin orientations θi . The vorticity are defined Tc = 0.8935(1), and Tc = √2/2, respec- as the winding numbers,{ i.e.,} a collection 1 for vortices tively, it is thus interesting to see whether or not ma- and anti-vortices, but we shall not use those± as input, for chine learning could recognize the phase transition of XY the reason remarked earlier due to the results in Ref. [38]. model. Although the BKT transition of the XY model 9

(A) (a) 0.8 including those at Tc; see App. C and Fig. 20.) In the 4 4 next section we will study the GXY model to extend the 2 6 6 y 8 machine learning beyond the XY model.

1, 0.5 8 y 10 10 12 0.2 0.6 0.707 0.8 0.68 0.707 0.76 B. The generalized XY models (B) T (b) T 0.73 0.73 c 1. q=2 and q=3 T Tc=0.71+/-0.01 Tc=0.725+/-0.009 0.67 0.67 0 0.1 0.2 0 0.1 0.2 1/L 1/L

FIG. 11: (Color online) The two values y1 and y2 of the out- (a) q=2 (b) q=3 put layer using the CNN for XY model by inputting (A){θi} P P with sizes L = 4, 6, 8, 10 and (a) {sin θi, cos θi} with sizes 0.8935 0.8935 L = 4, 6, 8, 10, 12 on the honeycomb lattices. (B) and (b) T N F T N Finite-size scaling of the critical points; the results obtained F 0.365 are consistent with Tc = 0.707 using both types of the inputs. 0 0 0 0.5 1 0 0.5 1 ∆ ∆ has been studied by machine learning methods [38], it is 4 4 our motivation to go beyond and study the GXY model. (c ) (d )

In Figs. 10 (A) and (a), we show the learning results ) / using the raw spin configurations (i.e. both spin direc- 2 2 tions θ and spin components sinθ , cosθ ) on the θ { i} { i i} P( square lattices with sizes L =4 to L = 16. The lines y1 & y2 correspond to the average values of the two output 0 0 0 1 2 0 1 2 neurons. Due to the fact that the performance of the θ / θ / network is lowest near the critical points, we use 25000 2 2 Monte-Carlo samples of the training data and of test data ( e ) ( f ) for each temperature T . In this way, we obtain results ) / with low standard deviations around the critical point in 1 1 θ the range 0.75

tional layers, and the network indeed can learn the tran- (A) (a) 0.8 sition. For both ∆ = 1 (pure XY model) and 0 (the GXY 6 6 model) the phase transition is located at T/J =0.8935. 2 8 8 y 10 As remarked earlier, this is because the ∆ = 0 GXY

1, 0.5 12 y 14 12 model isomorphic to the usual XY model [63] by chang- 16 ¯ 0.2 ing the variable qθi as θi. Forboth∆=0and∆=1, us- 0.75 0.8935 1 0.75 0.8935 1 ing either (1) or (2) as the input works well and the CNN T (B) (b) T can distinguish, respectively, the quasi-long-range ferro- Tc=0.92(5) c 0.9 magnetic phase and the nematic phase, from the high T T =0.92(3) c temperature disordered paramagnetic phase. Fig. 13 (A) 0.8 0 0.1 0.2 0 0.1 0.2 displays the average values of the output layer and the 1/L 1/L performance of learning by using θi as input for q =2 at ∆ = 0, with L =6, 8, 12, 14, 16,{ while} that in (a) is ob- FIG. 13: (Color online) The two values y1 and y2 of the output tained from using sin θi, cosθi as input. The resulting layer of learning of (A) {θi} and (a) {sin θi, cos θi} for (a) q = { } 2, ∆ = 0 with various lattice sizes. (B) and (b) The estimated two curves y1 and y2 from the two neurons in the output critical points from the dependence on lattices sizes 1/L. In layer can distinguish different phases and their crossing the thermodynamical limit, Tc = 0.92(3) and Tc = 0.92(5) gives the transition. In the thermodynamic limit, the are estimated. critical points are estimated to be Tc = 0.92 0.03 and ± Tc = 0.92 0.05, respectively, using the two different types of input.± 1 1 Here in the network there are 32 and 64 kernels in the

2 (a) (b) first and second layers, respectively. We use the Wolff- L40 L40 cluster algorithm to generate configurations for the GXY , y 0.5 ∆0 0.5 q2∆1 1 q2 L60 L60

y model. To obtain enough equilibrated states, we throw 0 0 out the configurations during the first 10000 Monte-Carlo 0.8 0.8935 1 0.8 0.8935 1 steps. To avoid the correlations between configurations, T T 1 1 we pick up configurations with interval of 2-5 Monte Carlo steps at each temperature. 2 (c) L40 L40 (d) , y ∆1 L60 Inspired by the main difference between the above 1 0.5 q3∆0 L60 0.5 q3 y three phases being the shape of the histograms, we inves- 0 0 tigate whether using such feature engineering (i.e. his- 0.8 0.8935 1 0.8 0.8935 1 tograms) can help the learning better. In principle, spin T T 1 configurations from each sample in the Monte Carlo al-

2 (e)L20 (f) ∆ =0.2505(5) gorithm generates a histogram. However, to make the L40 0.25 c , y histogram smooth, we use multiple configurations to av- 1 0.5 T=0.365 L60 y L80 erage (e.g. 20) for small system sizes. Employing wolff 0 cluster algorithm allows us to access larger lattices, we 0.22 0.25 0.265 0 0.02 0.04 ∆ can use just one single configuration to generate a his- 1/L togram. After obtaining the histogram, we segment the images into a 32 32 matrix of black and white pixels, FIG. 14: (Color online) The two values y1 and y2 of the output in which the white× area is set to 0 and colorful area is set layer in the learning of histograms of {θi} for (a) q = 2, ∆ = 0; (b) q = 2, ∆ = 1; (c) q = 3, ∆ = 0; (d) q = 3, ∆= 1. (e) to 1. These matrices of pixels are our engineered feature The average results in the output layer and (f) the finite-size and are used as the input to the CNN for training. scaling of the critical points by scanning the parameter ∆, We directly recognize the histograms and obtain re- and critical point ∆c is estimated to be 0.2505(5). sults as accurate as other kinds of inputs, using the two- layer-convolution CNN as shown in Figs. 14 (a)-(d) along the four red dashed lines in Figs. 12 (a)-(b). For both ∆ = 1 limit, the system is quasi-long-range ferromagnetic ∆ = 0 and 1, the GXY model has the phase transition (broken reflection symmetry) in low-temperature limit located at T/J =0.8935. The results of using histograms and the distribution of the spin orientations is shown in as the engineered feature make the CNN able to recog- Fig. 12(d). In the higher temperature, the system be- nize different phases in the generalized XY model and the come disordered, and is in a paramagnetic phase. The associated transitions in these two limits of ∆. For com- distribution for the paramagnetic phase are also shown in pleteness, we also scan a path in the phase diagram of Figs. 12(e) and (f) for q = 2 and 3, respectively. Clearly, q = 3 model by first varying ∆ at T/J =0.365 (for which the distributions spread through a very wide arrange of the ∆c = 0.25 [45]), and the results shown in Figs. 14 angles due to strong thermal fluctuations. (e)-(f) demonstrate that the neural network, in partic- We use the configuration of (1) θi and (2) ular, can also distinguish the nematic phase from the cos θ , cos θ as input to the CNN with{ two} convolu- ferromagnetic phase and learn the transition point. The { i i} 11

2.0 2.0 approach of using histograms helps us to access larger (a) (b) P P lattices without any further training. 1.4 1.4

Armed with the success of learning two distinct phases, T T we move on to test whether our method can be used 0.8 0.8 N N to distinguish three phases. In particular, we scan the F F 0.2 0.2 phase diagram in Fig. 12 (b) along the green dashed lines, 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Δ Δ combining the previous horizontal path varying ∆ (at 2.0 2.0 T/J =0.365) and then the vertical path by varying T/J (c) (d) P P (at ∆ = 1); this path cuts through the three phases: N, 1.4 1.4

F and P. To do this, we need to use three neurons in T T the output layer. During the training of the network, 0.8 0.8 N N the configurations in the three phases are labeled as 0, F F 0.2 0.2 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 1 and 2 respectively. The sigmoid function maps the Δ Δ output layer located between the range [0, 1]. In the test stage, the first neuron in the output layer (representing FIG. 16: The phase diagram of the GXY model, which con- the N phase) is 1 for ∆ < 0.25 and becomes zero at tains, Nematic phase (N, blue), Ferromagnetic phase (F, other two phases (the F and P phases). The output of green), and paramagnetic phase (P, red), obtained by the the other two neurons shows converse behavior as shown semi-supervised method discussed in the text. The neural in Fig. 15 (a). There are three curves corresponding to network is trained only for data from ∆ = 0 and ∆ = 1.(a) the three neurons in the output layer. The accuracy is The q = 2 model: the input data for training is the histogram shown in Fig. 15 (b) and it equals to 100% at non-critical of spin orientations { θi }. (b) The q = 2 model: the in- regimes and decreases near the two critical points around put data for training is the spin orientations. (c) The q = 3 model: the input data for training is the histogram of spin ori- (∆ = 0.25,T = 0.365) and (∆ = 1,T = 0.89). In short, entations. (d) The q = 3 model: the input data for training we have demonstrated successful learning of three phases is spin orientations θi. These agree with those (from Monte (N, P and F) and the transitions. Carlo simulations) in Figs. 12(a)&(b). The phase boundaries represented by white lines by Monte Carlo methods are also 1 shown. (a) F 0.5 y1 N y2 P thus obtained, as shown in Fig. 16, agrees well with those y3 in Figs. 12 (a)&(b). 0 0 0.25 0.8935 ∆ T

1 2. q=8 (b) 0.5 For q > 3, the results in Refs [61, 62] suggest the ex- Accuracy istence of new phases at intermediate values of ∆ that 0 do not appear for either ∆ = 0 or 1. Since the existence and/or nature of some of those transitions are still dis- FIG. 15: (a) The results in averaging outcomes separately in puted, it would be interesting to see the outcome of the the three neurons of the output layer by scanning the green neural networks in those cases. dashed lines in Fig. 12(b). (b) The accuracy of the learning. Without loss of generality, a typical value q = 8 is chosen. In Fig. 17 (a), the phase diagram, containing N, As a final test, we use a semi-supervised method to P, and F phases, is shown. The blue dashed lines are the retrieve the global phase diagram of GXY model with data from Refs [61, 62]. The color of the small squares q = 2 and q = 3 on the square lattice of 12 12 sites. is the mapping of the average values in the output layer × What we mean by the semi-supervised method is that, yN , yF2 , yP , and yF of the two-layer CNN. The color is only the limited data that are not generic are used in the mapped via z = yP +20 yN +40 yF2 +60 yF and its training, for example, those along ∆ = 0 and ∆ = 1, normalized expression z =∗ [z min∗(z)]/max(∗z). Besides − i.e., the two vertical paths in Fig. 14(a) and (b). These the previous phases of the q = 3 model, a new phase F2 are non-generic, as they only give very limited repre- phases emerges, consistent with Refs. [61, 62]. sentations in the phase diagram. Once the network is Here we comment on the way we train the CNN. trained and optimized, we use the neural network to pre- Firstly, the lattice size should be large enough such as dict the phases, using the configurations generated from L = 100. By fixing the temperatures at T = 0.35 and Monte Carlo in the whole phase diagram, with parame- 0.8, we scan the parameter ∆ from 0 to 1 in the phase ters (∆, T ) covering the range ∆ = 0, 0.1, 0.2, , 1 and diagram with an interval of 0.05. For each parameter T =0.2, 0.4, , 1.8, 2. We use both the spin orientation··· point, 2500 histogram samples for training are used and and the histogram··· as the input, and the phase diagrams 25000 samples of histograms for testing, and four labels 12

1 1.00 (a) the test is from that training using data from the 100 (b) T=0.35 ∆=0 N ×

P θ/π) 0.75 100 system size. We note that the size of our histogram P( images was chosen and fixed more or less arbitrarily at, ∆

T (c) T=0.35 =0.5 F 0.50 2 e.g., 32 32, and that other (larger) sizes can be used N F × F to increase accuracy. Using previously trained network 0.25 2 (d)T=0.35 ∆=1 F avoids training the network repeatedly for other system 0 0.0 0.2 0.4 0.6 0.8 1.0 sizes. Δ 0 1 2 θ/π 1 1 1.00 (e) 1.00 (f) P P 0.75 0.75 T T 0.50 N F 0.50 N F V. CONCLUSION

0.25 F2 0.25 F2 0 0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 In summary, we have used machine learning methods Δ Δ to study the percolation, the XY model and the GXY model in the two dimensional lattices. For the percola- FIG. 17: (a) The phase diagram of the GXY model with tion phase transition, the unsupervised t-SNE can map q = 8, containing N, P, F , and F2 phases from Ref. [61]. The the high dimensional data-sets of configurations into an network is trained from the spin histograms of size L = 100. two-dimensional image with classifiable data. Using the (b) Illustration of the distributions of spins directions in the FCN even without explicitly giving the two-dimensional nematic phase show 8 peaks. (c) The distributions of spins spatial structure, still allows to recognize the percolation directions in F phase. Using the trained network of (a), we phase transition. By feeding the information about the obtain the phase diagrams of systems with sizes L = 150 (e) and L = 200 (f). existence or not of the spanning cluster, the transition can be predicted without any training information on whether the configurations are generated with p>pc or p

(a) (b) Appendix A: t-SNE method 0.8

1 2 0.6

Here we summarize for convenience the t-Distributed , y

Sigmoid 1 Stochastic Neighbor Embedding (t-SNE) method [47], 0.5 y 0.4 Sigmoid Softmax which is an improvement from the Stochastic Neibh- Softmax 0 0.2 gor Embedding (SNE) method [58]. The main idea 0 1 2 3 0.56 0.58 0.6 0.62 0.64 Epoch of these methods is to, from a set of high-dimensional p data, represented by high-dimensional vectors xi , ob- tain a corresponding set of low-dimensional (e.g.{ } 2- or FIG. 18: (a)The training accuracy by using the sigmoid and softmax activation, respectively. (b) For the site percolation 3-dimensional) data, represented by a set of vectors y { i} on the square lattices, we find the average output layers are such that the latter maintains key features of the former. the same using enough samples (25000). In the t-SNE method, the pairwise similarity qij ’s for a pair of low-dimensional vectors yi and yj is defined as (1 + y y 2)−1 Appendix C: Choosing training data q = || i − j || , (A1) i,j (1 + y y 2)−1 Pk6=l || k − l|| whereas that for the high-dimensional vectors is defined as pi,j (pj|i + pi|j )/(2n), where n is the total number of vectors≡ and (a) exp( x x 2/2σ2) p = −|| i − j || i . (A2) 0.8 j|i exp( x x 2/2σ 2) Pk6=l −|| k − l|| i

The lower dimensional vectors y’s are obtained by us- 8 2 10 ing a gradient descent approach by minimizing the cost y

1, 0.5 14 function, y 16 p C = p log ij , (A3) X X ij q i j ij 0.2 0.32 0.3470.36 0.37 (b) p where the gradient by varying yi’s is given by 0.37

δC c p =0.349(8) 2 −1 p c =4 X(pij qij )(yi yj)(1 + yi yj ) . (A4) δy − − || − || 0.347 i j 0.33 In the above, the standard deviation σi’s are obtained by 0 0.05 0.1 fixing the so-called perplexity (supplied by the user), 1/L

− P pj|i log2 pj|i Perp(Pi)=2 j , (A5) FIG. 19: The result of bond percolation on the triangular lat- which is typically chosen between 5 and 50, according to tice via training the CNN with configurations generated sym- the performance. Here we set it to be 20. metrically w.r.t. pc. This is to be compared with Fig. 8(b), and the results here show some improvement. The initial low-dimensional vectors yi’s are generated randomly. Iterating the gradient descent procedure will yield an improved approximation consecutively, until the There are several factors that could affect the trained gradient is very small. network. In this appendix, we compare two choices of the training data and the resultant learned transition points, i.e. intersections of y1 and y2. The first approach for the Appendix B: activation functions of sigmoid and training data is to use configurations corresponding to softmax probabilities p’s uniformly, as we have used in most of the plots in the main text, such as the percolation in Fig. 8. There are many activation functions to be used in the The second choice is to use configurations generated at neural network, here we use the sigmoid function for the p’s symmetric with respect to pc (but not including those FCN. Taking the site percolation as an example for test, at pc), which is demonstrated at Fig. 19. we output the results with 25000 samples per occupa- We remark that the training data used in Fig. 8 were tion probability. In Fig. 18 (a), the training accuracy of obtained at p’s that are uniform. By judiciously making the using sigmoid function reaches an equilibrated stage the choice such that these p’s are symmetric with respect faster than that by the softmax function. It is also found to the transition point (excluding the transition point), that the results of using softmax and sigmoid function better learning of the transition can be made. are almost the same as shown in Fig. 18 (b). Therefore, We also test this for the XY model on the honeycomb. enough samples can avoid systematic error induced by In Fig. 20 (a), y1 and y2 for the two neurons in the output different activation functions. layer are plotted and the result Tc =0.709(2) from finite- 14 size scaling in Fig. 20 (b) appears more accurate than Acknowledgments Tc =0.729(8) in Fig. 11 (b).

(a) 0.8 4 6 8 WZ thanks for the valuable discussion with C. X. Ding 0.6 10 2 on Monte-Carlo simulations and gratefully acknowledge y 1,

y financial support from China Scholarship Council and the 0.4 NSFC under Grant No.11305113; TCW is supported by the National Science Foundation under Grants No. PHY 0.2 1314748 and No. PHY 1620252. 0.6 0.707 0.8 (b) T 0.72 Tc=0.709(2) 0.707 c T 0.69 Notes added. After we finished the manuscript, we 0 0.1 0.2 1/L learned a very interesting paper on “Parameter diagnos- tics of phases and phase transition learning by neural FIG. 20: (a) Averaged values of the two neurons in the out- networks” by Suchsland and Wessel [60]. Even though put layer, y1 and y2, for the XY model on the honeycomb it is beyond the scope of the current manuscript, but as lattice by choosing the training data {cosθi, sinθi} below Tc a future direction, it will be interesting to employ the and above Tc symmetrically with respect to Tc but not in- methods presented there in percolation and the general- cluding it. (b) The finite-size scaling of Tc and the value in ized XY models. (They have already analyzed the XY the thermodynamic limit is estimated to be 0.709(2). model.)

[1] M. Jordan and T. Mitchell, Machine learning: Trends, chine Learning Phases of Strongly Correlated Fermions, perspectives, and prospects, Science 349, 255(2015). Phys. Rev. X 7, 031038 (2017). [2] J. Carrasquilla and R. Melko, Machine learning phases [13] J. Liu, H. Shen, Y. Qi, Z. Meng and L. Fu, Self-Learning of matter, Nat. Phys. 13, 431 (2017). Monte Carlo Method in Fermion Systems, Phys. Rev. B [3] L. Wang, Discovering Phase Transitions with Unsuper- 95, 241104 (2017). vised Learning, Phys. Rev. B 94, 195105, (2016). [14] P. Broecker, J. Carrasquilla, R. Melko and S. Trebst, [4] Sebastian J. Wetzel, Unsupervised learning of phase tran- Machine learning quantum phases of matter beyond the sitions: from principal component analysis to variational fermion sign problem, Sci. Rep. 7, 8823 (2017). autoencoders, Phys. Rev. E 96, 022140 (2017). [15] G. Torlai and R. Melko, A Neural Decoder for Topologi- [5] W. Hu, R. Singh and R. Scalettar, Discovering Phases, cal Codes, Phys. Rev. Lett. 119, 030501 (2017). Phase Transitions and Crossovers through Unsupervised [16] Y. Zhang and E. A. Kim, Quantum Loop Topography for Machine Learning: A critical examination, Phys. Rev. E Machine Learning, Phys. Rev. Lett. 118, 216401 (2017). 95, 062122 (2017). [17] D. Deng, X. Li and S. Sarma, Exact Machine Learning [6] I. Goodfellow, Y. Bengio and A. Courville, in Deep Topological States, Phys. Rev. B 96, 195145 (2017). Learning (MIT Press, 2016). [18] D. Deng, X. Li and S. Sarma, Quantum Entanglement in [7] C. Wang and H. Zhai, Unsupervised Learning of Frus- Neural Network States, Phys. Rev. X 7, 021021 (2017). trated Classical Spin Models I: Principle Component [19] P. Zhang, H. Shen and H. Zhai, Machine Learning Topo- Analysis, Phys. Rev. B 96, 144432 (2017). logical Invariants with Neural Networks, Phys. Rev. Lett. [8] K. Ch’ng, N. Vazquez, and E. Khatami, Unsupervised 120, 066401 (2018). machine learning account of magnetic transitions in the [20] F. Schindler, N. Regnault and T. Neupert, Probing Hubbard model Phys. Rev. E 97, 013306 (2018). many-body localization with neural networks, Phys. Rev. [9] E. van Nieuwenburg, Y. Liu and S. Huber, Learning B 95, 245134 (2017). phase transitions by confusion, Nature Phys 13, 435 [21] T. Mano and T. Ohtsuki, Phase diagrams of three- (2017). dimensional Anderson and quantum percolation models [10] P. Broecker, F. Assaad and S. Trebst, Quantum using deep three-dimensional Convolutional Neural Net- phase recognition via unsupervised machine learning, work, J. Phys. Soc. Jpn. 86, 113704 (2017). arXiv:1707.00663 [22] M. Bukov, A. Day, D. Sels, P. Weinberg, A. Polkovnikov [11] G. Carleo and M. Troyer, Solving the Quantum Many- and P. Mehta, Reinforcement Learning in Different Body Problem with Artificial Neural Networks, Science Phases of Quantum Control, arXiv:1705.00565. 355, 602 (2017). [23] J. Venderley, V. Khemani and E. Kim, Machine learning [12] K. Ch’ng, J. Carrasquilla, R. Melko and E. Khatami, Ma- out-of-equilibrium phases of matter,arXiv:1711.0002. 15

[24] G. Torlai, G. Mazzola, J. Carrasquilla, M. Troyer, R. phases in a class of generalized XY models, Phys. Rev. Melko and G. Carleo, Many-body quantum state tomog- Lett. 106, 067202 (2011). raphy with neural networks, arXiv:1703.05334. [46] S. Chung, Essential finite-size effect in the 2D XY model, [25] L. Li, T. E. Baker, S. R. White and K. Burke, Pure Phys. Rev. B 60, 11761 (1999); P. Olsson, Monte Carlo density functional for strong correlations and the ther- analysis of the two-dimensional XY model. II. Compari- modynamic limit from machine learning, Phys. Rev. B son with the Kosterlitz renormalization-group equations, 94, 245129 (2016). Phys. Rev. B 52, 4526 (1995). [26] D. Crawford, A. Levit, N. Ghadermarzy, J. Oberoi and P. [47] L. van der Maaten, Accelerating t-SNE using Tree-Based Ronagh, Reinforcement Learning Using Quantum Boltz- Algorithms. J Mach Learn Res. 15, 3221 (2014); L. mann Machines, arXiv:1612.05695. van der Maaten and G. Hinton, Visualizing Non-Metric [27] K. I. Aoki, T. Kobayashi, Restricted Boltzmann Ma- Similarities in Multiple Maps, Machine Learning 87, 33 chines for the Long Range Ising Models, Mod. Phys. Lett. (2012); L. van der Maaten and G. Hinton, Visualizing B 30, 1651401 (2016). High-Dimensional Data Using t-SNE, J Mach Learn Res. [28] C. Li, D. Tan and F. Jiang, Applications of neural 9, 2579 (2008). networks to the studies of phase transitions of two- [48] M. Abadi, A. Agarwal and P. Barham et al, Tensor- dimensional Potts models, arXiv:1703.02369. Flow: Large-Scale Machine Learning on Heterogeneous [29] L. Arsenault, A. Lopez-Bezanilla, O. Lilienfeld and A Distributed Systems, arXiv:1603.04467. Millis, Machine learning for many-body physics: The [49] M. Newman and R. Ziff, Efficient Monte Carlo algo- case of the Anderson impurity model Phys. Rev. B 90, rithm and high-precision results for percolation, Phys. 155136 (2014). Rev. Lett. 85, 4104 (2000). [30] G. Torlai, R. G. Melko, Learning Thermodynamics with [50] M. Levenshteˇin, B. Shklovskiˇi; M. Shur and A. Eˇfros, Boltzmann Machines, Phys. Rev. B 94, 165134 (2016). The relation between the critical exponents of percolation [31] X. Gao and L. Duan, Efficient Representation of Quan- theory, Zh. Eksp. Teor. Fiz. 69 386 (1975). tum Many-body States with Deep Neural Networks, Nat. [51] M. Sykes and and J. Essam, Exact critical percolation Commun.8, 662 (2017). probabilities for site and bond problems in two dimen- [32] E. Stoudenmire and D. Schwab, Supervised Learn- sions, J. Math. Phys. 5 1117 (1964). ing with Quantum-Inspired Tensor Networks, [52] A. Saberi, Recent advances in percolation theory and its arXiv:1605.05775. applications, Phys. Rep. 578, 1 (2015). [33] J. Chen, S. Cheng, H. Xie, L. Wang and T. Xiang, On [53] M. E. J. Newman and R. M. Ziff, Fast Monte Carlo al- the Equivalence of Restricted Boltzmann Machines and gorithm for site or bond percolation, Phys. Rev. E 64, Tensor Network States, arXiv:1701.04831. 016706 (2001). [34] L. Huang, L. Wang, Accelerate Monte Carlo Simulations [54] R. Swendsen, and J. Wang, Nonuniversal critical dynam- with Restricted Boltzmann Machines, Phys. Rev. B 95, ics in Monte Carlo simulations, Phys. Rev. Lett. 58, 86 035105 (2017); Can Boltzmann Machines Discover Clus- (1987). ter Updates, Phys. Rev. E 96, 051301 (2017) [55] N. Metropolis, A. Rosenbluth, M. Rosenbluth, A. Teller [35] J. Liu, Y. Qi, Z. Meng, L. Fu, Self-Learning Monte Carlo and E. Teller. Equations of state calculations by fast com- Method, Phys. Rev. B 95, 041101 (2017). puting machines. J. Chem. Phys. 21 , 1087 (1953). [36] N. Portman and I. Tamblyn, Sampling algorithms for val- [56] Y. Hsieh, Y. J. Kao and A. W. Sandvik. Finite-size scal- idation of supervised learning models for Ising-like sys- ing method for the Berezinskii-Kosterlitz-Thouless tran- tems, J. Comput. Phys.350, 871 (2017). sition, J Stat. Mech-Theory E. (09), P09001 (2013). [37] P. Mehta and D. Schwab An exact mapping between the [57] B. Nienhuis, Exact Critical Point and Critical Exponents Variational Renormalization Group and Deep Learning, of O(n) Models in Two Dimensions, Phys. Rev. Lett. arXiv:1410.3831. 49, 1062 (1982); Y. Deng, T. Garoni, W. Guo, H. Bl¨ote [38] M. Beach, A. Golubeva and R. Melko, Machine learning and A. Sokal, Cluster simulations of loop models on two- vortices at the Kosterlitz-Thouless transition, Phys. Rev. dimensional lattices Phys. Rev. Lett. 98, 120601 (2007). B 97, 045207 (2018). [58] G. E. Hinton and S. T. Roweis, Stochasitic Neighbor [39] S. R. Broadbend and J. M. Hammersley, Percolation pro- Embedding. In Advances in Neural Information Process- cesses. I. Crystals and Mazes, Proc. Camb. Phil. Soc.53, ing Systems, volume 15, pages 833-840 (The MIT Press, 629 (1957). Cambridge, MA 2002). [40] J. M. Hammersley, in Percolation Structure and Process, [59] D. R. Nelson and J. M. Kosterlitz, Universal jump in the edited by G. Deutscher, R. Zallen and J. Adler(Adam superfluid density of two-dimensional superfluids, Phys. Hilger, Bristol, 1983). Rev. Lett. 39, 1201 (1977). [41] G. Grimmett, in Percolation (Springer-Verlag, New York, [60] P. Suchsland and S. Wessel, Parameter diagnostics of 1989). phases and phase transition learning by neural networks, [42] J. P. Hovi and A. Aharony, Scaling and universality in Phys. Rev. B 97, 174435 (2018). the spanning probability for percolation , Phys.Rev.E 53, [61] G. A. Canova, Y. Levin, and J. J. Arenzon, Competing 235 (1996). nematic interactions in a generalized XY model in two [43] M. Cristoforetti, G. Jurman, A. I. Nardelli and C. and three dimensions, Phys. Rev. E 94, 032140 (2016). Furlanello, Towards meaningful physics from generative [62] G. A. Canova, Y. Levin, and J. J. Arenzon, Kosterlitz- models, arXiv:1705.09524 Thouless and Potts transitions in a generalized XY [44] D. H¨ubscher and S. Wessel, Stiffness jump in the gener- model, Phys. Rev. E 89, 012126 (2014). alized XY model on the square lattice, Phys. Rev.E 87, [63] G. A. Canova, F. Poderoso, J. J. Arenzon and Y. Levin, 062112 (2013). Reply to “Incommensurate vortices and phase transitions [45] F. Poderoso, J. Arenzon and Y. Levin, New ordered in two-dimensional XY models with interaction having 16 auxiliary minima” by S. E. Korshunov, arXiv:1207.3447.