arXiv:1804.02709v2 [cond-mat.stat-mech] 8 Jun 2019 l,tesrnl orltdFrisseswr studied were exam- systems For Fermi correlated of strongly [11]. description the properties the ple, dynamical in networks and physics, stud- neural equilibrium classical been artificial also Beyond with have was ied problems [10]. method many-body useful quantum moving-window transi- be the a phase to just shown similarly the of also predict and Instead to confu- [9], invented called [8]. tions was scheme e.g. learning method see a sion transition, used; the been learning also have ensem- em- neighboring ble, trees stochastic 7]. random t-distributed [3–5, as and such bedding network) mainly methods, neural model unsupervised artificial Other and XY (an method the (PCA) autoencoder and analysis the component model principal the Ising learning also by the was machine learning to unsupervised supervised applied addition, by In variant [2]. model methods Ising gauge thermo- classical the its the of study and of the transitions in phase used was dynamical it matter, of phases learning [6]. machine data) unsupervised unlabeled labeled of (II) ma- (clustering with and supervised The regression data) (I) or training into [2–5]. (classification divided physics learning roughly in chine are transitions phases tools of phase main context and the matter in of and particular, science have in of methods and, fields various ML engineering, to adopted au- that, recently even Besides been also and [1]. feeds, vehicles digital news as tonomous successfully vision, such been life, computer everyday has recognition, to intel- (ML) applied artificial increasingly learning and and Machine data big ligence. processing enabled has ogy mns h aletdvlpeti hsdrcinof direction this in development earliest the Amongst technol- and science computer in advancement Recent ahn erigo hs rniin nteproainan percolation the in transitions phase of learning Machine a elandadtedt olpeb sn h vrg outp average the using by neu collapse fully-connected data exponent a the critical in and layer learned hidden mo be one XY can just the using percolations, that bond find and tr site whose including lattices, dimensional properties, two the on models mechanical eue otepaedarmfrohrszs( sizes other netw for trained diagram t The phase learn the works. to to indeed network used it the be that train demonstrate to and engin order rela model feature in a the spin orientations devise or is spin we angles network, the ∆ neural (either where convolutional configurations a values, spin in ∆ the w parameter trained using network particular Besides the two from at app diagram also axis phase we consistent and a the as phases, obtain such ferromagnetic phases, quasi-long-range richer contains the model and XY defects—vortices generalized topological The of unbinding and binding 1 olg fPyisadOteetois aya Universit Taiyuan Optoelectronics, and Physics of College nti ae,w pl ahn erigmtost td pha study to methods learning machine apply we paper, this In .INTRODUCTION I. 2 ..Yn nttt o hoeia hsc n Department and Physics Theoretical for Institute Yang C.N. tt nvriyo e oka tn ro,SoyBok Y1 NY Brook, Stony Brook, Stony at York New of University State ν eas td h eeisi-otrizTols trans Berezinskii-Kosterlitz-Thouless the study also We . azo Zhang, Wanzhou Dtd ue1,2019) 11, June (Dated: ,2 1, iy Liu, Jiayu L ′ × L bv h suspected the above nargo containing region a in nesadtertclyb apn tt h renormal- to the [37]. made to group it been ization mapping have auto by Attempts the theoretically understand ac- reduce [34–36]. to and helpful time sampling is correlation Carlo and Monte the 33], the in [32, celerate discussed [24– networks been tensor others also of has many context learning and machine The [21], 23] 31]. model [22, percolation models quantum [15–19], non-equilibrium been [20], phases also systems topological have as disorder systems limitation such Other beyond successfully, [14]. methods studied problems ML sign meth- with the self-learning even of [12], and networks [13], connected ods e.g. using in ntrergms[] i eo h suspected the below (i) [2]: regimes configura- phase three Carlo Monte in calculate prepare tions models. to to (GXY) learning needs one supervised XY boundaries, apply generalized to the order and percolations, In model bond and XY topo- site the or including non-local involve properties, statisti- transitions logical classical whose of types models, two cal study to methods learning o riig yapyn h rie loih ocon- to of algorithm vicinity trained the the in applying figurations by training, for rniin yesnilyaeaigoe oa pn [2]. spins local over phase averaging recognize essentially can by actually transitions fully- (FCN) optimized the network magnetization, connected the as such (OP) eter indeed high with network phases the distinct [2]. the that confidence and verify transition the to learn with used can testing, being and training latter both the all for used from are configurations regimes three transition, phase the predicting) rate ′ where , 1 eew ou nuigmsl uevsdmachine supervised mostly using on focus we Here hntemdludrsuyhsalclodrparam- order local a has study under model the When n z-he Wei Tzu-Chieh and T c niin nov o-oa rtopological or non-local involve ansitions nivrie,i h lsia Ymodel. XY classical the in anti-vortices, eigapoc sn h itgasof histograms the using approach eering oee,i h ups st er isedof (instead learn to is purpose the if However, . fTcnlg hni002,China 030024, Shanxi Technology of y e n h eeaie Ymdl We model. XY generalized the and del ymcielann ehdt t We it. to method learning machine ly opnns steiptinformation input the as components) etrepae ntegnrlzdXY generalized the in phases three he a ewr,teproaintransition percolation the network, ral L eai hs,teprmgei and paramagnetic the phase, nematic tlyrgvscretetmt fthe of estimate correct gives layer ut t nydt ln h temperature the along data only ith ′ r yuigsse size system using by ork 6= iewih fpr Ycoupling. XY pure of weight tive etastosi eti statistical certain in transitions se L fPyisadAstronomy, and Physics of ihu n ute training. further any without ) T 7434,USA 1794-3840, c T c n ii nitreit regime intermediate an (iii) and 2 to,wihinvolves which ition, h rttorgmsaeused are regimes two first The . Ymodels XY d T c n a ne naccu- an infer can one , L × L can T c (ii) , 2
For the phase transition characterized by the non-local a CNN and verify that both give successful learning of order parameters, such as the topological phase of the the BKT transition in the XY model. Additionally, we Ising gauge model [2] or the classical XY model [38], the find that using the histograms of the spin orientations convolutional neural network (CNN) is a better tool than also works for the XY model and can be applied effi- the FCN as it encodes spatial information. It was demon- ciently to larger system sizes Additionally, we find that strated in the classical 2d Ising gauge model that the op- using the histograms of the spin orientations also works timized CNN essentially uses violation of local energetic for the XY model and the training can be done efficiently constraints to distinguish the low-temperature from the for larger system sizes. To go beyond the XY model, we high-temperature phase [2]. find it interesting to apply the ML to the generalized XY In percolation, non-local information such as the wrap- (GXY) model [44, 45] as it contains more complex config- ping or spanning of a cluster is needed to characterize the urations such as half-vortices linked by strings (domain phases and the transition in between. Percolation is one walls) and an additional nematic phase. We find the use of the simplest statistical physical models that exhibits of spin configurations (either angles or spin components) a continuous phase transition [39–41], and the system is and histograms both works. The advantage of the latter characterized by a single parameter, the occupation prob- approach is that the training can be done for larger sys- ability p of a site or bond, instead of temperature. If a tem sizes and, moreover, trained network for one system spanning cluster exists in a randomly occupied lattice, size can be applied to other system sizes. then the configuration percolates [42]. Can the neural network be trained to recognize such nonlocal informa- tion and learn or even predict the correct critical point? If so, can it be used to reveal other properties of the con- tinuous transition, such as any of the critical exponents? This motivates us to study percolation using machine learning methods. We first use the unsupervised t-SNE method to charac- (a) (b) terize configurations randomly generated for various oc- cupation probability p and find that it gives clear separa- tion of configurations away from the percolation thresh- old pc. This indicates that other machine learning meth- ods such as supervised ones will likely work, which we also employ. We find that both the FCN and the CNN works for learning the percolation transition, and the networks trained with configurations labeled with information of whether they are generated with p>pc or p σ σ ns N−n system to form a nematic phase. For both ∆ = 0 and 1 Z = X ps (1 ps) s , (1) − the model reduces to the pure XY model (redefining qθ as {σ} θ in the first case), and hence the transition temperature is identical to that of the pure XY model. However, such where, e.g., on the square lattice with N = L L sites, ps σ × a redefinition is not possible with ∆ = 1. The phase is the probability of site occupation, ns is the numbers diagrams of the GXY models (4) depend6 on the integer of sites being occupied in the configuration labeled by σ. parameter q, and they have been explored extensively [44, In order to obtain the critical phase transition points by 45]. Monte-Carlo simulations, usually the wrapping probabil- ity R is defined [42] in the case of the periodic bound- ary condition. With open boundaries, a cluster growing III. PERCOLATIONS large enough could touch the two opposite boundaries and hence it is referred to a spanning cluster. The wrap- Even though we mainly use supervised learning meth- ping cluster is defined as a cluster that connects opposite ods, we will begin the study of percolation using an un- sides that would be in the otherwise open boundary con- supervised method, the t-distributed stochastic neighbor dition, as illustrated in Fig. 1 (a) and (b). A cluster embedding (t-SNE). We shall see that it can characterize forming along either the x or y direction can contribute configurations randomly generated in percolation to two to R. In ML method, we do not need to measure this distinct groups with high and low probabilities p of oc- observable directly and a naive labeling of each configu- cupation as well as a belt containing configurations gen- ration is given according to how it is generated according erated close to the percolation threshold p = p . This to the occupation probability p and p’s relation with the c gives us confidence to proceed with supervised methods critical value (the percolation threshold) p , i.e., whether c such as FCN and CNN. p>pc (say labeled as ‘1’) or p 1 1 (A) (a) the square lattice and on the triangular lattice, respec- tively, both with L = 16. The behavior is similar to that of site percolation. The upshot is that the t-SNE method can characterize percolation configurations into different phases and near the transition. In order to obtain the transition point p , one can probably divide the belt into 0 0 c 1 1 two halves and the probability value p at the cut can be (B) (b) used as an estimation of the percolation threshold. But we do not do that here, as we will use supervised learning below to learn the transition more accurately. 0 0 B. Learning percolation by FCN 1 1 (C ) ( c ) Neuron 0 0 1 2 3 4 XW1 WY2 FIG. 2: (Color online) The t-SNE distribution of site perco- 1 lation on the square lattice with L = 32 for (A) one step of 2 1 iteration, and (B) 2000 steps of iterations. (C) The result for 3 0 bond percolation on the square lattice with size L = 16 after 4 output layer 1100 iterations. The t-SNE distribution of site percolation on Input input layer the triangular lattice with L = 32 for (a) only one step of it- hidden layer eration and (b) 2000 steps of iteration. (c) The result for the bond percolation on triangular lattice at L = 16 after 1100 iterations. Note that in (A), (a), (B) and (b), we only use two colors pink and yellow, but in (C) and (c) the samples are colored according to the probability p of occupation, from pink to yellow. FIG. 3: (Color online) Architecture of the fully connected neural network used to obtain pc for the site percolation t-SNE procedure, where each configuration x contains model. The input is a set of = configurations in a two- 32 32 = 1024 elements 0 or 1. Fig. 2 (A) is the dis- × dimensional lattices with periodic lattice with size L =4. A tribution obtained after only one step of iteration in the percolating configuration is illustrated, where the blue squares t-SNE method. Clearly, after the first iteration, the data are the occupied sites while the empty squares are not occu- for both labels are still mixed together and there is no pied. separation into distinct groups. However, after 2000 it- erations, as shown in Fig. 2 (B), the data converges into In Fig. 3, we show the structure of the FCN (which we three distinct groups, with two concentrated clusters and implemented with the TensorFlow library [48]), which a wide ‘belt’. The concentrated cluster with yellow solid consists of one input layer, one hidden layer and one out- circles indicates data generated from non-percolating (or put layer of neurons. The network between two neighbor- subcritical) phase, i.e. p (or the number of spins in the Ising model), or the total or (ii) the average cross-entropy between such pairs, number of bonds in the bond percolation. The values of 1 the neurons are denoted as the input vector x. CE X X yT j log yj + (1 yT j ) log(1 yj ) . ≡−N − − The hidden layer is to transform the inputs into some- x j thing that the output layer can use and one can employ (8) as many such hidden layers (but we will focus on just An additional term called regularization, such as 2 one hidden layer in FCN). It determines the mapping λ/(2N) i Wi , is introduced to the cost function in P | | relationships. For example, as illustrated in Fig. 3, we order to prevent overfitting. The optimization is done denote W1 as the weight matrix from the input to the with stochastic gradient descent using the TensorFlow hidden layer and b1 the bias vector for the hidden layer. library. The number of neurons in the hidden layers generally is Once the network is optimized after the training stage, chosen to be approximately of the same order as the size we use as input different and independently generated of the input layer or less. Assume the activation function configurations with possibly different sets of p values, and for the neurons in the hidden layer is f. Then the neu- use the average values of the output layer y (sometimes rons in the hidden layer will output yH = f(x W1 +b1). referred to as the average output layer) from the network · Denote W2 as the weight matrix from the hidden to the to estimate the transition. This is referred to as the test output layer and b2 the bias vector for the output layer. stage. The two components in y are in the range [0, 1], Assume the activation function for the neurons in the and the larger the value of the component (associated hidden layer is g. Then the neurons in the output layer with a neuron) gives the more probable prediction the will have states described by y = g(yH W2 + b2). One neuron makes. Usually we plot such average numbers choice usually used as the activation is the· so-called sig- for both output neurons (one is associated label 0 and moid function, the other 1). This results in two curves as a function of the probability p, as illustrated in Fig. 4. The value of 1 p at which the two curves cross is used as an estimate σ(z) , (5) ≡ 1+ e−z of the percolation threshold. Note that, as seen below in Sec. IV, when we encounter three or more phases to related to the Fermi-Dirac distribution in physics. An- identify, then the number of neurons in the output layer other is the so-called softmax function that, when applied will be accordingly three or more. to a vector with components zj , gives another vector with components. (A) (a) 1 L4 1 L8 zj 2 e L16 aj = , (6) , y 0.5 zk 1 L24 Pk e y L32 0 L48 0 0.562 0.593 0.625 0.465 0.5 0.535 and it is related to the Boltzmann weight in physics. Yet (B) p (b) p 0.65 0.54 another choice that has become popular recently is the c p rectifier, defined as fRec(z) = max(0,z), and one unit 0.593 0.5 that uses this activation function is called a rectified lin- 0.55 0.46 ear unit. In principle, one can employ as many hidden 0 0.1 0.2 0.3 0 0.1 0.2 0.3 1/L 1/L layers in the network. It is the universality of the neural (C) (c) network, i.e., it can approximate any given function, that 1 1 makes machine learning powerful. 2 , y 0.5 0.5 1 These weights W ’s and biases b’s need to be optimized y at the training stage. The inputs consist of pairs of x : 0 0 { -0.3 0.2 -0.1 0 0.1 0.2 0.3 -0.4 -0.2 0 0.2 0.4 yT , where each configuration is represented by a vector 1/v 1/v (p - p )L (p - p L x of} L L components of value being 1 (whether a site is c c) occupied)× or 0 (not occupied) and a corresponding label yT indicating whether the configuration x is generated FIG. 4: (Color online) The two values y1 and y2 of the output layer of site percolation on the square lattice (left columns) above or below the percolation threshold pc. This label can be described by by one single binary number, e.g. and triangular lattice (right columns) with the system sizes 1 representing p>p and 0 representing p In Fig. 4 (A), we show the two average values y1 1 2 and y in the output layer, obtained by testing the net- L2 X y(x) yT (x) , (7) 2 ≡ N x | − | work with site percolation configurations in the range of 6 tions according to whether p>pc or p 1 0.5 y L16 L24 transition point pc = 0.5 [51]. The data collapse and 0.2 0.2 0.48 0.5 0.52 0.33 0.347 0.36 the finite size scaling are shown in Figs. 4 (b) and (c), p p (B) (b) respectively. 0.515 0.375 Similarly, we use the FCN to study bond percolation, c p 0.5 0.347 and the results are shown in Figs. 5 (A) and (B) give the 0.485 0.315 average output layer for the square lattice (with sizes L = 0 0.1 0.2 0.3 0 0.1 0.2 0.3 1/L 1/L 8, 16, 24, 32), and finite size scaling (yielding the critical point at pc =0.5), respectively. Figs. 5 (a) and (b) show FIG. 5: (Color online) The two values y1 and y2 of the out- the results of bond percolation on the triangular lattices. put layer of the machine learning for bond percolation on The bond percolation threshold [51] is at 2 sin(π/18) the square lattice (left columns) and triangular lattice (right 0.347 and our result agrees with it. We remark that in≈ columns) with the system sizes L = 4, 8, 16, 24 using the fully the training, each input configuration has 2L L bond connected network. The first and the second rows show the variables for the square lattice and 3L L bond× variables average results in the output layer y1 and y2 and the finite for triangular lattices. × size scaling for the critical points, respectively. Site percolation on the Bethe lattice was studied an- alytically and exact transition was known, and thus it is interesting to apply the neural network. In Fig. 6 (a) 0.562 percolation theory [50]. During the stage of training, the labeling yT gives the (a) information whether a configuration is generated at the 2 K3 , y 0.5 K=1 432 1 K5 probability p greater or less than pc. The information y about whether the individual configuration is percolating or not is not known. However, one would expect that 0.48 0.5 0.52 p for the configurations generated sufficiently away from p = pc, the neural network is learning such a property. FIG. 6: (Color online) (a) Bethe lattice with coordination But close to pc even if a configuration is generated at number z = 3. The lattice sites are represented by solid circles p 1 0.5 0.5 y 12 14 1. CNN structure 0.2 16 0.2 0.55 0.593 0.65 0.48 0.52 (B) 4 (b)e 0.75 6 0.75 2 8 ,y 10 1 0.5 0.5 y 12 14 0.25 16 0.48 0.52 0.25 p 0.33 0.347 0.397 p FIG. 8: (Color online) The two values y1 and y2 of the output layer, using the CNN for site [(A)&(a)] and bond percolation convolution pooling fully connected layers [(B)&(b)] on the square (left) and triangular (right) lattices, respectively. FIG. 7: (Color online) Architecture of the fully connected neural network used to obtain pc for the site percolation model. The input is the configurations in a two-dimensional We repeat the same study of percolation on square and lattice with periodic boundary condition, illustrated with size triangular lattices with CNN. In the CNN structure, we L = 4. The the colored or shaded squares are the occupied use two combination layers (i.e., convolution+maxpool). sites while the white squares are not occupied. In Figs. 8 (A) and (B), we show the results of site perco- lation and bond percolation results on the square lattice, respectively. The critical point is converged to 0.593 and We first begin by discussing the structure of the CNN 0.5, respectively. During the training, for the site perco- shown in Fig. 7, where the input is a two-dimensional lation on e.g. the square lattice, the L L site occupation × array or an image. There is a filter with a small size configuration is used as input. For the bond percolation, such as 5 5 also called a local receptive field, that pro- the 2L L bond configuration is used. As expected, the × cesses information× of a small region. This same local CNN works well in learning the phase in percolation. For receptive field moves along the lattice to give a coarse- the site and bond percolation on the triangular lattices, grained version of the original 2D array or image. We the results are also shown in Figs. 8 (a) and (b). We note can move the filter not one lattice site but a few (which that some slight improvement in the learning of the tran- is usually called stride to the next region). We can also sition point can be made by careful choosing of p’s; see pad the outer regions with columns or rows of zeros so App. C. We also comment that since the Bethe lattices is as to maintain the same size of the filtered array, which not regular and we do not use CNN for the corresponding is referred to as padding. This results in a filtered or percolation problem. But it should be possible in princi- generally coarse-grained hidden layer. We can use many ple. Moreover, the FCN result of the percolation on the different local receptive fields to obtain many such lay- Bethe lattices is good enough. ers, usually referred to as kernels. Roughly speaking, the original image is converted to small ones with differ- ent features. For each kernel, a further processing called 3. Labeling by cluster identifying algorithm max-pooling is done on non-overlapping small patches, e.g., 2 2 regions, that further coarse grain the arrays. In this subsection, we would like to show a different Other pooling× methods can be used. One can repeat such ways of training the data using the non-local property convolution+pooling layers a few times, but we will use of whether the configuration is percolating or not as the one such combination layer in the percolation and two in label y. In the previous works for thermodynamic phase the later part of the XY and the GXY models. After this, transitions [2, 5, 9], for a configuration x at parameter there is a fully connected layer of neurons, as in the FCN. such as T , the corresponding label y is set to be 0 if This can be repeated a few more layers, but we will only T The results in Ref. [38] show that the detection of vortices does not necessarily result in the best classification ac- (a) (b) 1 1 curacy, especially for lattices of less than approximately L8 1000 spins. The advantage may show up for larger sizes, 2 L16 but the training becomes more costly. Here we limit our- , y , L32 1 L8 selves to smaller sizes for using spin orientations (or their y L48 L12 0 0 L20 components) in the training, but later introduce a differ- 0.520 0.593 0.520 0.593 ent approach in feature engineering that can be efficiently (c) p (d) p 1 0.85 applied to larger systems. In term of the two-dimensional L8 image for the CNN, when we use the spin components, L16 the L L sites need to be effectively doubled to L 2L L32 L8 × × L48 L12 that gives the same information of x. L20 0.5 0.6 In the following, we will focus on the pure XY model on 0.520 0.593 0.52 0.593 (e) p (f) p both the square and the honeycomb lattices and the GXY 1 1 model with q =2, 3 and q = 8 on the square lattices. 0.5 L16 0.5 L12 A. The pure XY model 0 0 0 0.593 1 0 0.593 1 p p (A) (a) 0.8 FIG. 9: (Color online) The two values y1 and y2 of the output 0.8 6 4 layer , using the cluster identifying method to label the config- 2 8 6 y 8 1, 0.5 12 0.5 uration, with (a) the FCN and (b) the CNN. The respective y 14 10 accuracy curves of (a) and (b) are shown in (c) with FCN 0.2 16 and in (d) with CNN. Labels {0, 1} for percolation or not are 0.75 0.8935 1 0.2 shown for randomly generated configurations with (e) L = 16 T 0.75 0.8935 1 (B) T and (f) L = 12. Notice the fluctuations near the transition. (b) 0.9 0.9 c T T =0.891(8) 0.8 Tc=0.89(1) 0.8 c to whether or not the configuration has a spanning or 0 0.1 0.2 0 0.1 0.2 wrapping cluster, instead of the relationship between oc- 1/L 1/L cupation probability p and pc. FIG. 10: (Color online) The two values y1 and y2 of the out- Using the new labeling scheme, we show the results put layer using the CNN for XY model by inputting (A){θi} in Figs. 9 (a) and (c) with sizes L = 8, 16, 32, 48 by us- with sizes L = 6, 8, 12, 14, 16 and (a) {sin θi, cos θi} with sizes ing FCN and (b) and (d) with sizes L = 8, 12, 20 by L = 4, 6, 8, 10 on the square lattices. (B) and (b) Finite-size using CNN. Here no information about whether p>pc scaling of the critical points; the results obtained are consis- or p x = (cosθ1, sinθ1, ..., cosθN , sinθN ), (9) Since the phase transition points are already known for the XY model on both lattices [46, 56, 57], i.e., Square Honeycomb or the spin orientations θi . The vorticity are defined Tc = 0.8935(1), and Tc = √2/2, respec- as the winding numbers,{ i.e.,} a collection 1 for vortices tively, it is thus interesting to see whether or not ma- and anti-vortices, but we shall not use those± as input, for chine learning could recognize the phase transition of XY the reason remarked earlier due to the results in Ref. [38]. model. Although the BKT transition of the XY model 9 (A) (a) 0.8 including those at Tc; see App. C and Fig. 20.) In the 4 4 next section we will study the GXY model to extend the 2 6 6 y 8 machine learning beyond the XY model. 1, 0.5 8 y 10 10 12 0.2 0.6 0.707 0.8 0.68 0.707 0.76 B. The generalized XY models (B) T (b) T 0.73 0.73 c 1. q=2 and q=3 T Tc=0.71+/-0.01 Tc=0.725+/-0.009 0.67 0.67 0 0.1 0.2 0 0.1 0.2 1/L 1/L FIG. 11: (Color online) The two values y1 and y2 of the out- (a) q=2 (b) q=3 put layer using the CNN for XY model by inputting (A){θi} P P with sizes L = 4, 6, 8, 10 and (a) {sin θi, cos θi} with sizes 0.8935 0.8935 L = 4, 6, 8, 10, 12 on the honeycomb lattices. (B) and (b) T N F T N Finite-size scaling of the critical points; the results obtained F 0.365 are consistent with Tc = 0.707 using both types of the inputs. 0 0 0 0.5 1 0 0.5 1 ∆ ∆ has been studied by machine learning methods [38], it is 4 4 our motivation to go beyond and study the GXY model. (c ) (d ) In Figs. 10 (A) and (a), we show the learning results ) / using the raw spin configurations (i.e. both spin direc- 2 2 tions θ and spin components sinθ , cosθ ) on the θ { i} { i i} P( square lattices with sizes L =4 to L = 16. The lines y1 & y2 correspond to the average values of the two output 0 0 0 1 2 0 1 2 neurons. Due to the fact that the performance of the θ / θ / network is lowest near the critical points, we use 25000 2 2 Monte-Carlo samples of the training data and of test data ( e ) ( f ) for each temperature T . In this way, we obtain results ) / with low standard deviations around the critical point in 1 1 θ the range 0.75