A Learning Algorithm with Emergent Scaling Behavior for Classifying Phase Transitions
Total Page:16
File Type:pdf, Size:1020Kb
A learning algorithm with emergent scaling behavior for classifying phase transitions N. Maskara,1, 2, ∗ M. Buchhold,1, 3 M. Endres,1 and E. van Nieuwenburg1, 4 1Division of Physics, Mathematics and Astronomy, California Institute of Technology, Pasadena, CA 91125, USA 2Department of Physics, Harvard University, Cambridge, MA 02138, USA 3Institute for Theoretical Physics, University of Cologne, Z¨ulpicher Strasse 77, 50937, Cologne, Germany 4Niels Bohr International Academy, Niels Bohr Institute, University of Copenhagen, Blegdamsvej 17, 2100 Copenhagen, Denmark (Dated: March 31, 2021) Machine learning-inspired techniques have emerged as a new paradigm for analysis of phase transitions in quantum matter. In this work, we introduce a supervised learning algorithm for studying critical phenomena from measurement data, which is based on iteratively training convolutional networks of increasing complexity, and test it on the transverse field Ising chain and q = 6 Potts model. At the continuous Ising transition, we identify scaling behavior in the classification accuracy, from which we infer a characteristic classification length scale. It displays a power-law divergence at the critical point, with a scaling exponent that matches with the diverging correlation length. Our algorithm correctly identifies the thermodynamic phase of the system and extracts scaling behavior from projective measurements, independently of the basis in which the measurements are performed. Furthermore, we show the classification length scale is absent for the q = 6 Potts model, which has a first order transition and thus lacks a divergent correlation length. The main intuition underlying our finding is that, for measurement patches of sizes smaller than the correlation length, the system appears to be at the critical point, and therefore the algorithm cannot identify the phase from which the data was drawn. Introduction. – Machine learning techniques have emerged as a new tool for analyzing complex many-body systems [1, 2]. A particularly well-studied application of such techniques is that of the identification and classification of phase tran- sitions directly from data, assuming little to no prior knowl- edge of the underlying physics [3–12]. Recent efforts have expanded such explorations to a diverse range of systems including disordered [13–15] and topologically ordered sys- tems [16–20], as well as applications to experiments [21–23]. An often-voiced concern, however, is that machine learn- ing methods appear as a black box and that it is difficult to trust neural network classification without traditional support- ing evidence. For example, in the study of phase transitions, the phase boundary identified by a machine learning algo- rithm may be affected by short-distance correlations, which turn out to be irrelevant for the thermodynamic phase of the system [8]. Instead, learning algorithms should ideally focus Figure 1. Conceptual illustration of our method for a 1D spin-chain. on phase transition features which characterize the transition, Snapshots near a second order phase transition reveal a characteristic such as power-law divergences near the critical point of a sec- length scale ξcorr, over which spins are correlated, and which diverges k ond order phase transition. at the critical point. The modules m are designed to only capture correlations in the data up to a certain length scale `k, and their out- In this paper, we develop a machine learning algorithm in- k puts are aggregated into M . On length scales shorter than ξcorr, the spired by this fundamental feature of critical phenomena, i.e. algorithm cannot make a firm distinction between the two phases. As the emergence of long-distance correlations and scale invari- the module size is increased, the prediction hMki is improved until ance. The algorithm systematically analyzes critical behavior `k ∼ ξcorr, after which it quickly saturates. near a suspected transition point, using only snapshot data, by varying the functional form of a neural network. Specifically, we restrict the architecture so the network can only access Our main result is the identification of an emergent length arXiv:2103.15855v1 [cond-mat.stat-mech] 29 Mar 2021 patches of the snapshot at a time, and then vary the largest patch size. The resulting architecture is similar to those in scale, which we extract from classification data, and which Refs [10, 24–27]. We observe that, under these conditions, displays scaling behavior. Physical arguments suggest that classification we can extract information about the spatial growth of corre- this length scale reflects the system’s correla- lations in the underlying data from the behavior of the classi- tion length, which diverges at the critical point according to a fication probabilities. power law with a universal exponent. We exemplify this on the one-dimensional (1D) transverse field Ising model, whose critical exponents are known, and compare the scaling of the physical correlation length with the classification length. We ∗ [email protected] also consider a first order transition in the two-dimensional 2 (2D) q = 6 Potts model, where, in-line with expectations, no cially, during the k-th step of training, the variational param- scaling of the classification length is observed. eters (weights) of all the modules m j; j < k are frozen, and k The learning algorithm proves quite versatile, and neither hence m is trained only on the residuals y˜i of all the prior requires prior knowledge of the structure of the order pa- models. These residuals are the leftover errors from the pre- rameter nor of the measurement data. We demonstrate this vious training step, and each subsequent module only learns by considering projective measurements in different measure- features which prior modules did not capture. In our actual ment bases, with which the thermodynamic phase and its or- implementation, we use a linear classifier for the aggregation der parameter cannot be readily inferred from conventional function and the binary cross-entropy for the loss function (see two-point correlation functions. Nevertheless, the algorithm is below). As a result, our modules are not exclusively trained on capable of learning more complex correlations, and manages residuals, but the intuitive picture of subtracting prior features to distinguish the two phases with a high degree of accuracy. from the cost function still approximately holds. This is especially promising in light of studying phase tran- The Network. – In this section, we’ll provide more details sitions with non-local or hidden order, for which algorithms about our choice of neural networks. For the modules, we based on purely local structures hardly gain access. develop a class of convolutional neural networks designed to The Algorithm. – We start by introducing a supervised probe the spatial locality of correlations. Each module mk learning algorithm that allows one to systematically add com- takes the functional form plexity to a machine learning model. The model is composed 1 X of a set of independent computational units, termed modules. mk(~x) = mk(~x`k ) (2) N 0 j The algorithm takes and trains modules iteratively, and, by de- j sign, each new module learns correlations in the data that the k prior modules did not capture. Conceptually, complexity is where m0 is a two layer neural network that acts on a subset added by increasing the amount of correlations representable `k ~x of the data ~x. The label `k indicates the size of a spatial by the model in each step. Here, we are interested in scal- j region corresponding to the subset (e.g., `k adjacent sites in ing behaviour near critical points, so each subsequent module a one-dimensional lattice) and the index j enumerates all N is designed to capture spatial correlations at a larger length- regions of size `k. scale. i The aggregation function we choose is a linear classifier Each module m : ~x ! R, labeled with an index i, takes as LC, acting on the module outputs mk(~x). input a snapshot (projective measurement) ~x and maps it to a k scalar. Then, we apply an aggregation function M that aggre- MK(~x) = LC m1(~x); m2(~x); :::mK(~x) : (3) gates the outputs of the first k modules m1; :::; mk. In practice, both the modules and the the aggregation function are imple- The linear classifier is defined by LC(fmi(~x)g) = mented through a neural network. The modules plus the ag- σ(P w mi(~x) − b), where σ is the sigmoid (or logistic) gregation function constitute our combined machine learning i i function σ(z) = 1=(1 + e−z), and w and b are free parameters. model Mk(m1; :::; mk): ~x ! R, mapping an input ~x to a cor- i The non-linearity σ maps the linear combination of module responding classification target y. This model is then trained outputs to a value between [0; 1], as expected for binary to minimize a classification loss function L using the iterative classification. The loss function we use is the binary cross en- training algorithm described in pseudo-code in Algorithm 1: k k k tropy, L(yi; M (xi)) = hy log M (xi) + (1 − y) log(1 − M (xi))i, Algorithm 1: Iterative Training Algorithm where yi is the target label yi 2 f0; 1g and the expectation value is taken over the training dataset. input : Sequence of modules m1; :::; ml k 1 k Choosing a linear classifier ensures that the network can- input : Aggregate models M (m ; :::; m ); k ≤ l not capture additional, spurious correlations between mod- input : Labeled dataset f(~xi; yi)g ules. For example, for a linear classifier, the first module input : Loss function L k that extracts information on a 5-site region will be the module Result: Trained set of models fM g; k = 1; :::; l m5(~x). Instead, a non-linear classifier, which contained e.g. for k = 1; :::; l do k terms quadratic in the arguments, would also include products train M on dataset (~xi; yi) by minimizing m2 ~x m3 ~x non-local k of the form ( ) ( ).