Emergence of a Finite-Size-Scaling Function in the Supervised Learning

Emergence of a finite-size-scaling function in the supervised learning of the Ising phase transition Dongkyu Kim and Dong-Hee Kim Department of Physics and Photon Science, Gwangju Institute of Science and Technology, Gwangju 61005, Republic of Korea E-mail: [email protected] Abstract. We investigate the connection between the supervised learning of the binary phase classification in the ferromagnetic Ising model and the standard finite- size-scaling theory of the second-order phase transition. Proposing a minimal one-free- parameter neural network model, we analytically formulate the supervised learning problem for the canonical ensemble being used as a training data set. We show that just one free parameter is capable enough to describe the data-driven emergence of the universal finite-size-scaling function in the network output that is observed in a large neural network, theoretically validating its critical point prediction for unseen test data from different underlying lattices yet in the same universality class of the Ising criticality. We also numerically demonstrate the interpretation with the proposed one- parameter model by providing an example of finding a critical point with the learning of the Landau mean-field free energy being applied to the real data set from the uncorrelated random scale-free graph with a large degree exponent. arXiv:2010.00351v2 [cond-mat.stat-mech] 17 Feb 2021 2 1. Introduction Understanding how an artificial neural network learns the state of matter is an intriguing subject for the applications of machine learning to various domains including the study of phase transitions in physical systems [1{5]. In a typical form of the multilayer perceptron, a neural network consists of layers of neurons that are connected through a feedforward network structure. The network can produce a mathematical function approximating any desired outputs for given inputs in principle [6{9], and one can optimize associated neural network parameters for a particular purpose in a data-driven way. In the supervised learning for the classification of data, which we particularly focus on in this study, the network is optimized to reproduce the labels of the already classified training input data. Remarkably, the neural network trained in a data-driven way often produces prediction with some accuracy even for unacquainted data of a similar type, which is not necessarily from the same data set or system given in the training. With various machine learning schemes being examined, the phase classification and the detection of a phase transition point have been extensively studied in classical and quantum model systems in recent years [10{50]. Because it is data-driven, rather than being based on the first principles, witnessing the empirical successes naturally leads to fundamental questions such as what specific information the neural network learns from the training data, to what extent and why it works even for unacquainted data or systems, how trustworthy such data-driven prediction can be, and most importantly, what is the mathematical foundation of the learnability. A general difficulty in addressing these questions is due to the nature of the \black box" model where one can hardly see inside because of high complexity generated by the interplay between a large number of neural network components. While the opaque nature may not harm its empirical usefulness especially when it works as a recommender, transparency can be crucial in the applications requiring extreme reliability where one wants logical justification of how it reaches such predictions. Explainable machine learning to deal with issues along this direction has attracted much attention in domains of scientific applications [51]. In the machine-learning detection of phase transitions and critical phenomena, there are increasing efforts in interpreting how the machine prediction works or designing transparent machines such as demonstrated in several previous studies [19{21, 32{38, 47]. Our goal in this paper is to interpret the predicting power of a neural network classifying the phases of the Ising model into the conventional physics language of the critical phenomena by proposing an analytically solvable model network having just one free parameter. The Ising model has been employed as a popular test bed of machine learning and is particularly useful for our purpose of discussing the learnability since it is a well- established model of the second-order phase transition in statistical physics. Our work is closely related to the seminal work by Carrasquilla and Melko [10] where a network with a single hidden layer of 100 neurons was trained with the pre-assigned phase labels of the Ising spin data that were given accordingly whether the data is sampled below 3 or above the known critical point of the training system. It turned out that the one trained for the square lattices was reusable to the unseen data from the triangular lattices without any cost of a new training, providing a good estimate of a critical point with a finite-size-scaling behavior. In our previous work [21], we investigated this reusability by downsizing the neural network. We found that the hidden layer could be as small as the one with just two neurons without loss of the prediction accuracy. In the downsized network model, we argued that its reusability to the systems in the different lattices is encoded in the system-size scaling behavior of the network parameters which is universal for any other lattices in the same universality class. In this paper, we further simplify the neural network model into a minimal setting with just one free parameter, providing a more transparent mathematical view on how the learning and prediction of the critical point occurs with the data of the Ising model. Despite the minimal design, we find that a single parameter is only necessary to capture the behavior observed in a large neural network that plays an essential role in the predicting accuracy and the reusability acquired from the training. The present one- parameter model improves the idea of the previous neural network models [10, 21] in terms of transparency and analytical interpretability. In our previous two-node model [21] that needs two free parameters, the fluctuations of the order parameter were ignored for the convenience of analytic treatment, which we find is important and now fully incorporated into the present derivation with the one-parameter model. On the other hand, the three-node model proposed previously by Carrasquilla and Melko [10] also had one free parameter but unfortunately was not analytically explored further. While the third hidden neuron is unnecessary in our model, our derivations and discussions can be directly applied to the previous three-node model because of the similarity between their functional forms of the output. Analytically minimizing the cross entropy for the supervised learning with the canonical ensemble at an arbitrarily large system size, we show that the trained network output becomes a universal scaling function of the order parameter with the standard critical exponents. This emergence of the scaling function is consistent with the empirical observation in a large neural network, which we find works as a universal kernel for the prediction with unseen test data from different lattices but belonging to the same Ising universality class. We demonstrate the operation of the one-parameter model by presenting the learning with the Landau mean-field free energy and its prediction accuracy of the critical point with the data from the uncorrelated random scale-free graph that belongs to the mean-field class. This paper is organized as follows. In Sec. 2, the procedures of the supervised learning are described. In Sec. 3, the implications of the scaling form emerging in the network output are discussed. In Sec. 4, the one-parameter neural network is presented with the derivation of the analytic scaling solution. The demonstration with the Landau mean-field free energy and the application to the data of the Ising model on the random scale-free graph is given in Sec. 5. The summary and conclusions are given in Sec. 6. 4 2. Supervised learning of the phase transition in the Ising model We consider the classical ferromagnetic spin-1=2 Ising model with the nearest-neighbor exchange interactions without a magnetic field, which is described by the Hamiltonian P H = −J hi;ji sisj where the spin si at a site i takes the value of either 1 or −1, and the summation runs over all the nearest-neighbor sites hi; ji in the given lattices. The interaction strength J and the Boltzmann constant kB are set to be unity throughout this paper. The spin configuration s ≡ fs1; s2; : : : ; sN g is given as an input to train the neural network, which is labeled as the ordered or disordered phase depending on whether the temperature associated with the data is lower or higher than the critical temperature Tc given for the supervision. The learning with the labeled data for the binary classification can be done by minimizing the cross entropy [56,57], X L(x) = − [Q(s) ln F (s; x) + (1 − Q(s)) ln (1 − F (s; x))] ; (1) s with respect to the neural network parameters x. The function Q(s) returns the binary value 0 or 1 representing the label of the data s. The function F (s; x) is the output of the neural network, giving a value between 0 and 1 for an input s. The parameter x is to be optimized to maximize the likelihood between the distribution of the output F and the given distribution of the actual label Q. We prepare the data set of spin configurations at a given temperature by assuming an unbiased sampling with the Boltzmann probability in the canonical ensemble.

Load more