arXiv:1802.08862v1 [cond-mat.str-el] 24 Feb 2018 lsiyn ufc rb mgsi togycreae el correlated strongly in images probe surface Classifying eti nu “xeine) nsc a st improve to as way a such in (“experience”), input with certain presented when program the modifications is, iterative That deter- to accumulates as metric. ability performance experience, its some with by if increases mined ML task A given exhibit a to diagnosis. perform said medical is vi- program and computer computer economics, bioinformatics, marketing, applications as sion, broad such with disciplines science, across data and science puter from arise they whether . or surface arise material, interactions the the inside whether deep so, from if and interac- present, are whether the tions identifying for responsible formation, is physics complex which can identify networks concen- to neural right trained artificial be the that here at show or We model physics, tration. non-interacting surface-only a interactions from from or from even material, arise a 9] within 4, can deep elsewhere.[3, complexity shown scale-free have encode we Such properties as geometric exponents, critical the the critical by Therefore, controlled is point. that way fixed multi- a in on be- scales complexity length clusters critical- ple spatial geometric displaying near of scale-free, systems configurations come For spatial the formation surfaces. ity, pattern on multiscale observed emergent often information the spatial detailed in the available interpret to majority microscopic guid- how offering the for on treatments ance theoretical date, focused few on with To have [8], data physics treatments of materials[7]. theoretical wealth of of variety increasing wide ever a an yielding ma- scan- of terials, understanding 1982, our revolutionized in have invention elec- probes their ning correlated Since strongly of systems[1–6]. surface tronic the at formation tern ahn erig(L sabrenn edi com- in field burgeoning a is (ML) learning Machine pat- complex reveal often experiments probe Scanning a o n eepc oesnrybtenmcielann a learning machine between synergy u more our expect future. broadens we This machi and system. do, that physical can demonstrates a of configuratio also behavior Usi classifying investigation universal on system. This accuracy met a 97% learning in . achieve machine formation to pattern able the driving are of is classifi investigation model image first unifi physical this the a to approach report revealing alternative we 4] an point,[3, As the critical s materials. that universal disorder-driven systems the a electronic studying to correlated By distinct 2] several scales.[1, in length system multiple electronic on correlated strongly on (AFM) microscopy 1 cnigpoeeprmnssc ssann unln micr tunneling scanning as such experiments probe Scanning eateto hsc n srnm,Pru nvriy We University, Purdue Astronomy, and Physics of Department INTRODUCTION .Burzawa, L. Dtd eray2,2018) 27, February (Dated: 1 .Liu, S. learning 1 n .W Carlson W. E. and erigcndsigihteodrdfo h disordered supervised the models, from ordered different phase.[17] the few distinguish varied a can for learning is that temperature Melko shown and have the Carrasquilla transition.[16] fac- phase as the structure through significantly and change parameter could order tor algorithm the learning that unsupervised “discover” an that Wang systems.[13] of showed wavefunction spin the interacting study im- many-body to quantum theory.[24]. Anderson ML used mean-field Troyer the and dynamical as Carleo and such model[23] physics purity many-body in lems hsc,o rmtebl ftematerial). the of surface bulk by the driven from are or data physics, example, in for determine, observed are (to whether dimensions phenomenon many the how relevant in and a involved Hamiltonian, is interactions the disorder which in quenched term as whether such important, information are identifica- reveal of can type spa- tion This many experiments.[1–6] in resolved observed tially interpreting formation to pattern relevant multiscale most the are con- near-critical these on since focus We a figurations, criticality. generate to configurations to close on are used that focusing was configuration, model can spin which ML particular identify that paper to this used in be show We for transition. responsible phase actually the is Hamiltonian underlying which t efrac ttets,wtottoemodifications those programmed.[10] without explicitly task, being the at performance its ru[8 9,adbgdt suso aeil science[20– Arsenault materials of issues 22]. data big and body renormalization 19], 17], group[18, many glassy transitions[16, quantum transport[14], phase dynamics[15], in quantum example electronic for problems[13], condensed elementary in applied physics, to be to clusters[11] beginning galaxy is ML from particles[12]. scales, of range oee,n n a e akdM ihidentifying with ML tasked yet has one no However, npyis Lhsbe ple ot hsc ta at physics to to applied been has ML physics, In atr omto sdie yproximity by driven is formation pattern aino h atr omto nthese in formation pattern the of cation fe eelcmlxptenformation pattern complex reveal often s ainpolmo oe aeil,here materials, novel of problem cation aigi hs mgs ehv shown have we images, these in caling mgsfo he oeswt Ising with models three from images n drtnigo htmcielearning machine what of nderstanding elann a atr h implicit the capture can learning ne ganua ewr rhtcue we architecture, network neural a ng o odtriewihunderlying which determine to hod dcnesdmte hsc nthe in physics matter condensed nd soy(T)adaoi force atomic and (STM) oscopy tLfyte N497 USA 47907, IN Lafayette, st tal. et crncssesvamachine via systems ectronic 1, ∗ aeue Lt drs prob- address to ML used have 2

In this paper, we are interested in whether ML can classify microscopy images according to the underlying physics driving the complex pattern formation. To be more specific, we want to know whether ML can capture the universal properties and critical behavior implied by the image patterns, and then identify which model gen- erated the image. In the present work we limit ourselves to three theoretical models: the two-dimensional (2D) clean Ising model on a square lattice (Fig. 1); the three- dimensional (3D) clean Ising model on a cubic lattice (Fig. 2); and the square lattice site model (Fig. 3). The Ising models may be written as: − H = J X σiσj (1) where σi = ±1 and J is the coupling strength between nearest neighbor sites, and the summation runs over the sites of either a two-dimensional square lattice or a three- dimensional cubic lattice. Ising models were first used to describe magnetic tran- sitions, from a ferromagnetically ordered phase at low temperature T Tc. Temperatures are in units of J, which point either “up” or “down.” However, the model can be is the coupling strength between Ising variables. Black and applied to a wide variety of physical systems. For exam- white pixels represent Ising variables σ = +1 and −1 respec- ple, the behavior of the critical endpoint of the liquid-gas tively. is well described by the criticality of the three-dimensional Ising model. Electrons inside of solids have their own phase transitions, and when comparing to experiments on these systems, σ may be viewed as a generalized pseudospin in the context of two-component electronic behavior. This may be mapped, for example, to the two perpendicular orientations of nematic stripes in cuprate superconductors,[3] or to the metal and insu- lator islands in VO2.[4] In the absence of interactions, pseudospins may be said to follow a percolation model, which is a non-interacting model. Therefore we also consider site percolation on a square lattice, assigning a probability p to having pseu- dospin σ = 1 on any given site, otherwise the pseudospin is assigned as σ = −1. In two dimensions, this model has a continuous phase transition (and therefore exhibits criticality) at pc =0.59. We aim to explore the efficiency of ML for capturing features associated with the corre- sponding universality class, including interactions, disor- der, and dimension. Other universal features, such as the type of random disorder in the system, and the symmetry of the order parameter, are left for future work.

METHODS FIG. 2. Ising 3D images. Row 1: T Tc. Temperatures are in units of J, which With ML, a software program undergoes significant is the coupling strength between Ising variables. Black and white pixels represent Ising variables σ = +1 and −1 respec- changes based on new input, without those changes be- tively. ing explicitly hard-coded by the programmer. Rather, 3

network topology is illustrated in Fig. 4. The input layer consists of the black and white image itself, where each circle represents one pixel. In the hidden layer, each circle represents a single neuron, with multiple inputs, which turns on according to some nonlinear function of its input weights. The output layer in our case has three nodes, one for each of the three models which may generate a spatially complex image near criticality. The goal of neural network training is to optimize the weights and biases of the network, by minimizing a loss function. In the present case, we use cross as the loss function. The training proceeds towards a point where the loss function is at a minimum. We have used scaled conjugate gradient descent method[27] to mini- mize the loss function. We furthermore use backprop- agation of errors, a type of automatic differentiation in which the derivatives associated with the chain rule are computed in reverse order (i.e. working from the top- most dependent variable down through to the indepen- dent variables).

Neurons 50 75 100 125 150 200 250 % Accuracy 94.93 95.53 95.95 96.48 96.68 97.00 96.72 FIG. 3. Percolation images. Row 1: ppc. On a square lattice, the site percolation threshold is TABLE I. Classification accuracy for a given number of hid- pc = 0.59. Black and white pixels represent variables σ = +1 den neurons. Accuracy initially increases with increasing and −1 respectively. number of hidden neurons.

We use the 2D percolation model and Monte Carlo training algorithms contained within the program train simulations of the 2D clean Ising model and 3D clean the neural network as new input is received.[10] Whereas Ising model to generate images to train the neural net- human visual pattern classification and recognition can work. For the 2D models, the neural network is directly be viewed as a qualitative process, ML turns this into an fed spin configurations near criticality for training pur- explicitly quantitative process, albeit within some mar- poses. For the 3D model, the neural network is fed spin gin of error. In our case, we have used a scaled conjugate configurations from a 2D slice of the 3D lattice. This is gradient (SCG) algorithm[27] under supervised learning to mimic surface probe experiments, which have access conditions to train a neural network algorithm using the to only 2D information. The goal is for the neural net- MATLAB Neural Network Toolbox through XSEDE[28]. work to learn to identify which model generated which With supervised learning, the program is presented with images. a training set of data for which the “right answer” is also The “interesting” cases occur near criticality, where ge- supplied to the training algorithm. The point is to de- ometric clusters (as defined by connected sets of like-spin velop an algorithm that gives the correct output for a nearest neighbors) grow in such a way as to have struc- given input. In our case, the goal will be to identify the ture over multiple length scales. Near criticality, sys- underlying interacting physics model (the output) from tems experience fluctuations over all length scales from which a particular Ising spin configuration was generated atomic to the size of the system. Because there is no (the input). characteristic length scale in between, critical systems In order to accomplish this, we have used an artificial are described by power law behavior, where the expo- neural network with a single hidden layer containing 200 nent of each power law is called a “critical exponent.” “neurons”. Each neuron consists of multiple inputs, a At any second order phase transition, the unique set of nonlinear activation function, and a single output. We critical exponents acts like a fingerprint to identify the use a hyperbolic tangent activation function, which re- universality class of the phase transition, which is set sults in outputs between -1 and 1. The network is defined by universal features such as the dimension of the lat- by its topology (how the neurons are interconnected) and tice and the symmetry of the order parameter (Z2 in the weights associated with each connection. We use a the case of Ising variables σ = ±1), and independent of feedforward neural network, which can be represented by “non-universal” short-distance physics like the inclusion a directed acyclic graph where each edge has a weight and of 2nd-nearest-neighbor interactions.[25, 26] Ultimately, each node has a bias. A very small example of a neural this behavior arises because the geometric clusters have 4

RESULTS AND DISCUSSION

We find that the neural network, once trained to iden- tify which model produced which spin configurations, achieves an average classification accuracy of about 97%, once the number of hidden neurons is at least 150. The dependence of the classification accuracyon the number FIG. 4. Schematic of a small artificial neural network. In of hidden neurons is shown in Table I. The error bars are this case there are 4 features in the input data (for example calculated from the case with 200 neurons where multiple pixels in a 2x2 image), 3 hidden neurons, and 2 outputs (for training runs where conducted. The classification error example cat/dog, or percolation/Ising). drops from around 5% at 50 neurons to around 3% at 150 neurons. Results between 150 and 250 neurons are very similar and adding more neurons than 150 does not help much, although there does appear to be a minimum in the error for 200 neurons. The case study presented in Table shows how errors a character near criticality, whereby each cluster were distributed among the models. The rows represent is scale-free. In these cases, we have shown[3, 4, 9] that the models that images belong to and columns represent the of the shapes of the clusters in the images the models that images were classified as. The most mis- encodes the universality class of the underlying model. classified model is the 3D Ising model. It was particularly The hardest cases to judge by eye are those with roughly hard for the neural network to distinguish between the equal numbers of up and down spins. For this reason, 2D and 3D Ising models. Ising 2D and percolation are the we focus attention within 4% of the critical region of the most easily distinguished with almost negligible errors. Ising models (since magnetization rapidly develops inside the ordered region), and within 15% of the percolation We have shown that ML can be used to classify config- model (because equal numbers of up and down spins in uration images into their associated universality classes, that case is p=0.5, which is close but not at the criti- which means that it identifies implicit universal proper- cal value of pc = 0.59 for site percolation on a square ties underlying the complex pattern formation of image lattice). Note that net magnetization is not a trivial dis- configurations near a critical point. The conventional ap- criminator of these models near criticality. Within these proach to identifying a universality class is to first explic- parameter ranges, each model covers states with no net itly fit the experimental data in order to find the critical magnetization as well as those with net magnetization exponents, then match that set of critical exponents to in the thermodynamic (infinite size) limit. Even more a known universality class. However, the approach de- importantly, for the same nominal set of model param- veloped here does not depend upon critical exponents. eters, thermal fluctuations can cause a net magnetiza- Instead, we feed image data and universality class labels tion to appear in a finite size system for T >Tc. It is to the model and let the ML algorithm especially important to distinguish whether interactions develop its own internal parameters in the neural net- are present, in order to determine in any given surface work. We haven’t programmed the model to include any probe experiment whether interactions are responsible theories from physics, but the model learns directly from for the observed pattern formation, in order to distin- the data, and then captures the universal features in the guish it from putative “dirt effects” which may arise at configuration images to identify the universality class. the surface of a material. Because the Ising models are Thus, we have shown that it is possible to go directly near criticality, we use the Wolff algorithm, and sam- from data to an identification of the universality class ple spin configurations every 100 Monte Carlo steps to of that data, without explicitly considering any critical ensure independent sampling. We simulate system sizes exponents or scaling theory. of L2 = 1002 for the 2D case, and L3 = 1003 for the 3D This finding broadens our understanding of the extent case. (For comparison, experimental images from surface to which ML can be applied, and thus has important im- probes in condensed matter physics today are typically plications in the field of ML and computer vision where about 200×200 pixels.[8]) The images passed to the neu- the image classification problem is mainly focused on ex- ral network consist of 100 × 100 = 10, 000 pixels, called plicit object recognition, rather than discovering under- “features” in the context of ML; the number of training lying physics. We also expect that machine learning can examples given to the neural network must be signifi- be used to study and even learn more complex physics cantly greater than We use 400,000 configurations from given large enough datasets. This synergy between ma- each model, for a total of 1,200,000 configurations. These chine learning and physics opens a new perspective for configurations are then divided randomly, using 70% for future research in condensed matter physics, or even in all training, 15% for validation, and 15% for testing. fields of physics, that we can draw important conclusions 5

Neural network output [2] Y. Kohsaka, C. Taylor, K. Fujita, A. Schmidt, C. Lupien, perc I-2D I-3D T. Hanaguri, M. Azuma, M. Takano, H. Eisaki, H. Tak- perc 98.05% 0.01% 1.94% agi, S. Uchida, and J. C. Davis, Science 315, 1380 (2007). Input model I-2D 0.03% 98.30% 1.67% [3] B. Phillabaum, E. W. Carlson, and K. A. Dahmen, Nat. I-3D 1.82% 2.96% 95.22% Commun. 3, 915 (2012). [4] S. Liu, B. Phillabaum, E. W. Carlson, K. A. Dahmen, TABLE II. Classfication results for testing data for the case N. S. Vidhyadhiraja, M. M. Qazilbash, and D. N. Basov, of 200 neurons in the hidden layer. The model listed on each Phys. Rev. Lett. 116, 036401 (2016). row denotes the Hamiltonian used to generate spin configu- [5] E. Dagotto, Science 309, 257 (2005). rations to present to the neural network. In each column, the [6] A. Moreo, M. Mayr, A. Feiguin, S. Yunoki, and output of the neural network is listed. The goal is to train E. Dagotto, Phys. Rev. Lett. 84, 5568 (2000). the neural network to identify which model produced which [7] G. Binnig and H. Rohrer, Reviews of Modern Physics 71, spin configurations. 324 (2013). [8] D. A. Bonnell, D. N. Basov, M. Bode, U. Diebold, S. V. Kalinin, V. Madhavan, L. Novotny, M. Salmeron, U. D. Schwarz, and P. S. Weiss, Reviews of Modern Physics via ML, without the need to interpret the intermediate 84, 1343 (2012). steps of the ML algorithm. [9] E. W. Carlson, S. Liu, B. Phillabaum, and K. A. Dah- men, Journal of Superconductivity and Novel Magnetism (2015). [10] S. Russell and P. Norvig, “Artificial intelligence: A mod- CONCLUSION ern approach,” (Pearson, 2015) Chap. V. [11] N. Ball and R. Brunner, Int. Journal of Modern Physics D 19, 1049 (2010). In conclusion, we have shown that machine learning [12] S. Whiteson and D. Whiteson, Engineering Applications can successfully determine which spin configurations were of AI 22, 1203 (2009). 355 generated from which theoretical model when the images [13] G. Carleo and M. Troyer, Science , 602 (2017), arXiv:1606.02318. are near criticality. As scanning probe experimental ca- [14] A. Lopez-Bezanilla and O. von Lilienfeld, Phys. Rev. B pabilities grow, they are producing a growing wealth of 89 (2014). data, including more examples of systems in which the [15] S. Schoenholz, E. Cubuk, D. Sussman, and et al., Nature electronic textures display multiscale pattern formation Physics 12 (2016). at the surface of the material.[1, 2, 29, 30] In cases where [16] L. Wang, “Discovering phase transitions with unsuper- the pattern formation is driven by proximity to a critical vised learning,” (2016), arXiv: 1606.00318. point,[3, 4, 9] the techniques employed here can be used [17] J. Carrasquilla and R. Melko, “Machine learning phases of matter,” (2016), arXiv: 1605.01735. to identify the underlying physics driving the pattern for- [18] P. Mehta and D. Schwab, “An exact mapping between mation, without the need to explicitly determine critical the variational renormalization group and deep learning,” exponents or scaling forms. (2014), arXiv: 1410.3831. [19] C. Beny, “Deep learning and the renormalization group,” (2013), arXiv: 1301.3124. [20] A. Kusne, T. Gao, A. Mehta, and et al., Scientific Re- ACKNOWLEDGEMENTS ports 4 (2014). [21] L. M. Ghiringhelli, J. Vybiral, S. V. Levchenko, C. Draxl, and M. Scheffler, Phys. Rev. Lett. 114, 105503 (2015). LB acknowledges support from the SURF program [22] S. Kalinin, B. Sumpter, and R. Archibald, Nature Ma- through the College of Engineering at Purdue Univer- terials 14 (2015). sity. SL acknowledges support from the Bilsland Dis- [23] L.-F. Arsenault, A. Lopez-Bezanilla, O. A. von Lilienfeld, sertation Fellowship at Purdue University. EWC and SL and A. J. Millis, Phys. Rev. B 90, 155136 (2014). acknowledge support from NSF DMR-1508236 and Dept. [24] L.-F. Arsenault, O. A. von Lilienfeld, and A. J. Mil- of Education Grant No. P116F140459. This work used lis, “Machine learning for many-body physics: effi- the Extreme Science and Engineering Discovery Environ- cient solution of dynamical mean-field theory,” (2015), arXiv:1506.08858. ment (XSEDE), which is supported by National Science [25] M. E. Fisher, Rev. Mod. Phys. 46, 597 (1974). Foundation grant number ACI-1053575. [26] H. E. Stanley, Introduction to Phase Transitions and Critical Phenomena (Oxford University Press, 1971). [27] M. F. Moller, Neural Networks 6, 525 (1993). [28] J. Towns, T. Cockerill, M. Dahan, I. Foster, K. Gaither, A. Grimshaw, V. Hazlewood, S. Lathrop, D. Lifka, G. D. ∗ [email protected] Peterson, R. Roskies, J. R. Scott, and N. Wilkins-Diehr, [1] M. M. Qazilbash, M. Brehm, B.-G. Chae, P.-C. Ho, G. O. Computing in Science and Engineering 16, 62 (2014). Andreev, B.-J. Kim, S. J. Yun, A. V. Balatsky, M. B. [29] S. Lupi, L. Baldassarre, B. Mansart, A. Perucchi, Maple, F. Keilmann, H.-T. Kim, and D. N. Basov, A. Barinov, P. Dudin, E. Papalazarou, F. Rodolakis, Science 318, 1750 (2007). J. P. Rueff, J. P. Iti´e, S. Ravy, D. Nicoletti, P. Postorino, 6

P. Hansmann, N. Parragh, A. Toschi, T. Saha-Dasgupta, anen, F. Baumberger, and M. P. Allan, Nature Physics O. K. Andersen, G. Sangiovanni, K. Held, and M. Marsi, (2016). Nature Communications 1, 105 (2010). [30] I. Battisti, K. M. Bastiaans, V. Fedoseev, A. de la Torre, N. Iliopoulos, A. Tamai, E. C. Hunter, R. S. Perry, J. Za-