PHYSICAL REVIEW RESEARCH 2, 023369 (2020)

Machine learning holographic mapping by neural network renormalization group

Hong-Ye Hu ,1 Shuo-Hui Li ,2,3 Lei Wang,2,4,5 and Yi-Zhuang You 1,* 1Department of Physics, University of California at San Diego, La Jolla, California 92093, USA 2Institute of Physics, Chinese Academy of Sciences, Beijing 100190, China 3University of Chinese Academy of Sciences, Beijing 100049, China 4CAS Center for Excellence in Topological Quantum Computation, University of Chinese Academy of Sciences, Beijing 100190, China 5Songshan Lake Materials Laboratory, Dongguan, Guangdong 523808, China

(Received 28 February 2020; accepted 1 June 2020; published 19 June 2020)

Exact holographic mapping (EHM) provides an explicit duality map between a conformal field theory (CFT) configuration and a massive field propagating on an emergent classical geometry. However, designing the optimal holographic mapping is challenging. Here we introduce the neural network renormalization group as a universal approach to design generic EHM for interacting field theories. Given a field theory action, we train a flow-based hierarchical deep generative neural network to reproduce the boundary field ensemble from uncorrelated bulk field fluctuations. In this way, the neural network develops the optimal renormalization-group transformations. Using the machine-designed EHM to map the CFT back to a bulk effective action, we determine the bulk geodesic distance from the residual mutual information. We have shown that the geometry measured in this way is the classical saddle-point geometry. We apply this approach to the complex φ4 theory in two-dimensional Euclidian space-time in its critical phase, and show that the emergent bulk geometry matches the three-dimensional hyperbolic geometry when geometric fluctuation is neglected.

DOI: 10.1103/PhysRevResearch.2.023369

I. INTRODUCTION theory of different scales are mapped to different depths in the bulk, and vice versa. The field variable deep in the The holographic duality, also known as the anti-de Sitter bulk represents the overall or infrared (IR) features, while space and conformal field theory correspondence (AdS-CFT) the variable close to the boundary controls the detailed or [1–4], is a duality between a CFT on a flat boundary and a ultraviolet (UV) features. Such a hierarchical arrangement gravitational theory in the AdS bulk with one higher dimen- of information is often observed in deep neural networks, sion. It is intrinsically related to the renormalization-group particularly in convolutional neural networks (CNNs) [16]. (RG) flow [5–11] of the boundary quantum field theory, since The similarity between renormalization group and deep learn- the dilation transformation, as a part of the conformal group, ing has been discussed in several works [17–22]. With the naturally corresponds to the coarse-graining procedure in the development of in quantum many-body RG flow. The extra dimension emergent in the holographic physics [23–28], techniques have also been bulk can be interpreted as the RG scale. In the traditional employed to construct the optimal RG transformations [29,30] real-space RG [12], the coarse-graining procedure decimates and to uncover the holographic geometry [31–34]. In this irrelevant degrees of freedom along the RG flow, therefore work, we further explore the possibility of designing the EHM the RG transformation is irreversible due to information loss. for interacting quantum field theories using deep learning However, if the decimated degrees of freedom are collected approaches. By training a flow-based hierarchical generative and hosted in the bulk, the RG transformation becomes a model [35,36] to generate field configurations following the bijective map between the degrees of freedom on the CFT probability distribution specified by the field theory action, the boundary and the degrees of freedom in the AdS bulk. model converges to the optimal EHM with an emergent bulk Such mappings, generated by information-preserving RG gravitational description, where neural network parameters transforms, are called exact holographic mappings (EHMs) and latent variables correspond respectively to the gravity [13–15], which were first formulated for free-fermion CFT. (geometry) and matter degrees of freedom in the holographic Under the EHM, the boundary features of a quantum field bulk. The learned holographic mapping is useful for both sam- pling and inference tasks. In the sampling task, the is used to propose efficient global updates for boundary *[email protected] field configurations, which helps to boost the Monte Carlo simulation efficiency of the CFT. In the inference task, the Published by the American Physical Society under the terms of the boundary field theory is pulled back to an effective theory of Creative Commons Attribution 4.0 International license. Further the bulk field, which enables us to probe the emergent dual distribution of this work must maintain attribution to the author(s) geometry (on the classical level) by measuring the mutual and the published article’s title, journal citation, and DOI. information in the bulk field.

2643-1564/2020/2(2)/023369(12) 023369-1 Published by the American Physical Society HU, LI, WANG, AND YOU PHYSICAL REVIEW RESEARCH 2, 023369 (2020)

field on the holographic boundary. The inverse RG can be considered as a deep generative model G, which organizes the bulk field ζ (x, z) to generate the boundary field φ(x), φ(x) = G[ζ (x, z)]. (1) The renormalization G−1 and generation G procedures are thus unified as the forward and backward maps of a bijective (invertible) map between the boundary and the bulk, known as the EHM [13,14]. At first glance, such an information-preserving RG does not seem to have much practical use, because ζ (x, z)issimply a rewriting of φ(x), which does not seem to reduce the degrees of freedom. However, since the bulk field ζ (x, z) represents the irrelevant feature to be decimated under RG, it should look like independent random noise, which contains a minimal amount of information. So instead of memorizing the bulk FIG. 1. Relation between (a) RG and (b) generative model. The field configuration ζ (x, z) at each RG scale for reconstruction inverse RG can be viewed as a generative model that generates the ζ , ensemble of field configurations from random sources. The random purposes, we can simply sample (x z) from uncorrelated sources are supplied at different RG scales (coordinated by z), which (or weakly correlated) random source and serve them to the ζ , can be viewed as a field ζ (x, z) living in the holographic bulk with inverse RG transformation. Suppose the bulk field (x z)is ζ one more dimension. The original field φ(x) is generated on the drawn from a prior distribution Pprior[ ], the transformation holographic boundary. φ = G[ζ ] will deform the prior distribution to a posterior distribution Ppost[φ] for the boundary field φ(x), II. RENORMALIZATION GROUP   −  δG[ζ ]  1 AND GENERATIVE MODEL P φ = P ζ   , post[ ] prior[ ]det δζ  (2) Renormalization group (RG) plays a central role in the −1 study of quantum field theory (QFT) and many-body physics. where | det(δζ G)| is the Jacobian determinant of transfor- The RG transformation progressively coarse-grains the field mation. In such a manner, the objective of the inverse RG configuration to extract relevant features. The coarse-graining is not to reconstruct a particular original field configuration, rules (or the RG schemes) are generally model dependent but to generate an ensemble of field configurations φ(x), φ and require human design. Take the real-space RG [12]for whose probability distribution Ppost[ ] should better match the example: for a ferromagnetic , the RG rule should , progressively extract the uniform spin components as the −S [φ] P [φ] = e QFT /Z , (3) most relevant feature; however, for an antiferromagnetic Ising target QFT φ model, the staggered spin components should be extracted specified by the action functional SQFT[ (x)] of the boundary − φ = SQFT[ ] instead; if the spin couplings are randomly distributed on field theory, where ZQFT [φ] e denotes the partition the lattice, the RG rule can be more complicated. When it function. comes to the momentum-space RG [37], the rule becomes This setup provides us a theoretical framework to discuss to renormalize the low-energy degrees of freedom by inte- the designing principles of a good RG scheme. We propose grating out the high-energy degrees of freedom. What is the two objectives for a good RG scheme (or EHM): the RG general design principle behind all these seemly different RG transformation should aim at decimating irrelevant features schemes? Can a machine learn to design the optimal RG and preserving relevant features, and the inverse RG must scheme based on the model action? aim at generating field configurations matching the target field With these questions in mind, we take a closer look at theory distribution Ptarget[φ]inEq.(3). An information theo- the RG procedure in a lattice field theory setting. In the retic criterion for “irrelevant” features is that they should have traditional RG approach, the RG transformation is invertible minimal mutual information, so the prior distribution Pprior[ζ ] due to information loss at each RG step when the irrelevant should be chosen to minimize the mutual information between features are decimated, as illustrated in Fig. 1(a). However, bulk fields at different points, i.e., min I(ζ (x, z):ζ (x, z )). if the decimated features are kept at each RG scale, the RG We refer to this designing principle as the minimal bulk transformation can be inverted. Under the inverse RG flow, mutual information (minBMI) principle, which is a general the decimated degrees of freedom ζ (x, z) are supplied to information theoretic principle behind different RG schemes each layer (step) of the inverse RG transformation such that and is independent of the notion of field pattern or energy the field configuration φ(x) can be regenerated, as shown in scale. The close relation between RG and deep learning has Fig. 1(b). Here we assume that the φ(x) field is defined in a flat been discussed in several early works [18,20,21]. However, d Euclidean space-time coordinated by x = (x1, x2,...) ∈ R , as pointed out in Refs. [30,38], the hierarchical architecture then ζ (x, z) will live on a manifold with one higher dimension, itself cannot guarantee the emergence of RG transformation and the extra dimension z corresponds to the RG scale. Given in a deep neural network. Additional information theoretic its close analogy to the holographic duality, we may view principles must be imposed to guide the learning. In light ζ (x, z) as the field in the holographic bulk and φ(x)asthe of this observation, Refs. [30,38] proposed the maximal

023369-2 MACHINE LEARNING HOLOGRAPHIC MAPPING BY … PHYSICAL REVIEW RESEARCH 2, 023369 (2020) real-space mutual information (maxRSMI) principle, which where we have used Eq. (3) to express Ptarget[φ] in terms of aims at maximizing the mutual information between the ζ . The normalization of the bulk field probability distribution ζ = coarse-grained field and the fine-grained field in the surround- [ζ ] Pbulk[ ] 1 further implies that the QFT partition func- ing environment. Our minBMI principle is consistent with and tion ZQFT, which was originally defined on the holographic is more general than the maxRSMI principle (see Appendix A boundary, can now be written in terms of the bulk field ζ as for a detailed discussion about the relation between these two well principles).  −SQFT[G[ζ ]]+ln det(δζ G) In the simplest setting, we can hard code the minBMI prin- ZQFT = e . (7) ciple by assigning the prior distribution to the uncorrelated [ζ ] Gaussian distribution, Note that Z is by definition independent of G,sowe ζ = N ζ , ∝ −ζ 2 , QFT Pprior[ ] [ ;0 1] e (4) G  are allowed to sum over all possible on both sides of ζ 2 = |ζ , |2 Eq. (7), which establishes a duality between the following two where x,z (x z) . Hence the mutual information vanishes for every pair of points in the holographic bulk. partition functions: Given the prior distribution, the problem of finding the opti-   −SQFT[φ] −Sgrav[ζ,G] mal EHM boils down to training the optimal generative model ZQFT = e ↔ Zgrav = e , (8) G to minimize the Kullback-Leibler (KL) divergence between [φ] [ζ,G] the posterior distribution Ppost[φ]inEq.(2) and the target distribution Ptarget[φ]inEq.(3), i.e., min L with with the bulk theory Sgrav given by L = KL(P [φ]  P [φ]) ζ, = ζ − δ . post target   Sgrav[ G] SQFT[G[ ]] ln det( ζ G) (9) δG[ζ ] = E SQFT[G[ζ ]] + ln Pprior[ζ ] − ln det , (5) ζ ∼ δζ By “duality” we mean that ZQFT and Zgrav only differ by Pprior = a proportionality constant (because Zgrav [G] ZQFT), so where Eζ ∼Pprior denotes the average over the ensemble of they are equivalent descriptions of the same physics theory. ζ drawn from the prior distribution. This fits perfectly to Sgrav[ζ,G] describes how the bulk variables ζ (matter field) the framework of flow-based generative models [35,36]in and the neural network G (geometry) would fluctuate and machine learning, which can be trained efficiently thanks interact with each other, which resembles a “quantum gravity” to its tractable and differentiable posterior likelihood. We theory in the holographic bulk. The bulk has more degrees model the bijective map G by a neural network (to be de- of freedom than the boundary, because there can be many tailed later) with trainable network parameters. We initiate different choices of ζ and G that leads to the same boundary the sampling from the bulk ζ ∼ Pprior and push the bulk field field configuration φ = G[ζ ]. This is a gauge redundancy in to the boundary by φ = G[ζ ], collecting the logarithm of the the bulk theory, which covers the diffeomorphism invariance Jacobian determinant along the way. Given the action SQFT[φ], as well as the interchangeable role between matter and space- we can evaluate the loss function L in Eq. (5) and back time geometry in a gravity theory. At this level, the bulk theory propagate its gradient with respect to the network parameters. looks intrinsically nonlocal and the geometry can fluctuate We then update the network parameters by gradient strongly. descent. We iterate the above steps to train the neural network. However, it is usually more desired to work with quantum In this way, simply by presenting the QFT action SQFT to gravity theories with a classical limit, which describe weak the machine, the machine learns to design the optimal RG fluctuations (matter fields and gravitons) around a classical transformation G by continually probing SQFT with various geometry. Although not every CFT admits a classical gravity machine-generated field configurations. Thus our algorithm dual, we still attempt to find the classical approximation of may be called the neural network renormalization group the dual quantum gravity theory, neglecting the fluctuation (neural RG) [29], which can be implemented by using deep of G. Aiming at a classical geometry, we look for the op- = learning platforms such as TENSORFLOW [39]. timalG that maximizes its marginal probability PEHM[G] − − ζ, 1 Sgrav[ G] ζ Zgrav [ζ ] e with the bulk-matter field traced out. III. HOLOGRAPHIC DUALITY This optimization problem seems trivial because, according = / AND CLASSICAL APPROXIMATION to Eq. (7), PEHM[G] ZQFT Zgrav is independent of G.It is understandable that any choice of G is equally likely if We would like to provide an alternative interpretation of the we have no preference on the prior distribution Pprior[ζ ]of loss function L in Eq. (5) in the context of holographic duality, the bulk-matter field, because there is a trade-off between which will deepen our understanding of the capabilities and G and Pprior that one can always adjust Pprior to compensate limitations of our approach. Suppose we can sample the the change in G. Such a trade-off behavior is fundamentally boundary field configuration φ(x) from the target distribution required by the gauge redundancy in the bulk gravity theory. φ φ Ptarget[ ] and map (x) to the bulk by applying the EHM along To fix the gauge, we evoke the minBMI principle to bias the − the RG direction ζ = G 1[φ], the obtained bulk field ζ (x, z) bulk-matter field toward independent random noise, such that will follow the distribution the classical solution of G will look like a RG transforma- −1 −1 tion, in line with our expectation for a holographic mapping. P [ζ ] = P [φ] det(δφG [φ]) bulk target Choosing a minBMI prior distribution such as Eq. (4) and − − ζ = 1 SQFT[G[ ]] δ , δ / ZQFTe det( ζ G) (6) replacing det( ζ G)inEq.(9)byPprior Ppost, PEHM[G] can be

023369-3 HU, LI, WANG, AND YOU PHYSICAL REVIEW RESEARCH 2, 023369 (2020) cast into model and to demonstrate its applications in both the sampling Z P [G[ζ ]] Z and the inference tasks. P G = QFT target  QFT e−L, EHM[ ] E ζ (10) Zgrav ζ ∼Pprior Ppost[G[ ]] Zgrav IV. APPLICATION TO COMPLEX φ4 MODEL which is bounded by e−L from below, with L being the KL di- We consider a lattice field theory defined on a 2D square vergence between P and P as defined in Eq. (5). There- post target lattice, described by the Euclidean action fore the objective of maximizing PEHM[G] can be approxi-   mately replaced by minimizing the loss function L, which φ =− φ∗φ + μ|φ |2 + λ|φ |4 , SQFT[ ] t i j ( i i ) (12) is no longer a trivial optimization problem. From this per- ij i spective, the loss function L can be approximately interpreted where φi ∈ C is a complex scalar field defined on each as the action (negative log-likelihood) for the holographic bulk geometry associated with the EHM G. Minimizing the site i of a square lattice and ij denotes the summation over all nearest-neighbor sites. The model has global U(1) loss function corresponds to finding the classical saddle-point iϕ symmetry under which the field rotates by φi → e φi on solution of the bulk geometry. We will build a flow-based gen- μ =− + λ = erative model to parametrize G and train the neural network every site. We choose 200 2t and 25 to cre- ate a deep Mexican hat potential that basically pins the by using deep-learning approaches. The fluctuation of neural √ θ √ φ = ρ i i ρ = network parameters in the learning dynamics reflects (at least complex field on a circle i e of radius 2. partially) the gravitational fluctuation in the holographic bulk. In this way, the field theory falls back to the XY model S =−1 cos(θ − θ ) with an effective At the classical saddle point G∗ = argmin L,wemay QFT T ij i j G −1 extract an effective theory for the bulk-matter field: T = (ρt ) . By tuning the temperature T , the model exhibits two phases: the low-T algebraic liquid phase with a power- S [ζ ] ≡ S [ζ,G∗] φ∗φ ∼| − |−α eff grav law correlation i j xi x j and the high-T disor- 2 dered phase with a short-range correlation. The two phases are =ζ  + ln P [G∗[ζ ]] − ln P [G∗[ζ ]]. (11) post target separated by the Kosterlitz-Thouless (KT) transition. Several As the KL divergence L = KL(Ppost  Ptarget) is minimized recent works [40–43] have focused on applying machine after training, we expect Ppost and Ptarget to be similar, such that learning methods to identify phase transitions or topological their log-likelihood difference ln Ppost − ln Ptarget will be small, defects (vortices). Our purpose is different here: we stay in the so the effective theory Seff[ζ ] will be dominated by the first algebraic liquid phase described by a Luttinger liquid CFT and term ζ 2 in Eq. (11), implying that the bulk field ζ will be seek to develop the optimal holographic mapping for the CFT. massive. The small log-likelihood difference further provides We design the generative model G as a bijective deep neu- kinetic terms (and interactions) for the bulk field ζ , allowing ral network following the architecture of the neural network it to propagate on a classical background that is implicitly renormalization group (neural-RG) proposed by Ref. [29]. Its specified by G∗. In this way, the bulk field will be correlated structure resembles the MERA network [44] as depicted in in general. Even though one of our objectives is to minimize Fig. 2(a). Each RG step contains a layer of disentangler blocks the bulk mutual information as much as possible, the learned (like a CNN convolutional layer) to resolve local correlations, EHM typically cannot resolve all correlations in the original and a layer of decimator blocks (like a CNN pooling layer) QFT, so the residual correlations will be left in the bulk field to separate the renormalized and decimated variables. Given ζ as described by the log-likelihood difference in Eq. (11). that the space-time dimension is two on the boundary, we can The mismatch between Ppost and Ptarget may arise for several overlay decimators on top of disentanglers in an interweaving reasons: first, limited by the design of the neural network, the manner as shown in Fig. 2(b). Both the disentangler and the generative model G may not be expressive enough to precisely decimator are made of three bijective layers: a linear scaling deform the prior distribution to the target distribution; second, layer, an orthogonal transformation layer, and an invertible even if G has sufficient representation power, the training may nonlinear activation layer, as arranged in Fig. 3 (see Ap- not be able to converge to the global minimum; finally and pendix B for more details). They are designed to be invertible, perhaps the most fundamental reason is not every QFT has a nonlinear, and U(1)-symmetric transformations, which are classical gravitational dual; the bulk theory should be quantum used to model generic RG transformations for the complex gravity in general. Taking the classical approximation and φ4 model. The bijector parameters are subject to training ignoring the gravitational fluctuation leads to the unresolvable (training process and number of parameters are specified in correlation and interaction for the matter field ζ that has to be Appendix C). The Jacobian matrix of these transformations kept in the bulk. are calculable. After each decimator, only one renormalized Nevertheless, our framework could in principle include variable flows to the next RG layer, and the other three deci- fluctuations of G by falling back to Zgrav in Eq. (8). We can mated variables are positioned into the bulk as little crosses as either model the PEHM[G] by techniques shown in Figs. 2(a) and 2(b). The entire network constitutes like graph generative models, or directly analyze the gravi- an EHM between the original boundary field φ(x) and the dual tational fluctuations by observing the fluctuations of neural field ζ (x, z) in the holographic bulk. network parameters in the learning dynamics as mentioned We start with a 32 × 32 square lattice as the holographic below Eq. (10). We will leave these ideas for future explo- boundary and build up the neural-RG network. The network ration. In the following, we will use a concrete example, a will have five layers in total. Since the boundary field theory two-dimensional (2D) compact boson CFT on a lattice, to has a global U(1) symmetry, the bijectors in the neural net- illustrate our approach of learning the EHM as a generative work are designed to respect the U(1) symmetry (see Fig. 3),

023369-4 MACHINE LEARNING HOLOGRAPHIC MAPPING BY … PHYSICAL REVIEW RESEARCH 2, 023369 (2020)

FIG. 3. Neural network architecture within a decimator block (the disentangler block shares the same architecture). Starting from  the renormalized variable φ and the bulk noise ζ1, ζ2, ζ3 as complex variables, the Re and Im channels are first separated, then S applies the scaling separately to the four variables within each channel and O implements the O(4) transformation that mixes the four variables together. S and O are identical for Re and Im channels to preserve the U(1) symmetry. Then the channels merge into complex FIG. 2. (a) Side view of the neural-RG network. x is the spatial variables followed by element-wise nonlinear activation describe by φ → φ /|φ | |φ | dimension(s) and z corresponds to the RG scale. There are two types an invertible U(1)-symmetric map i ( i i ) sinh i . of blocks: disentanglers (dark green) and decimators (light yellow). The network forms an EHM between the boundary variables (blue dots) and the bulk variables (red crosses). (b) Top view of one RG layer in the network. Disentanglers and decimators interweave in the space-time (taking two-dimensional space-time, for example). Each decimator pushes the coarse-grained variable (black dot) to the higher layer and leaves the decimated variables (red crosses) in the holographic bulk. (c) The training contains two stages. In the first stage, we fix the prior distribution P[ζ ] to be uncorrelated Gaussian and train the EHM G to bring it to the Boltzmann distribution of the CFT. In the second stage, we learn the prior distribution with the trained EHM held fixed. (d) The behavior of the loss function L in the two training stages. such that the bulk field also preserves the U(1) symmetry. The training will be divided into two stages, as pictured in Fig. 2(c). In the training stage I, we fix the prior distribution in Eq. (4) and train the network parameters in the generative model G to minimize the loss function L. The training method is outlined below Eq. (5). The loss function L decays with training steps, whose typical behavior is shown in Fig. 2(d). We will discuss the stage II training later. We perform the stage I training for several neural networks at different T separately, i.e., we use SQFT[φ]of different parameters to train different neural networks. After training, each neural network can generate configurations of the boundary field φ from the bulk uncorrelated Gaussian field ζ efficiently. To test how well these generative models work, FIG. 4. Performance of the trained EHM for the complex φ4 we measured the order parameter φ and the correlation theory. (a) Order parameter φ vs temperature T . Different models φ∗φ function i j using the field configurations generated by are trained separately at different temperature. For finite-sized sys- the neural network. Although the order parameter φ is tem, φ crosses over to zero around the KT transition. Correlation φ∗φ expected to vanish in the thermodynamic limit, for our finite- function i j scaling in log-log plot (b) and log-linear plot (c). size system, it is not vanishing and can exhibit a crossover Distribution of φi in a single sample generated by the neural network around the KT transition, as shown in Fig. 4(a). The crossover trained in (d) the algebraic liquid phase and (e) the disordered phase.

023369-5 HU, LI, WANG, AND YOU PHYSICAL REVIEW RESEARCH 2, 023369 (2020)

than the variance of the bulk field degree of freedom. Under the EHM, we observe a global change of the boundary field configuration, as shown in Fig. 5. It is interesting to note that the change of the IR bulk field basically induces a global U(1) rotation of φi (see the insets of Fig. 5), which corresponds to the “Goldstone mode” associated with the “spontaneous symmetry breaking” in the fine-sized system, showing that the machine can identify the order parameter as the relevant IR degrees of freedom without prior knowledge about low- energy modes of the system. We also check that the Hamilto- nian Monte Carlo sampling in the bulk converges much faster FIG. 5. The boundary field configuration φ before (left) and after compared with applying the same algorithm on the boundary ζ (right) a local update in the most IR layer of the bulk field .The (see Appendix D for more evidence). In connection to several φ complex field i is represented by the small arrow on each site. recent works, our neural-RG architecture can be integrated The background color represents the vorticity. The inset shows the φ to self-learning Monte Carlo approaches [48–54] to boost distribution of i in the complex plane. the numerical efficiency in simulating CFTs. The inverse RG transformation can also be used to generate super-resolution temperature T  0.9 agrees with the previous Monte Carlo samples [55] for finite-size extrapolation of thermodynamic study [45–47] of the KT transition temperature TKT = 0.8929 observables. in the two-dimensional XY model. We measure the correlation Now let us turn to the inference task. We can use the φ∗φ = . function i j at two different temperatures: one at T 0 5 optimal EHM to push the boundary field back into the bulk in the algebraic liquid phase, one at T = 1.0 in the disordered and investigate the effective bulk theory Seff[ζ ] induced by the φ∗φ phase. We plot the two-point function i j as a function of boundary CFT. As analyzed below Eq. (11), the mismatch be- the Euclidean distance rij ≡|xi − x j| (on the square lattice) tween Ppost and Ptarget will give rise to the residual correlation on both a log-log scale as in Fig. 4(b) and on a log-linear scale (mutual information) of the bulk-matter field, which can be as in Fig. 4(c). The comparison shows that the correlation used to probe the holographic bulk geometry. Assuming an function in the algebraic liquid (or the disordered) phase fits emergent locality in the holographic bulk, the expectation is better to the power-law (or the exponential) decay. Figure 4(d) that the bulk effective theory Seff[ζ ] will take the following shows the statistics of φi in one sample generated by the form in the continuum limit: machine trained in the algebraic liquid phase. It exhibits the  μν ∗ 2 2 4 “spontaneous symmetry breaking” behavior due to the finite- Seff[ζ ] = g ∂μζ ∂ν ζ + m |ζ | + u|ζ | +··· , (13) size effect, although accumulating over multiple samples will M restore the U(1) symmetry. However, the similar plot Fig. 4(e) which describes the bulk field ζ on a curved space-time in the disordered phase respects U(1) symmetry in every background M equipped with the metric tensor gμν . Strictly single sample. Based on these tests, we conclude that the neu- speaking, ζ is not a single field but contains a tower of fields ral network has learned to generate field configurations φ(x) corresponding to different primary operators in the CFT. We that reproduce the correct physics of the complex φ4 model. choose to focus on the lightest component and model it by a The trained generative model G maps an almost uncorrelated scalar field, as it will dominate the bulk mutual information bulk field ζ to a correlated boundary field φ, and vice versa, at large scale. Because the bulk field excitation is massive therefore G provides a good EHM for the φ4 theory. and cannot propagate far, we expect the mutual information The learned EHM can be useful in both the backward between the bulk variables at two different points to de- and forward directions. The backward mapping from bulk to cay exponentially with their geodesic distance in the bulk. boundary provides efficient sampling of the CFT configura- Following this idea, suppose ζi = ζ (xi, zi ) and ζ j = ζ (x j, z j ) tions, which can be used to boost the Monte Carlo simulation are two bulk field variables, then their distance d(ζi : ζ j ) of the CFT. The forward mapping from boundary to bulk can be inferred from their mutual information I(ζi : ζ j )as enables direct inference of bulk field configurations, allowing follows: us to study the bulk effective theory and to probe the bulk I(ζi : ζ j ) geometry. Let us first discuss the sampling task. The EHM d(ζi : ζ j ) =−ξ ln , (14) establishes a mapping between the massive bulk field ζ and I0 the massless boundary field φ. The bulk field admits efficient where the correlation length ξ and the information unit I0 are sampling in terms of local update, because the it is uncor- global fitting parameters. related (or short-range correlated). Local updates in the bulk To estimate the mutual information among bulk field vari- gets maps to global updates on the boundary, which allows us ables, we take a quadratic approximation of the bulk effective to sample the critical boundary field efficiently, minimizing ζ  ζ ∗ ζ = ζ † ζ action Seff[ ] ij i Kij j K , ignoring the higher- the effect of critical slowdown. To demonstrate this, we first order interactions of ζ for now. This amounts to relaxing the generate one field configuration as shown in the left panel of prior distribution of the bulk field ζ to a correlated Gaussian Fig. 5. Then we push this field configuration back into the distribution bulk field ζ . We tweak the bulk field in the most IR degree of freedom by adding a small random Gaussian number. The  ζ =  1 −ζ †Kζ . Pprior[ ] e (15) variance of the added Gaussian variable is five times smaller det(2πK−1)

023369-6 MACHINE LEARNING HOLOGRAPHIC MAPPING BY … PHYSICAL REVIEW RESEARCH 2, 023369 (2020)

The kernel matrix K is carefully designed to ensure positivity and bulk locality (see Appendix E for more details). To determine the best fit of K, we initiate the stage II training to learn the prior distribution with the EHM fixed at its optimal solution obtained in the stage I training, as illustrated in Fig. 2(c). We use the reparametrization trick [56] to sample the bulk field ζ from the correlated Gaussian in Eq. (15), then ζ is pushed to the boundary by the fixed EHM to evaluate the loss function L in Eq. (5), and the gradient signal can back-propagate to train the kernel K. As we relax the Gaussian kernel K for training, we can see that the loss function will continue to drop in the stage II, as shown in Fig. 2(d).This indicates that the Gaussian model is learning to capture the residual bulk field correlation (at least partially), such that the overall performance of generation gets improved. One may wonder why we do not train the generative model G and bulk field distribution Pprior[ζ ] jointly. This is because there is a trade-off between these two objectives. For example, one can weaken the disentanglers in G and push more correlation to the bulk field distribution Pprior[ζ ]. Such trade-off will undermine our objective of minimizing bulk mutual infor- mation in training a good EHM, therefore the two training stages should be separated, or at least assigned very different learning rates. Intuitively, the machine learns the background geometry in the stage I training and the bulk field theory (to the quadratic order) in the stage II training. The trade-off FIG. 6. (a) Distance matrix D(A : B), indexed by the decimator between the two training stages resembles the interchangeable indices A, B, obtained based on Eq. (17). (b) Visualization of the bulk roles between matter and space-time geometry in a gravity geometry by multidimensional scaling projected to the leading three theory. principle dimensions. Each point represents a decimator in the neural After the stage II training, we obtain the fitted kernel matrix network, colored according to layers from UV to IR. The neighboring UV-IR links are add to guide the eye. Panels (c) and (d) show the K. The mutual information I(ζi : ζ j ) can be evaluated from distance scaling along (c) the angular and (d) the radius direction. ζ ∗ζ ζ ζ =−1 − i j , I( i : j ) ln 1 ∗ ∗ (16) 2 ζ ζi ζ ζ j i j 3 bolic space H , which corresponds to the AdS3 space-time ζ ∗ζ = −1 where the bulk correlation i j (K )ij is simply given under the Wick rotation of the time dimension. This indicates by the inverse of the kernel matrix K. Then we can measure that the emergent bulk geometry is indeed hyperbolic at the the holographic distance d(ζi : ζ j ) between any pair of bulk classical level. variables ζi and ζ j following Eq. (14). To probe the bulk ge- Our result demonstrates that the Luttinger liquid CFT can ometry, we further define the distance between two decimators be approximately dual to a massive scalar fields on AdS3 A and B to be the average distance between all pairs of bulk background geometry. The duality is only approximate be- variables separately associated with them, cause we have assumed a classical geometry in the bulk, ignoring all the gravitational fluctuations. In AdS /CFT cor- = ζ ζ . 3 2 D(A : B) avg d( i : j ) (17) respondence, the bulk gravitational coupling G = 3/2c is ζ ∈A,ζ ∈B N i j inversely proportional to the central charge c of the CFT [57]. The result is presented in Fig. 6(a). To visualize the bulk ge- The Luttinger liquid CFT has a relatively small central charge ometry qualitatively, we perform a multidimensional scaling c = 1 and hence a large gravitational coupling in the bulk, so to obtain a three-dimensional embedding of the decimators in we should not expect a classical dual description. It would be Fig. 6(b). One can see that a hyperbolic geometry emerges in more appropriate to consider holographic CFTs which admit the bulk. To be more quantitative, we label each decimator classical duals. However, our current method only applies to by three coordinates (x1, x2, z), where x = (x1, x2 ) denotes lattice field theories of bosons with explicit action functionals, its center position projected to the holographic boundary and which prevent us from studying interesting holographic CFTs. z = 2l is related to its layer depth l (ascending from UV to Generalizing the neural RG approach to involve fermions and IR). We found that the measured distance function follows the gauge fields and to work with continuous space-time will be scaling behavior important directions for future development. D(x1, x2, z : x1 + r, x2, z) ∝ ln r, D(x1, x2, z : x1, x2, z + r) ∝ r, (18) V. SUMMARY AND DISCUSSIONS as demonstrated in Figs. 6(c) and 6(d). These scaling behav- In conclusion, we introduced the neural RG algorithm to iors agree with the geometry of a three-dimensional hyper- allow automated construction of EHM by machine learning

023369-7 HU, LI, WANG, AND YOU PHYSICAL REVIEW RESEARCH 2, 023369 (2020) instead of human design. Previously, the EHM was only APPENDIX A: MINIMAL BULK MUTUAL designed for free fermion CFT. Using machine learning ap- INFORMATION PRINCIPLE proaches, we are able to develop more general EHMs that also The maximal real-space mutual information (maxRMI) apply to interacting field theories. Given the QFT action as in- principle proposed in Refs. [30,38] aims to maximize the put, the machine effectively digests the information contained mutual information between the coarse-grained field and the in the action and encodes it into the structure of the EHM net- fine-grained field in the surrounding environment at a single work, which represents the emergent holographic geometry. RG step. In this section, we show that the maxRMI principle Our result provides a concrete example that the holographic can be derived from our minimal bulk mutual information space-time geometry can emerge as the optimal generative (minBMI) principle under certain assumptions. network of a quantum field theory [58]. The obtained EHM si- Let us set up the problem based on Fig. 7. Assuming φ and multaneously provides an information-preserving RG scheme A φ are field configurations in two neighboring regions A and and a generative model to reproduce the QFT, which could be B B in the UV layer. Under one step of the RG transformation, useful for both inference and sampling tasks. φ gets mapped to the coarse-grained variable φ and the bulk However, as a version of EHM, our approach also bares A A variable ζ , and the mapping is bijective. Similarly, another the limitations of EHM. By construction, the bulk geometry A bijection takes φ to φ and ζ . Eventually, φ and φ will is discrete and classical, such that the model cannot resolve B B B A B be mapped to the bulk field ζ in deeper IR layers. Therefore, the sub-AdS geometry and cannot capture gravitational fluc- C the random variables that appeared in Fig. 7 are related by the tuations. Recent development of neural ordinary differential following bijections f , f , f as equation approaches [59–61] are natural ways to extend our A B C flow-based generative model to the continuum limit. Con- φ ,ζ = φ , φ ,ζ = φ ,ζ= φ ,φ . tinuous formulation of real-space RG has been discussed in ( A A) fA( A) ( B B ) fB( B ) C fC ( A B ) the context of gradient flows [62–64] and trivializing maps (A1) [65], where the RG flow equations are human-designed. Our What are the information theoretical principles to guide research may pave the way for machine-learned RG flow the bijections fA, fB, fC toward good RG transformations? equations for continuous holographic mappings. Our formal- We propose the minBMI principle that these bijections ism also allows the inclusion of gravitational fluctuations in should minimize the mutual information among the bulk principle, by relaxing optimization to allow superposition of variables, different EHMs. We conjecture that the bulk gravitational ζ ζ + ζ ζ + ζ ζ . fluctuation could be partially captured by the fluctuation of min I( A : B ) I( A : C ) I( B : C ) (A2) neural network parameters. The learned EHM provides us a starting point to investigate the corrections on top of the References [30,38] propose another principle, the maxRMI classical geometry approximation, which may enable us to go principle, whereby the RG transformation should maximize beyond holographic CFTs and study the quantum gravity dual the mutual information between the coarse-grained variable φ φ of generic QFTs. Another feature of EHM is that it is a one- (such as A) and its environments (such as B), to-one mapping of field configurations (operators) between φ φ . bulk and boundary, while in holographic duality, a local bulk max I( A : B ) (A3) operator can be mapped to multiple boundary operators in different regions. A resolution [66,67] of the paradox is that the nonunique bulk-boundary correspondence only applies to the low-energy freedoms in the bulk, which can be encoded on the boundary in a redundant and error-correcting manner. The bidirectional holographic code (BHC) [68,69] was proposed as an extension of the EHM to capture the error-correction property of the holographic mapping. Extending our current network design to realize machine-learned BHC will be an- other open question for future research.

ACKNOWLEDGMENTS We acknowledge the stimulating discussions with Xiao- Liang Qi, John McGreevy, Maciej Koch-Janusz, Cédric Bény, Koji Hashimoto, Wenbo Fu, and Shang Liu. S.-H.L and L.W. were supported by the National Natural Science Foundation of China under Grant No. 11774398 and the Strategic Priority Research Program of Chinese Academy of Sciences Grant No. XDB28000000. H.-Y.H. and Y.-Z.Y. are supported by a startup fund from UCSD. S.-H.L. and Y.-Z.Y. also bene- fit from the 2019 Swarma Club Study Camp supported by FIG. 7. Functional dependence of variables in the neural-RG Kaifeng Foundation. network. Each block represents a bijective map.

023369-8 MACHINE LEARNING HOLOGRAPHIC MAPPING BY … PHYSICAL REVIEW RESEARCH 2, 023369 (2020)

We can show that the objective of the maxRMI in Eq. (A3)is consistent with the objective of the minBMI in Eq. (A2)inthe limit of UV-IR decoupling. The minBMI principle aims to minimize mutual informa- tion among all bulk variables, both between different RG scales and within the same RG scale. Its objective has a broader scope than the maxRMI principle, because the latter does not specify its objectives across the RG scales. So to make a connection between these two principles, one must first restrict the scope of the minBMI principle to a single layer. This can be achieved by assuming that there is no mu- tual information between bulk variables at different RG scales. In our setup, this corresponds to I(ζA,ζB : ζC ) = 0, which fac- torizes the joint probability p(ζA,ζB,ζC ) = p(ζA,ζB )p(ζC ) and decouples the bulk variables between UV and IR. As a result, the mutual information between any bulk variables across different RG scales vanishes I(ζA : ζC ) = I(ζB : ζC ) = 0. This already minimizes the bulk mutual information across layers and reduces the minBMI objective in Eq. (A2)to

min I(ζA : ζB ). (A4) FIG. 8. Orthogonal transformation. φ In this UV-IR decoupled limit, we can prove that max I( A : φ ) and min I(ζ : ζ ) are equivalent. B A B at each bijector. For the generative process, at each RG step, The proof starts by considering the mutual information it takes the four complex degrees of freedom and they go between φ and φ . We can see that A B through three layers of bijectors: S, O, and A: I(φ : φ ) = I(φ ,ζ : φ ) (I) Scaling layer (S): At the scaling layer, each complex A B A A B λ φ i = φ φ + ζ φ variables i is multiplied by a factor e . The inverse and the I( A : B ) I( A : B ) Jacobian of this transformation can be obtained easily. = φ φ + ζ φ ,ζ O I( A : B ) I( A : B B ) (II) Orthogonal transformation layer ( ): The orthogonal   transformation in the disentangler and decimator is in gen- = I(φ : φ ) + I(ζ : φ ) + I(ζ : ζ ) A B A B A B eral an O(4) transformation. Instead, we implemented it by = φ φ + ζ ζ . I( A : B ) I( A : B ) (A5) stacking multiple O(2) transformations. In Fig. 8(a), each blue block represents the matrix: Here we have used the bijective property of fA, fB, fC to ob- tain I(φ : φ ) = I(φ ,ζ : φ ), I(ζ : φ ) = I(ζ : φ ,ζ ),   A B A A B A B A B B θ θ and I(ζ : ζ ) = I(ζ : φ ,φ ). In the UV-IR decoupled limit, θ = sin i cos i , A C A A B Mblue( i ) θ − θ (B1) ζ ζ = ζ φ ,φ = cos i sin i I( A : C ) 0, so I( A : A B ) 0, which further implies ζ φ = ζ φ = I( A : A) I( A : B ) 0. With these relations, all steps in Eq. (A5) are justified. On the left-hand side, I(φA : φB )is and the orange block in Fig. 8(b) represents the matrix: determined by the field theory in the UV layer, which can   θ − θ be treated as a constant. For the given amount of information θ = cos i sin i . φ φ Morange( i ) θ θ (B2) between regions A and B,Eq.(A5) tells us that I( A : B ) and sin i cos i I(ζA : ζB ) are competing for information resources. Therefore φ φ ζ ζ maximizing I( A : B ) is equivalent to minimizing I( A : B ). θi in those blocks are training parameters. The arrangement of We can apply this argument layer by layer. Then to achieve the type I and II blocks are so designed that when Mblue(θi = the objective of the maxRMI principle, we need to minimize π/4) and Morange(θi = 0) the network reproduces the ideal mutual information among bulk variables in the same RG EHM originally proposed in Ref. [13]. We initialize the pa- scale, which is precisely the statement of the minBMI prin- rameter to this ideal limit. ciple when restricted to each layer. In this sense, the maxRMI (III) Nonlinear layer (A) For nonlinear part, we use the and minBMI principles are consistent. However, the minBMI amplitude hyperbolic functions for complex field φi.Inteh principle actually relaxes the assumption that bulk variables coarse-graining direction, it acts in the following way: at different RG scales are fully decoupled. Instead, we want to minimize mutual information among all bulk variables, φ ζ = |φ| , including those across the scales. In this sense, the minBMI Re( ) sinh |φ| principle is more general than the maxRMI principle. φ (B3) ζ = |φ|Im . Im( ) sinh |φ| APPENDIX B: DESIGN OF BIJECTORS We designed a set of symmetry-persevere bijectors to The corresponding inverse and Jacobian can be calculated making sure that U (1) symmetry of the boundary is preserved easily.

023369-9 HU, LI, WANG, AND YOU PHYSICAL REVIEW RESEARCH 2, 023369 (2020)

APPENDIX D: MONTE CARLO SAMPLING EFFICIENCY We tested the numerical efficiency of our method by comparing the convergence rate between Hamiltonian Monte Carlo (HMC) on the boundary system, and HMC in the bulk system. Both methods are implemented by using TENSOR- FLOW probability API with same parameters. The result is shown in Fig. 9. As we can see, the HMC in the bulk system converges faster than the HMC on the boundary system.

APPENDIX E: DESIGN OF THE CORRELATED GAUSSIAN PRIOR In finding the effective bulk field theory, we assume the bulk field is very massive. Under this assumption, higher- FIG. 9. Hamiltonian Monte Carlo (HMC) simulations in the order interaction terms are irrelevant. Therefore, we use a latent space and physical space. correlated Gaussian distribution with positive definite kernel matrix K as our effective bulk field theory. We also assumed locality of our effective bulk field theory, which means Kij is APPENDIX C: NEURAL NETWORK TRAINING nonzero if and only if ζi and ζ j are nearest neighbors in the All the training parameters of our neural RG network bulk, including neighbors interscale and intrascale. To further are contained in scaling bijectors and orthogonal transfor- reduce the fitting parameters of matrix K, we also imposed mation bijectors, as illustrated in Appendix B. We imposed translation invariance of the bulk field at each scale. This is translation invariance of our network at each layer, due to reasonable, because our RG scheme also has the translation translation invariance of the system at each energy scale. The invariance at each scale. total number of training parameters scale with O(log(N)), To ensure matrix K is positive definite, we decomposed where N is the size of boundary theory. The prefactor depends matrix K into a set of positive semidefinite matrix and a mass on the depth of bijector neural networks. In our case, the term. Particularly,  total number of training parameters are 24 log N. In order 2 K = λ (|i i|+|j j|−|i j|−|j i|) + mI, (E1) for faster convergence of the training, we first set learning ij rate for parameters contained in scaling bijectors as 10−2, ij −4 and gradually reduce it to 10 . And the learning rate for where I is the identity matrix, and λij and m are positive parameters contained in orthogonal transformation bijectors numbers. This ensures that the matrix K we constructed is is always 10−4. positive definite.

[1] E. Witten, Anti-de Sitter space and holography, Adv. Theor. [11] V. Balasubramanian, M. Guica, and A. Lawrence, Holographic Math. Phys. 2, 253 (1998). interpretations of the renormalization group, J. High Energy [2] E. Witten, Anti-de Sitter space, thermal , and Phys. 01 (2013) 115.

confinement in gauge theories, Adv. Theor. Math. Phys. 2, 505 [12] L. P. Kadanoff, Scaling laws for Ising models near Tc, Physics (1998). (Long Island City, N. Y.) 2, 263 (1966). [3] S. S. Gubser, I. R. Klebanov, and A. M. Polyakov, Gauge theory [13] X.-L. Qi, Exact holographic mapping and emergent space-time correlators from non-critical string theory, Phys. Lett. B 428, geometry, arXiv:1309.6282. 105 (1998). [14] C. H. Lee and X.-L. Qi, Exact holographic mapping in free [4] J. Maldacena, The large-N limit of superconformal field theo- fermion systems, Phys.Rev.B93, 035112 (2016). ries and supergravity, Int. J. Theor. Phys. 38, 1113 (1999). [15] Y. Gu, C. H. Lee, X. Wen, G. Y. Cho, S. Ryu, and X.-L. [5] J. de Boer, E. Verlinde, and H. Verlinde, On the holo- Qi, Holographic duality between (2 + 1)-d quantum anomalous graphic renormalization group, J. High Energy Phys. 8 (2000) Hall state and (3 + 1)-d topological insulators, Phys. Rev. B 94, 003. 125107 (2016). [6] K. Skenderis, Lecture notes on holographic renormalization, [16] Y. LeCun, Y. Bengio, and G. Hinton, Deep learning, Nature Classical Quantum Gravity 19, 5849 (2002). (London) 521, 436 (2015). [7] I. Heemskerk and J. Polchinski, Holographic and Wilsonian [17] C. Bény, Deep learning and the renormalization group, renormalization groups, J. High Energy Phys. 06 (2011) 031. arXiv:1301.3124. [8] B. Swingle, Constructing holographic spacetimes using entan- [18] P. Mehta and D. J. Schwab, An exact mapping between glement renormalization, arXiv:1209.3304. the variational renormalization group and deep learning, [9] B. Swingle, Entanglement renormalization and holography, arXiv:1410.3831. Phys. Rev. D 86, 065007 (2012). [19] C. Bény and T. J. Osborne, The renormalization group via [10] M. Nozaki, S. Ryu, and T. Takayanagi, Holographic geometry statistical inference, New J. Phys. 17, 083005 (2015). of entanglement renormalization in quantum field theories, [20] D. Oprisa and P. Toth, Criticality & deep learning II: Momen- J. High Energy Phys. 10 (2012) 193. tum renormalisation group, arXiv:1705.11023.

023369-10 MACHINE LEARNING HOLOGRAPHIC MAPPING BY … PHYSICAL REVIEW RESEARCH 2, 023369 (2020)

[21] H. W. Lin, M. Tegmark, and D. Rolnick, Why does deep [43] K. Zhou, G. Endrodi,˝ L.-G. Pang, and H. Stöcker, Regressive and cheap learning work so well? J. Stat. Phys. 168, 1223 and generative neural networks for scalar field theory, Phys. (2017). Rev. D 100, 011501 (2019). [22] W.-C. Gan and F.-W. Shu, Holography as deep learning, Int. J. [44] G. Vidal, Entanglement Renormalization, Phys. Rev. Lett. 99, Mod. Phys. D 26, 1743020 (2017). 220405 (2007). [23] G. Torlai and R. G. Melko, Learning thermodynamics with [45] P. Olsson, Monte Carlo analysis of the two-dimensional XY Boltzmann machines, Phys. Rev. B 94, 165134 (2016). model. II. Comparison with the Kosterlitz renormalization- [24] L. Wang, Discovering phase transitions with unsupervised group equations, Phys. Rev. B 52, 4526 (1995). learning, Phys. Rev. B 94, 195105 (2016). [46] M. Hasenbusch and K. Pinn, Computing the roughening tran- [25] E. P. L. van Nieuwenburg, Y.-H. Liu, and S. D. Huber, Learning sition of Ising and solid-on-solid models by BCSOS model phase transitions by confusion, Nat. Phys. 13, 435 (2017). matching, J. Phys. A: Math. Gen. 30, 63 (1997). [26] G. Carleo and M. Troyer, Solving the quantum many-body [47] M. Hasenbusch, The two-dimensional XY model at the transi- problem with artificial neural networks, Science 355, 602 tion temperature: A high-precision Monte Carlo study, J. Phys. (2017). A: Math. Gen. 38, 5869 (2005). [27] J. Carrasquilla and R. G. Melko, Machine learning phases of [48] J. Liu, H. Shen, Y. Qi, Z. Y. Meng, and L. Fu, Self-learning matter, Nat. Phys. 13, 431 (2017). Monte Carlo method in fermion systems, Phys. Rev. B 95, [28] G. Torlai, G. Mazzola, J. Carrasquilla, M. Troyer, R. Melko, 241104 (2017). and G. Carleo, Neural-network quantum state tomography, Nat. [49] K.-I. Aoki and T. Kobayashi, Restricted Boltzmann machines Phys. 14, 447 (2018). for the long range Ising models, Mod. Phys. Lett. B 30, 1650401 [29] S.-H. Li and L. Wang, Neural Network Renormalization Group, (2016). Phys. Rev. Lett. 121, 260601 (2018). [50] L. Huang and L. Wang, Accelerated Monte Carlo simulations [30] M. Koch-Janusz and Z. Ringel, Mutual information, neural with restricted Boltzmann machines, Phys.Rev.B95, 035105 networks and the renormalization group, Nat. Phys. 14, 578 (2017). (2018). [51] J. Liu, Y. Qi, Z. Y. Meng, and L. Fu, Self-learning Monte Carlo [31] Y.-Z. You, Z. Yang, and X.-L. Qi, Machine learning spatial method, Phys. Rev. B 95, 041101(R) (2017). geometry from entanglement features, Phys.Rev.B97, 045153 [52] Y. Nagai, H. Shen, Y. Qi, J. Liu, and L. Fu, Self-learning Monte (2018). Carlo method: Continuous-time algorithm, Phys. Rev. B 96, [32] K. Hashimoto, S. Sugishita, A. Tanaka, and A. Tomiya, Deep 161102(R) (2017). learning and the AdS/CFT correspondence, Phys.Rev.D98, [53] A. Tanaka and A. Tomiya, Towards reduction of autocorrelation 046019 (2018). in HMC by machine learning, arXiv:1712.03893. [33] K. Hashimoto, S. Sugishita, A. Tanaka, and A. Tomiya, Deep [54] Y. Nagai, M. Okumura, and A. Tanaka, Self-learning Monte learning and holographic QCD, Phys.Rev.D98, 106014 Carlo method with Behler-Parrinello neural networks, Phys. (2018). Rev. B 101, 115111 (2020). [34] Koji Hashimoto, AdS/CFT as a deep Boltzmann machine, Phys. [55] S. Efthymiou, M. J. S. Beach, and R. G. Melko, Super-resolving Rev. D 99, 106017 (2019). the Ising model with convolutional neural networks, Phys. Rev. [35] L. Dinh, J. Sohl-Dickstein, and S. Bengio, Density estimation B 99, 075113 (2019). using Real NVP, arXiv:1605.08803. [56] D. P. Kingma and M. Welling, Auto-encoding variational [36] D. P. Kingma and P. Dhariwal, Glow: Generative flow with Bayes, arXiv:1312.6114. invertible 1 × 1 convolutions, arXiv:1807.03039. [57] J. D. Brown and M. Henneaux, Central charges in the canonical [37] K. G. Wilson, The renormalization group and critical phenom- realization of asymptotic symmetries: An example from three- ena, Rev. Mod. Phys. 55, 583 (1983). dimensional gravity, Commun. Math. Phys. 104, 207 (1986). [38] P. M. Lenggenhager, Z. Ringel, S. D. Huber, and M. Koch- [58] X. Dong and L. Zhou, Spacetime as the optimal genera- Janusz, Optimal Renormalization Group Transformation from tive network of quantum states: A roadmap to QM = GR? , Phys. Rev. X 10, 011037 (2020). arXiv:1804.07908. [39] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, [59] R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. Duvenaud, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Neural ordinary differential equations, arXiv:1806.07366. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, [60] L. Zhang, W. E, and L. Wang, Monge-Ampère flow for genera- L. Kaiser, M. Kudlur, J. Levenberg et al., TensorFlow: Large- tive modeling, arXiv:1809.10188. scale machine learning on heterogeneous systems (2015); soft- [61] W. Grathwohl, R. T. Q. Chen, J. Bettencourt, I. Sutskever, and ware available from tensorflow.org. D. Duvenaud, FFJORD: Free-form continuous dynamics for [40] M. J. S. Beach, A. Golubeva, and R. G. Melko, Machine scalable reversible generative models, arXiv:1810.01367. learning vortices at the Kosterlitz-Thouless transition, Phys. [62] K. Fujikawa, The gradient flow in λφ4 theory, J. High Energy Rev. B 97, 045207 (2018). Phys. 03 (2016) 021. [41] W. Zhang, J. Liu, and T.-C. Wei, Machine learning of phase [63] Y. Abe and M. Fukuma, Gradient flow and the renormalization transitions in the percolation and XY models, Phys.Rev.E99, group, Prog. Theor. Exp. Phys. 2018, 083B02 (2018). 032142 (2019). [64] A. Carosso, A. Hasenfratz, and E. T. Neil, Nonperturba- [42] J. F. Rodriguez-Nieva and M. S. Scheurer, Identifying topologi- tive Renormalization of Operators in Near-Conformal Sys- cal order via unsupervised machine learning, Nat. Phys. 15, 790 tems Using Gradient Flows, Phys. Rev. Lett. 121, 201601 (2019). (2018).

023369-11 HU, LI, WANG, AND YOU PHYSICAL REVIEW RESEARCH 2, 023369 (2020)

[65] M. Lüscher, Trivializing maps, the Wilson flow and bulk/boundary correspondence, J. High Energy Phys. 06 (2015) the HMC algorithm, Commun. Math. Phys. 293, 899 149. (2010). [68] Z. Yang, P. Hayden, and X.-L. Qi, Bidirectional holographic [66] A. Almheiri, X. Dong, and D. Harlow, Bulk locality and codes and sub-AdS locality, J. High Energy Phys. 01 (2016) quantum error correction in AdS/CFT, J. High Energy Phys. 04 175. (2015) 163. [69] X.-L. Qi, Z. Yang, and Yi-Zhuang You, Holographic coherent [67] F. Pastawski, B. Yoshida, D. Harlow, and J. Preskill, Holo- states from random tensor networks, J. High Energy Phys. 08 graphic quantum error-correcting codes: Toy models for the (2017) 060.

023369-12