<<

JHEP05(2021)013 Springer -structure May 3, 2021 : March 29, 2021 : -structure met- SU(3) . December 21, 2020 4 : b P , SU(3) -structure Published Accepted Received , SU(3) Sven Krippendorf, Published for SISSA by a https://doi.org/10.1007/JHEP05(2021)013 ,d James Gray, b [email protected] , . [email protected] 3 Mathis Gerdes, , [email protected] and Fabian Ruehle a , 2012.04656 a The Authors. c Superstring Vacua, Superstrings and Heterotic Strings, Differential

We use machine learning to approximate Calabi-Yau and , [email protected] structure, our machine learning approach allows us to engineer metrics with certain E-mail: [email protected] [email protected] Arnold Sommerfeld Center forTheresienstr. Theoretical , 37, Ludwig-Maximilians-Universität, 80333 Munich, Germany CERN, Theoretical Physics Department 1,Geneva Esplanade 23, des CH-1211, Particules, Switzerland Rudolf Peierls Centre forParks Theoretical Physics, Road, University Oxford OX1 of 3PU, Oxford, U.K. Department of Physics, Robeson Hall,Blacksburg, VA Virginia 24061, Tech, U.S.A. b c d a Open Access Article funded by SCOAP Keywords: and Algebraic Geometry ArXiv ePrint: role in swampland conjectures, toSU(3) mirror symmetry and the SYZtorsion conjecture. properties. In the case Ourmanifolds of based methods on are a demonstrated one-parameter family for of Calabi-Yau quintic and hypersurfaces in rics, including for the firstfurthermore time improve complex existing structure numerical moduliKnowing approximations these dependence. in metrics Our terms has new ofaspects methods numerous accuracy of applications, the and ranging effective speed. frommalizations field for computations theory Yukawa of of couplings, crucial and compactifications the such massive as string the spectrum canonical which plays nor- a crucial Abstract: Lara B. Anderson, Nikhil Raghuram Moduli-dependent Calabi-Yau and metrics from machine learning JHEP05(2021)013 37 40 35 11 41 ) 10 < | ψ ψ | 29 13 < 36 32 0 4 = 10 30 directly ψ 37 H 9 24 4 – 1 – 27 19 32 30 38 35 10 14 40 structure 33 39 networks structure directly dependence 16 23 4 20 7 H ψ SU(3) 6 SU(3) by minimizing the Ricci loss by minimizing the Monge-Ampère loss H H 2 29 networks with H B.2.1 Finding equivariant elements via Donaldson’s algorithm 3.1.1 An example 2.6.1 Supervised2.6.2 regression of Donaldson’s Kähler Learning potentials the Hermitian matrix 2.7.1 Experiments D.1 NN withD.2 homogeneous coordinate input NN with affine coordinate input (LDL output C.1 Supervised trainingC.2 with Donaldson’s algorithm Learning C.3 Learning C.4 C.5 Network architectures B.1 Constructing theB.2 basis Donaldson’s algorithm A.1 Sampling byA.2 solving for the Illustration dependent ofA.3 coordinate rejection sampling Homogeneous sampling in projective space 3.1 Learning an ansatz 3.2 Learning the 2.7 Learning the metric directly 2.2 CY example:2.3 quintic hypersurfaces Metric ansätze 2.4 Accuracy measures 2.5 Finding metrics2.6 with machine learning Learning the Kähler potential 2.1 Ricci flatness from a Monge-Ampère equation D Details on metric training C Details for training B Algebraic metrics and Donaldson’s algorithm A Sampling 3 CY metrics with 4 Conclusions and future directions Contents 1 Introduction 2 Ricci-flat CY metrics JHEP05(2021)013 ] ]. 2 17 SU(3) SU(3) ]). 18 ]. One common feature which is seen 16 ] for a review on data science applications – 3 19 structure solutions discovered thus far seem – 2 – SU(3) ] for some related work and [ 12 ] and techniques from algebraic geometry allow many quantities of interest to be 1 -structure metrics in the string compactification literature. Machine learning (ML), in particular through the recent advances in deep learning, The need for a numerical approach to obtaining the metric on compactification man- In particular, knowing how the metric changes with the moduli is important for under- Unfortunately, Ricci-flat metrics on CY three-folds, being high-dimensional structures However, there are several quantities in the effective field theory for which knowledge of SU(3) offers a flexible approachwill to look at finding the solutions application ofmanifolds of machine (see differential learning [ equations. techniques to bothin In Ricci-flat string this theory) metrics and on paper, numerical CY approximations we structures. to metrics associated to more general to generically suffer fromsistency the conditions presence for of a smallperhaps cycles good surprising, therefore, in that the there approximation is geometry,to to essentially violating no the the work on compactification. con- numerical approximations It is ifolds becomes even morestructure. acute The in methods the ofthus algebraic more even geometry general quantities are that arena largely can of notexplicit be knowledge available compactifications addressed in of of quasi-topologically the these in compactification cases,the the metric and in CY class this case of more may compactification-suitable general require setting. In addition, metrics, are relevant such a restriction is a serious impedimentstanding to moduli progress. dependent massesand in imprints relation of to moduli the in solutions swampland to distance the conjecture electroweak [ hierarchy problem (e.g. [ with no continuous isometries,has are led seemingly to prohibitively the hard developmentthese to of and find a number analytically. related of This quantities numerical,in in and such other, the work approaches literature to isspace [ computing that at each a time. computationFor of In example, a many whenever potential metric the applications is dynamics of performed such of at work, fields, this one and is point hence a in the serious moduli limitation. moduli dependence of the determined by the Kählerinaccessible potential. with This the is methods ametric of non-holomorphic (and algebraic function geometry. indeed that One often islow-energy needs other therefore effective explicit structures) field knowledge in theory. of order the to compute this crucial aspect of the computed without explicit knowledge ofFor example, the the Ricci-flat massless metric spectrumcomputed on of in the fields a extra and purely dimensions superpotential quasi-topological [ Yukawa manner. couplings can be the metric is apparently indispensable. The kinetic terms of matter fields, for example, are Finding numerical approximations to metricsis for a the subject compact dimensions whichmetrics of is by string not now theory always hastheorem necessary. a [ In long the history case within of Calabi-Yau the (CY) literature. compactifications, Yau’s Knowledge of such 1 Introduction JHEP05(2021)013 ] which is 22 . Technical de- 4 , we discuss our setup 2 -structure. This contrasts -structures with non-trivial SU(3) SU(3) ). ). ). 2.6 3 holonomy, we will then turn to apply 2.7 ]. We then train metrics in an unsupervised 4 SU(3) – 3 – -structure metric obtained by one of the authors SU(3) -structure and then see which torsion classes that ansatz ] and are learned as a function of moduli. We then discuss a 20 SU(3) structure metrics (see section SU(3) -structure with the required properties. Clearly, the approach that we will detail describes how approaches similar to the previous sections can be used to learn -structure metrics. We conclude and present an outlook in section ]. One of the reasons that the second of these approaches in particular appears to Learning Learning the CY metricsupervised from or unsupervised a fashion Kähler (see potential section whichLearning can the be CY trained metric directly in (see either section a As this work was being completed, we became aware of related work in [ The rest of this paper is organized as follows. In section Having studied the case of metrics of In more detail, we first construct NNs that interpolate between numerical approxi- We will explore a number of different but related approaches to learning moduli- 3 21 SU(3) • • • SU(3) tails about our experiments andappendices. implementation of known algorithms can be found incoordinated the to appear simultaneously. theory, and there isan no guarantee thathere any avoids this given issue. proposed ansatz will turnto out approximate to the give complextion structure moduli dependence of Ricci-flat CY metrics. Sec- one can choose thewith non-zero the torsion analytic approaches classes that of havefor appeared the the in target forms the literature, defining whichgives an propose rise an to. ansatz InSpecific the constraints context on of the string torsion compactifications, classes this leads are to imposed a by shooting the problem. equations of motion of the torsion. We will again presentthe two other approaches: concerning one learning based thethe around metric the exact directly. use We analytic verify of results these anin methods ansatz for [ by and an reproducing be promising is that, in choosing contributions to the loss function, we will describe how existing techniques for computingproved numerical performance approximations in to efficiency CY (i.e.to given metrics. existing accuracy algorithms. We per find computation im- time) in comparison machine learning techniques to the metrics associated to fashion by optimizing loss functionsout measuring using the Donaldson’s deviation from algorithm).gebraic Ricci For Kähler flatness both (i.e. potential” types, with- [ themethod for metrics constructing are neural networks obtained whichusing from output ML approximations an to techniques “al- Ricci-flat directly. metrics We compare the efficiency of the methods we present with mations to Ricci-flat CYmoduli metrics space computed using at Donaldson’s different algorithm fixed [ points in complex structure dependent metrics with neural network (NN) approximations: JHEP05(2021)013 . . 3 g ¯ Ω J ∧ (2.2) (2.3) (2.1) ), this Ω 2.2 . Solving this equation ]. He proved uniqueness of 23 K ]. 1 In this section, we first highlight 1 ¯ g , Ω ∧ -fold and its associated Kähler form on the CY manifold by forming d Ω ∂φ Ω ∂ is supposed to be in the same cohomology κ log det ¯ +  J = ∂ g i form J J – 4 – ∂ . on a CY ∧ − = 0) φ , g J J = (3 ¯  ∧ i J corresponding to the Rici-flat metric. Another volume form . The second order Monge-Ampère equation arises by noting J φ is obtained from the Kähler potential with Kähler form ∂K CY g ∂ = g that is constant at any given point in . Using ( C ∈ κ The deviation from this proportionality measures how close a given metric is to Ricci One starts with any Kähler metric Calabi conjectured that for a compact Kähler manifold with vanishing first Chern class there exists a 1 While there are many CY spaces,for and all string of theory these applications spaces desire techniques — available in particular examples withKähler metric larger in Hodge the numbersthis same — metric cohomology we and, class restrict subsequently, with Yau proved vanishing the Ricci existence curvature of [ such a metric [ flatness. We will returnin to detail. this later in this section,2.2 when we discuss CY accuracy example: measures quintic hypersurfaces Since the top form is unique, these need to be proportional: for some becomes a Monge-Ampère equation for for some smooth zero-form that there are twoarises ways of from the building Kähler a form is (top) given volume in form. terms The of first the (top) holomorphic volume form (MA) equation. The Ricci flat metric class. Hence, it can be written as where the metric for a Kähler potential(PDE). corresponds The to following solving idea reduces a the fourth problem order to partial solving differential a equation second order Monge-Ampère using deep learning and compare their performance with2.1 existing techniques. Ricci flatness fromThe Ricci a curvature Monge-Ampère on equation Kähler manifolds can be written in a simple form The uniqueness and existence ofmanifolds metrics with with vanishing vanishing first Ricci Chern curvature on classseveral compact is important Kähler long known known. results relevantKähler for metrics, obtaining with numerical the approximations goalin to of our these setting ML up our approaches. notation and We to then introducing discuss examples several used approaches to finding numerical metrics 2 Ricci-flat CY metrics JHEP05(2021)013 (2.7) (2.6) (2.4) (2.5) , and as an b : z f = 1 | ]. We fix a a 6 z | gives five points , ) i ~z ( ) = 0 . Specifically, we focus ~z w is given by ( ) +1 i ψ d Ω ~z p ( P f ). , N , =1 2.4 i X ]. If we restrict to a patch where c = 0 . 1 24 dz N i ] z , the form +1 ≈ d a,b ,...,d +1 =0 z i Y d ^ 6= a, b : c =1 dA ψ c 6= + ... b CY c : dA +2 1 Vol – 5 – /∂z d i z ) d z : 1 with ~z ( 0 f c +1 =0 ψ z z i d X ψ X ∂p can be constructed straightforwardly for hypersurfaces = [ Z is largest as the one the defining equation is solved for. ) = ~z Ω -dimensional hypersurfaces in ~z | = d ( b Ω = encodes the complex structure dependence and where we ψ p CY C form /∂z ) ~z Vol ∈ 0) ( ψ d, ψ p f d ( | ψ by choosing two random points (with flat prior) on the unit sphere X 4 Z . Intersecting this line with the hypersurface 4 P P is the hypersurface constraint as introduced in ( (i.e. pick a set of local affine coordinates) and consider the coordinate ψ p For sampling points on the CY space we use the method outlined in [ In practice, we will use two types of conventions for coordinates on the CY mani- The holomorphic In line with the degree of this equation, CY one-folds (i.e. tori) of this type are called = 1 a not sampled with a flatweighted prior accordingly on in the the CY numerical manifold, Monte which Carlo means integration that of the some points function have to be choice of affine coordinates leads tofor values our in neural a unit networks. ball, which readily normalizes the input random line in isomorphic to on the CY manifold which we use as our sample. It should be noted that these points are coordinate as before. Forscale numerical stability, the it homogeneous is coordinate bestpick the to with dependent go the coordinate to largest asambient the space absolute above. affine and patch value This local where convention to coordinates defines we and one, on the the the relation variables CY between the manifold the CY by hypersurface uniquely equations specifying are the implicitly patch solved for. Note that this where fold. The first choicecoordinate is for to which stickAlternatively, to we the can full use set affine of coordinates homogeneous in coordinates, each and different pick patch the and pick the induced (implicit) function of the coordinates folds are called quintics. Mostquintic of hypersurfaces. our subsequent discussion is focused on the one-parameter or complete intersections in projectivez ambient spaces [ denote the homogeneous coordinates cubics, CY two-folds (i.e. K3 manifolds) of this type are called quartics, and CY three- on the one-parameter family of hypersurfaces where the parameter ourselves to prototype complex JHEP05(2021)013 An (2.9) (2.8) freely 2 (2.10) (2.11) (2.12) is much +2 d +2) d +2 Z ( d d × Z πi/ and grows like 2 o e +2 k ) d = Z +2 d for the quintic. Z ) 4 × k , α ( ] +2 d O ) enjoy a . The numerator is evaluated , , , the FS Kähler potential can +1 S i d ( n a ~z b ¯ ∼ 2.4 | z ¯ z , P k . a semi-direct product). We note a ] k a ) 0 +1 z CY ¯ k z ~ d z N o ( dA 2 . : α n Vol ¯ =0 β − k b ¯ d : a X ¯ s ¯ b z ¯ and the denominator can be obtained +1 a ¯ β b = ¯ d ... a α z ¯ ) = Ω kk k : i for more details. : H H ~z 1 ∧ ( ) a π z A 1 z ~z ... 1 2 w Ω ( : α , – 6 – α =0 = : 2 n s ) b ∝ ¯ z X 0 k a, : z K k =0 b ¯ 0 ¯ 1 β CY = N ln( X ∂ α z α, [ [ a k π 1 ∂ Vol 2 = 7→ 7→ d = ] ] k = b ¯ a +1 +1 g K d d H z z denote differentiation. These FS metrics can be generalised by : : ) defined in high-dimensional projective spaces to our ambient k -dimensional CY manifold, i.e...... d 2.11 : : . points is weighted with 1 1 denotes the group and z z N : : B.1 for a 0 0 +2 d z z [ [ ) in the case of the quintic). The embedding is constructed via global sections S : : 4 +1 . We refer the reader to appendix d P 4 k FS +2 +2 P ( (1) d (2) d ω ∗ p O Z Z i The following discrete symmetries can reduce the number of independent components ∼ ∝ containing the defining equations are removed. Details on the implementation are sum- of line bundles which are non-trivial on the CY: 2 k α In practice, the basisN of these sections is given by polynomials ofmarized degree in appendix eralised FS metrics ( space (i.e. s where the subscripts on adding a Hermitian matrix So-called algebraic metrics are obtained by considering non-trivial pull-backs of these gen- and the corresponding Kähler metric is There is a canonicalStudy choice (FS) of metrics. Kähler For metrics abe given in written complex complex as projective projective space spaces called Fubini- It should be noted thatbigger the (here, full (non-free) symmetrythat group the most genericenforce hypersurface them will in our not ML have models these to symmetries2.3 keep and our we ansatz hence as Metric general do ansätze as not possible. acting symmetry i.e. these symmetries act by multiplication of complex phases and by cyclic permutation. with the value offrom the the pullback top of form thedA Fubini-Study metric of the ambient space onto thein hypersurface, the Kähler potential ansatz. The quintic hypersurfaces ( where each of the JHEP05(2021)013 . = 0 such (2.13) (2.14) (2.15) k σ , we learn it in eigenvalues of the 2.7 for any given ] and do not rescale ] propose to compute 6 6 + 1 H k .

, ¯ Ω g , 3 punishes outliers in this measure ∧ n J

¯ Ω n Ω ¯ Ω ) det ∧ , the option of doing a Monte-Carlo -matrix. 3 i 1 κ ∧ Ω 6 J Ω . H − − Ω ( or 1 1 κ

B.2 = J X − have increasing multiplicities due to the fact that they Z J 1 – 7 – 4

here can be any Kähler form in the same Kähler ¯ Ω ∧ , these balanced metrics are unique and converge P ¯ α Ω J J ∧ ∧ = ∧ Ω 1 Ω → ∞ J MA X , i.e. we force the metric networks to learn the rescaled k In this sense, the algebraic metrics can be understood as R L = 3 η , we generally follow the authors of [ = action on the projective space. = 1 ]. ] provides a method which determines 8 4 σ κ 2.6 ). Note that fixed. accuracy. Note that a larger SU(4) , cf. [ 2.3 Ω σ 4 . When learning the metric directly in section P = 1 give the eigenfunctions corresponding to the first κ k is constant at each point on the CY space, the integrand vanishes and η to set is balanced. In the limit merical approximations. uses these measures as lossin functions, the finding language Ricci-flat of CY ML. metrics is readily defined Ω H One can check the convergence of the numerical method andOne compare can different optimize nu- the metric by minimizing these measures. In different words, if one Donaldson’s algorithm [ The eigenvalues of the scalar Laplacian on at degree 1. 2. 3 or defined in equation ( α J a normalization such that metrics since we keep carry representations of the which allows for someapproximation of overall the re-scaling of more strongly. In section Hence, if In practice, our losses can be of the generalized form which should be constant throughout theκ CY manifold. In fact,class this should (e.g. just be the equalmetric. Fubini-Study to metric) As an and accuracy need measure, the not authors be propose the to compute Kähler form of the Ricci-flat In order to evaluate the qualitythe of quantity an approximation, the authors of [ to the Ricci-flat CYour metric. implementation can Details be about found Donaldson’s in appendix algorithm,2.4 more definitions, and Accuracy measures Measuring how close a given metric is to the Ricci-flat CY metric is useful for two reasons: s scalar Laplacian on spectral expansions with coefficients given by the that interesting aspect of this parametrization is that linear of the global sections JHEP05(2021)013 (2.18) (2.19) (2.20) (2.16) (2.17) , we can ) j ( /∂~z . ) i ¯ ( k ¯ z d ∂~z ∧ = ¯  ¯ z via ij d . i T . Hence, if we use differ- ∧ and obtain an affine patch ,  . A good cross-check which i +1 n d i || +1 dz z ) P d = 2 = 1 ¯ k . This allows us to compute the z . Defining the standard patches j i ¯ , ijk n of i z c in patch . g /z i ) i . real conditions 2.7 ..., i U z ( † ij Im( and g = 0 i 18 T || +1 z i · z ¯ ,i = 0 = + ) i k , = 1 ( k n g 1 g i || n − · ) dz z i for our networks. We measure the Kählerity − z i ij ∧ – 8 – ijk c ¯ ,k T ¯  c i ¯ z g d = is then simply ,..., Re( ) = ∧ i 1 j j || ( z i z from the metric U g , ijk dz c i j 0 i,j,k X z to z i conditions ¯ ,k  = i U g = dJ 18 ) L i . Denoting the transition matrix with ( in patch g ~z ↔ ) j ( g lie in all standard affine patches 4 = 0 non-trivial complex or respectively , we can use the projective scaling to set dJ 9 automatically satisfies the overlap conditions, so these conditions are pri- } 2.3 6= 0 i z { The third consistency condition we need to impose is that the metric transforms cor- In addition to being Ricci flat, the solution has to be Kähler. Of course, if we learn a Alternatively, one can use the vanishing of the Ricci scalar as an accuracy measure That is, all points up to a measure zero set. = 4 i points ent patches as inputstransform to as describe dictated the by same the transition point functions on between the the CY patches. manifold, the metric should The transition function from transition functions for compute the metric marily used when weU go beyond the ansatzwith in coordinates section where in our experimentswe we used have to used test both our implementations is that thisrectly Kähler on loss overlaps is of zero patchesin of for section the the projective FS ambient metric. space. The Kähler potential ansatz network, we need toimplemented properly each of take these thisaccuracy into as follows account when taking . We have We implement these conditions byvariables. taking Note derivatives of that the thiswhere NN is derivatives with different are respect from taken toAs the with the the usual respect input induced backpropagation to coordinate, in theother i.e. neural parameters coordinates the networks, upon of coordinate imposing which the the is neural hypersurface implicitly network constraint, specified is layers. in an additional terms input of to the our This leads to when we learn thetakes longer Kähler and potential the directly. numerics appeared However, to using be less twoKähler stable additional potential, and this derivatives accurate. propertythe is metric guaranteed, so directly. we The only condition need is to that impose the fundamental it two-form when is learning closed, JHEP05(2021)013 (2.21) (2.22) overlaps, we . Finally, we . d µν . We define the 2.4 M and compute the , is the index of the . Typically, in the n 2.7 j

ψ j

p ) j 6= ¯ z ~ ∂ ( k † . Alternatively, instead of jk T · , where ) j with z ~z 2.6.1 . Now, we will also input the ( i k ) z j U , ( NN ) and the numerical values of points g n | · ) µν ~z 2.20 ( ] in our experiments. M | jk 27 T µ,ν X − ). Since we compute the loss for ) j – 9 – = ~z ( n ) into the loss. Again, this loss is non-negative and || k ( NN µν g ], and JAX [ is the index of the coordinate of a point on the CY

/d

1 i M 26 || k,j X . The present ML approach is in large parts facilitated 1 d 1 where = i U using the transition functions and write the transition loss as 5 via the sum of all matrix components, ], Tensorflow [ k 1 25 U Transition L n > as the sum of the absolute values of all matrix elements = 1 n A first approach is to use NNs for supervised regression on Kähler potentials using The motivation for an ML approach is that an improvement in speed and accuracy by In order to compute the overlap loss, we proceed as follows. As explained above, we We utilize PyTorch [ 5 constructing more accurate metric approximations in the sense ofthe section output of Donaldson’s algorithmperforming as fixed discussed point in iteration section of the T-operator in Donaldson’s algorithm, we can use the and metric respectively. using these numerical methodsThe enables current a numerical studymetrics benchmark of at CY is a metrics given fixed at by point a in Donaldson’s much moduli broader algorithm, space scope. which and computes becomes significantly more expensive when differential equations. Thelearn representations first of the implementation Kähler choicesetup potential or one can the has be metric.by to found A schematic readily make in overview available of is figure frameworks either optimize whether implementing the to auto-differentiation, appropriate as loss this functions allows which involve one derivatives to of the Kähler potential code, we can again use the Fubini-Study metric, which2.5 is well-defined on Finding the overlaps. metrics withGiven machine the learning accuracy measuresbe just formulated discussed, as it a is continuous clear ML that optimization finding problem Ricci-flat of metrics the can underlying algebraic and and for sum in the loss overintroduce all patches a (except conventional for factorgoes of to zero if the metric transforms correctly across patches. In order to cross-check the where the transition functions arewe as use explained in are ( explainedmatrix when norm we for describe our metric networks in section resulting expression for the metricvalue based in on the this NN patches input. We then compute the expected usually go to themanifold patch with the largestcoordinate absolute which value, has and we theinput solve largest for for absolute the value of NNcoordinates the we of already the divide point all in coordinates question by in other patches JHEP05(2021)013 ). ¯ b = 0 a g C.3 dJ ) and the left ). We stress ( ). Moreover, θ

H 6= 0 2.6.2

dJ . Model 2.7 earnable parameters l -accuracy measure to directly opti- ⃗ σ z ψ ¯ b ], our approach takes into account the moduli a 9 grows rapidly. K g – 10 – 2 k N -matrix as a parametrization of Kähler potentials has there exist better approximations (as quantified by the directly is that it is more general, for instance allowing H ⃗ k g z H in order to be able to back-propagate in the optimization step i z θ

-matrix as an input to our neural network (cf. section ) than the ones obtained from Donaldson’s algorithm. We also demon- H σ metric, while the ansatz for the Kähler potential requires dealing with , for finite, fixed ) are designed. The respective models are neural networks of different complexity. d Model × → ∞ learnable parameters right d ( and the inputs k . Schematic overview of how models predicting the Hermitian matrix b ¯ g a g The feed-forward neural networks are implemented with standard packages. However, The advantage of learning the Kähler potential is that it automatically satisfies ψ Tensorflow and PyTorch but can be avoided by using JAX. 2.6 Learning the KählerAs potential mentioned above, learning the several advantages: optimization task needed to solve theIn equations particular, that ensure we that implemented the themultiplication resulting and transition metric is the function CY. complex computations derivativesoutput as in well terms as of real the andthrough matrix imaginary the parts respective of losses. the NN This splitting into real and imaginary parts is required in the loss functions associatedis also to worth the noting accuracysupervised that, learning measures when approach. are learning Indeed, customprovide the we implementations. labels metrics do for not directly, It supervised know we the learning. are CY not Instead, metrics dealing the and with hence loss a cannot functions encode the continuous advantage of learning the metric for larger functional flexibilitylearning (e.g. the ability to metric capturehermitian directly solutions requires with onlymatrices learning whose number the of independent components components of the terms of the architecture.Kähler In potential principle, in one theInstead, can we neural also learn network, start the which metric with directly we a which do different we ansatz not discuss forand in pursue in the section further the in case of this the article. algebraic metric ansatz the overlap conditions are guaranteed. The flatness measure strate that using theAlthough more this is expensive similar Ricci todependence scalar the of approach as the in a [ lossthat is altering feasible the setup (cf. to appendix include multiple complex structure moduli is straightforward in metric same ansatz for themize Kähler the potential, output but of utilize ourverge the neural for network. While Donaldson’s algorithm is guaranteed to con- Figure 1 JHEP05(2021)013 . σ = 35 , and α k s N points. In matrix. Due . To this end, 15 80000 H × 15 , which has is a real independent entries, = 3 from a flat prior with 2 k H k N ψ middle). We assess the quality obtained by the NN on the test ). Hence, this approach requires has 2 ) that σ ) with B.3 H B.3 2.4 to zero or to be equal. For example in independent real components. However, H . a (see figure matrix z k N 100 = 225 ). – 11 – 2 × ≤ k using Donaldson’s algorithm using ) 15 , we find from ( N ψ ψ ( B.2.1 = 2 as can be seen from ( Im k k ≤ . It is a nice cross-check that the off-diagonal components for different choices of complex structures using Donaldson’s 100 H ) with − H can force many entries of 2.4 (see appendix ¯ s and · (0) . 4 H H We present results for the quintic ( 100 · for different values of P and impose relations between the diagonal entries. This leaves only two s ≤ H 0 ) ψ ( . Re grows rapidly with larger H k ≤ The CY metric is bypatches construction of globally defined, i.e. it glues nicely acrossThe different resulting Kähler potentialconsequently in is terms given of explicitly the coordinates in terms of the sections The CY metric is guaranteed to be complex Kähler. being Hermitian, this matrix has and the output are the independent real and imaginary components of the Hermitian N H • • • ψ 100 distance). For reference,from we computing also the compare Kählerprovide these potential a results at lower each with bound point for theNNs in the result or the quality from that of test using the is set.Donaldson’s a approximation obtained that algorithm wrong In can but theory, does be this close-by not obtained should approximation. produce from In the the practice, Kähler as potential alluded with to the above, lowest possible We computed one experiment,− we randomlyof the NN drew interpolation byset comparing 100 with the the error result measure values onea would point for obtain of by the using training the set “wrong” that Kähler is potential closest computed in for complex structure moduli space (in Euclidean algorithm. The input toof the NN is justmatrix the real part, imaginary part, andExperiments. absolute value 2.6.1 Supervised regressionFirst, of we demonstrate Donaldson’s that Kähler it is potentials in easily possible a to train supervised a NN learninghas to learn sufficient setup. the functional moduli dependence capabilities Thiswe to provides compute learn a the the simple matrix moduli check dependence that of the NN architecture real degrees of freedomautomatically become in zero and thesame respective numerical diagonal components values automatically in have Donaldson’sthe the algorithm, initial even matrix if they have not been chosen to for equivariance of the case of the quinticto ( the (multiplication by a complex phaseentries and to permutation) symmetries be force the off-diagonal However, the disadvantage is thatand the more and more training datato in learn order the to complex fix structure the dependencemetries coefficients efficiently. can and It tremendously allow should reduce the be the interpolating noted NN that number discrete of sym- complex structure parameters. Moreover, JHEP05(2021)013

In 1

. We 6

i − . H

Donaldson NN prediction nearest value Donaldson NN prediction nearest value i Algorithm Algorithm ; indeed, . On the

C

|

1 − ψ for different for different ψ

| i

σ σ

i

46

0+10 + 10 . i

61

10 i − −

17

.

5

. middle). Given

10 i i

51

86

. of the test set

7+63 + 87

40

.

i 3

− σ

8

0+10 + 10 − .

i 53 −

for sparse sampling.

10

29 −

. i

51 for random sampling. −

25 . On the left, we can

.

4+52 + 94 10

− i .

64 i

− ψ 81 −

100

41

. 3

of the test set 11

ψ .

i i

σ 95 34

.

100 −

9+99 + 49

. i

1

91

. increases by a factor of 2

7

8+86 + 68 −

100 . i

45 and 46

.

29

.

80 σ i

32

100

36 2 ): comparison of ): comparison of

. (see figure

96 . 3 2 1 0 19

. . . .

) 5 4 3 2 1 0

0 0 0 0 66 + 82 ......

.

σ 0 0 0 0 0 0

σ 69 − } in the intermediate range. Right Right , this grid contains only very few ψ 100 k ± , train test closest train test closest 10 ± , 2 1 100 10 ± , 1 10 0 – 12 – 50 -plane. We find that ψ 0 ∈ { 10 ) ) plane, which is why we do not think that this is related to the ψ ψ 0 0 0 Re( Re( ψ 10 a, b . This illustrates that for larger complex structure, we − in train and test set in train and test set -dimensional and gives the independent entries of ψ 50 ψ 2 1 − k 10 − ]. Overall, we find that the results do not depend stron gly N with (100) 2 10 100 28 − − ib for Donaldson’s algorithm as a function of the error is larger than for 0 1 2 0 2 1 0 0 50 50 for Donaldson’s algorithm as a function of 10 10 10 100 10 10 10 100 −

∼ O

− − − −

+ ) ψ Im( ) ψ Im( σ ψ σ in the complex ψ a σ in order to achieve the same quality of approximation of the Ricci-flat = used in the training and evaluation set. ( used in the training and evaluation set. ( k is very costly, especially for larger and ψ ψ ψ H ): errors , and we find that sometimes the NN and/or the nearest points produce ): errors (1) k Left Left ∼ O . The output layer is .( .( ψ ψ ): values of ): values of The results of our two experiments are presented in figures We use feedforward neural networks with 3 hidden layers and ReLU activation. The We also repeat this analysis, but this time we sample from a grid of complex structure This happens throughout the complex 6 Middle Middle metric. However, interestingly, the errorfor does very not increase small monotonically non-zero the with figure in the middle, we show the training (blue)conifold and point. test (orange) sets for function, the optimizer, the batch size, etc. Further detailssee are the given error in measure appendix between need to go to larger input layer is three-dimensional, asvalue it of takes the real part,use imaginary ADAM part, as and an the optimizeron absolute [ hyperparameter choices such as the learning rate, the network architecture or activation error at fixed better results than a direct computation following Donaldson’s algorithm. values of the form that computing samples as compared to the rather dense sampling used in the first experiment. Figure 3 ( approximation approaches. Figure 2 ( approximation approaches. JHEP05(2021)013 H . that C . This from a H H = 2 is positive n H as computed via . However, as the matrices exist for ) with ψ H H values. 2.15 ψ . This is followed some , the differences for this ) 2 ψ is very time-consuming. Of ( H independent components in matrices produced by Donald- H 42025 in the training set. directly ψ dependence of the coefficients in H ψ – 13 – and the complex angle arg | ψ ], with the key difference that we are learning the moduli | 9 that converge to the Ricci-flat metric at a significantly faster , which corresponds to ψ = 6 k , the results of using the nearest neighbor and the NN are small, as to } 1 (note the log scale on the axes). In that case, for the innermost “square” ± as computed by the NN, and from using the wrong . We find that this approach produces better accuracies while taking , 3 1 for all points in the evaluation set as obtained from Donaldson’s algorithm, H H and value of σ . Based on several experiments, we have found the following architecture to k H } ∈ {± to a, b ψ { As a balance between numerical cost and quality of approximation, our experiments We performed several experiments to find the best NN architecture to model the maps Instead of building on top of Donaldson’s algorithm, we now study networks that are As one can see for the randomly sampled points in figure . We have found that the Cholesky decomposition typically leads to better results than definite in addition to Hermitian. Further technical details canhere be are found performed in for appendix (as a general Hermitianeach matrix). degree We expect generally that optimal from work well. Asnumber input of dense we layers take using(without the activation) sigmoid is activation added function.H that Finally, maps another dense to layer theencoding needed the number of real complex and parameters imaginary of parts directly, which ensures the output trained directly using the Monge-Ampèreis loss similar defined to in the equationdependence approach ( of of [ similar amounts of time as Donaldson’s algorithm over a range of that approximate the Ricci-flatson’s metric algorithm. by learning While the training we to produce have useful seen interpolations,corresponding that the computational approach only cost is of few still Donaldson’s limited data algorithm. by are the accuracy needed and for the supervised neural networks. However,training given time, and the the ease quality of of the implementing results2.6.2 the we NN, refrained from the Learning exploring extremely the thisIn Hermitian further. fast the matrix previous section, we have shown that it is in principle feasible to train networks the nearest neighbor approximationbe becomes seen much from worse the than outerapproximation the square. of NN This the prediction, illustrates functional as thatrelatively form can the small of NN sample, the can which already iscourse learn important since a one computing reasonable could compare the NN against interpolation/regression algorithms other than has been obtained fromThis Donaldson’s changes algorithm if the for sampling a ofas points shown nearby in in point figure the is complex structure notwith plane too becomes large more sparse, either. be expected from our resultsdistance for densely between sampled, the randomly nearest distributed neighbors and the actual complex structure point increases, from using the Donaldson’s algorithm for the closest available rather fine sampling of pointsmeans in that the the complex NN structure works plane well, are it not also very shows large. that While the this error one makes by taking the right, we plot JHEP05(2021)013 , 2 d 10 , a = 12 < 175 | k , we can ψ over the | accuracies H | ≈ < σ ψ 0 | = 100 ψ values over which we optimize. . This improvement is not only k ψ -accuracy achieved by a network with one is strongly dependent on the chosen network σ ψ ]. Our networks could in principle find these functions for learning the Kähler potential, the 9 – 14 – -fold. 2 k directly. This has several potential advantages: d N g . One can see an improvement in the , and by the range of takes only on the order of minutes, while Donaldson’s shows the H ψ = 1000 4 to | ψ | ψ accuracy comparable to Donaldson’s algorithm at degree σ . Figure takes on the order of days using the same hardware. . However, their ability to do so will be limited by the complexity of 100 H < = 12 | in the appendix). This is a noteworthy result, as training the network at k ψ | 12 < over the whole range of 0 Instead of the need ofNN predicting always only needsreal to parameters predict for the a independent complex components CY of theIn comparison metric, to i.e. approaches which use aing general the ansatz metric for directly the Kähler saves potential, two derivatives learn- when evaluating the Monge-Ampère loss. Our experiments show that this ML approach can outperform Donaldson’s algorithm For the main experiment, we have chosen to use uniformly distributed values in the • • = 6 also train a NNof to the directly metric will learnstructure. depend a In on functional contrast to the expression theto position for methods learn in the presented the the to components CY learn CY of metric. the manifold the Kähler metric as potential, The we well value now as aim on the complex feasibility of this MLfuture approach, research. and we leave further optimization of the2.7 architectures to Learning the metricInstead directly of using a NN to learn the complex structure dependence of the matrix outside the training region. in efficiency (i.e. theaccuracies achieved accuracy over the which desired range canarchitecture, of be as achieved is in the a extrapolation given computing behavior. time). Our The primary focus here is showing the extrapolation no longer outperformsbetter Donaldson’s algorithm approximation beyond than this anentire point, extrapolation shown it range. of is still Donaldson’s Theresult a metric latter shows at is that shown theover in a machine the range learning figure of approach as values can a in dashed predict moduli line. space. very good In Moreover, it summary, approximations can the also extrapolate (within reason) training range, the figureset shows for how larger well valuescompared up the to to network Donaldson’s extrapolates algorithmpresent beyond at over the the same the training degree rangefactor of our 2 algorithm beyond was the trained regime on, used but during extends training. up While to the accuracy of the network k algorithm at range and two dense hiddenperforming layers, ones respectively. from a These search architectures over several were architectures. chosen Besides as the the performance best- on the optimal values of the model that mapsIn from an initial experiment wherethe the network network reached was a optimized(see on uniform figure values rate than Donaldson’s balanced metrics [ JHEP05(2021)013 3 σ 0 1 (2.23) . The error band in each ψ , accuracy achieved by Donaldson’s σ overlap L 3 λ | + 2 | 0 over real values of dJ 1 L 2 λ – 15 – pulled back to the CY space. Since this provides = 100 + that was not used during training, and thus shows the | ψ +1 0 ψ d MA 0 | 1 P L 1 is shown. The dashed line corresponds to the extrapolation of these losses have to be chosen experimentally in a = λ | i

achieved by the dense network with one and two hidden layers. ψ λ | = m ), we will need to impose that the Kähler and gluing con- o L r = 6 f 2.3

n k o i . So, in addition to finding a Ricci-flat metric that solves the is not automatically Kähler, nor does it automatically glue nicely t a g l +1 d o p P Accuracy for Accuracy a r t x DenseModel-1 DenseModel-2 Donaldson k=6 E accuracies at σ . 1 . The metric 0 1 -structure metrics. 1 0 0 Experimentally, we have found that it is beneficial to start near a solution which Finding a NN that computes the CY metric for a given point and complex structure However, there is also a disadvantage as compared to the method discussed in sec- 2.6 0 1 1 satisfies the overlap conditionsknow approximately. several While metrics we do thatStudy not are (FS) know metric Kähler the on and CYa the metric, glue ambient promising we nicely starting on pointthe in the Ricci-flatness the CY criterion sense manifold: needs that out the to of Fubini- be the optimized, three we consistency will conditions start only our network as a small components: where the optimalhyperparameter weighting search. ditions are satisfied.ensured As by construction mentioned also previously, allows usSU(3) the to fact apply this that approach to the more Kähler general (non-Kähler) property isthen not comes down to optimizing the parameters of the NN subject to these three loss To the best of ourdifferences knowledge, can our be experiments numerically are advantageous. the first to test whethertion these heuristic across patches of Monge-Ampère equation ( algorithm for the sameof range using of Donaldson’s balancedcase metric corresponds at to theaccuracy maximal at and different angles. minimal value obtained respectively when evaluating the Figure 4 The shaded area indicatesextrapolation the range behavior of of the networks. For reference, the JHEP05(2021)013 (2.27) (2.24) (2.25) (2.26) is a real .  ) D .  32 ) L 32 g Im( , ) Im( , 31 ) L 31 , but reconstructing the g ii . Im( D ) , i Im( ) , NN ) Q 21 g L 21 g ) = g Im( (1 + , Im( ) , ’s along its diagonal, while ) FS 1 32 det( g . 32 L g † = . Adding information about which ambient Re( | Re( (2) , i , ) ) – 16 – 31 , g = LDL 31 /∂z L g g ψ NN g ∂p Re( Re( | , , + ) ) 21 21 FS g g L here. = Re( Re( , , (1) = 10 33 33 g ψ , g ,D 22 22 , g ,D 11 g 11 D ) = and the ambient space coordinates in which the pulled-back metric is expressed. is a complex lower triangular matrix with ) = ψ z ψ ~z, ( L z ψ ~z, In the first experiment, we aim to learn the metric using a single neural network for all When outputting the information about the metric components we have explored two N ( N space coordinates have notambiguity at been a measure scaled zerospace to set coordinates one of have points or the oninformation solved the same CY actually for (largest) manifold absolute does only where value. not twoonly resolves or display We impact a more results checked training ambient theoretical for that or omitting final this accuracies. For concreteness, we manifold in homogeneous ambient spaceparts coordinates, of together with the realIt and should imaginary be notedimplicitly that to the information the about NN,homogeneous the ambient since space ambient we space coordinates coordinates has gofor been is the to scaled available coordinate the to with one patch and the where where largest the we have largest solved absolute value of the Below, we present two experiments todirectly demonstrate works that this and methodstarting produces of point. learning results the metric that clearly improve throughout trainingpatches. This from network our takes as an input the real and imaginary parts of a point on the CY In the latter case, theactual determinant metric is requires computed two as matrix multiplications. 2.7.1 Experiments In the LDL parametrization,composition: we output the non-determined components of this LDL de- Here, diagonal matrix. When learning the metricof directly, the we output metric the independent components directions, either tocomponents output in the the LDL real decomposition and of imaginary the metric parts of the metric directly or the A priori, it ismined unclear which which approach type leads to of the network best works result. better, and we experimentally deter- perturbation around the Fubini-Study metric utilizing one of the two ansätze: JHEP05(2021)013 , 3 9 λ 10 from error < | σ Losses (2) ψ K¨ahler Overlap Total Monge-Amp`ere | g < 0 20 18 (middle) and 16 2 λ 14 ): optimizing the NN 12 10 epoch Right 8 loss can be read off from 6 ). ( Average loss per epoch 4 σ 2 = 0 0 2 ): optimizing the NN with all three λ 3 2 1 Losses

K¨ahler Overlap Total Monge-Amp`ere

10 10 10 loss Left 20 . The evolution of the three components 18 5 16 14 . Note that the 12 – 17 – 10 06 . epoch . 8 ψ 0 accuracy is improving during training. We observe 6 Average loss per epoch to σ 4 2 2 . We have observed that including more training points . 0 0 3 2 1 Losses

= 6 ).

K¨ahler Overlap Total Monge-Amp`ere 10 10 10 loss are shown in figure k = 0 . The results shown below are achieved with standard feedforward 3 . This flatness accuracy is the same level that is reached with 20 D λ = 10 18 16 ψ 14 = 9000 12 1 10 ) outperforms the additive ansatz. ): optimizing the NN without Kähler loss (i.e. epoch λ 8 × 6 , where we see that the 2.24 Average loss per epoch 4 . Evolution of the training loss during training. ( 6 Middle 2 0 In the second example, we learn the LDL-components of the metric, and we have a The results for For both types of networks, we have performed hyperparameter tuning, as discussed in In the second experiment, we train the network on points in the interval 3 2 1

10 10 10 loss separate network for each ofin the figure five coordinatethat patches. the The individual training networks evolutionwhich exhibit is occur jumps shown in in close theimplementation, vicinity Monge-Ampère we to loss observe increases at that in different when the times, we overlap loss. do not In include contrast the to the overlap loss previous condition the the overlap condition, respectively.even Interestingly, if we they find wereThey that not increase nevertheless, being by these a optimized losses, end factor for of of training in 15 where the and they multiplicative 2.5 are ansatz, respectively included in when do the compared not optimizer. with blow-up. the loss at the Donaldson’s algorithm for (which are easily obtainable) improves theNote accuracies, and that we expect the thisperformance trend initial to of points continue. the at Fubini-Studythe zero metric, initialized perturbed epochs but untrained by provide NN.(right) a an to We zero; (small) also approximate in trained random other comparison the words, permutation NN we to of do with not the setting optimize the NN to solve the Kähler condition and that make up themeasure total loss has function gone arethe down plotted Monge-Ampère on from the loss, left.batch_size since After they 20 are epochs, the just proportional with proportionality constant more detail in appendix neural networks thatoutput have dimensions. three In hidden bothequation dense cases, ( we layers have and found a that the dense multiplicative output ansatz layer with and we use a separateand neural imaginary network for parts each of patch.with the Each the points network complex takes in structure as affine parameter input coordinates the real for the respective patch, together Figure 5 losses. ( without overlap loss (i.e. JHEP05(2021)013 σ 200 ): 175 in the . If one should 150 J 6 Middle = 1 125 κ 100 Epoch and is obtained from , this is not the 75 5 1 20 50 . The networks were > Mean Deviation from induced FS-Metric 2 < 25 | 11 π/ ψ h 3 | 0 0 1 2 <

° °

10

values used for training.

| °

10 | 10

| i | i g g

FS NN ) and by setting 10 ψ , and . This can be scaled out and 3 0 2.6 . , π 20 2 J 5 . X 17 accuracies (as explained above) and R , π/ Metric Network Induced FS 0 0 . σ 15 5 . 12 0 | . √ | 10 5 , namely Sigma-Accuracy . 7 ): deviation from the induced Fubini study metric iθ – 18 – 0 . 5 re 5 . = Right 2 ψ 0 . from our construction ( 0 accuracies for the second experiment, which are plotted 2 1 1 1 1 ° ° ° ° ° Ω for 10 10 10 10 10 σ £ £ £ £

4 3 2 6 θ æ 200 , and the performance in the range 10 175 . Losses Monge-Amp`ere K¨ahler Overlap Total < 6 150 | . By fixing ψ ¯ Ω | 125 ∧ 100 Ω Epoch ): evolution of the validation loss components for our network trained on the range X 75 R Average Loss per Epoch . The loss components are averaged over the respective patch-networks. ( Left 50 10 ), we fix the Kähler class. In more general setups with .( < 25 | 2.14 ψ 0 As a final comment, it should be noted that for the CYs considered in this paper, the We want to point out that the magnitudes of the losses in figures | 6 7 0 1 2 3 4 5 ° ° ° ° ° ° °

10 <

10 10 10 10 10 10 10 Loss wants to fix the Kählerenforce class, it one by computing can these easilyleave integrals add and the the adding Kähler Kähler them class class to unspecified,integration: as the but loss. input since then to Alternatively, one one the the can needs NNswith weights to and respect depend be to on careful the the with Kähler the Kähler class Monte of class, Carlo the they current, need approximate metric to after be each re-evaluated update. Kähler class is fixedabsorbed by in the volumeloss as ( measured by case. Then, one canappropriate simply basis integrate of the intersections pull-backnumbers, of of one pairs the can of then numerical divisors. Kähler pick out With form the knowledge over coefficients an of of the the intersection duals of divisor classes in the Monge-Ampère losses for thethen first compare experiment the to resultin to the the middle of figure not be compared directly,the as networks the were evaluated architectures,the are the relative different normalization, scaling for and of both thearchitectures the start experiments. losses, points close one to For where can the a induced look Fubini-Study rough at metric. estimate the Alternatively, one of beginning can of convert the training, since both overlap conditions are severely violated. Sincefor the all first patches, approach has the a difference singlenetwork between neural the in network two the experiments first comes experiment downthe to has second the shared experiment fact weights have that between separate the the weights that patches, are, while however, the simultaneously NNs optimized. in accuracy on different patches.is (We also have over multiple four networkstrained which different on are angles values trained of here.)extrapolation beyond This average the trainingduring training, set. averaged over ( each patch-network and across the Figure 6 0 accuracy of our networkindicate in the comparison with mean the value induced of Fubini-Study the accuracy. performance, The and data the points error bars show the minimal and maximal JHEP05(2021)013 (3.3) (3.4) (3.1) (3.2) SU(3) denotes y . satisfying the + Ω Ω . They read: four-dimensional d Ω y + . Ω = 1 and 1 2 , which make the above − J structures appear in the N can be trivially obtained = 3 = 0 , 5 arbitrary 3 W J Ω W 3 SU(3) ∧ ∧ 5 Ω = 0 J + and ∧ W ∧ J 2 dJ , W 2 ]. In that case, the requirement for a ∧ + y W 4 J W 31 holonomy. We will need a small amount dφ , W J structure, one can easily reconstruct the ,J 1 2 – W ∧ = and a complex three-form Ω 29 2 = 5 is the almost complex structure determined Ω = 4 ∧ J l W W SU(3) ∧ m SU(3) Ω) + Ω 2 1 1 + 3 J – 19 – i 3 4 ,W J W = W ( 4 ∧ of an = Ω , = ], before proceeding to discuss how one can set up Im d J Ω J where y J 1 3 2 21 Ω) 2 structure ,W ∧ ∧ − W ln J J, J obeying the algebraic conditions above, the nature of the 3 ( J and 1 l = 12 m unique. Frequently in what follows it will be useful to have ∧ W J = 0 to indicate real and imaginary parts, and the symbol J Ω = Ω) J = 2 dJ Ω d ), the other two classes d ± J, SU(3) = W structures in general and how they appear in com- ( dJ 3.3 y = mn and Ω 1 g i . 1 6 W Ω SU(3) dJ − structure is encoded in five torsion classes. These are determined in terms = structure on a real six-dimensional manifold can be specified by two nowhere 1 W SU(3) ). And given the data is the heterotic dilaton. In this section, as an example of how machine learn- SU(3) structures than the CY metrics with 3.2 φ As mentioned in the introduction, a wide variety of An structures, we will generate metrics associated to structures of this form. good supersymmetric vacuum can succinctly be stated as follows: Here ing techniques can be used to numerically find metrics associated to non-Ricci-flat all of the torsionbeen the classes focus vanish: offor previous this section. computational reduces However, ease, to this andpaper, special the we many case Ricci-flat will is other present CY frequently possible justclasses studied manifolds one torsion simply of illustrative that classes example: the have are the Strominger-HullMinkowski general of vacuum system constraints in interest. that on heterotic the are string In torsion theory required this [ for an from ( associated metric as by the three-form subject of string compactifications. By far the most widely studied case is that were Here, we use subscripts of contraction with indices beingthe raised torsion with classes the in metric. ( Given the expression for three of together with the conditions decomposition of straight forward formulae for extracting the torsion classes given Given the pair ofresulting forms of the exterior derivatives of numerical approaches to finding the associated metrics. vanishing forms. These arefollowing a algebraic real relations. two-form In this section, weSU(3) will turn our attentionof to background metrics on that arepactifications in associated particular. to Here, more wethe general will notation quickly summarize and the conventions needed material of following [ 3 CY metrics with JHEP05(2021)013 Ω and (3.7) (3.6) (3.5) SU(3) J . i real functions, m with the restriction in the ansatz for for all are structure is to ensure i 0 i J ) is automatic, the first Ω a + 1 structures on CY three- i 3.1 n SU(3) . are taken to be everywhere = 0 i k ]. The . The condition that the first i r a a Ω SU(3) q j 2 K r 32 a , i A − a P . 24 i +     n ijk 0 homogeneous equations in an ambient and replace the Λ 1 m . . . K K Ω structure metrics is somewhat similar in ) denotes the homogeneous multi-degree =1 1 m i K =1 A . . . 3.5 m = 0 P . . . q . . . q X 2 i,j,k 1 1 m . . . 1 SU(3) ) is that it automatically ensures that A Ω = q q = – 20 – ’s in ( , 2 3.6 1 | q m n 2 . . . n i J P A P i | a     + i 2 m ] if we set | structure for such choices of the undetermined functions, X 1 21 A = | is the usual expression for the closed holomorphic three-form are chosen to be nowhere vanishing. In addition, this ansatz J 0 2 SU(3) Ω are globally well-defined and nowhere vanishing. One approach to A . Each column of Ω . It is important to note that in most of the discussion that follows, m n and P are complex functions. This ansatz becomes the same as that which was and 1 2.6.2 defining equations in the coordinates of the ambient space factors. Clearly, 2 × A J A is the restriction to the CY manifold of an algebraic Kähler form for the K ... i J × and 1 )). Meanwhile, n 1 P A ]. This approach to machine learning The benefit of an ansatz such as ( Here, On such a manifold, we make the following ansatz The ansatz we will consider will provide (torsional) 2.11 ambient projective space factor (this can be derived from a Kähler potential described 21 are nowhere vanishing and globally well-defined if the th automatically defines an subject to one further condition.is While only the satisfied second if condition the in ( following relationship between the functions holds: structure being considered from thespace. integral We complex will structure see inherited from this the in ambient more detail shortly. Ω positive and if associated with the Ricci-flat structurewhile on the CICY [ used in the analyticof work Fubini-Study of Kähler [ forms.can be We important note in that that it including allows the us to form divorce the almost complex structure of the i by ( space of one of the the complex dimension ofChern such class a vanishes can manifold be is satisfied by insisting that Such a manifold is the common solution set of spirit to section we will consider the case where the moduli havefolds been described fixed as to a acan specific complete describe value. intersection such a in manifold products in of terms projective of a spaces configuration (CICYs). matrix. We One of the more difficultthat issues the in numerically forms searching foraddressing an this issue is toAs impose an an example for ansatz this, whichin we enforces [ will such consider behavior a from generalization the of outset. the ansatz that was considered 3.1 Learning an ansatz JHEP05(2021)013 , i ). a and 3.3 (3.8) (3.9) = 0 2 0 ) explicitly ) and ( Ω W ], even when and the real 2 structure, for 3.7 i 3.2 A 21 a y ) 2 2 A structure given the A SU(3) ] where + + 21 1 1 A A ( SU(3) ∂ i − 0 0 Ω Ω 1 ∧ A 0 ) gives rise to a ). y Ω ) 2 3.7 3.9 2 ijk ’s, perhaps with some additional ancillary A i A Λ i J in ( + + 3 4 1 2 1 A = W . A ( – 21 – k ∂ ) i J 2 2 A ∧ + A as exponentials of real functions would be sufficient j 0 + J + 2 Ω 1 1 y i A ∧ ) A 2 J i i A ( J J ∂ ∧ ). There is no need to have a contribution to the loss function ∧ ) i∂A and we regain the expressions produced in [ + 4 i 3.6 1 + ), our goal is to set up a NN which takes as input the real and ) W structure, we can compute its torsion classes using ( da 2 0 A structures described by the ansatz is no longer strongly linked to 2 ) holds. We have two options here. We can solve ( ( = 0 A 3.6 − Ω y A i y i 2 is defined via the following equation. 3.7 ), subject to the constraint ( J + 1 + A 1 da Λ 1 SU(3) i ( holonomy structure. As such, it no longer has to be integrable: we can parameters appearing in the 3.6 ∂A SU(3) A X A i ( with Fubini-Study Kähler forms. The almost complex structure which is i was simpler. We see again here that the generalization of the ansatz we are 2 ∂ 1 H X − i J 5 . The ansatz itself guarantees the former, and encoding the = = = = 0 = W SU(3) 4 5 3 1 2 2.7 W W W W W and the 2 In terms of loss functions, several of the requirements that should be imposed are Given the ansatz ( Given such an A , 1 in section and imaginary parts of to enforce the latter.above discussion, The if result ( for is one also of the guaranteed defining to functions be of an the ansatz in terms of the others. Or we can include a A data as we will describe shortly. automatically satisfied by ( which aims to enforce global well-definedness and non-vanishing, for example, as we did describe non-integrable almost complex structuresthis on manner, the leading underlying to complex the manifold non-vanishing in imaginary parts of a pointcoordinates), on together the perhaps CICY with threefold the (indefining real terms and relation of imaginary if homogeneous parts such ambient of space some dependence coefficients is in desired. the As an output, the NN should give the the form of introducing does produce a qualitativereplacing difference to the that which appeared inassociated [ to the that of the Note that if we set We find that Thus the ansatz ( any appropriate choice of the functions that appear. In this expression, JHEP05(2021)013 (3.12) (3.13) (3.14) (3.10) (3.11) 5 . W n ) analytically, one would L 4

as constraints to consider. -structure solutions to the γ , we obtain k ) we then add the following 3.7 a . . j + dφ a 3.3 = 0 n 4 i Ω

), we then arrive at the following = 1 n a SU(3) W

∧ Ω 5 in ( L W ijk ), one would ask instead that the ) 3 5 ∧ W dφ 3.13 γ Λ + ) W Ω = − + + 2.18 =1 n d 2 4 Ω m + k y X d is slightly less straightforward given that W + Ω W and i,j,k ) was desired. y ) and ( as part of the output of the NN: this is an dφ d L 2 Ω 4 + 2 y − dφ φ 1 2 − 3.4 γ Ω + W 2 | 3.12 1 2 = Ω + 2 and – 22 – dJ 1 2 5 A y ), ( | Ω + ( − J W d = 0 k + Ω + ( SU(3) 2 = 3.10 d 2 = is automatic given our ansatz, and for the Strominger L | = =

4 1 1 W 5 4 J γ A = W | W W ∧ = 0 2 =

2 L L 2 ). In the case where one solves ( 1 W = structure that we are trying to produce. Instead of imposing W L W 2.23 ), together with the condition SU(3) can therefore be enforced by including the loss function contribution 3.3 Strominger L SU(3) ) that ’s in ( allow us to weight the various conditions being imposed differently, L . Clearly, analogous loss functions could be set up for the constraints λ = 0 3.9 + 2 = 0 R ) and ( 1 W ∈ γ 3.2 is arbitrary so that we do not need to include these quantities in any loss i γ 3 would be. This leads us to include W φ Combining the contributions in ( The final set of conditions We see from ( The remaining contributions to the loss function would all be concerned with the take a given desired form. What would be required here would depend upon the i Here, the analogously to the of course set placed upon torsion classes by other string compactifications. total loss function, in theStrominger case system: where we are interested in mentioned above. Given the expressionscontributions for to the loss function: we currently do notdilaton know, in any realisticexample of application, the what extra the ancillary data profile that for can the sometimes heterotic be required in the output that was The condition system function. This just leavesCombining us ( with a loss function tryingW to enforce Kählerityphysical as application, with in different ( stringthe constructions torsion placing classes. different As constraintsused a upon if concrete a example, solution let to us the discuss Strominger the system loss ( functions that would be torsion classes of the contribution to the loss function of the form, JHEP05(2021)013 ] 7 – are 4 ), to a (3.15) (3.16) X 3.9 to be the Fubini- ] give the following 1 J structures, there is a 21 . , ) 1 a = 0 SU(3) -structure metrics. (Indeed, structures, there are, to our (ln 2 d -structure solution with torsion is the Kähler form derived from ,A ], and Donaldson’s approach [ = 2 SU(3) J 1 SU(3) 2 1 4 ). These choices lead, from ( a SU(3) -structure metrics of any kind in the W = 3.6 1 = 2 5 ) where SU(3) ,A . 3.6 2 2 ,W | | – 23 – a 4 p σ X | |∇ = 0 3 . In addition, the authors take 4 =0 3 1 4 π a X W P = = ]. For this solution, the authors of [ = 1 σ a 2 21 W restricted to the quintic CY threefold, rather than the more ] can rest on Yau’s theorem [ = 4 1 13 P ]. In the case of more general where , ) shows that these choices indeed lead to a solution to the torsion is the defining relation of the quintic hypersurface and the 9 W 33 p , 3.4 4 -structure metric. In developing NN’s to describe -structure metrics is viable, we will, in the next sub-section, show that the techniques To combat this issue, we will show that the numerical results that we will present in SU(3) Comparison with ( class constraints thattheory. arise To from show that considering theSU(3) methodology the being Strominger proposed in systembeing this of implemented section can for heterotic machine correctly string learning reproduce this known solution. Study Kähler formgeneral of algebraic Kähler potentialstorsion considered classes in ( In this expression, the homogeneous coordinates on on the quintic CYexpressions threefold for [ the functionsthe appearing Fubini-Study in Kähler ( potential: can use certain resultsKähler potential pertaining [ toknowledge, balanced no metrics such and existence theorems the available. algebraic ansatz forthe next the sub-section approximate an explicitly known theorems guarantee that athat solution the to numerical the approximationsthe system that system, exists. are rather being than Thissystem just obtained which is being admits are no metrics important close exact which solution. as toan approximate In it energy full the particular, functional shows solutions desired the [ method properties to utilizing in extremization of a one last issue thatthe we should address beforequestion moving as on to to the howconstructing subject to Ricci-flat evaluate of the directly metrics trustworthiness learning onover of CY the those manifolds results. aimed benefit Numerical at from methods producing several for more notable general advantages structures. One of these is that existence Running a full analysis ofattempt an at ansatz using of machine thewe learning type believe techniques described this to above is learn isphysics the too literature). first complex As work for such, on awe instead numerical first will of defer providing providing an explicit sample example computational in results this until sub-section, the next. However, there is 3.1.1 An example JHEP05(2021)013 . ) ). 3.6 = 0 3.15 struc- 2 SU(3) W . Such a = J 1 SU(3) ) and ( W 3.6 ) could be used 3.13 . This is important, ) that the ansatz ( 3.9 by using ( -structure metrics, we will 3.1.1 ) and ( Ω ) analytically. Nevertheless, ) and we have 3.7 3.12 SU(3) 3.3 and attempt to learn Ω structure structure, as we did in the last section, SU(3) is fixed by ( SU(3) ) with its complex conjugate, would also have to 1 a – 24 – Ω ) and simple contributions imposing the algebraic (ln ] and repeated in section d ) for an 21 2.21 = 2 3.6 5 for the CY case, one could try and learn the Kähler form W structure directly structure directly rather than leaning on an ansatz. Such an structure if we choose to solve ( 2.7 were nowhere vanishing, perhaps by constraining the eigenvalues Ω SU(3) SU(3) SU(3) and J ) can be implemented in a trivial manner. Contributions to the loss function structure obtained in [ 3.1 structure in what follows, we will specify ) we have provided relies on the existence of known nowhere vanishing forms on and the real and imaginary parts of the components of the complex three-form 3.6 SU(3) J at each point and the contraction of In more detail, we fix a three-form for the To illustrate this approach to numerically determining An important advantage to such an approach to numerically determining One possible strategy would be to take as inputs to a NN the real and imaginary SU(3) at those points. The global well definedness of the forms could be imposed using loss J of an since reproducing a known solution gives usgiven more the confidence lack in the of methods existence being theorems espoused, in this setting. In that case, the torsion class approach would take isat rather this clear stage given tothe the present a above concrete discussion, result. andsimplification Rather it has than is two attempting perhaps benefits. toSecond, useful in learn First, doing both so it forms we makes of will this show that initial we foray can guide into the such system work to simpler. learn the known example reproducing the torsion classes necessaryapproach for such a as given that typeto of being string obtain discussed compactification. any here, In pattern analogueson an of of the torsion ( manifold classes under desired, consideration. assuming that such aprovide pattern an is explicit possible example rather than outlining general formalism. The form a general tures for string compacitificationsconstraints would by be appropriate the choices abilitystructures to of ‘choose’ loss that any functions. have set been Ininvolved of and the torsion applied then analytic class to computes approachesproblem date, the to in torsion one classes that first that there makes can is an be no ansatz achieved. guarantee for This that the is a geometry a given shooting choice of ansatz may be capable of conditions ( guaranteeing that of be included. parts of points onform some algebraic variety, and asΩ outputs the components offunctions the contributions real similar two- to ( approach, while much more ambitious,ansatz clearly ( has potential advantages.the For space example, the on whichtechniques it can be is applied. defined In —is addition, restricting constrained we can in the see the possible from forms equation manifolds of ( to torsion which classes analogous it can give rise to. Starting from an ansatz suchhas as many ( advantages. Thealso resulting automatically structure an is automaticallyjust globally as well considered defined. in It section (and is threeform) of an 3.2 Learning the JHEP05(2021)013 (3.18) (3.19) (3.20) (3.17) structure ap- and the form loss function to J norm for the losses SU(3) 1 SU(3) error measure, i.e. the . In contrast, the metric η .

000 0 i p

40 n , ¯ Ω

) which also remains unchanged 3 n ∧ J . || J i ≈ ) J J Ω Ω ∧ 2.21 i FS ∧ J 4 ∧ . In general, we are not guaranteed to 3 g 1 ( ∧ J. Ω a − = 0 J ∧ 1 ln 3 4

4 3 d SU(3) ). i structure η W W pts h − =1 (so that we are using the L i of the contribution to the X – 25 – N = 2.18 1 + shows how the losses change over the course of 1 dJ

γ , i.e. an improvement of 5 orders of magnitude. pts 7 = 1 1 || dJ 2 SU(3) = . N n = 1 ), given a different name as we are no longer searching 4 = ). In order to try and reproduce the known solution of i W 0 i ≈ SU(3) L 3.4 ) 2.15 L . We also leave all other hyperparameters unchanged; in NN SU(3) and indeed we could leave this torsion class as an output of the g η via ( ( h , given the nowhere vanishing nature of the expression being used 5 = 10 = 0 J W 3 SU(3) ψ 1 2 η W h = to be one, and set -structure metric, we get ). For our torsion classes, the condition simply becomes i 4 γ ). Note that given the index structure we will impose on W 3.2 , we will look for a case where . We then have a complete specification of the torsion classes desired and can 3.1 SU(3) 3.1.1 3.1.1 being taken, the second of the constraints in that equation are automatic. This is . We also need to impose the transition loss ( We will use the same example as for the CY metrics in earlier sections, i.e. the quintic We will need several contributions to the loss function. First we implement a loss of Ω Ω We find that if we settion the to NN the to zero, i.e.after use training the gives FS metric as the lowest order approxima- training. As ato measure the for Fubini-Study how metric, muchdeparture we the from compute metric the improves Monge-Ampere thethe during equation equivalent test training of averaged set: as over the all compared points on the manifold in with one parameter particular, we choose the weightbe factor 10, all other and not weighting outliersfrom disproportionately the strongly). Fubini-Study metric. We use Figure multiplicative boosting Hence, we will use the loss which closely resembles the Kähler loss ( vanishing condition on for from the CY case. Inuse order equation to ( impose the torsion class constraints discussed above, we can in order to imposepearing the in first ( ofof the algebraic conditionsthe defining same the loss asfor appeared a in Ricci-flat ( metric. It is a useful fact that this loss function also enforces the nowhere section attempt to learn the two form of the the form this also fixes section find a solution with NN rather than specifyingwill its allow value. us However to in the verify case the at validity of hand, requiring our its techniques vanishing by recovering the known solution of If we wish to look for torsion classes compatible with the Strominger-Hull system, then JHEP05(2021)013 , Ω known (3.21) SU(3) E ). In addition, 50 48 structure. It does 3.17 46 is the known solution respectively. Thus we , we find that 44 42 017 SU(3) = 1 . 40 Losses known 0 Monge-Amp`ere K¨ahler Overlap Total g -structure example. n 38 36 n and some || 34 SU(3) 59 32 . for the output of the NN. Moreover, ), then it must be linear. Additional 0 30 ) we choose the transformation that known . In fact, some caution is required here g 28 2.4 025 − 26 . 3.21 0 24 epoch 3.1.1 . If we choose 22 20 numeric – 26 – g 18 || 3.1.1 Average loss per epoch 16 structures on six-manifolds. This gives us confidence = 14 12 10 known SU(3) 8 E ) is closely related to the loss function ( makes the improvement even more notable — showing that then we obtain values of 6 n 4 is the output of our trained NN and 3.20 = 2 2 n 0 . Change in loss during training for the numeric g 3 2 1 0

for the Fubini-Study metric to 10 10 10 10 loss . In order show that our numerics are approaching this known solution we 511 . Figure 7 0 3.1.1 One can imagine many long term goals of the approach to obtaining explicit Proceeding in this manner, we obtain a measure of how accurately our NN is repro- The error measure ( butions to the loss function designed to ensure that there are no small cycles anywhere in find that the machinereproducing learning known techniques results described for inthat this such techniques section can are be indeed useful capable in of this arenastructure going metrics forward. discussed in this section. For example, one could in principle add contri- ducing the analyticgoes solution from of section choosing higher values of the numerical result has fewerexample, outlying if regions we that choose are far from the desired solution. For which is the same for thethe numerical set and of exact solutions. possiblethat Imposing coordinate these must transformation two provides be constraints on usminimizes considered, with its and value. a small in list considering of ( possibilities computed from the quantities givenas in even section if thebe numerical related results by a were non-trivial approachingis coordinate the to transformation. preserve analytic the If expression, such formconstraints the a of are coordinate two the placed transformation could quintic upon ( this transformation by the requirement that it preserve not demonstrate that wesection are correctly approximatingwish the to analytic consider example an error described measure in of the following form. In this expression this quantity is only a measure of how close we are to JHEP05(2021)013 5 ) by 7 -structure. ). We also 4 SU(3) structure solutions with non- SU(3) ]. 21 -structure manifolds. Comparing to existing – 27 – SU(3) -holonomy and SU(3) structure. ) n ). This latter approach is a crucial step away from past approaches (which were, 6 SU( reproducing the known, exact results of [ observe that the loss canout be with, kept while at a atmagnitude. small the order We same hence compared consider time to these the what solutionsflat as Monge-Ampère-loss we metrics. non-trivial have is approximations started This for changed Ricci- allows by usWe an to demonstrate order also for search of for the general first solutions with time that NNs can find such solutions (figure and by construction, tied to Kählerfor geometry) and the first to beOne generalizable additional to difficulty metrics arises foron our the directly learned overlap metrics, and namely for that the the loss Kähler condition is non-vanishing. Pragmatically, we find that our metrics generalizegeneral well the beyond runtime the of all range networkson they is standard have very been desktop reasonable trained and CPU our on. or results GPU In can systems. We be have obtained presented themetric: viability of 1) learning two the distinct Kähler approaches potential to and 2) approximating directly a learning CY the metric (figures We have demonstrated that MLthe is case a of viable approachmethods, to we finding find Ricci-flat metrics thatalgebraic in networks metrics with outperform relatively Donaldson’srespect algorithm few in to dense terms layers the of converging achieved efficiency to accuracy (i.e. the with given a certain runtime, cf. figure The key results of this work include the following: • • tively study moduli dependencethe of Donaldson CY algorithm metrics formoduli (something example, space) very and which difficult importantly, isby to to approximating formulated achieve move Ricci-flat with at away but from a non-Kähler metrics the single for complex/Kähler point manifolds regime in of entirely, special the structure. CY or surpassing those obtainedminimization, from while methods at such as thependence. the same Donaldson In time algorithm particular, naturallycertain and the including quantitative energy and techniques complex qualitative presented structure improvementssignificant in on moduli qualitative the the de- advances prior previous being state sections that of provide the machine both art. learning techniques The most allow us to effec- CY geometries play an importantno role explicit, in analytic string CYin compactifications. metrics a However, are the wide fact known rangetions that has of has formed physical been applications. a long-standing. substantialML As In barrier can a this to serve result, work, progress as the we an need have important demonstrated for addition that numerical the to approxima- techniques this of literature, producing results on par with trivial torsion to date).the scope Even of the the mostto current basic paper, future implementation work. however, and of we this leave approach the is exploration beyond of4 such possibilities Conclusions and future directions the target geometry (an issue common to all known JHEP05(2021)013 ]. ]). ]. 17 36 35 , 34 , 11 , 5 given an explicit -structure metrics, or indeed to any special rather than just non-vanishing components due – 28 – fibration according to the SYZ conjecture [ 3 H SU(3) T -structure solutions, our methods have a potentially im- . This is in contrast to most other available methods of gen- SU(3) ). 3 -structure solutions, which often fix the torsion classes. This flexibility holonomy/structure could potentially provide interesting new classes of 2 G and SU(3) 2 structure studied here as a proof of concept that ML methods are capable ]). These masses, together with their moduli dependence, play an important role 8 are optimizing all components of portant flexibility in thatchoice of it torsion is classes possibleerating to approximate acould prove metric useful in applications within string model building. moduli dependence of CYrithm metrics. to In obtain expressions particular, for we themoduli have CY space applied metric at and Donaldson’s different algo- then points(figures trained in complex a structure NN to learn fromWithin that the the context CS of moduli dependence We have demonstrated that ML can shine light on previously difficult to determine SU(3) Finally, our primary goal in beginning this study was the hope that these tools will It is clear that are a number of natural and very related geometric applications for these There are many possible directions in which this work could be extended or applied • • = 6 be of use in applicationsAs to mentioned string previously, phenomenology canonically andparticle the normalized masses/excitations study in kinetic of string terms vacua, the(see and string are e.g. for swampland. [ needed this the to explicitin determine metric the must recent be discussion known of the string swampland, especially in the distance conjecture [ (something that has already beenLastly, we attempted could via use the the approximate Donaldson metricsstructure. Algorithm generated here [ This to could probe theoretically include expectedthe decompositions case of of the elliptic metricable or into to K3 fiber/base see fibrations, components that or in any in CY the manifold large is complex a structure limit one should be tools and the approximateinvolve CY additional metrics geometric we data have in generated.cial the sub-cycles form Many (including of Special string slope-stable Lagrangian compactifications subvarieties vector ofhave bundles, developed CY here 3-folds). fluxes, could or The readily spe- techniques be we ample extended the to associated learn these Hermitian-Yang-Mills connection associated on structures a — for slope ex- poly-stable vector bundle tigations into more generalstructure classes manifold. of Asmanifolds with one particular example,examples, where applying existing ML examples/constructions techniques are scarce. to metrics for k to symmetry constraints (whichthat have algorithms been can actually heavily finishwith employed in in finite previous time). workof In to producing a ensure related non-Kähler spirit, results. we view Clearly, the it metrics would be of interest to continue such inves- in the future. Beginningextended with to CY more metrics, generalfocused it on algebraic is the varieties. clear quintic one thataccommodate parameter For our the hypersurfaces. concreteness additional approach complexity However, in could of ourambient complete be the architectures intersection spaces. can readily present manifolds easily in work, Towards more this general we end, we find it encouraging that our algebraic metrics for JHEP05(2021)013 . , i for Ge- 6= (A.1) j = 0  ) i with ( . ~z j z ψ 2.7 p complex numbers . These numbers, , we go to a patch n X +1 n ), we have in the affine P , SCGP programs on = 3 n LA and JG are supported in . and . We then solve 5 ) i  ( j z = 0 (0) i is to first sample has a solution for the last coordinate. z ψ  and X of the ambient space, uniquely define a -dimensional variety 4 ) =1 n i i i X ( ) = 0 i U ) z of the ambient space i ( , and we solve for a coordinate Neural Networks and Data Science Revolution i ~z ( U – 29 – ψ = 1 + = 1 p and i  z (0) ~z  0 p where +1 n P , where we have skipped over  String Theory, Geometry and String Model Building ) . Depending on the manifold, one may have to restrict the sampling of the +1 i ( n that X Recent Developments in Strings and Gravity 2019. 0 U , . . . , z For example, in the case of the Fermat quintic ( Recall that to obtain an affine patch of an to obtain affine coordinates in patch ) of the ambient space i ) ( 0 i i ( j z patch point on initial coordinates so thatNote the that, equation if the definingbe polynomial able is symmetric to under use coordinatepoints the permutation, on coordinates one other may generated patches. for a point on one patch to immediately obtain U Thus, a basic approach for generating points on z along with information specifying the chart A.1 Sampling by solvingWe for use the the dependent followingutilizing coordinate one method network to per generate coordinate patch, training which data are presented for in our section metric neural networks This appendix summarizesmarizes known our results conventions. from We thehypersurface. start We literature by then discussing present about a two samplingcan simple methods example lead and for on to why sampling sum- a restricting point points tois sample the on implemented. with CY the a hypersurface non-flat prior and discuss how our re-weighting of points shop on part by NSF grant PHY-2014086. NR is supported by NSFA grant PHY-1720321. Sampling and Dieter Lüst forworkshops helpful in discussions. recent years We whichresearch are and allowed very to us present grateful to preliminary to discussprogram results in on various of this various conferences work. combinations and about Inometry particular and this this Physics includes: of Hitchin MITP String Systems Phenomenology conferences in 2018 and 2019, DLAP 2019 in Kyoto and Corfu work- moduli stabilization. We hope to turn to some of theseAcknowledgments open questions in future work. We would like to thank Chris Beasley, Michael Douglas, Koji Hashimoto, Andre Lukas Finally, the moduli dependence of the metric will also play a vital role in the quest for JHEP05(2021)013 using (A.2) X 1.0 and throw . This type 8 1] , 1 0.5 − [ × 1] , 1 0.0 − [ and sampled with a flat prior iϕ . : -0.5 5  re  ) using the example of the unit disk. (0) 3 = (0) i z , z iy right  (0) 2 -1.0 + 1.0 0.5 0.0 3 -0.5 -1.0 =1 ) and how the induced measure for sampling , z i X x (0) 1 left − z 1  – 30 – 1.0 − ], and we comment below on the implementation. 5 6 v u u t given = 0.5 (0) 4 (0) 4 z z 0.0 , one would get only points inside the unit disk. However, as shown ] (with all but one affine coordinate fixed). A fast method to do this is π 2 -0.5 , . [0 X ) = 0 ~z ∈ ( ψ ϕ on the right, the induced measure on the disk is not flat. In our case, the way p . Illustration of rejection sampling ( , -1.0 8 . So if one used spherical coordinates , one can then solve for 0.5 0.0 1.0 1] -0.5 -1.0 ψ 4 , The crucial step is to find the solutions of the single-variable complex polynomial [0 is j ∈ weights of each point) is explained in [ A.3 Homogeneous sampling inFor projective all space other network discussedintersections with in a this line, paper, which we involves sample the points following steps: on the manifold of rejection sampling works inlarger our case as well, butr it is extremely ineffective,in especially figure at to correct the auxiliary measure to account for this sample bias (i.e. how to compute the A.2 Illustration of rejectionTo sampling illustrate that the measureconsider points when on restricting the to unitdisk the disk. would CY One be hypersurface way of is toaway getting non-flat, those just a points let randomly flat that us sample do distribution not of points lie points in inside inside the the the disk, interval cf. the left-hand-side of figure five points on equation by computing the eigenvalues of the polynomial’s companion matrix. Since there are in general five fifth roots, one gets for each choice of initial complex values Figure 8 spherical coordinates with a flat prior can be non-flat ( If JHEP05(2021)013 +2) + 2 n (A.3) n 2( S 1.0 Real Imag . 0.5 ], making it more = 1 | 0 37 z 1 z 0.0 : | ≤ | t 2 . This can either be done z | X , | 1 line intersection z | 0.5 , 1.0 ) = 0 defined by b ~ t 0 by first sampling real numbers from , thereby defining a complex line. ˆ U + 1.0 +1 +1 ~a n ( n – 31 – P ψ Real Imag P , generated using either sampling algorithm introduced p ∈ , for example by finding the eigenvalues of the poly- b ~ = 0 t 0.5 2 ~a, z 1 z 0 z 1 z 0.0 + 10 3 2 z . . + 9 b ~ t 0.5 3 1 z is the defining of + + ) ~a 3 0 solve for dependent coordinate solve for ~z z ( = ψ 1.0 ~z p . The values all lie in the affine patch 0 5 5 0 0 ...... Scatter-plot comparing real and imaginary parts of the affine coordinates of the variety

A

1 0 0 0 1

2 z points nomial’s companion matrix or simply numerically. where manually given a specific definingtions. equation, or This using was a done libraryeasily for for the extendable symbolic manipula- implementation to here other using defining SymPy equations) [ defined by Solve the defining equation for Due to the multiplicity of roots, each chosen line intersects the manifold in Uniformly sample two points Compute the following polynomial in the complex variable Since the line is chosen uniformly in projective space, this sampling algorithm leads to A specific example of the difference between the two sampling algorithms defined above One can uniformly sample points on 2 3. 4. 1. 2. P points on the manifold thatto are the not Fubini-Study metric uniform on to the its ambient volume space. form, butcan uniform be with found respect in figure and combining them into complexare numbers multiple representing algorithms homogeneous for coordinates. samplingdently points There sample on coordinates a from real sphere; a an normal efficient distribution one and is then to divide indepen- by their norm. Figure 9 in in section JHEP05(2021)013 . 1 ] 4 (B.6) (B.3) (B.4) (B.5) (B.1) (B.2) can be

) , . . . , z ~z 0 ( z . A basis of [ ψ i p z C

i : of ) Q k } ψ ( 4 X + , z vanishes, which means O 3 5 i z ψ , z i p . 2 P , z , the basis has to be reduced 1 +2) n X , z . j . ( ) = 0 ∀ − ~z , z ! ( k { i

ψ 1 5 ) p ~z +1 − − ( n = 0 3 ψ k k polynomial) must be removed to obtain z p .  

3 2 i , and z z / 1] , . . . z + 2 − , 0 k 0 i ] z 3 n z = 3 ! Y , , given by all homogeneous defined h ) 0 +1 n ψ – 32 – , n k C ) = , with the basis ( k + 4 ) [1 ~z + the defining polynomial, ( ~z ψ +1 k 5 i ( s n p = 6 z ψ X , . . . z P

p 0 k (a degree i z O [ = X = ψ C . k p

  4 ) N j ~z z ( ψ p ,...,

= 0 j , 5 0 z equations can be used to eliminate one monomial. One choice is to remove j z 5 . The reason is that on corresponds to the row vector + 2 vanishes, the following relations are generated is then given by multiplying n ψ (5) i p 3 ) ≥ P So far, the discussion of how the reduced is obtained was on a mathe- To make this clearer, consider ~z k ( O ψ p of Each of these all monomials matical level. In practice,example, the the sections monomial can be represented using a matrix of . For Since The second term isder precisely pullback. the (We number follow ofis the sections zero). convention that that become a linearly dependent coefficient un- with negativeh entries We get the following expression for the number of basis sections of Another perspective on this isrewritten that to each express linearly one independent of polynomial the in constituent monomials in terms of the remaining monomials. that all polynomials containing a basis. Formally, the basis is defined as where B.1 Constructing the monomialWhen basis the basis of thein line the bundle homogeneous projective coordinates,for is restricted to B Algebraic metrics and Donaldson’s algorithm JHEP05(2021)013 (B.8) (B.7) could be 0] , . [2 i ]. To test our ~z = 27 ~z

and for this choice (e.g. ¯ Ω 2] 3 , ∧ ]. The algorithm relies J X ). This is described in [0 Ω k 12 , i on 7 w . p , ¯ β 6 for the implementation. i (0) α X A H p 1 N ,...,N ] an approximation scheme for Ricci- ]. The algorithm finds the balanced depends on 4 can be written as a matrix with each → ], so we will just outline the different = 1 k matrix 12 3 i , N J 12 , each summand corresponds to a row in 7 k (2) , , 2 i 1 1 . X N ~z z P 7 . The (multi-) degrees fix a direction in the of the line bundle which restrict non-trivially    Z , of the induced distribution of sampled points × O 6 k + k p 2 0 2 0 1 1 0 2 z – 33 – N    → ∞ , ) and uses numerical integration paired with an k i ~z ) = ,...,N w ~z ,...,N ( i p = 1 X of an ample line bundle to work with. The approximation = 1 p α . Both the generation of the monomial basis on projective 1 i , k N , X in question (where α i s w → ], Donaldson presented in [ X ¯ Ω 20 having an embedding into projective spaces (whose homogeneous ∧ X Ω X Z . B.1 . (These are not drawn from a flat prior even though the ambient points were.) X In terms of these weights, the numerical integration reduces to by intersecting a lineambient defined space by with two the randomly, CY uniformly manifold). distributed See points appendix in the on to the CY manifold section error was proven to goPicard to lattice zero dual as to the Kähler cone. Choose a random initial Hermitian Compute the weights Find a basis of sections Fix a complex structure and find points Choose a (multi-) degree The algorithm is described in detail in [ 5. 4. 2. 3. 1. implementations, we compared themetric results as with follows: [ on the CY manifold coordinates we denote collectivelyiteration procedure by to approximate the Ricci-flat metric. steps. We have implemented the algorithm in Mathematica and JAX [ B.2 Donaldson’s algorithm Extending work of Tian [ flat CY metrics, which lends itselfmethod to was numerical adopted in implementation the on physics a literature computer. soon afterwards Indeed, [ the Given a defining equation suchthe as matrix. Solving forto either deleting summand a and row in removingremoved the it to matrix. from obtain For the a the basis basisspace, current on thus example, and corresponds either the of reductionallows the given defining equation a to defining be replaced polynomial without adding can significant implementation be work. done algorithmically. This row representing a monomial: The full basis of monomials of, for example, JHEP05(2021)013 is ) ` ( (B.9) (B.11) (B.12) (B.13) (B.10) H is a sim- 1 and , this is just , we can think +2) +1) ` d X ( ( × H +2) Algorithm d ( 1 . = ) ) i i ~z H ~z , we need to perform two more ( ( ¯ δ ¯ β .  ¯ s CY . In practice around 10–20 steps ¯ ) s β ¯ δ ) † ` g )¯ s ` ( γ i ( ¯ C β , and ~z H · ( α ~z H ) α i H s ~z = ≈ , α CY i ( ˆ s g γ s K, µ a w ·  s b , i +1) i ∂ ∂z ` ln ∂x C ( a P P . ∂ = 1 H = ~z 1 πk i – 34 – ) = k = 2 k w g µa . It should be noted that this can be computed in (ˆ ab N = C ∗ ˆ g X P in an affine patch but prior to pullback is given by i K ˆ g = = ) ¯ β ` ( α CY g ˜ H and return to the previous step. Alternatively, sample new affine coordinates as being (implicit) functions of the others. metric 1 ) −  m + 3 ) ` . ( 6 m ˜ − H (3 +  10 × = ) pullback map is given by m ) are local coordinates on +1) m ` ( µ x coordinates that are to be eliminated. The pulled back metric is then (3 + H (3 + m From this example, we seethe FS that Kähler for potential. smaller than points and re-calculate the weights and then go to step 6. are typically enough. Westeps terminate or the when procedure the either maximum after absolute a value certain of number the of difference of The sum over the points and the weights appear from the numerical integration. × of the remaining The Ricci-flat Kähler metric is given in terms of the Kähler potential Repeat until we reach a fixpoint, i.e. Set Compute Second, we need to pull back the metric computed from the Kähler potential, which is In order to arrive at an expression for the CY metric 3 9. 8. 7. 6. m Implementation of Donaldson’s algorithmplified in Python pseudocode pseudocode. which illustrates howis a computed. single iteration of Donaldson’s algorithm where the terms of derivatives of thefor defining the equations with no need to actually solve the equations of Since the the done by going to anthe affine largest patch. absolute values We to goWe unity to denote in the the order patch affine to where patch ensure we coordinates numerical scale stability by the of coordinates theproduced algorithm. with by the algorithm above, to the CY manifold. On the CY space The metric found in this fixpoint procedure is calledsteps. balanced. First, we need to account for the projective rescaling degrees of freedom. This is best JHEP05(2021)013 ‘ k ‘ degree . simplicity of for , points ) matrix ψ , vol_cy, count): now ‘- ψ H ‘ sample ‘ point variable the count ) this for ψ ‘ – 35 – single in a ¯ over β s ¯ β α operator We leave a more systematic study of this observation for networks in relation to their absolute value, we can identify two sum H T 7 α ¯ β s returns integral ¯ s ψ H α . The blue cluster, containing most of the components, corre- s (count): the the Carlo 10 ) and this s k range new_H Monte in a = conj( = compute_monomials(z, patch, k) Simplified algorithm for computing a single iteration of Donaldson’s algorithm. pretend i . ¯ T = T + dT / count dT = numerator / denominator * weight denominator = numerator = s s weight = mc_weight(variety, z, patch, z, patch = to_affine(z) z = line_sample(variety, # accumulate by Approximate donaldson_step(variety, H, k, pows, return new_H = invert(T).transpose() T = T * basis_size(variety, k) / vol_cy for T = zeros_like(H) # # # def ] for work on detecting symmetries in theoretical high-energy settings using NNs. 9 8 7 6 5 4 3 2 1 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 38 matrix with and without using data obtained using Donaldson’s algorithm. H See [ 7 Algorithm 1 C Details for training In this appendix, we provide more detailsthe on the experiments we have performed for learning polynomials under the discretethe symmetries numerical as approximation it is should.metries valid. in This Conversely, the provides it Kähler a can potential. the cross-check serve future. that to detect underlying sym- When evaluating the relative standardfor deviation multiple over values iterations of ofclusters Donaldson’s as shown algorithm in figure sponds to elements which areThe essentially vanishing orange and the cluster fluctuations has areThe small relatively number large. fluctuations of but vertical includes lines several in elements the which orange are cluster small. matches with the number of invariant B.2.1 Finding equivariant elements via Donaldson’s algorithm JHEP05(2021)013 has 1 0 p 0 0 would 1 6 , ): close- 1 1 4 = 2 ), which is 0 | 1 6 Hence, even k = 9 H 2.14 5 | k 2 4 1 4 to the input. However, Bottom row 0 0 1 ψ 1 ): orange clusters correspond as defined in ( 1 σ 0 1 1 0 0 independent (real and imaginary) 4 1 in the specified range. Top row 4 2 k , 1 ψ 5 N .( 1 | 0 5 1 5 = 8 and H | k 1 = 10 0 3 0 = 3 ψ 1 0 . The input is (the real part, imaginary part, 1 k – 36 – as features. for 1 | ψ | H gets larger. | , and ) ψ | ψ ( 0 2 1 8 , Im randomly from a flat prior, it will already have roughly zero mean), which and the output are the ) , 0 4 ψ ψ had zero error, the numerical error dictated by using 1 ( ψ 0 0 | 4 8 0 for the quintic with H 1 = as a measure for precision. In our experiment, we trained the network H | gets larger as k 39 . 3 σ 8 σ 0 . 0 1 H and 14 . Clustering of elements in . , positive powers tend to produce rather large features. So one should either normalize them to 0 ψ 1 1 when using 0 0 1 2 As explained above, we choose the patch where we set the largest absolute value of the . 1

We ran experiments where we added (the real and imaginary part of) powers of We observe that

0.010 0.025 0.020 0.015 0

9 8

r o t a r e p o - T f o s n o i t a r e t i r e v o ) H ( d t s l e r for large unit variance (since we draw is problematic if onechoose wants a to branch extrapolate or beyondsmall, include the we all training ended branches set. up as For using features. fractional Re powers, Since one the will observed have accuracy to improvements are rather between if the NN computing be with 90 percent of the grid points and evaluated on the remaining 10 percent. We train with (leaky) ReLU activation,and cf. absolute table value of) components of coordinates to 1 andthe solve largest implicitly absolute for value. the With these coordinate results, for we which compute the derivative of C.1 Supervised trainingIn with designing Donaldson’s and algorithm training theparameter tuning NN, and we does found not thatwe require chose the complicated a result network simple is architectures. feed-forward not NN For very this with sensitive paper, 3 to hidden hyper- layers of dimensions 100, 2000 and 2000 Figure 10 to vanishing elements,up blue view clusters of correspond theequivariant to components. blue non-vanishing clusters. values. The ( number of vertical lines is in close relation to the number of JHEP05(2021)013 . . k H H and (C.1) ψ weight 2 ). Both losses 2 ] for degrees up k 9 , the Ricci-based -dependence of C.1 N 6 ψ + ≤ 2 k k – ψ N 400 × 101 000 1 001 000 measure at each degree σ 1000 , 2 Number of Parameters | ) , consider the case of fixed a z H ( is one based on minimizing the Ricci R . | , defined as the one that minimizes the ) ψ a – H 2.4 z is the associated weight. Because this loss ( ) identity w a Activation z leaky ReLU leaky ReLU leaky ReLU – 37 – ( =1 M . This is precisely the situation that was explored a X w k 1 M . This establishes the basic stochastic gradient descent = ψ 0 2 R k 3 by least-squares, we use multiple steps of gradient descent, N 100 for a given value of 1000 1000 . This takes less than a minute and is orders of magnitude accuracies achieved by a similar gradient descent setup as in the H H 001 σ . 0 Number of Nodes by minimizing the Ricci loss by minimizing the Monge-Ampère loss -dependent networks that output H H ψ shows the and several values of input Layer output hidden 3 hidden 2 hidden 1 11 is the number of points and = 6 . Neural network architecture for the neural network that learns the k M ]. As a first step towards ML, we have repeated the optimization using stochastic Figure 9 . We now want to find the optimal matrix Table 1 loss converged within tens of minutes. previous section, using instead theshould Ricci-scalar have loss the defined same inNo global equation exhaustive ( minimum parameter with search respect was to conducted the to obtain optimal convergence in either where depends on the Kählerto potential in compute its than fourth thematrix derivative, Monge-Ampère it converged loss is within applied significantly minutes more in for expensive the the Monge-Ampère rest of loss with the paper. Where the C.3 Learning The second type ofcurvature. loss Here we introduced use in the section Ricci scalar as a loss function sample of points fordecreased, each batch while has avoiding over-fitting. the advantagedegree We that have replicated the the number results ofsetup in points that [ used will can be be used for the more complex models. Monge-Ampère loss at the givenin degree [ gradient descent. Thethe main manifold difference is and that finding each instead time of computed picking over a a large random set batch of of points fewer on points. Choosing a different random faster than re-computing C.2 Learning Before training k the NN for 200decay epochs with with parameter stochastic gradient descent, ADAM optimizer and L JHEP05(2021)013 . 6 H 0 = (C.2)

, n o 5 . s ] d i l p a ψ n [ ) on the right. o 4 D C.1 5 , Im ] 1 i k p ψ [ = 3 Ricci-loss . i p | to the Hermitian matrix ψ , | ψ 2 2 matrix using gradient descent with | c H − 5 ) a z 1 ( = R | ) a that we want to optimize over), each with a 6 0 z ( ψ = w – 38 –

, =1 M 5 n a X o 5 = s 1 d l M a n = o c 4 D . dependence ) Gradient descent optimization for descent optimization Gradient R s s ψ o ( ψ k l - 0 to a set of real input features. We have tried several possibilities: = ψ 3 and arg | ψ 2 | -accuracies achieved by an optimization of the σ . networks with 1 denotes the target curvature. c H Introduce an additional array of powers andRaise compute to a power and split into real and imaginary parts, Re Split into 1 2 Following the choice of input features, we add some number of dense layers (how many 0 0 • • • 1 1 are best seems to depend on the range of We now want to findSince a within the network network that we describes wantmap to a the work map with complex real from value numbers, we first have to choose how to where C.4 complicated loss functions, dependingto on its higher higher derivatives complexity, we ofit have the not has Kähler pursued the the potential. advantage Ricci thatnot loss Due it further further can investigated in be here: this extended work. to However, the case of constant Ricci scalar, which is respect to the Monge-Ampère loss on the left, and thecase, Ricci so loss of the equation results ( approximations should of be the understood flatMonge-Ampère as metric. loss. showing This both Convergence shows losses is that are gradient similar descent feasible is to and in the principle lead one also to possible achieved for using more the Figure 11 JHEP05(2021)013 0 (C.3) (C.4) = 20.0 , 2 1 trained over = k

t 17.5 C.5 a

n o s d l a , n 15.0 o † D LL = 12.5 ,H     ˆ i H, | ) k iH ˜ | H 10.0 d N ( + σ H r = – 39 – H . . out . 7.5 H DenseModel-3 d 1 0 accuracies achieved by Donaldson’s algorithm over real values H σ 5.0 6     = = k

n L using the MA loss. The shaded area points out the extrapolation of the o s d 10 l 2.5 a < n matrix directly, we also used a parametrization via the following Cholesky o | D ψ H | DenseModel-1 . Accuracies achieved by two network architectures detailed in section 0.0 denotes the sigmoid-function. This makes it more stable for gradient descent to set σ 1 2 are shown. The error band indicates the minimal and maximal values measured over multiple During our experiments we observed that it is beneficial to multiply the final output 0 0 ψ 1 1 this leads to slightly more stable gradient descentC.5 training. Network architectures The following is aresults brief in summary the above of figures. different network architectures used to produce the where now the diagonalwhich entries may are lead positive. to non-definite This metrics prevents on negative or the zero manifold. eigenvalues, Our experiments indicate that where some output values tothe zero. Hermitian Besides parametrizingdecomposition: the real and imaginary parameters of trivial activation function maps from the last features toparameters the by required a number modulation of factor values. as in of complex angles. sigmoid activation function. This numberof of dense hidden layers layers. is referred In to order above as to the get number the right number of parameters, a final dense layer with Figure 12 uniform values networks. As a comparison, the JHEP05(2021)013 , σ ) ψ ( with i − raised to 10 matrix, we ψ and arg . The output , in our case | H k k ψ | N N ) need to be rather 2.23 . in ( i (two components), and of a λ = 10 ψ ψ = 10 ψ ) for . To construct the final H weight decays with parameter 2 2.7.1 L – 40 – and 6] , [3 , the input to the neural network consists of the real on a dataset with 50000 points, which we split according , each. ∈ k i 2.7 N = 10 with ψ i . This is followed by a single hidden layer of size − 2 / 10 1 This model is exactly the same as the above, exceptAs that input now features we we use take the real and imaginary parts of For the first model, we start with the input features , and 3 , 2 , 1 . This is motivated by the fact that due to our choice of symmetric manifold, we . We also tried different activation functions (leaky ReLU, GELU, ELU, Tanh) and . As explained in section 8] , 2 , no weight decay, leaky ReLU activation and ADAM optimizer. We chose a rather We found that the linear metric perturbation only improves the error measure 4 [4 = 205 is constructed using the decomposition of a Hermitian matrix into real and imaginary − 6 ∈ marginally as compared toloss the grow FS rapidly metric. for the Weto additive also the ansatz multiplicative if observed ansatz. we that do This thefine-tuned not means overlap actively in that and optimize the the for Kähler parameters former themresults in case. were contrast shown For in these the reasons main we text focus (cf. on section the multiplicative loss. The True/False encoding of whichand of as the a patch ambient coordinatecompare space the (5 coordinates CY components). metrics is at For usedto the train:test=90:10, sake for and of pulling we concreteness, back trainand we compute for the test and 20 loss epochs. and During stop training, earlier we if monitor they the start training to diverge (which does not happen). large batch size ofsample 900 (memory-wise, is this not is tootable not big). a We problem summarize since theand each architecture imaginary individual and training part the ofcomponents number the of (10 point parameters nodes), on in of the the quintic real expressed and in imaginary homogeneous part ambient of space optimizers (ADAM, Adagrad, SGD) as wellnodes as in varying the the number hidden ofnot hidden layers. significantly layers and We change the also our includedsmall, results. simple, dropout feedforward In or neural the batch net10 end, norm (without we layers, dropout got but or batch good this norm), did results with already learning for rate a rather D.1 NN withFor homogeneous this coordinate input input andtried output learning setup, rates we of havei performed some hyperparameter searches. We D Details on metricHere training we provide morehyperparameter details choices. about the training of the metric learning networks and our two hidden layers of dimension DenseModel-3. the powers H components. followed by a singleN hidden layer ofexpect relatively dimension few equal independentuse components to the of the Cholesky basis decomposition. size DenseModel-2. DenseModel-1. JHEP05(2021)013 ) ) ) . The 3 3 3 − − − 10 (instead 10 10 10 g < , , , ) as an input 2 | (0 (0 (0 − ψ ψ k k k | ) 10 N N N < , – , , , ) ) ) 10 0 (0 4 4 4 . We used ADAM 2 − − − 1 < k,b . d | 10 10 10 Initialization N – , , , ψ 1800 | = 0 10 100 10 100 101 (0 (0 (0 k k k < dJ N N N λ 0 Number of Parameters . The initialization is chosen such ) ) ) ) 6 6 6 4 , and as additional input to the NN in the 3 1 − − − − . ψ – 10 10 10 10 = 0 points for each patch and overlap region . – L2( L2( L2( L2( Regularization identity 5000 Activation leaky ReLU leaky ReLU leaky ReLU – 41 – overlap 50000 λ and train a second NN that interpolates points respectively in the range , ψ – – , reducing it when reaching a plateau. We trained our = 1 4 ReLU ReLU ReLU − ). Since providing 10000 Activation 10 2 MA d 17 λ 100 100 100 2.6.1 Number of Nodes 9 10 1000 1000 1000 ). We then readily compute the metric from this output. Each of these for different (fixed) g number nodes 2.25 input Layer output hidden 3 hidden 2 hidden 1 . Neural network architecture for each ‘patch’ neural network that approximates the . Neural network architecture for the neural network that approximates the CY metric input Layer as described in section output We have performed experiments with various architectures and different training ob- We have trained this network using Finally, we want to remark that these results do not change if we include hidden 1 hidden 2 hidden 3 H Table 2 directly by optimizingCY a metrics. loss function that combines the various consistency conditions for the network for 200 epochs with a batchsize of jectives. We havebetween varied multiplicative the and size additivements of metric on the corrections. different hidden complex We structure layers, have ranges. the also respective performed loss experi- weights, and how close the network is to the Fubini Studyand metric. validated the network using respective loss weights were with an initial learning rate of D.2 NN with affineThis coordinate class of input NNs (LDL hasof output one the NN metric for ( eachnetworks patch has and trainable the parameters outputthat is as the in shown the initial in LDL network table decomposition is close to the Fubini study metric. During training we monitor to the NN andlearn train the it metric for different complexof structures. A differenttraining approach process would worked be well, to we have not pursued this second option further. Table 3 CY metric. JHEP05(2021)013 ] 22 J. , ] ]. SPIRE ]. (2010) 107 Pure Appl. Math. IN arXiv:0805.3689 , Adv. Theor. Math. [ ][ 06 , . SPIRE Class. Quant. Grav. , IN hep-th/0606261 3 [ ][ JHEP K , (2008) 120 ]. 07 (1978) 339 3 Numerical solution to the hermitian Numerical Calabi-Yau metrics Calabi-Yau Metrics for Quotients and Eigenvalues and Eigenfunctions of the Numerical Hermitian Yang-Mills (2007) 083 arXiv:1103.3041 [ SPIRE JHEP 12 IN ]. , ][ arXiv:0712.3563 [ JHEP SPIRE Numerical Hermitian Yang-Mills Connections Vol. 2: 25th Anniversary , (2012) 014 IN – 42 – ]. ][ 01 . This guarantees that we do not require our neural  (2008) 080 SPIRE ]. IN hep-th/0612075 JHEP Numerical Ricci-flat metrics on 05 1 + [ , ][ ), which permits any use, distribution and reproduction in ]. Energy functionals for Calabi-Yau metrics Commun. Pure Appl. Math. , JHEP ]. , SPIRE arXiv:0908.2635 IN [ ][ math/0512625 CC-BY 4.0 Some numerical results in complex differential geometry SPIRE (2008) 032302 ,[ IN This article is distributed under the terms of the Creative Commons hep-th/0506129 49 ][ [ On the Ricci curvature of a compact Kähler manifold and the complex (2013) 867 ]. ]. DOI , Cambridge Monographs on Mathematical Physics, Cambridge University Press 17 (2009) 571 2 SPIRE SPIRE arXiv:1004.4399 IN IN Connections and Vector Bundle Stability[ in Heterotic Theories and Kähler Cone Substructure Scalar Laplace Operator on Calabi-Yau[ Manifolds Phys. [ Math. Phys. Complete Intersections (2005) 4931 Q. Yang-Mills equation on the Fermat quintic Monge-Ampère equation I Edition (2012), [ Unlike in the NNs with homogeneous coordinate inputs, we observe that the overlap is V. Braun, T. Brelidze, M.R. Douglas and B.A. Ovrut, M. Headrick and A. Nassar, M.R. Douglas, R.L. Karp, S. Lukic and R. Reinbacher, V. Braun, T. Brelidze, M.R. Douglas and B.A. Ovrut, S.K. Donaldson, M.R. Douglas, R.L. Karp, S. Lukic and R. Reinbacher, M.B. Green, J.H. Schwarz and E. Witten, M. Headrick and T. Wiseman, S.T. Yau, L.B. Anderson, V. Braun, R.L. Karp and B.A. Ovrut, L.B. Anderson, V. Braun and B.A. Ovrut, [8] [9] [6] [7] [4] [5] [2] [3] [1] [10] [11] any medium, provided the original author(s) and source are credited. References of always dividing by thethan largest one, homogeneous we coordinate allow such valuesnetwork to that up make all to predictions values very are far smaller away from where itOpen is trained Access. to make predictionsAttribution on. License ( crucial. One possible explanation iswhich the do fact that not we have areonly to training one five share independent network any networks deals common withpoints property points used unlike from to in all evaluate the patches. theare homogeneous For overlap defined these case as experiments on follows. where we overlap, In choose we order the to slightly make relax the the patches numerical the networks coordinate prescription. Instead JHEP05(2021)013 , ]. 01 Nucl. , , , in 32 ]. SPIRE ]. ]. JHEP ]. IN (2003) 5 , Fortsch. [ ]. , (1986) 357 SPIRE SPIRE 652 SPIRE IN SUSY Breaking in , DOI [ SPIRE IN IN 178 [ IN [ arXiv:1412.6980v9 ][ Structure (2020) 1 , J. Diff. Geom. , 839 Algebraic geometry and SU(3) ]. (1986) 253 , in Nucl. Phys. B , Phys. Lett. B Complete Intersection Calabi-Yau , 274 ]. , H. Wallach et al. eds., Curran arXiv:2010.12581 Phys. Rept. arXiv:0906.3297 , , [ ]. arXiv:2006.02435 ]. ]. ]. , Numerical Calabi-Yau metrics from SPIRE IN , vol. 12 (1957), pp. 78–89 [ metrics SPIRE SPIRE ][ http://papers.neurips.cc/paper/9015-pytorch- SPIRE SPIRE 3 IN IN metrics from little string theory metrics ][ IN [ IN Nucl. Phys. B (2009) 007 K ][ ]. [ , 3 3 ][ – 43 – Machine Learning Calabi-Yau Metrics K K 09 ]. Calabi-Yau Manifolds and SPIRE IN SPIRE JHEP ][ (1988) 493 , IN . [ ]. A plethora of ]. arXiv:1910.08605 298 Adam: A Method for Stochastic Optimization [ hep-th/0605264 On the Geometry of the String Landscape and the Swampland arXiv:1912.01703 arXiv:2012.04797 [ JAX: composable transformations of Python+NumPy programs SPIRE arXiv:1912.11068 , SPIRE Numerical Metrics, Curvature Expansions and Calabi-Yau Manifolds [ IN [ IN ][ Pytorch: An imperative style, high-performance deep learning library TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Superstrings with Torsion (2007) 21 arXiv:1805.08499 Compactifications of the Heterotic Superstring Nucl. Phys. B Data science applications to string theory On kähler manifolds with vanishing canonical class [ (2020) 044 , (2020) 2000068 . arXiv:1603.04467 On a set of polarized Kähler metrics on algebraic manifolds ]. ]. 766 , 05 68 SPIRE SPIRE IN hep-th/0211118 IN [ NonKähler string backgrounds and their[ five torsion classes S. Wanderman-Milne, http://github.com/google/jax [ Associates, Inc. (2019) [ an-imperative-style-high-performance-deep-learning-library.pdf Systems topology, a symposium in honor of S. Lefschetz Manifolds Advances in Neural Information Processing Systems 32 (2019) 171 holomorphic networks Local String/F-Theory Models (1990) 99 Phys. B Phys. JHEP arXiv:1810.10540 C.M. Hull, G. Lopes Cardoso, G. Curio, G. Dall’Agata, D. Lüst, P. Manousselis and G. Zoupanos, D.P. Kingma and J. Ba, A. Strominger, M. Abadi et al., J. Bradbury, R. Frostig, P. Hawkins, M.J. Johnson, C. Leary, D. Maclaurin and P. Candelas, A.M. Dale, C.A. Lütken and R. Schimmrigk, A. Paszke et al., M. Larfors, A. Lukas and F. Ruehle, M.R. Douglas, S. Lakshminarasimhan and Y. Qi, E. Calabi, R. Blumenhagen, J.P. Conlon, S. Krippendorf, S. Moster and F. Quevedo, F. Ruehle, G. Tian, S. Kachru, A. Tripathy and M. Zimet, A. Tripathy and M. Zimet, H. Ooguri and C. Vafa, W. Cui and J. Gray, S. Kachru, A. Tripathy and M. Zimet, A. Ashmore, Y.-H. He and B.A. Ovrut, [30] [31] [28] [29] [26] [27] [24] [25] [21] [22] [23] [18] [19] [20] [15] [16] [17] [13] [14] [12] JHEP05(2021)013 . . 479 JHEP , (2005) 56 (2017) e103 (2005) 253 3 , 13 Nucl. Phys. B , Quart. J. Math. , PeerJ Comput. Sci. , Comm. Anal. Geom. , . Stability Walls in Heterotic Theories ]. Mirror symmetry is T duality (1973) 145 – 44 – ]. Lacunas for hyperbolic differential operators with SPIRE 131 IN ][ SPIRE Detecting Symmetries with Neural Networks IN ]. ][ Acta. Math. , SPIRE IN [ Sympy: symbolic computing in python Scalar curvature and projective embeddings II arXiv:0905.1748 [ hep-th/9606040 [ Canonical metrics on stable vector bundles . (2009) 026 arXiv:2003.13679 09 (1996) 243 constant coefficients II 345 A. Strominger, S.-T. Yau and E. Zaslow, A. Meurer et al., S. Krippendorf and M. Syvaeri, S.K. Donaldson, X. Wang, L.B. Anderson, J. Gray, A. Lukas and B. Ovrut, M.F. Atiyah, R. Bott and L. Garding, [36] [37] [38] [33] [34] [35] [32]