<<

for QFT and string theory

Harold Erbin

Università di Torino & Infn (Italy)

In collaboration with: – Maxim Chernodub (Institut Denis Poisson, Tours) – Riccardo Finotello (Università di Torino) – Vladimir Goy, Ilya Grishmanovky, Alexander Molochkov (Pacific Quantum Center, Vladivostok) arXiv: 1911.07571, 2006.09113, 2007.13379, 2007.15706

1 / 67 Outline: 1. Motivations

Motivations Lattice QFT String theory

Machine learning

Lattice QFT

Calabi–Yau 3-folds

Conclusion

2 / 67 Question Where does it fit in theoretical physics?

I particle physics I quantum information I cosmology I lattice theories I many-body physics I string theory

[1903.10563, Carleo et al.]

Machine learning Machine Learning (ML) Set of techniques for / function approximation without explicit programming.

I learn to perform a task implicitly by optimizing a cost function I flexible → wide range of applications I general theory unknown (black box problem)

3 / 67 I particle physics I quantum information I cosmology I lattice theories I many-body physics I string theory

[1903.10563, Carleo et al.]

Machine learning Machine Learning (ML) Set of techniques for pattern recognition / function approximation without explicit programming.

I learn to perform a task implicitly by optimizing a cost function I flexible → wide range of applications I general theory unknown (black box problem)

Question Where does it fit in theoretical physics?

3 / 67 Machine learning Machine Learning (ML) Set of techniques for pattern recognition / function approximation without explicit programming.

I learn to perform a task implicitly by optimizing a cost function I flexible → wide range of applications I general theory unknown (black box problem)

Question Where does it fit in theoretical physics?

I particle physics I quantum information I cosmology I lattice theories I many-body physics I string theory

[1903.10563, Carleo et al.] 3 / 67 Outline: 1.1. Lattice QFT

Motivations Lattice QFT String theory

Machine learning

Lattice QFT Introduction to lattice QFT Casimir effect 3d QED

Calabi–Yau 3-folds Calabi–Yau 3-folds ML analysis

Conclusion

4 / 67 Limitations: I computationally expensive I convergence only for some regions of the parameter space → use machine learning

Lattice QFT Ideas: I discretization of action and path integral I Monte Carlo (MC)

Applications: I QCD phenomenology (confinement, quark-gluon plasma. . . ) I statistical physics I Regge / CDT / matrix model approaches to quantum gravity I supersymmetric gauge theories for AdS/CFT

5 / 67 Lattice QFT Ideas: I discretization of action and path integral I Monte Carlo (MC) algorithms

Applications: I QCD phenomenology (confinement, quark-gluon plasma. . . ) I statistical physics I Regge / CDT / matrix model approaches to quantum gravity I supersymmetric gauge theories for AdS/CFT

Limitations: I computationally expensive I convergence only for some regions of the parameter space → use machine learning

5 / 67 Machine learning for Monte Carlo

Support MC with ML [1605.01735, Carrasquilla-Melko]: I compute observables, predict phase I generalize to other parameters I identify order parameters I generate configurations I reduce autocorrelation times I prevent sign problem for fermions

Selected references: 1608.07848, Broecker et al.; 1703.02435, Wetzel; 1705.05582, Wetzel-Scherzer; 1805.11058, Abe et al.; 1801.05784, Shanahan-Trewartha-Detmold; 1807.05971, Yoon-Bhattacharya-Gupta; 1810.12879, Zhou-Endrõdi-Pang; 1811.03533, Urban-Pawlowski; 1904.12072, Albergo-Kanwar-Shanahan; 1909.06238, Matsumoto- Kitazawa-Kohno;... QCD review: 1904.09725, Joó et al.

6 / 67 Outline: 1.2. String theory

Motivations Lattice QFT String theory

Machine learning

Lattice QFT Introduction to lattice QFT Casimir effect 3d QED

Calabi–Yau 3-folds Calabi–Yau 3-folds ML analysis

Conclusion

7 / 67 No vacuum selection mechanism ⇒ string landscape

String phenomenology

Goal Find “the” Standard Model from string theory

Method: I type II / heterotic strings, M-theory, F-theory: D = 10, 11, 12 I vacuum choice (flux compactification): I typically Calabi–Yau (CY) 3- or 4-fold I fluxes and intersecting branes → reduction to D = 4 I check consistency (tadpole, susy. . . ) I read the D = 4 QFT (gauge group, spectrum. . . )

8 / 67 String phenomenology

Goal Find “the” Standard Model from string theory

Method: I type II / heterotic strings, M-theory, F-theory: D = 10, 11, 12 I vacuum choice (flux compactification): I typically Calabi–Yau (CY) 3- or 4-fold I fluxes and intersecting branes → reduction to D = 4 I check consistency (tadpole, susy. . . ) I read the D = 4 QFT (gauge group, spectrum. . . )

No vacuum selection mechanism ⇒ string landscape

8 / 67 Typical questions: I understand geometries: properties and equations involving many integers I find the distribution of some parameters I find good EFTs to describe low-energy limit I (construct an explicit string field theory)

Landscape mapping

String phenomenology: I find consistent string models I find generic/common features I reproduce the Standard model

9 / 67 Landscape mapping

String phenomenology: I find consistent string models I find generic/common features I reproduce the Standard model

Typical questions: I understand geometries: properties and equations involving many integers I find the distribution of some parameters I find good EFTs to describe low-energy limit I (construct an explicit string field theory)

9 / 67 Number of geometries

Calabi–Yau (CY) manifolds I CICY (complete intersection in products of projective spaces): 7890 (3-fold), 921,497 (4-fold) I Kreuzer–Skarke (reflexive polyhedra): 473,800,776( d = 4)

String models and flux vacua I type IIA/IIB models: 10500 I F-Theory: 10755 (geometries), 10272,000 (flux vacua)

[Lerche-Lüst-Schellekens ’89; hep-th/0303194, Douglas; hep-th/0307049, Ashok-Douglas; hep-th/0409207, Douglas; 1511.03209, Taylor-Wang; 1706.02299, Halverson-Long-Sun; 1710.11235, Taylor-Wang; 1810.00444, Constantin-He-Lukas]

10 / 67 → use machine learning

Selected references: 1404.7359, Abel-Rizos; 1706.02714, He; 1706.03346, Krefl-Song; 1706.07024, Ruehle; 1707.00655, Carifio-Halverson-Krioukov- Nelson; 1804.07296, Wang-Zang; 1806.03121, Bull-He-Jejjala-Mishra;...

Review: Ruehle ’20

Challenges

I numbers are huge I difficult maths problems (NP-complete, NP-hard, undecidable) [hep-th/0602072, Denef-Douglas; 1809.08279, Halverson-Ruehle] I methods from algebraic topology: cumbersome, rarely closed-form formulas

11 / 67 Challenges

I numbers are huge I difficult maths problems (NP-complete, NP-hard, undecidable) [hep-th/0602072, Denef-Douglas; 1809.08279, Halverson-Ruehle] I methods from algebraic topology: cumbersome, rarely closed-form formulas

→ use machine learning

Selected references: 1404.7359, Abel-Rizos; 1706.02714, He; 1706.03346, Krefl-Song; 1706.07024, Ruehle; 1707.00655, Carifio-Halverson-Krioukov- Nelson; 1804.07296, Wang-Zang; 1806.03121, Bull-He-Jejjala-Mishra;...

Review: Ruehle ’20

11 / 67 Plan

1. Casimir energy for arbitrary boundaries for a 3d scalar field → speed improvement and accuracy [2006.09113, Chernodub-HE-Goy-Grishmanovky-Molochkov]

2. deconfinement phase transition in 3d compact QED → extrapolation to different lattice sizes [1911.07571, Chernodub-HE-Goy-Molochkov]

3. Hodge numbers for CICY 3-fold → extends methods for algebraic topology [2007.13379, 2007.15706, HE-Finotello]

12 / 67 Outline: 2. Machine learning

Motivations

Machine learning

Lattice QFT

Calabi–Yau 3-folds

Conclusion

13 / 67 Definition

Machine learning (Samuel) The field of study that gives computers the ability to learn without being explicitly programmed.

Machine learning (Mitchell) A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T , as measured by P, improves with experience E.

14 / 67 Approaches to machine learning

Learning approaches (task: x −→ y):

I supervised: learn a map from a set (xtrain, ytrain), then predict ydata from xdata

I unsupervised: give xdata and let the machine find structure (i.e. appropriate ydata)

I reinforcement: give xdata, let the machine choose output following rules, reward good and/or punish bad results, iterate

15 / 67 Applications General idea = pattern recognition: I classification / clustering I regression (prediction) I transcription / translation I structuring I anomaly detection I denoising I synthesis and sampling I density estimation I conjecture generation

Applications in industry: , language processing, medical diagnosis, fraud detection, recommendation system, autonomous driving. . .

16 / 67 Examples Multimedia applications: I MuZero, AlphaZero (DeepMind): play , , Go I MuZero, AlphaStar (Deepmind), OpenAI Five, etc.: play video games (Starcraft 2, Dota 2, Atari. . . ) I Gpt-2 (OpenAI): conditional synthetic text sampling (+ general language tasks) I DeepL: translation I Yolo: real-time object detection [1804.02767] I Face2Face: real-time face reenactement (deep fake) applications: I AlphaFold (DeepMind): folding I (astro)particles [1806.11484, 1807.02876, darkmachines.org] I astronomy [1904.07248] I geometrical structures [geometricdeeplearning.com]

17 / 67 Generic method: fixed functions g (n), learn weights W (n)

Deep neural network

x (1) := x i1 i1 x (2) = g (2)W (1)x (1) i2 i2i1 i1 f (x ) := x (3) = g (3)W (2)x (2) i3 i1 i3 i3i2 i2

i1 = 1, 2, 3; i2 = 1,..., 4; i3 = 1, 2

18 / 67 Deep neural network

Architecture: I 1–many hidden layers, vector x (n) I link: weighted input, matrix W (n) I neuron: non-linear “” g (n)

x (n+1) = g (n+1)(W (n)x (n))

Generic method: fixed functions g (n), learn weights W (n)

18 / 67 I main risk: overfitting (= cannot generalize) → various solutions (regularization, dropout. . . ) → split data set in two (training and test)

Learning method I define a L

Ntrain X (train) (pred) L = distance yi , yi i=1 I minimize the loss function (iterated . . . )

19 / 67 Learning method I define a loss function L

Ntrain X (train) (pred) L = distance yi , yi i=1 I minimize the loss function (iterated gradient descent. . . ) I main risk: overfitting (= cannot generalize) → various solutions (regularization, dropout. . . ) → split data set in two (training and test)

19 / 67 ML workflow

“Naive” workflow: 1. get raw data 2. write neural network with many layers 3. feed raw data to neural network 4. get nice results (or give up)

https://xkcd.com/1838

20 / 67 ML workflow

Real-world workflow: 1. understand the problem 2. exploratory data analysis I feature engineering I feature selection 3. baseline model I full working pipeline I lower-bound on accuracy 4. validation strategy 5. machine learning model 6. ensembling

Pragmatic ref.: [coursera.org/learn/competitive-data-science]

20 / 67 Particularities:

I fi (I) : engineered features I identical outputs (stabilisation)

Advanced neural network

21 / 67 Advanced neural network

Particularities:

I fi (I) : engineered features I identical outputs (stabilisation)

21 / 67 Comparisons I results comparable and sometimes superior to human experts (cancer diagnosis, traffic sign recognition. . . ) I perform generically better than any other machine learning

Drawbacks I black box I magic I numerical

(= how to extract analytical / predictable / exact results?)

Some results

Universal approximation theorem Under mild assumptions, a feed-forward network with a single hidden containing a finite number of neurons can n approximate continuous functions on compact subsets of R .

22 / 67 Drawbacks I black box I magic I numerical

(= how to extract analytical / predictable / exact results?)

Some results

Universal approximation theorem Under mild assumptions, a feed-forward network with a single hidden layer containing a finite number of neurons can n approximate continuous functions on compact subsets of R .

Comparisons I results comparable and sometimes superior to human experts (cancer diagnosis, traffic sign recognition. . . ) I perform generically better than any other machine learning algorithm

22 / 67 Some results

Universal approximation theorem Under mild assumptions, a feed-forward network with a single hidden layer containing a finite number of neurons can n approximate continuous functions on compact subsets of R .

Comparisons I results comparable and sometimes superior to human experts (cancer diagnosis, traffic sign recognition. . . ) I perform generically better than any other machine learning algorithm

Drawbacks I black box I magic I numerical

(= how to extract analytical / predictable / exact results?)

22 / 67 Outline: 3. Lattice QFT

Motivations

Machine learning

Lattice QFT Introduction to lattice QFT Casimir effect 3d QED

Calabi–Yau 3-folds

Conclusion

23 / 67 Outline: 3.1. Introduction to lattice QFT

Motivations Lattice QFT String theory

Machine learning

Lattice QFT Introduction to lattice QFT Casimir effect 3d QED

Calabi–Yau 3-folds Calabi–Yau 3-folds ML analysis

Conclusion

24 / 67 Discretization I Euclidean periodic lattice Λ, spacing a µ d−1 x /a ∈ Λ = {0,..., Lt − 1} × {0,..., Ls − 1}

I scalar field ∈ site: φ(x) −→ φx I gauge field → phase factor ∈ link ` = (x, µ) ! Z x+ˆµ 2 0ν iaAµ+O(a ) Uµ(x) = P exp i dx Aν → Ux,µ = e x

I field strength → phase factor ∈ plaquette P = (x, µ, ν) † † Uµν(x) = Uν(x) Uµ(x +ν ˆ) Uν(x +µ ˆ)Uµ(x) 2 3 ia Fµν +O(a ) → Ux,µν = e

25 / 67 Monte Carlo methods I interpret path integral → statistical system partition function Z P e−βS[C]O[C] Y dφ −→ X and hO[C]i = C x P e−βS[C] x C C

C = {φx }x∈Λ field configuration I Monte Carlo: sample susbset E = {C1,..., CN } s.t. N 1 X Prob(C ) = Z −1 e−βS[Ck ], hOi = O[C ] k N k k=1 I Markov chain: built E by sequence of state stochastic transition Prob(Ck → Ck+1) = Prob(Ck , Ck+1) I Metropolis algorithm: select trial configuration C 0, accept 0 Ck+1 = C with probability given by action difference  0  0 −β(S[C ]−S[Ck ]) Prob(Ck → C ) = min 1, e 0 Prob(Ck → Ck ) = 1 − Prob(Ck → C )

26 / 67 Outline: 3.2. Casimir effect

Motivations Lattice QFT String theory

Machine learning

Lattice QFT Introduction to lattice QFT Casimir effect 3d QED

Calabi–Yau 3-folds Calabi–Yau 3-folds ML analysis

Conclusion

27 / 67 Scalar field theory

I action and Dirichlet boundary conditions (µ = 0, 1, 2) 1 Z S[φ] = d3x ∂ φ∂µφ, φ(x)| = 0 2 µ x∈S

I static boundary S: quasi-parallel lines or closed curve I Casimir energy ES = hT00iS − hT00i0 = change in vacuum energy density due to defect I motivations: I 3d scalar: simplest model exhibiting this effect I interesting for technological applications (micro-electromechanical and micro-fluidic systems) [Rodriguez-Capasso-Johnson, Photonics ’11] I modify QCD vacuum → chiral symmetry breaking / confinement [1805.11887, Chernodub et al.]

28 / 67 Discretization

I partition function and action Z Y −S[φ] 1 X 2 Z = dφx e , S[φ] = (φx+ˆµ − φx ) x 2 x,µ

µˆ unit vector in direction µ I Euclidean energy

1 X  2 2 T00 = ηµ (φx+ˆµ − φx ) + (φx − φx−µˆ) 4 µ

(η0, η1, η2) = (−1, 1, 1) I Hybrid Monte Carlo algorithm (MC + molecular dynamics) I boundaries: parallel lines or closed curves

29 / 67 ML analysis

I input: 2d boundary condition (= BW image), Ls = 255 I output: Casimir energy ∈ R I network: 4 layers, ∼ 400k param. I data: 80% train, 10% validate, 10% test I time comparison: I MC = 3.1 hours / sample I training = 5 min / 800 samples I prediction = 5 ms / sample → 106 times faster

[1911.07571]

30 / 67 Examples id 136: error = 0.000596 id 471: error = 1.190834 true = -13.5286, pred = -13.5205 true = -1.54119, pred = -3.37649 0 0

50 50

100 100

150 150

200 200

250 250 0 50 100 150 200 250 0 50 100 150 200 250

id 722: error = 0.000225 id 98: error = 0.042850 true = -36.4675, pred = -36.4593 true = -37.6339, pred = -36.0213 0 0

50 50

100 100

150 150

200 200

250 250 0 50 100 150 200 250 0 50 100 150 200 250

31 / 67 Predictions 200 true 40 pred 175

150 30 125

100 Count

20 Count

75

10 50

25

0 5 0 0 35 30 25 20 15 10 0.0 0.5 1.0 1.5 2.0 E_c Relative errors on E_c

70 true pred 50 60

50 40

40 30 Count

30 Count

20 20

10 10

0 0 37.5 37.0 36.5 36.0 35.5 35.0 34.5 0.000 0.002 0.004 0.006 0.008 0.010 0.012 0.014 E_c Relative errors on E_c 32 / 67 Training and learning curves

Training curve Learning curve

train train 0.35 validation validation

0.30

0.25 100

0.20 Loss Loss

0.15

0.10

10 1 0.05

0 20 40 60 80 100 120 140 0.2 0.4 0.6 0.8 Epochs Percentage of training data (n = 2000)

33 / 67 Relative errors and RMSE

errors (relative) closed curves parallel lines mean 0.064 0.0037 min 0.000087 0.000019 75% 0.069 0.0051 max 2.1 0.016 RMSE 0.97 0.18

ML − MC rel. error = MC

34 / 67 Comparison MC and ML

Best and worst in terms of absolute error (closed curves):

MC ML E errE E errE -22.62 0.13 -22.60 0.014 -20.34 0.12 -20.34 0.0018 -12.22 0.09 -12.23 0.011 best -9.57 0.16 -9.57 0.0028 -9.57 0.13 -9.56 0.011 -0.82 0.12 -2.54 1.72 -1.63 0.10 -2.67 1.04

worst -1.48 0.09 -2.30 0.82

35 / 67 Outline: 3.3. 3d QED

Motivations Lattice QFT String theory

Machine learning

Lattice QFT Introduction to lattice QFT Casimir effect 3d QED

Calabi–Yau 3-folds Calabi–Yau 3-folds ML analysis

Conclusion

36 / 67 Compact QED: properties

Model: 3d compact QED at finite temperature [Polyakov ’76] I well understood [hep-lat/0106021, Chernodub-Ilgenfritz-Schiller] I good toy model for QCD (linear confinement, mass gap generation, temperature phase transition) I topological defects (monopoles): drive phase transition

Confinement-deconfinement phase transition: I low temperature: confinement caused by Coulomb monopole-antimonopole gas I high temperature: deconfinement, rare monopoles bound into neutral monopole-antimonopole pairs

37 / 67 Compact QED: lattice

I angle θx,µ = a Aµ(x) ∈ [−π, π) lattice gauge field I elementary plaquette angle 2 4 θPx,µν = θx,µ + θx+ˆµ,ν − θx+ˆν,µ − θx,ν = a Fx,µν + O(a ) I lattice action: continuum coupling g, temperature T

X X  1 Lt T S[θ] = β 1 − cos θPx,µν , β = 2 = 2 x µ<ν ag g I Polyakov loop → order parameter for confinement

L −1 Pt i θ0(t,x) L(x) = e t=0 , hL(R)i = e−F /T infinitely heavy charged test particle, free energy F I confining potential (σ string tension)

−Lt V (R) hL(0)L(R)i ∝ e , V (R) ∼T ∼0 σ|R|

38 / 67 Monte Carlo computations

I MC simulations for different temperatures β: 1. gauge field configurations 2. monopole configurations 3. extract properties

I useful quantities: I spatially averaged Polyakov loop L I plaquettes U (spatial and temporal) I monopole density ρ

I study phase transition from |L|: I critical temperature βc I phase φ = 0 (confined) or φ = 1 (deconfined)

[2006.09113]

39 / 67 ML analysis Objective:

1. train for (Lt , Ls ) = (4, 16)

2. predict phase Prob(φ), Polyakov loop |L| for (Lt , Ls ) 6= (4, 16) (Lt = 4, 6, 8, Ls = 16, 32) 3. compute the critical temperature

Characteristics: I input: 3d monopole configuration (= 3d BW image) I main output: |L|, Prob(φ) I auxiliary output: L, U, ρ, β I network: convolution + dense layers, 1.28M parameters I data: I train 1: 2000 samples for each β ∈ [1.5, 3], ∆β = 0.05 I train 2: 100 samples for each β ∈ [0.1, 2.2], ∆β = 0.1 I validation/test: 200 samples for each β ∈ [1.5, 2.5], ∆β = 0.05

40 / 67 Neural network

41 / 67 Predictions (temperature, density)

(Lt , Ls ) = (4, 16) (Lt , Ls ) = (6, 32)

0.6 MonDens (ML) MonDens (ML) 0.6 PL_mod (ML) PL_mod (ML) MonDens (MC) 0.5 MonDens (MC) 0.5 PL_mod (MC) PL_mod (MC) 0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0.0 0.0

1.6 1.8 2.0 2.2 2.4 1.6 1.8 2.0 2.2 2.4 beta beta

42 / 67 Predictions (phase)

(Lt , Ls ) = (4, 16), MC (Lt , Ls ) = (6, 32), MC

confined confined deconfined deconfined 0.7 0.5

0.6 0.4 0.5

0.3 0.4 PL_mod PL_mod 0.3 0.2

0.2

0.1 0.1

0.0 0.0

1.6 1.8 2.0 2.2 2.4 1.6 1.8 2.0 2.2 2.4 beta beta

ML ML

confined confined 0.6 deconfined 0.6 deconfined

0.5 0.5

0.4 0.4

0.3 PL_mod PL_mod 0.3

0.2

0.2

0.1

0.1 0.0 1.6 1.8 2.0 2.2 2.4 1.6 1.8 2.0 2.2 2.4 beta beta

43 / 67 Predictions (errors)

RMSE φ score |L| 0.089 accuracy 94.5% ρ 0.0027 precision 95.8% β 0.19 recall 96.0% U 0.016 F1 0.96

1400 true true pred pred 1200 2000

1000 1500 800 Count 600 Count 1000

400 500 200

0 0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.00 0.02 0.04 0.06 0.08 PL_mod MonDens

44 / 67 Training and learning curves

Training curve Learning curve

train train 7 3.0 validation validation

6 2.5

5 2.0

4

Loss Loss 1.5 3 1.0 2 0.5 1

0 5 10 15 20 25 30 35 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Epochs Percentage of training data (n = 64200)

45 / 67 Critical temperature: estimations

I maximum slope of Polyakov loop:

βc = argmax ∂βh|L|iβ β

I maximum probability variance:  βc = argmax Varβ p(φ) β

I maximum probability uncertainty:

hp(φ)iβ|βc = 0.5

46 / 67 Phase probability distribution

1.0 (4, 16, 16) (4, 16, 16) (4, 32, 32) 0.08 (4, 32, 32) (6, 16, 16) (6, 16, 16) 0.8 (6, 32, 32) (6, 32, 32) (8, 16, 16) 0.06 (8, 16, 16) 0.6 (8, 32, 32) (8, 32, 32)

0.04 0.4 phase_prob (var) phase_prob (mean) 0.2 0.02

0.0 0.00

1.6 1.8 2.0 2.2 2.4 1.6 1.8 2.0 2.2 2.4 beta beta

47 / 67 Critical temperature: predictions

2.4 Ls = 16 2.2

c 2.0 1.8 Monte Carlo Ls = 32 1.6 Machine Learning

4 6 8 4 6 8

Lt

48 / 67 Error correction

βc prediction could be improved to < 5% error:

I modify decision function

( Ls = 16 0 p(φ) < p 0.12 φ = c Ls = 32 0.10 )

1 p(φ) ≥ pc r o r r e

0.08 e v i t a tune pc , predict βc from l 0.06 e r (

c hφi, Var φ 0.04

I error linear in Lt 0.02 → apply correction 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0 Lt

Notes I form of boosting/hyperparameter tuning using several lattices I useful if considering many more lattices

49 / 67 Outline: 4. Calabi–Yau 3-folds

Motivations

Machine learning

Lattice QFT

Calabi–Yau 3-folds Calabi–Yau 3-folds ML analysis

Conclusion

50 / 67 Outline: 4.1. Calabi–Yau 3-folds

Motivations Lattice QFT String theory

Machine learning

Lattice QFT Introduction to lattice QFT Casimir effect 3d QED

Calabi–Yau 3-folds Calabi–Yau 3-folds ML analysis

Conclusion

51 / 67 I described by configuration matrix m × k

 n1 1 1  P a1 ··· ak  . . .. .  X =  . . . .  nm m m P a1 ··· ak m k X X r dimC X = nr − k = 3, nr + 1 = aα r=1 α=1

r nr I aα power of coordinates on P in αth equation

Calabi-Yau Complete intersection Calabi–Yau (CICY) 3-fold: I CY: complex manifold with vanishing first Chern class I complete intersection: non-degenerate hypersurface in products of projective spaces I hypersurface = solution to system of homogeneous polynomial equations

52 / 67 Calabi-Yau Complete intersection Calabi–Yau (CICY) 3-fold: I CY: complex manifold with vanishing first Chern class I complete intersection: non-degenerate hypersurface in products of projective spaces I hypersurface = solution to system of homogeneous polynomial equations I described by configuration matrix m × k

 n1 1 1  P a1 ··· ak  . . .. .  X =  . . . .  nm m m P a1 ··· ak m k X X r dimC X = nr − k = 3, nr + 1 = aα r=1 α=1

r nr I aα power of coordinates on P in αth equation

52 / 67 Classification I invariances (→ huge redundancy) I permutation of lines and columns I identities between subspaces I but: I constraints ⇒ bound on matrix size I ∃ “favourable” configuration

Configuration matrix Examples I quintic (a = 0,..., 4)

h 4 i X a 5 Px 5 =⇒ (X ) = 0 a I 2 projective spaces, 3 equations (a, α = 0,..., 3)

 a b c " # fabc X X X = 0 3 3 0 1  Px =⇒ g Y αY βY γ = 0 3 0 3 1 αβγ Py  a α haα X Y = 0

53 / 67 Configuration matrix Examples I quintic (a = 0,..., 4)

h 4 i X a 5 Px 5 =⇒ (X ) = 0 a I 2 projective spaces, 3 equations (a, α = 0,..., 3)

 a b c " # fabc X X X = 0 3 3 0 1  Px =⇒ g Y αY βY γ = 0 3 0 3 1 αβγ Py  a α haα X Y = 0 Classification I invariances (→ huge redundancy) I permutation of lines and columns I identities between subspaces I but: I constraints ⇒ bound on matrix size I ∃ “favourable” configuration 53 / 67 Topological properties I Hodge numbers hp,q (number of harmonic (p, q)-forms) here: h1,1, h2,1 I Euler number χ = 2(h1,1 − h2,1) I Chern classes I triple intersection numbers I line bundle cohomologies

Topology

Why topology? I no metric known for compact CY (cannot perform KK reduction explicitly) I topological numbers → 4d properties (number of fields, representations, gauge symmetry. . . )

54 / 67 Topology

Why topology? I no metric known for compact CY (cannot perform KK reduction explicitly) I topological numbers → 4d properties (number of fields, representations, gauge symmetry. . . )

Topological properties I Hodge numbers hp,q (number of harmonic (p, q)-forms) here: h1,1, h2,1 I Euler number χ = 2(h1,1 − h2,1) I Chern classes I triple intersection numbers I line bundle cohomologies

54 / 67 Topology

Why topology? I no metric known for compact CY (cannot perform KK reduction explicitly) I topological numbers → 4d properties (number of fields, representations, gauge symmetry. . . )

Topological properties I Hodge numbers hp,q (number of harmonic (p, q)-forms) here: h1,1, h2,1 I Euler number χ = 2(h1,1 − h2,1) I Chern classes I triple intersection numbers I line bundle cohomologies

54 / 67 Datasets

CICY have been classified I 7890 configurations (but ∃ redundancies) I number of product spaces: 22 I h1,1 ∈ [0, 19], h2,1 ∈ [0, 101] I 266 combinations (h1,1, h2,1) r I aα ∈ [0, 5]

Original [Candelas-Dale-Lutken-Schimmrigk ’88][Green-Hubsch-Lutken ’89] I maximal size: 12 × 15 I number of favourable matrices: 4874

Favourable [1708.07907, Anderson-Gao-Gray-Lee] I maximal size: 15 × 18 I number of favourable matrices: 7820

55 / 67 Data

h1,1

103

350 100

102 300 80 count 250

101 60 200 h21 40 150 100

0 2 4 6 8 10 12 14 16 18 100 20 50 h2,1 0

0 5 10 15 h11

102 Sizes

1 10 100 500 count

101

100

0 10 20 30 40 50 60 70 80 90 100 56 / 67 Goal and methodology

Philosophy Start with the original dataset, derive everything else from configuration matrix and machine learning only.

Current goal Input: configuration matrix −→ Output: h1,1, h2,1

1. CICY: well studied, all topological quantities known → use as a sandbox 2. improve over [1706.02714, He; 1806.03121, Bull-He-Jejjala-Mishra] 3. both original and favourable datasets

[2007.13379, 2007.15706]

57 / 67 Outline: 4.2. ML analysis

Motivations Lattice QFT String theory

Machine learning

Lattice QFT Introduction to lattice QFT Casimir effect 3d QED

Calabi–Yau 3-folds Calabi–Yau 3-folds ML analysis

Conclusion

58 / 67 Strategy

Questions: I classification or regression? I feature engineering? I data diminution: remove outliers (0.49%)? I : generate more inputs using invariances? I single- or multi-tasking?

Classification vs regression: I classification: assume knowledge of boundaries I regression: better for generalization different scales → normalize data ≈ use continuous variable

59 / 67 Algorithms

1.0 99% h1, 1 (30% ratio) 97% 95% h1, 1 (80% ratio) h2, 1 (30% ratio) 0.9 h2, 1 (80% ratio)

83%

0.8

70% 0.7

61% 61% 0.6 57% 55% 54% 51% 50% 50% 0.5 49% 46% accuracy

0.4 36% 33%

0.3

23% 22% 0.2 16% 14% 12% 11% 11% 11% 10% 10% 10% 0.1

0.0

lin. reg. lin. SVM rand. for. ConvNet Inception grad. boost. SVM (Gauss.)

60 / 67 Algorithms

1.0 99% h1, 1 (30% ratio) 97% 95% h1, 1 (80% ratio) h2, 1 (30% ratio) 0.9 h2, 1 (80% ratio)

83%

0.8

72% 0.7 69% 66% 65% 64% 63% 63% 61% 0.6 57% 55%

50% 0.5 accuracy

0.4 36% 34% 33%

0.3 27%

23% 23% 21% 21% 20% 0.2 19% 19% 17% 14%

0.1

0.0

lin. reg. lin. SVM rand. for. ConvNet Inception grad. boost. SVM (Gauss.)

60 / 67 Inception neural network (1)

I designed by for computer vision → breakthrough in image classification [Szegedy et al., 1409.4842, 1512.00567, 1602.07261] I sequence of inception modules → parallel with kernels of 6= sizes I learns different combinations of features at different scales I motivations: I 1d parallel kernels of maximal sizes: look at all Pn/equations for each equation/Pn at the same time I weight sharing (convolution): same operations for each Pn and equation

I neural network performs badly on h2,1 (but still much better than any other approach) → focus on h1,1

61 / 67 Inception neural network (2)

Concatenation module 1 Concatenation module 2 Concatenation module 3

hor.: 1x15 kernel hor.: 1x15 kernel hor.: 1x15 kernel vert.: 12x1 kernel vert.: 12x1 kernel vert.: 12x1 kernel

ReLU activation ReLU activation ReLU activation

32@12x15 64@12x15 32@12x15

Flatten

64@12x15 128@12x15 64@12x15 . . .

Input layer 1@12x15 32@12x15 64@12x15 32@12x15 Output layer . FC . . 1 unit (ReLU activation)

234,000 parameters: 7× less than [Bull et al.]

62 / 67 Learning curve and errors

1.0 h11 (train) h11 (val.) 0.8

0.6

accuracy 0.4

0.2

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 training ratio

Absolute Errors of the Model Residual Plot of the Model

validation 2.5 validation test test 2.0

102 1.5

1.0 #

residuals 0.5 101

0.0

0.5

100

0.5 0.0 0.5 1.0 1.5 2.0 2.5 2 4 6 8 10 12 14 16 errors predicted values 63 / 67 Comparing architectures

1.0 97% 99%

85%

0.8 75% 68%

0.6 55%

45% 0.4 37%

0.2

0.0

lin. reg. (80%) He (reg., 63%) InceptionInception (30%) (80%) SVM RBF SVM(reg., RBF 30%) (reg., 80%) Bull et al.Bull (reg., et al. 70%) (class., 70%)

64 / 67 Ablation study

1.0 97% 99% 97% 93% 89% 0.9 86% 85% 0.8 78% 79% 0.7 69% 68% 62% 0.6 0.5

accuracy 0.4 0.3 training data 0.2 30% 0.1 80% 0.0

convnet, outliers convnet, no outliers Inception, 1d, outliers Inception, 2d, outliers Inception, 1d, no outliers Inception, 2d, no outliers

65 / 67 Outline: 5. Conclusion

Motivations

Machine learning

Lattice QFT

Calabi–Yau 3-folds

Conclusion

66 / 67 Conclusion Results: I machine learning = promising tool for different fields of theoretical physics I wide range of applications for lattice QFT and string theory

Outlook: I lattice QFT: dynamics of topological objects in non-Abelian theories, generate boundaries from Casimir energy. . . I string theory: extend to other geometries (4-folds, Kreuzer–Skarke polytopes. . . ) I dissect neural network data to understand what it learns I symbolic computations with graph representation and I explore the space of effective field theories

67 / 67