1 CLARIANT LORENTZ COVARIANT NEURAL NETWORK
Alexander Bogatskiy, Brandon Anderson, Risi Kondor, David Miller, Jan Offermann, Marwah Roussi 2
[ml4a.github.io] SYMMETRIES 3
[NOAA] TRANSLATIONAL SYMMETRY 4
▸ Solution: CNN’s TRANSLATIONAL SYMMETRY 4
▸ Solution: CNN’s
[Cohen & Welling: Group equivariant CNNs (ICML 2016)] ROTATIONAL SYMMETRY 5 ROTATIONAL SYMMETRY 5
▸ Solution: randomly rotated training samples?
preprocessing? ROTATIONAL SYMMETRY 6
▸ Solution: randomly rotated training samples?
preprocessing? ROTATIONAL SYMMETRY 7
▸ Solution: convolutions on Lie groups
[Cohen & Welling: Group equivariant CNNs (ICML 2016)] [Cohen & Welling: Steerable CNNs (ICLR 2017)] BENEFITS OF SYMMETRY-BASED NEURAL NETWORKS 8
▸ Greatly reduce the number of learnable parameters
▸ Reduce the number of training samples
▸ Improve generalization and regression ability
▸ Provide physically interpretable models
▸ Bring elegance and mathematical transparency
▸ Build on physical principles of symmetries and constraints LORENTZ GROUP 9
t
x LORENTZ GROUP 9
t
x LIE GROUP COVARIANCE/EQUIVARIANCE 10
Fout → ρout(g)Fout OUTPUTS Fout
F1 F3 F2 INPUTS
F1 → ρ1(g)F1 F2 → ρ2(g)F2 F3 → ρ3(g)F3 LIE GROUP COVARIANCE/EQUIVARIANCE 10
Fout → ρout(g)Fout OUTPUTS Fout
F1 F3 F2 INPUTS
F1 → ρ1(g)F1 F2 → ρ2(g)F2 F3 → ρ3(g)F3 PERMUTATION INVARIANCE 11
Any continuous function can be represented:
2n⋅d n F(x1, …, xn) = ∑ Fk ∑ fki(xi) k=0 ( i=1 )
[Kolmogorov-Arnold representation theorem] PERMUTATION INVARIANCE 11
Any continuous function can be represented:
2n⋅d n F(x1, …, xn) = ∑ Fk ∑ fki(xi) k=0 ( i=1 )
[Kolmogorov-Arnold representation theorem]
Symmetric functions:
̂ F(x1, …, xn) = F ∑ f(xi) ( i )
[Zaheer et al. (Deep Sets, 2017)] COVARIANCE IN THE LITERATURE 12
Group equivariant neural networks (Cohen & Welling, 2016)
Harmonic networks: deep translation and rotation equivariance (Worrall, Garbin, Turmukhanbetov & Brostow, 2016)
Steerable CNNs (Cohen & Welling, 2017)
On the generalization of convolution and equivariance (Kondor & Trivedi, 2018)
Intertwiners between induced representations (Cohen, Geiger & Weiler, 2018)
3D steerable neural networks (Weiler, Geiger, Wellig, Boomsma & Cohen, 2018)
Gauge equivariant neural networks (Cohen, Weiler, Kicanaoglu, Welling, 2019)
Tensor field networks (Thomas, Smidt, Kearns, Yang, Li Kohlhoff & Riley, 2018)
Relativistic Harmonic Networks (Chase Shimmin @ Boost 2019)
Cormorant: Covariant Molecular Neural Networks (Anderson, Hy & Kondor, 2019) SIMPLE NEURAL NETWORK COOKBOOK 13
INGREDIENTS SIMPLE NEURAL NETWORK COOKBOOK 13
INGREDIENTS
▸ Activations valued in a vector space SIMPLE NEURAL NETWORK COOKBOOK 13
INGREDIENTS
▸ Activations valued in a vector space
▸ learnable linear operation SIMPLE NEURAL NETWORK COOKBOOK 13
INGREDIENTS
▸ Activations valued in a vector space
▸ learnable linear operation
▸ composable nonlinear operation COVARIANT NEURAL NETWORK COOKBOOK 14
INGREDIENTS
▸ Activations valued in vector representations of a group
▸ Equivariant learnable linear operation
▸ Equivariant composable nonlinear operation COVARIANT NEURAL NETWORK COOKBOOK 14
INGREDIENTS
▸ Activations valued in vector representations of a group
▸ Equivariant learnable linear operation (“commutes” with the action of the group) ▸ Equivariant composable nonlinear operation COVARIANT NEURAL NETWORK COOKBOOK 14
INGREDIENTS
▸ Activations valued in vector representations of a group
▸ Equivariant learnable linear operation (“commutes” with the action of the group) ▸ Equivariant composable nonlinear operation (maps representations to representations) EQUIVARIANT LEARNABLE LINEAR OPERATION 15 ℱ I. MIXING IRREDUCIBLES ⏞ F 0 F 0 F 0
W : R → R′, W ⋅ ρ(g) = ρ′( g) ⋅ W
F 2 F 2 F 2 Schur’s lemma implies that acts as scalar
Activations need to be stored as collections of irreducible components.
F 3
II. CLEBSCH-GORDAN PRODUCT
The “only” bilinear equivariant operation mapping two representations to another one is the tensor product
⊗ : R1 × R2 → R3
μ(ρ) ρ (g) ⊗ ρ (g) = C ρ(g) C† 1 2 ρ1,ρ2 ⨁⨁ ρ1,ρ2 ρ 1
Since our linear operation requires knowledge of irreducible components, each product must be followed by a Clebsch-Gordan decomposition. EQUIVARIANT NONLINEAR OPERATION 16
II. CLEBSCH-GORDAN PRODUCT
The “only” bilinear equivariant operation mapping two representations to another one is the tensor product
⊗ : R1 × R2 → R3
μ(ρ) ρ (g) ⊗ ρ (g) = C ρ(g) C† 1 2 ρ1,ρ2 ⨁⨁ ρ1,ρ2 ρ 1
Since our linear operation requires knowledge of irreducible components, each product must be followed by a Clebsch-Gordan decomposition. 17
ARCHITECTURE ARCHITECTURE 18
‣ particles with input 4-momenta μ N pi ‣ N activations ℱi at each level live in representations of the Lorentz group
‣ The update rule involves pair interactions * ⊗2 2 ℱi ↦ W ⋅ ℱi ⊕ ℱi ⊕ ∑ f (pij) ⋅ pij ⊗ ℱj j
‣ Arbitrary traditional sub-networks can be applied to Lorentz invariants
‣ Output layer sums over i and projects onto invariants (or other irrep)
1 (2g )4 ℳ2 = c p ⋅ p + m c2 p ⋅ p + m c2 2 2 2 [ 1 3 e ] [ 2 4 μ ] 4 ((p1 − p3) − mγc )
*IRC Safety to be addressed separately 19
PERFORMANCE INVARIANCE TESTS 20
Rotational invariance Boost invariance
out(θ)/out(0) − 1 0.012 out(γ)/out(1) − 1 2.5 ×10-9 0.010 1 % 2. ×10-9 0.008
1.5 ×10-9 0.006
1. ×10-9 0.004
× -10 5. 10 0.002
2 4 6 8 θ 500 1000 1500 2000 γ
Relative invariance within 10-9 Relative invariance within 10-2 using double precision limited by floating point precision TOP TAGGING CLASSIFIER 21
Dataset: [Butter, Kasieczka, Russell & Russell (Deep-learned Top Tagging with a Lorentz Layer) arXiv: 1707.08966] 22
CLARIANT 0.970 0.922 426 3k
[Kasieczka et. al. (ML Landscape of top taggers, 2019)]
[Xia et. al. (ResNeXt, 2016)] [Moore at. al. (Multi-body N-subjettiness, 2018)] [Qu & Gouskos (ParticleNet, 2019)] [Thaler et. al. (Energy/Particle Flow Polynomials, 2018/19)] [Butter et. al. (LoLa — Lorentz Layer, 2018)] CONCLUSION 23
FUTURE WORK AND POTENTIAL EXTENSIONS
▸ Complete hyperparameter optimization;
▸ Include particle information (label, charge, spin, …);
▸ Regression tasks and measurements:
๏ invariant mass detection,
๏ covariant 4-momentum measurements;
▸ Detection of hidden symmetries;
▸ Multiple symmetries combined (Standard Model?); 24
Physics Department Computer Science Department
Brandon Jan Marwah Risi Kondor David Miller Anderson Offermann Roussi DATASET ENERGY DISTRIBUTIONS 25
Top tagging reference dataset (arXiv:1902.09914) Top tagging reference dataset (arXiv:1902.09914) 106 105 anti-kT, R = 0.8 anti-kT, R = 0.8 Hadronic top decays Hadronic top decays Light quarks and gluons Light quarks and gluons
105 104 Number of jets per bin Number of jets per 10 GeV
400 450 500 550 600 650 700 750 800 2 1 012 pT [GeV] VISUALIZATION OF SAMPLE EVENTS 26 HIERARCHICAL NETWORKS 27 HIERARCHICAL NETWORKS 27
ATLAS 28 INPUT LAYER 28 INPUT LAYER 28 INPUT LAYER
ITERATED CG LAYER 28 INPUT LAYER
ITERATED CG LAYER 28 INPUT LAYER
ITERATED CG LAYER 28 INPUT LAYER
ITERATED CG LAYER ITERATED CG LAYER ITERATED CG LAYER ITERATED CG LAYER ITERATED CG LAYER ITERATED CG LAYER ITERATED CG LAYER
OUTPUT LAYER OUTPUT LAYER CLARIANT