<<

1 CLARIANT LORENTZ COVARIANT NEURAL NETWORK

Alexander Bogatskiy, Brandon Anderson, Risi Kondor, David Miller, Jan Offermann, Marwah Roussi 2

[ml4a.github.io] 3

[NOAA] TRANSLATIONAL 4

▸ Solution: CNN’s TRANSLATIONAL SYMMETRY 4

▸ Solution: CNN’s

[Cohen & Welling: Group equivariant CNNs (ICML 2016)] 5 ROTATIONAL SYMMETRY 5

▸ Solution: randomly rotated training samples?

preprocessing? ROTATIONAL SYMMETRY 6

▸ Solution: randomly rotated training samples?

preprocessing? ROTATIONAL SYMMETRY 7

▸ Solution: convolutions on Lie groups

[Cohen & Welling: Group equivariant CNNs (ICML 2016)] [Cohen & Welling: Steerable CNNs (ICLR 2017)] BENEFITS OF SYMMETRY-BASED NEURAL NETWORKS 8

▸ Greatly reduce the number of learnable parameters

▸ Reduce the number of training samples

▸ Improve generalization and regression ability

▸ Provide physically interpretable models

▸ Bring elegance and mathematical transparency

▸ Build on physical principles of symmetries and constraints 9

t

x LORENTZ GROUP 9

t

x COVARIANCE/EQUIVARIANCE 10

Fout → ρout(g)Fout OUTPUTS Fout

F1 F3 F2 INPUTS

F1 → ρ1(g)F1 F2 → ρ2(g)F2 F3 → ρ3(g)F3 LIE GROUP COVARIANCE/EQUIVARIANCE 10

Fout → ρout(g)Fout OUTPUTS Fout

F1 F3 F2 INPUTS

F1 → ρ1(g)F1 F2 → ρ2(g)F2 F3 → ρ3(g)F3 PERMUTATION INVARIANCE 11

Any continuous function can be represented:

2n⋅d n F(x1, …, xn) = ∑ Fk ∑ fki(xi) k=0 ( i=1 )

[Kolmogorov-Arnold representation theorem] PERMUTATION INVARIANCE 11

Any continuous function can be represented:

2n⋅d n F(x1, …, xn) = ∑ Fk ∑ fki(xi) k=0 ( i=1 )

[Kolmogorov-Arnold representation theorem]

Symmetric functions:

̂ F(x1, …, xn) = F ∑ f(xi) ( i )

[Zaheer et al. (Deep Sets, 2017)] COVARIANCE IN THE LITERATURE 12

Group equivariant neural networks (Cohen & Welling, 2016)

Harmonic networks: deep and rotation equivariance (Worrall, Garbin, Turmukhanbetov & Brostow, 2016)

Steerable CNNs (Cohen & Welling, 2017)

On the generalization of convolution and equivariance (Kondor & Trivedi, 2018)

Intertwiners between induced representations (Cohen, Geiger & Weiler, 2018)

3D steerable neural networks (Weiler, Geiger, Wellig, Boomsma & Cohen, 2018)

Gauge equivariant neural networks (Cohen, Weiler, Kicanaoglu, Welling, 2019)

Tensor field networks (Thomas, Smidt, Kearns, Yang, Li Kohlhoff & Riley, 2018)

Relativistic Harmonic Networks (Chase Shimmin @ Boost 2019)

Cormorant: Covariant Molecular Neural Networks (Anderson, Hy & Kondor, 2019) SIMPLE NEURAL NETWORK COOKBOOK 13

INGREDIENTS SIMPLE NEURAL NETWORK COOKBOOK 13

INGREDIENTS

▸ Activations valued in a vector space SIMPLE NEURAL NETWORK COOKBOOK 13

INGREDIENTS

▸ Activations valued in a vector space

▸ learnable linear operation SIMPLE NEURAL NETWORK COOKBOOK 13

INGREDIENTS

▸ Activations valued in a vector space

▸ learnable linear operation

▸ composable nonlinear operation COVARIANT NEURAL NETWORK COOKBOOK 14

INGREDIENTS

▸ Activations valued in vector representations of a group

▸ Equivariant learnable linear operation

▸ Equivariant composable nonlinear operation COVARIANT NEURAL NETWORK COOKBOOK 14

INGREDIENTS

▸ Activations valued in vector representations of a group

▸ Equivariant learnable linear operation (“commutes” with the action of the group) ▸ Equivariant composable nonlinear operation COVARIANT NEURAL NETWORK COOKBOOK 14

INGREDIENTS

▸ Activations valued in vector representations of a group

▸ Equivariant learnable linear operation (“commutes” with the action of the group) ▸ Equivariant composable nonlinear operation (maps representations to representations) EQUIVARIANT LEARNABLE LINEAR OPERATION 15 ℱ I. MIXING IRREDUCIBLES ⏞ F 0 F 0 F 0

AAACEHicbVBNS8NAEN3Ur1q/qh69LBbBU5NUQY8FQTxWMK3QxrLZTtqlm03Y3Yil9DcInvSfeBOv/gP/iGc3bQ629cHA470ZZuYFCWdKO863VVhZXVvfKG6WtrZ3dvfK+wdNFaeSgkdjHsv7gCjgTICnmeZwn0ggUcChFQyvMr/1CFKxWNzpUQJ+RPqChYwSbSTv+sHput1yxak6U+Bl4uakgnI0uuWfTi+maQRCU06UartOov0xkZpRDpNSJ1WQEDokfWgbKkgEyh9Pj53gE6P0cBhLU0Ljqfp3YkwipUZRYDojogdq0cvE/7x2qsNLf8xEkmoQdLYoTDnWMc4+xz0mgWo+MoRQycytmA6IJFSbfOa22J4yki2ZYjYnGp5smWU8MUG5i7Esk2at6p5Va7fnlXotj6yIjtAxOkUuukB1dIMayEMUMfSMXtGb9WK9Wx/W56y1YOUzh2gO1tcvtDKc+Q== AAACEHicbVDLSgNBEOyNrxhfUY9eBoPgKftQ0GNAEI8R3ERI1jA7mU2GzD6YmRXDkm8QPOmfeBOv/oE/4tnZZA8msaChqOqmu8tPOJPKsr6N0srq2vpGebOytb2zu1fdP2jJOBWEuiTmsbj3saScRdRVTHF6nwiKQ5/Ttj+6yv32IxWSxdGdGifUC/EgYgEjWGnJvX6wek6vWrPq1hRomdgFqUGBZq/60+3HJA1ppAjHUnZsK1FehoVihNNJpZtKmmAywgPa0TTCIZVeNj12gk600kdBLHRFCk3VvxMZDqUch77uDLEaykUvF//zOqkKLr2MRUmqaERmi4KUIxWj/HPUZ4ISxceaYCKYvhWRIRaYKJ3P3BbTlVoyBZPM5FjRJ1PkGU90UPZiLMuk5dTts7pze15rOEVkZTiCYzgFGy6gATfQBBcIMHiGV3gzXox348P4nLWWjGLmEOZgfP0Ctdqc+g== ρinv 1 2 AAACEHicbVBNS8NAEN3Ur1q/qh69BIvgqUlaQY8FQTxWMG2hjWWz3bRLN5uwOxFL6G8QPOk/8SZe/Qf+Ec9u2hxs64OBx3szzMzzY84U2Pa3UVhb39jcKm6Xdnb39g/Kh0ctFSWSUJdEPJIdHyvKmaAuMOC0E0uKQ5/Ttj++zvz2I5WKReIeJjH1QjwULGAEg5bcmwe7X++XK3bVnsFcJU5OKihHs1/+6Q0ikoRUAOFYqa5jx+ClWAIjnE5LvUTRGJMxHtKupgKHVHnp7NipeaaVgRlEUpcAc6b+nUhxqNQk9HVniGGklr1M/M/rJhBceSkTcQJUkPmiIOEmRGb2uTlgkhLgE00wkUzfapIRlpiAzmdhi+UqLVmSKWZxDPTJklnGUx2UsxzLKmnVqk69Wru7qDRqeWRFdIJO0Tly0CVqoFvURC4iiKFn9IrejBfj3fgwPuetBSOfOUYLML5+AbeCnPs= 3 Finite-dimensional representations of the Lorentz group are decomposable, i.e. are F 1 F 1 F 1 F 1

AAACEHicbVBNS8NAEN3Ur1q/qh69LBbBU5NUQY8FQTxWMK3QxrLZbtqlm03YnYil9DcInvSfeBOv/gP/iGc3bQ629cHA470ZZuYFieAaHOfbKqysrq1vFDdLW9s7u3vl/YOmjlNFmUdjEav7gGgmuGQecBDsPlGMRIFgrWB4lfmtR6Y0j+UdjBLmR6QvecgpASN51w9u1+2WK07VmQIvEzcnFZSj0S3/dHoxTSMmgQqiddt1EvDHRAGngk1KnVSzhNAh6bO2oZJETPvj6bETfGKUHg5jZUoCnqp/J8Yk0noUBaYzIjDQi14m/ue1Uwgv/TGXSQpM0tmiMBUYYpx9jntcMQpiZAihiptbMR0QRSiYfOa22J42kq245rYgwJ5slWU8MUG5i7Esk2at6p5Va7fnlXotj6yIjtAxOkUuukB1dIMayEMUcfSMXtGb9WK9Wx/W56y1YOUzh2gO1tcvtdyc+g== AAACEHicbVDLSgNBEJyNrxhfUY9eBoPgKftQ0GNAEI8R3ERI1jA76U2GzD6YmRXDkm8QPOmfeBOv/oE/4tnZZA8msaChqOqmu8tPOJPKsr6N0srq2vpGebOytb2zu1fdP2jJOBUUXBrzWNz7RAJnEbiKKQ73iQAS+hza/ugq99uPICSLozs1TsALySBiAaNEacm9frB7Tq9as+rWFHiZ2AWpoQLNXvWn249pGkKkKCdSdmwrUV5GhGKUw6TSTSUkhI7IADqaRiQE6WXTYyf4RCt9HMRCV6TwVP07kZFQynHo686QqKFc9HLxP6+TquDSy1iUpAoiOlsUpByrGOef4z4TQBUfa0KoYPpWTIdEEKp0PnNbTFdqyRRMMpMTBU+myDOe6KDsxViWScup22d15/a81nCKyMroCB2jU2SjC9RAN6iJXEQRQ8/oFb0ZL8a78WF8zlpLRjFziOZgfP0Ct4Sc+w== AAACEHicbVDLSsNAFL2pr1pfVZduBovgqklqQZcFQVxWMG2hjWUynbRDJw9mJmIJ/QbBlf6JO3HrH/gjrp20WdjWAxcO59zLvfd4MWdSWda3UVhb39jcKm6Xdnb39g/Kh0ctGSWCUIdEPBIdD0vKWUgdxRSnnVhQHHictr3xdea3H6mQLArv1SSmboCHIfMZwUpLzs2D3a/3yxWras2AVomdkwrkaPbLP71BRJKAhopwLGXXtmLlplgoRjidlnqJpDEmYzykXU1DHFDpprNjp+hMKwPkR0JXqNBM/TuR4kDKSeDpzgCrkVz2MvE/r5so/8pNWRgnioZkvshPOFIRyj5HAyYoUXyiCSaC6VsRGWGBidL5LGwxHaklUzDJTI4VfTJFlvFUB2Uvx7JKWrWqfVGt3dUrjVoeWRFO4BTOwYZLaMAtNMEBAgye4RXejBfj3fgwPuetBSOfOYYFGF+/utSc/Q== direct sums of irreps. ρ1 1 2 AAACEHicbVBNS8NAEN3Ur1q/qh69BIvgqUlaQY8FQTxWMG2hjWWz3bRLN5uwOxFL6G8QPOk/8SZe/Qf+Ec9u2hxs64OBx3szzMzzY84U2Pa3UVhb39jcKm6Xdnb39g/Kh0ctFSWSUJdEPJIdHyvKmaAuMOC0E0uKQ5/Ttj++zvz2I5WKReIeJjH1QjwULGAEg5bcmwenX++XK3bVnsFcJU5OKihHs1/+6Q0ikoRUAOFYqa5jx+ClWAIjnE5LvUTRGJMxHtKupgKHVHnp7NipeaaVgRlEUpcAc6b+nUhxqNQk9HVniGGklr1M/M/rJhBceSkTcQJUkPmiIOEmRGb2uTlgkhLgE00wkUzfapIRlpiAzmdhi+UqLVmSKWZxDPTJklnGUx2UsxzLKmnVqk69Wru7qDRqeWRFdIJO0Tly0CVqoFvURC4iiKFn9IrejBfj3fgwPuetBSOfOUYLML5+AbksnPw= 3 4

W : R → R′, W ⋅ ρ(g) = ρ′( g) ⋅ W

F 2 F 2 F 2 Schur’s lemma implies that acts as scalar AAACEHicbVDLSgNBEJyNrxhfUY9eBoPgKftQ0GNAEI8R3ERI1jA76U2GzD6YmRXDkm8QPOmfeBOv/oE/4tnZZA8msaChqOqmu8tPOJPKsr6N0srq2vpGebOytb2zu1fdP2jJOBUUXBrzWNz7RAJnEbiKKQ73iQAS+hza/ugq99uPICSLozs1TsALySBiAaNEacm9fnB6dq9as+rWFHiZ2AWpoQLNXvWn249pGkKkKCdSdmwrUV5GhGKUw6TSTSUkhI7IADqaRiQE6WXTYyf4RCt9HMRCV6TwVP07kZFQynHo686QqKFc9HLxP6+TquDSy1iUpAoiOlsUpByrGOef4z4TQBUfa0KoYPpWTIdEEKp0PnNbTFdqyRRMMpMTBU+myDOe6KDsxViWScup22d15/a81nCKyMroCB2jU2SjC9RAN6iJXEQRQ8/oFb0ZL8a78WF8zlpLRjFziOZgfP0Ct4ac+w== AAACEHicbVDLSgNBEJyNrxhfUY9eBoPgKftQ0GNAEI8R3ERI1jA76U2GzD6YmRXDkm8QPOmfeBOv/oE/4tnZZA8msaChqOqmu8tPOJPKsr6N0srq2vpGebOytb2zu1fdP2jJOBUUXBrzWNz7RAJnEbiKKQ73iQAS+hza/ugq99uPICSLozs1TsALySBiAaNEacm9fnB6Tq9as+rWFHiZ2AWpoQLNXvWn249pGkKkKCdSdmwrUV5GhGKUw6TSTSUkhI7IADqaRiQE6WXTYyf4RCt9HMRCV6TwVP07kZFQynHo686QqKFc9HLxP6+TquDSy1iUpAoiOlsUpByrGOef4z4TQBUfa0KoYPpWTIdEEKp0PnNbTFdqyRRMMpMTBU+myDOe6KDsxViWScup22d15/a81nCKyMroCB2jU2SjC9RAN6iJXEQRQ8/oFb0ZL8a78WF8zlpLRjFziOZgfP0CuS6c/A== W ρ 1 2 AAACEHicbVBNS8NAEN3Ur1q/qh69BIvgqUlaQY8FQTxWMG2hjWWz3bRLN5uwOxFL6G8QPOk/8SZe/Qf+Ec9u2hxs64OBx3szzMzzY84U2Pa3UVhb39jcKm6Xdnb39g/Kh0ctFSWSUJdEPJIdHyvKmaAuMOC0E0uKQ5/Ttj++zvz2I5WKReIeJjH1QjwULGAEg5bcm4dav94vV+yqPYO5SpycVFCOZr/80xtEJAmpAMKxUl3HjsFLsQRGOJ2WeomiMSZjPKRdTQUOqfLS2bFT80wrAzOIpC4B5kz9O5HiUKlJ6OvOEMNILXuZ+J/XTSC48lIm4gSoIPNFQcJNiMzsc3PAJCXAJ5pgIpm+1SQjLDEBnc/CFstVWrIkU8ziGOiTJbOMpzooZzmWVdKqVZ16tXZ3UWnU8siK6ASdonPkoEvUQLeoiVxEEEPP6BW9GS/Gu/FhfM5bC0Y+c4wWYHz9ArrWnP0= 3 multiplication on each irrep, and only 2 linearly combines vectors of the same weight.

Activations need to be stored as collections of irreducible components.

F 3 AAACEHicbVBNS8NAEN3Ur1q/qh69BIvgqUlaQY8FQTxWMG2hjWWz3bRLN5uwOxFL6G8QPOk/8SZe/Qf+Ec9u2hxs64OBx3szzMzzY84U2Pa3UVhb39jcKm6Xdnb39g/Kh0ctFSWSUJdEPJIdHyvKmaAuMOC0E0uKQ5/Ttj++zvz2I5WKReIeJjH1QjwULGAEg5bcm4d63+mXK3bVnsFcJU5OKihHs1/+6Q0ikoRUAOFYqa5jx+ClWAIjnE5LvUTRGJMxHtKupgKHVHnp7NipeaaVgRlEUpcAc6b+nUhxqNQk9HVniGGklr1M/M/rJhBceSkTcQJUkPmiIOEmRGb2uTlgkhLgE00wkUzfapIRlpiAzmdhi+UqLVmSKWZxDPTJklnGUx2UsxzLKmnVqk69Wru7qDRqeWRFdIJO0Tly0CVqoFvURC4iiKFn9IrejBfj3fgwPuetBSOfOUYLML5+AbkwnPw= 1 ρ3 EQUIVARIANT NONLINEAR OPERATION 16

II. CLEBSCH-GORDAN PRODUCT

The “only” bilinear equivariant operation mapping two representations to another one is the tensor product

⊗ : R1 × R2 → R3

μ(ρ) ρ (g) ⊗ ρ (g) = C ρ(g) C† 1 2 ρ1,ρ2 ⨁⨁ ρ1,ρ2 ρ 1

Since our linear operation requires knowledge of irreducible components, each product must be followed by a Clebsch-Gordan decomposition. EQUIVARIANT NONLINEAR OPERATION 16

II. CLEBSCH-GORDAN PRODUCT

The “only” bilinear equivariant operation mapping two representations to another one is the tensor product

⊗ : R1 × R2 → R3

μ(ρ) ρ (g) ⊗ ρ (g) = C ρ(g) C† 1 2 ρ1,ρ2 ⨁⨁ ρ1,ρ2 ρ 1

Since our linear operation requires knowledge of irreducible components, each product must be followed by a Clebsch-Gordan decomposition. 17

ARCHITECTURE ARCHITECTURE 18

‣ particles with input 4-momenta μ N pi ‣ N activations ℱi at each level live in representations of the Lorentz group

‣ The update rule involves pair interactions * ⊗2 2 ℱi ↦ W ⋅ ℱi ⊕ ℱi ⊕ ∑ f (pij) ⋅ pij ⊗ ℱj j

‣ Arbitrary traditional sub-networks can be applied to Lorentz invariants

‣ Output layer sums over i and projects onto invariants (or other irrep)

1 (2g )4 ℳ2 = c p ⋅ p + m c2 p ⋅ p + m c2 2 2 2 [ 1 3 e ] [ 2 4 μ ] 4 ((p1 − p3) − mγc )

*IRC Safety to be addressed separately 19

PERFORMANCE INVARIANCE TESTS 20

Rotational invariance Boost invariance

out(θ)/out(0) − 1 0.012 out(γ)/out(1) − 1 2.5 ×10-9 0.010 1 % 2. ×10-9 0.008

1.5 ×10-9 0.006

1. ×10-9 0.004

× -10 5. 10 0.002

2 4 6 8 θ 500 1000 1500 2000 γ

Relative invariance within 10-9 Relative invariance within 10-2 using double precision limited by floating point precision TOP TAGGING CLASSIFIER 21

Dataset: [Butter, Kasieczka, Russell & Russell (Deep-learned Top Tagging with a Lorentz Layer) arXiv: 1707.08966] 22

CLARIANT 0.970 0.922 426 3k

[Kasieczka et. al. (ML Landscape of top taggers, 2019)]

[Xia et. al. (ResNeXt, 2016)] [Moore at. al. (Multi-body N-subjettiness, 2018)] [Qu & Gouskos (ParticleNet, 2019)] [Thaler et. al. (/Particle Flow Polynomials, 2018/19)] [Butter et. al. (LoLa — Lorentz Layer, 2018)] CONCLUSION 23

FUTURE WORK AND POTENTIAL EXTENSIONS

▸ Complete hyperparameter optimization;

▸ Include particle information (label, charge, spin, …);

▸ Regression tasks and measurements:

mass detection,

๏ covariant 4-momentum measurements;

▸ Detection of hidden symmetries;

▸ Multiple symmetries combined (Standard Model?); 24

Physics Department Computer Science Department

Brandon Jan Marwah Risi Kondor David Miller Anderson Offermann Roussi DATASET ENERGY DISTRIBUTIONS 25

Top tagging reference dataset (arXiv:1902.09914) Top tagging reference dataset (arXiv:1902.09914) 106 105 anti-kT, R = 0.8 anti-kT, R = 0.8 Hadronic top decays Hadronic top decays Light quarks and gluons Light quarks and gluons

105 104 Number of jets per bin Number of jets per 10 GeV

400 450 500 550 600 650 700 750 800 2 1 012 pT [GeV] VISUALIZATION OF SAMPLE EVENTS 26 HIERARCHICAL NETWORKS 27 HIERARCHICAL NETWORKS 27

ATLAS 28 INPUT LAYER 28 INPUT LAYER 28 INPUT LAYER

ITERATED CG LAYER 28 INPUT LAYER

ITERATED CG LAYER 28 INPUT LAYER

ITERATED CG LAYER 28 INPUT LAYER

ITERATED CG LAYER ITERATED CG LAYER ITERATED CG LAYER ITERATED CG LAYER ITERATED CG LAYER ITERATED CG LAYER ITERATED CG LAYER

OUTPUT LAYER OUTPUT LAYER CLARIANT