<<

Particle identification on the DAMPE experiment

David F. Droz, University of Geneva (on behalf of the DAMPE collaboration)

Deep Learning in the Natural Sciences Hamburg, March 20181 Cosmic Rays

● Very high energy radiation coming from space ○ Composition: protons, atomic nuclei, electrons, gamma rays, neutrinos, … ○ Identified in early 20th century (Nobel Prize 1936) ○ Energies up to 10²¹ eV ● Extrasolar origins (extragalactic?) ○ Supernovae, pulsars, active galactic nuclei, hot gases, ... ● Many open questions ○ Sources ○ Acceleration mechanisms ○ Exact composition ○ Ultra high energies ○ Dark matter

2 3 DArk Matter Particle Explorer - DAMPE ● Space-based cosmic ray observatory of the Chinese Academy of Sciences ○ In operations since December 2015 ○ Planned lifetime: >3 years ● International collaboration ○ CAS China, INFN Italy, UniGE Switzerland ○ Leadership: Purple Mountain Observatory, Nanjing, China. Prof. Jin Chang ● Scientific goals: ○ Measure high-energy spectrum of cosmic electrons and gamma rays ○ Study cosmic rays spectrum and composition ○ High energy gamma rays astronomy ○ Indirect detection of dark matter particles Launch video: https://youtu.be/Iyy_A4cQzgE ■ E.g. annihilation : χ + χ → e + e 4 DAMPE detectors system

5 BGO calorimeter

● Three primary purposes: ○ Energy measurement ○ Imaging 3D profile of particle shower ○ Level 0 trigger ● 14 layers of 22 bismuth germanate bars ○ 7 layers in X direction, 7 in Y

J. Chang & al. arXiv:1706.08453 L. Wu & al. arXiv:1901.00734 6 p/e ratio: O(100). Energy dependent, Particle identification a-priori unknown

PSD for γ rejection

BGO for e/p separation Dr. S. Zimmer, TeVPa 2017, Columbus, USA

7 p/e ratio: O(100). Energy dependent, Event displays a-priori unknown

Electron SIGNAL Monte-Carlo

Particle direction

Proton Monte-Carlo BACKGROUND

8 Electron identification - the classical method

● Electrons have narrow and short showers ○ Combine spread and depth into a single variable ζ ■ Method: Chang, J. et al. Advances in Space Research 42.3 (2008): 431-436. ■ DAMPE CRE flux: Ambrosi, G., et al. Nature 552.7683 (2017): 63. arXiv:1711.10981

Shower depth

9 What about machine learning? Machine learning on DAMPE

● PCA and boosted decision trees had ~similar performances as ζ variable ● What about deep neural networks?

Pattern recognition - Convolutional Neural Multivariate analysis - Multilayers Networks (CNN) perceptron (MLP)

● Very powerful, lots of ongoing ● Simple research and applications ● Can receive physically-meaningful ● Translation/rotation invariance quantities → Easier analysis ● Taking into account the XZ/YZ ● Selectively control amount of planes of calorimeter not trivial information (nr of features) ● Integrating other sub-detectors not trivial

11 The data

12 Data selection

CNN MLP

● Consider each BGO bar as a “pixel”, with ● Extract BGO calorimeter variables that pixel value = energy deposited characterise the event ● Build either one image of 14x22 or two ○ Shower width, spread, energy, angle, … images of 7x22 (XZ view, YZ view) ● Can choose different set of variables ○ From ~30 to ~70 quantities

Both cases: extensive data cleaning, event preselection, and variable normalisation

Train, optimise and evaluate on MC data

13 Architectures

ReLU activation Dropout 20-25% Adam optimiser ~50-60 epochs Hardware: Baobab cluster @ University of Geneva ~1M events per class Nvidia X, Nvidia P100 14 Performances

15 Application: straightforward?

● Significant gain in accuracy, I have all my metrics, problem solved?

16 Application: straightforward?

● Significant gain in accuracy, I have all my metrics, problem solved?

Training data ≠ Application data

● Train/test on simulations, which are not exactly the same as flight data ○ Pretty close match between simulation and reality, still useable to build methods ○ But inaccurate to evaluate e.g. systematic uncertainties with high precision ● Need to evaluate neural networks on real, unlabelled data

17 Mismatch examples

Electron Beam Test vs Monte Carlo Flight data vs Monte Carlo Spread of energy deposition in 1st layer Energy deposited on BGO bars

Number of hits Event count (norm.)

Energy RMS [mm] 18 Application log scale!

● Plot the classifier output as a histogram ○ Flight data, unlabelled ● Recognise background and signal peaks at 0 and 1.

19 Application log scale!

● Plot the classifier output as a histogram ○ Flight data, unlabelled ● Recognise background and signal peaks at 0 and 1. ● … Extrapolate?

20 Application

● Plot the classifier output as a histogram ● Recognise background and signal peaks at 0 and 1. ● … Extrapolate?

Not monotonic behaviour! → Extrapolation not trivial 21 Output layer - role of the sigmoid

● The very last operation is applying the sigmoid function ○ Map the output to the [0;1] space, where 0,1 are usual class labels ○ Limited float accuracy: SIGMOID( 40 ) == 1.0 exactly (in theory would be 0.9999..) ■ Loss of informations if neural network produces large values

22 Output layer - role of the sigmoid

● The very last operation is applying the sigmoid function ● What if we remove it?

23 log scale!

Monotonic behaviour → Can extrapolate 24 Same performances, better usability

Can select pure background or pure signal samples ROC curves overlap

25 Yet...

26 Work in progress: … still not good enough No solution yet

How to estimate e.g. signal efficiency?

● Add correction factor post-NN ? ○ But: introduces new systematic uncertainty ● MLP approach: Remove mismodelled variables ? ○ Improves matching, but decreases performances: MLP may no longer outperform ζ ● CNN approach: Set low energy pixels to zero to cut noise ? ○ No notable improvement ● Improve the Monte-Carlo simulation ? ○ Not trivial, computationally very expensive. Work in progress in that area ● Other ideas? 27 Conclusions

● On academic cases, deep neural networks significantly outperform classical analysis ● Complex application: multivariate techniques sensitive to training/application mismatch ○ Ages old problem of getting good training data ○ Changing the output activation can make a model easier to use ● Training a good model is only a first step

28 Back-up slides

29 The DAMPE collaboration

● Purple Mountain ● INFN Perugia & ● University of Geneva Observatory, CAS, Nanjing University of Perugia ● Institute of High Energy ● INFN Bari & University of Bari Physics, CAS, Beijing ● INFN Lecce & University of Salento ● National Space Science Center, CAS, Beijing ● University of Science and Technology in China, CAS, Hefei ● Institute of Modern Physics, CAS, Lanzhou

30 DAMPE satellite

● Mission : ○ Launch on 2015-12-17. Planned duration: 3 years ○ Smooth operation since 2015-12-30 ● Launch system: Long March 2D ● : Sun-synchronous, 500km periapsis, 97.4° inclination, 95mn period ● Mass: 1400 kg

31 Prof. Xin Wu, TeVPa 2016, CERN 32 Convolutional Network architecture

1. 2x Conv2D, 32 filters, 3x3 kernel, ReLU 2. MaxPool 2x2 3. Dropout 20% 4. 2x Conv2D, 64 filters, 3x3 kernel, ReLU 5. MaxPool 2x2 6. Dropout 25% 7. Conv2D, 128 filters, 3x3 kernel, ReLU, dropout 25% 8. Dense layer, 128, ReLU 9. Batchnorm, 25% dropout 10. Dense Layer, 1, Sigmoid

Adam optimiser, binary crossentropy, learning rate scheduler, batch size 100 Model source: https://www.kaggle.com/adityaecdrid/mnist-with-keras-for-beginners-99457/notebook

33 Multi-layers perceptron architecture

1. Dense, 250 neurons, ReLU, 40% dropout 2. Dense, 150 neurons, ReLU, 30% dropout 3. Dense, 100 neurons, ReLU, 20% dropout 4. Dense, 50 neurons, ReLU, 20% dropout 5. Dense, 1 neuron, sigmoid

Adam optimiser, binary crossentropy, learning rate scheduler, batch size 40

Model resulting from a random gridsearch for hyperparameter optimisation

34