<<

PoS(ICRC2019)270 http://pos.sissa.it/ b † 15%. Recently, great progress has been made in ≈ for the Pierre Auger Collaboration a ∗ http://www.auger.org/archive/authors_icrc_2019.html [email protected] Speaker. for collaboration list see PoS(ICRC2019)1177 multiple fields of machine learning using deep neuraling networks these and new techniques associated to techniques. air-shower Apply- physics opensincluding up possibilities an for estimation improved reconstruction, of theconvolutional cosmic-ray neural composition. networks In can this be contribution, useddata. we for The show air-shower focus that reconstruction, of deep using the surface-detector machine-learningIn algorithm contrast is to to traditional reconstruct reconstruction depths methods,formation of the from shower the algorithm maximum. signal learns and to arrival-time distributions extract ofneural-network the the architecture, essential secondary describe particles. the in- training, We present and the assessshowers. the performance using simulated air The surface detector array ofers the induced Pierre by Auger ultra-high Observatory energy measures cosmicsensitive the rays. to footprint the The of reconstruction cosmic-ray air of mass, show- event-by-eventdetector is information observations a with challenging their task duty and cycle of so far mainly based on fluorescence † ∗ Copyright owned by the author(s) under the terms of the Creative Commons c 36th International Conference —24 ICRC2019 July – 1 August, 2019 Madison, Wisconsin, USA Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0). Jonas Glombitza Full author list: RWTH Aachen University, Aachen, Germany Observatorio Pierre Auger, Av. San Martín NorteE-mail: 304, 5613 Malargüe, Argentina Air-Shower Reconstruction at the based on Deep Learning a b PoS(ICRC2019)270 ], 10 ] was 1 Jonas Glombitza ] where humans 12 , 11 enables an estimation of the cosmic- ]. 9 , max ] recognition, using deep networks paved 4 X 8 . This surface detector grid is overlooked by 27 2 2 ]. The basic elements of deep learning are neural net- can be directly measured with the fluorescence detector 5 ] and speech [ , 7 3000 km 4 max ≈ X ], pattern [ 6 eV leading to extensive air showers when impacting the Earth’s atmo- 18 ]. The observable 3 15%. Alternatively, measuring the shower maximum using the SD is a challenging ≈ ] (SD) formed by 1660 water-Cherenkov stations placed in a hexagonal grid, which measure Recently, great progress has been made in machine learning using deep neural networks and The symmetric placement of sensors in a regular grid is ideally suited for the use of deep- Apart from the total signal and the signal start time as used in the standard reconstruction [ The air-shower development is especially dependent on the energy and the mass of the primary In this contribution we show that using deep networks enables the reconstruction of several The search for the origin of ultra-high energy cosmic rays (UHECRs) is one of the greatest 2 the way for results of previously unreachable precision [ the machine-learning approach reliesthe on water-Cherenkov the stations. time-dependent In distributions contrast of to shower-universality particles analyses traversing [ associated techniques (deep learning) [ learning techniques. Besides using pattern recognition forfootprint, analyzing the the arrival-time spatial distribution characteristic of of the the secondarythe particles measured is signal encoded (signal in trace) the in timeparticles each evolution water-Cherenkov of of station. the Recognizing electromagnetic single componentencourages and the to use infer of the techniques longitudinal of speech shower recognition development or further natural language processing. (FD). However, measurements of the FDduty are cycle only to possible on moonlesstask, nights, since which it reduces has the to be inferred indirectly from the secondary particles recorded on the ground. works holding adaptive weights, which canThe be use adjusted of to the graphics task processinghave at units recently hand made (GPUs), by it exploring large a possible amounts datapecially to of set. in train data computer deep vision and networks [ the which latest hold developments several million parameters. Es- cosmic ray, causing greaterHence, depth the of atmospheric depth shower of maximum the for shower maximum lighter masses and higher energies. air-shower properties by combining spatial andwe time introduce information the of deep-convolutional the network detected architecturewe air show and shower. that discuss First, the the network training. isreconstruction Subsequently, capable of of the reconstructing primary the particle basic energy. shower Finally, we geometry show and that providing the a deep-convolutional neural fluorescence telescopes which measure the light emissionextensive of cascade. nitrogen excited by the particles of the ray composition [ parametrize the signal and arrival-time distributionsthe deep-learning of algorithm secondary discovers particles the using best analytic parametrization models, itself. the footprint of extensive airtance showers of induced 1500 by m UHECRs. covering an The area stations of are separated by a dis- challenges of astroparticle physics. Studyingsource the candidates mass and of to these understand particles theprimary helps acceleration energies to mechanisms localize of at the possible 10 sources. UHECRs exceed Deep Learning 1. Introduction sphere. To allow forcompleted accurate in measurements 2008 of featuring UHECRs, a hybrid thetor design. [ Pierre The Auger baseline Observatory of the [ Observatory is the surface detec- PoS(ICRC2019)270 max X Jonas Glombitza coordinate of each event, (b) y and x 13 stations to reduce memory consumption. × 3 , we observed better training stability and faster convergence. To ] 1 − , 1 0) of our label distributions. [ ]. As a measured footprint usually has a size of several tens of square = 13 (a) µ . , 1 1 = σ As the placement of the stations shows slight deviations from a perfect grid, we use the coor- Due to the hexagonal grid of the surface detector, we cannot make direct use of convolutions In order to stabilize the training of the deep network, we used data preprocessing. By normal- The air-shower footprint triggers several water-Cherenkov stations, thus inducing a character- into the respective local coordinate system and calculate the deviation with respect to a perfect dinates of each station as additional inputs. We first transform the The station with the largestlocal signal coordinate is system. used as the center of this frame and defines the origin of the which rely on Cartesianpatterns representations. using an In axial order representation,alignment to by is use padding achieved convolutions, with [ we zeros transform and our shifting each signal second row until 2.1.1 Signal patterns and coordinates kilometers, the pattern is cropped to a window of 13 2.1 Preprocessing izing the input variables to Figure 1: (a) Measured surface-detector signal(b) pattern, Simulated induced time by traces a measured simulated at air one shower. surface-detector station. further enforce all mappings in thenormalization neural ( network to act on the same scale, we adopted a standard Deep Learning network provides promising results in the reconstruction of the depth of the shower maximum istic signal pattern on the hexagonalwhich SD encodes grid. a Each time-dependent of density theillustrated of triggered in stations particles Figure provides traversing a the signal station. trace The resulting data are on an event-by-event level. 2. Surface detector data PoS(ICRC2019)270 of i , 0 (2.1) (2.2) t Jonas Glombitza is linearly re-scaled ) t ( i S that can be digitized by the sat S , which is the spread of the distri- C σ broken) as additional input. During , the standard deviation of the arrival . ] = ] 1 . 1 data set , + t ) + t σ ( sat center i τ S S [ [ − 10 1. The signal amplitude 4 10 train data set i , , t 0 t σ log log ) + t of the respective station we use standard normalization ( = i z i S , ) = 0 t ˜ t ( i ready to measure, 0 ˜ S = ]. The first stage is used for the characterization of the time trace, hence using the minimum and maximum signal level ] measured at the central station and 13 0;1 0), otherwise the relative height would be calibrated out, which would cause shifts [ center = τ z µ , 1 Due to aging effects, bad calibrations or broken PMTs, some of the water-Cherenkov stations The network design for reconstructing cosmic-ray induced air showers features three stages The first part relies on characterizing the measured signal trace. Investigating the time-dependent The arrival direction of the primary particle can be reconstructed using the arrival times As the number of particles measured in each detector is approximately exponentially dis- = z σ may not be ready to measuremap real encoding data. In the order station to states let (1 the network correct for this, we add a feature and is based on [ To allow for positive values only we use the shower front at the stations.arrival time For each station, the arrival time is normalized with respect to the 2.1.4 Station states the training, we further imitate brokeninformation stations to by the randomly feature masking map stations of and stations propagating states. this 3. Deep network for air shower reconstruction signals allows for measuring theshower content which of each the induce differentformation and signal electromagnetic about components shapes. the of shower The the age, air separation energy of and the the components primary yields mass. in- To let the network exploit local learning features which includeallows information forming about spatial features. the In shower theused. development. final stage, several The sub-parts following for each part reconstruction task are 3.1 Time trace characterization into the range simulated data-acquisition system. 2.1.3 Arrival time time distribution: Deep Learning grid. Additionally, we normalize the deviationbution by of dividing grid by deviations. For the( altitude of the observed atmospheric depth.shower core In is transformed order in to the use same way. the same2.1.2 representations, the Signal traces position of the tributed, we use a logarithmic re-scaling of the trace: PoS(ICRC2019)270 ] 14 ◦ 0 ◦ ◦ 45 315 = 0.293 m = 36.851 m µ σ Jonas Glombitza samples = 36416 ◦ ◦ φ (b) 90 270 50 m 100 m 150 m ] and pooling operations. In total, 6 ◦ ◦ 225 135 ◦ 180 5 2.0 helium oxygen iron ] 1.5 ◦ [ δ ]. This stabilizes the training process and provides important physics 15 1.0 (a) angular distance 0.5 ] as activation functions and three convolutional layers with varying kernel sizes. To 5 ] to normalize the data between the layers. 0.0 0 16

500

The final part of the neural-network architecture consists of individual network parts for each After the characterization of the signal traces, the extracted features are used as input for the 1000 2500 2000 1500 3000 N we used two residual blocks coveringtwo three layers residual using units a rectified each. linear Each unit of in the between. residual As units the last contains layer we used global average pooling. reconstruction task. This last stage is based on residual units [ and separable convolutions [ 3.3 Task-related part parameters like the arrival time inmuch each layer. disturbance We in observed the that training. using batch Hence,units normalization [ as adds activation too functions we used scaled exponential linear second part and concatenated with theordinates additional and feature the maps station holding states. the The arrival following times, network station structure co- is based on densely connected [ stabilize the training and to increasespatial generalization dimensions. performance, Hence, we the use convolutional weightduces operation the sharing works time along on trace the the with a time lengthfeatures dimension of are only 251 extracted and time which re- steps, allows to thepart. 10 concatenation features. of In station-dependent each station inputs the in same the type following of 3.2 Spatial representation Deep Learning Figure 2: (a) Angularcolors indicate distance different between primaries. theThe reconstructed network (b) provides and Azimuth on the average dependence an true unbiased of estimate shower the of axes. shower-core the core reconstruction. position Different (yellow marker). features in the signal traces,linear e.g., units sharp [ edges or spikes, we apply convolutions. We use rectified PoS(ICRC2019)270 iron iron 2 2 160 EeV 10 10 − 000 showers 33 when the . , Jonas Glombitza ] with the Ten- oxygen oxygen 19 ] using the hadronic 17 shows the reconstruction (b) E [EeV] E [EeV] helium helium 2b . We used 360 ] 1 1 ◦ 10 10 65 , 000 for the final evaluation. ◦ , 0 [ proton proton ∈ θ , the angular distance between the recon-

5 0

–5 EeV /

µ 2a

0.2 0.1 0.0 E E / σ Counts ] was used with its default settings. We further 6 50 300 250 200 150 100 21 2 10 ]. The data is simulated in an energy range of 1 = 3.138 EeV 18 = –0.347 EeV σ µ samples = 36416 000 air showers simulated with CORSIKA [ . As primary particles, , helium, oxygen and iron nuclei are , 1 000 for the validation and around 37 / EeV) − , E 400 true (a) ≈ (E 10 1 log 10 . The colors represent different primary particles. Figure ] backend. As optimizer Adam [ ◦ 8 . 20 0 7 h with a batch size of 32. For training the network, we used Keras [ = 0 1 2

Before investigating the reconstruction of complex shower variables like the shower maxi- We used in total We trained the model for 27 epochs on a Nvidia GeForce GTX 1080 graphics card which

10 10 10

10 DNN 68 EeV) / (E log q structed and the true showerof axis is illustrated, showing a good angular resolution (68% quantile) mum, the network needs to beerties. capable For of the providing validation a of reconstructionity the of of deep-learning the the based basic reconstruction, reconstructed air-shower we prop- shower first geometry. investigated the In qual- Figure smoothly decayed the learningvalidation rate loss and did not dropped decrease the aftererror three learning and epochs. re-weighted rate As the by loss losses for to a all the factor tasks, same we of order used of 1 the magnitude. mean4.2 squared Geometry reconstruction used covering a full azimuth range and a zenith range of took Figure 3: (a) Correlation betweenresolution reconstructed of and the true reconstruction energies. shown (b) for different Energy-dependent primaries. bias and 4. Performance in simulations interaction model EPOS-LHC [ following a spectrum of for training the network, 4 4.1 Training sorFlow [ Deep Learning PoS(ICRC2019)270 iron iron . 2 2 10 10 max at 3 EeV to X 2 Jonas Glombitza cm oxygen oxygen / and the reconstruction 40 g ≈ E [EeV] E [EeV] at the highest energies. (b) max helium helium 2 X 1 1 10 10 cm / proton proton 25 g 37 m. Further, no azimuth-dependent ≈ ≈

max 0 0 σ

20 40 20 X cm g /

–20 σ shows the reconstruction of the energy. The –2 cm g /

µ reconstruction. σ –2 , the reconstructed shower maxima are shown 3a 4a max the different resolutions and biases are plotted for 7 X . The energy dependence of the resolution and bias 3b 2 cm / 10% at around 100 EeV. –2 –2 –2 –2 ≤ 30 g E nans = 0 / 1200 ≈ E corr = 0.861 σ ] = –0.880 g cm = 30.337 g cm samples = 36416 –2 µ σ mae = 23.38 g cm rmse = 30.35 g cm [g cm 1000 is investigated. In Figure for different primaries. The bias decreases from (a) max 4b X max,true X 800 8. It can be seen that the elements show separation in the reconstructed shower max- at 150 EeV. Due to the decreased shower-to-shower fluctuations, the reconstruction proton helium oxygen iron . 2 cm / 600

With the precision described previously, the network is able to reconstruct the air-shower ge- After validating the basic reconstruction of the shower geometry, the reconstruction of the 900 800 700 600

1200 1100 1000 max,DNN

X ] gcm [g 25 g –2 Figure 4: (a) Correlation between reconstructed and true depths of shower maximum for proton, helium, oxygen and iron. The correlation between the true ≤ of heavier elements shows an improved resolution of is shown in Figure (b) Energy-dependent resolution and bias of the amounts to 0 ima, and hence the algorithmresolution can of be the reconstruction used is for an estimation of the cosmic-ray mass. The overall overall reconstruction shows no bias. In Figure ometry and estimate theshower primary maximum cosmic-ray energy. In the following the reconstruction of the Deep Learning of the shower core. The absolute resolution amounts to different primaries. The composition bias dependsThe on the relative energy energy and increases resolution forachieves higher improves a energies. relative for resolution higher of energies due to decreasing fluctuations and 4.4 Reconstruction of the shower maximum primary cosmic-ray energy can be studied. Figure bias is visible which implies that the network is able to correct for4.3 shower asymmetry Reconstruction effects. of the energy PoS(ICRC2019)270 . . 580 . . Jonas Glombitza tensorflow.org . arXiv/1706.02515 Nucl. Instrum. Meth. A (2015) 211 Nucl. Instrum. Meth. A . 115 arXiv/1610.02357 . arXiv/1306.0121 . IJCV . . arXiv/1608.06993 (2015) 436 . . (2014) 1 arXiv/1512.03385 521 12 github.com/fchollet/keras 8 (2017) 46 Nature showed very promising results. Further validating the 88 arXiv/1412.6980 max X Phys. Rev. D., . . Astropart. Phys., (2018) 46 (2008) 409 97 . 586 . . (1982) 1461 . 8 Astropart. Phys. (2015) 172 Representations (ICLR), San Diego (2015), Showers. J. Phys. G. detectors and its applications, Report FZKA (1998) 6019. arXiv/1505.04597 (2007) 1485 Nucl. Instrum. Meth. A Observatory: Composition Implications, 798 Extensive air showers induce time-dependent signal patterns at the surface detector of the [4] I. Goodfellow, Y. Bengio, A.[5] Courville, Deep Learning,Y. LeCun, MIT Y. Bengio, Press, G. Cambridge, Hinton,[6] MA, Deep US Learning, (2016). K. He et al., Deep[7] residual learningO. for Ronneberger, image P. Fischer, recognition, T. Brox, U-Net: Convolutional Networks for Biomedical Image Segmentation, [3] A. Aab et al. [Pierre Auger Collaboration], Depth of Maximum of Air-Shower Profiles at the Pierre Auger [2] A. Aab et al. [Pierre Auger Collaboration], The Surface Detector System of the Pierre Auger Observatory, [8] D. Yu, L. Deng, Automatic[9] Speech Recognition:O. A Russakovsky Deep et Learning al., Approach, Imagenet Springer, large London, scale UK visual (2014). recognition challenge, [1] A. Aab et al. [Pierre Auger Collaboration], The Pierre Auger Cosmic Ray Observatory, [21] D. P. Kingma, J. Ba, Adam: A Method for Stochastic Optimization, Proc. 3rd Int. Conf. on Learning [10] S. Argiro et al., The Offline software framework of the Pierre Auger Observatory, [20] M. Abadi et al., TensorFlow: Large-scale machine learning on heterogeneous systems (2015), [13] M. Erdmann, J. Glombitza, D. Walz, A Deep Learning-based Reconstruction of Cosmic Ray-induced Air [11] A. M. Hillas, Angular and energy distributions of charged particles[12] in the electronM. Ave, M. cascades Roth, in A. air, Schulz, A generalized description of the time dependent signals in extensive air shower [14] G. Huang et al., Densely[15] Connected ConvolutionalF. Networks, Chollet, Xception: Deep Learning[16] with DepthwiseG. Separable Klambauer, Convolutions, T. Unterthiner, A.[17] Mayr, S. Hochreiter,D. Self-Normalizing Heck Neural et Networks, al., CORSIKA: A Monte Carlo code to[18] simulate extensive airT. Pierog showers, et Forschungszentrum al., Karlsruhe EPOS[19] LHC :F. test Chollet, of Keras: collective Deep hadronization Learning with for LHC Python data, (2015), Pierre Auger Observatory. To reconstruct theneural properties network. of the The primary presented cosmicWe ray algorithm have we relies used discussed on a the convolutions deep exploring intrinsictions huge structure in amounts of a of our first data. stage. model,features In which in the learns second space time-dependent stage and representa- the time.the network geometry allows We for and have the energy demonstrated extraction of that andreconstruction the of combination deep air the of learning shower shower using maximum can simulated be surface-detector used data. to In reconstruct addition, the Deep Learning 5. Conclusion performance with fluorescence detector measurements,action and models, investigating would different allow hadronic for inter- a mass-composition estimation with the fullReferences SD event statistics.