Tesis Doctorals En Xarxa
Total Page:16
File Type:pdf, Size:1020Kb
Coprocessor integration for real-time event processing in particle physics detectors Alexey Pavlovich Badalov http://hdl.handle.net/10803/396128 ADVERTIMENT. L'accés als continguts d'aquesta tesi queda condicionat a l'acceptació de les condicions d'ús establertes per la següent llicència Creative Commons: http://creativecommons.org/licenses/by/4.0/ ADVERTENCIA. El acceso a los contenidos de esta tesis queda condicionado a la aceptación de las condiciones de uso establecidas por la siguiente licencia Creative Commons: http://creativecommons.org/licenses/by/4.0/ The access to the contents of this doctoral thesis it is limited to the acceptance of the use WARNING. conditions set by the following Creative Commons license: http://creativecommons.org/licenses/by/4.0/ 90) - 02 - TESIS DOCTORAL Título Coprocessor integration for real-time event processing in particle physics detectors Realizada por Alexey Badalov en el Centro La Salle – Ramon Llull University y en el Departamento GR-SETAD C.I.F. G: 59069740 Universitat Ramon Llull Fundació Rgtre. Fund. Generalitat de Catalunya núm. 472 (28 472 núm. de Catalunya Generalitat Rgtre. Fund. Fundació Llull Ramon Universitat 59069740 G: C.I.F. Dirigida por Dr. Xavier Vilasis i Cardona Dr. Niko Neufeld C. Claravall, 1-3 | 08022 Barcelona | Tel. 93 602 22 00 | Fax 93 602 22 49 | [email protected] | www.url.edu Coprocessor integration for real-time event processing in particle physics detectors Alexey Badalov 2 Abstract High-energy physics experiments today have higher energies, more accurate sensors, and more flexible means of data collection than ever before. Their rapid progress requires ever more computational power; and massively parallel hardware, such as graphics cards, holds the promise to provide this power at a much lower cost than traditional CPUs. Yet, using this hardware requires new algorithms and new approaches to organizing data that can be difficult to integrate with existing software. In this work, I explore the problem of using parallel algorithms within existing CPU-orientated frameworks and propose a compromise between the different trade-offs. The solution is a service that communicates with multiple event- processing pipelines, gathers data into batches, and submits them to hardware- accelerated parallel algorithms. I integrate this service with Gaudi — a framework underlying the software environments of two of the four major experiments at the Large Hadron Collider. I examine the overhead the service adds to parallel algorithms. I perform a case study of using the service to run a parallel track reconstruction algorithm for the LHCb experiment's prospective VELO Pixel subdetector and look at the performance characteristics of using different data batch sizes. Finally, I put the findings into perspective within the context of the LHCb trigger's requirements. 3 Table of Contents 1 Overview ......................................................................................................................................... 6 2 Large Hadron Collider beauty experiment ........................................................................ 9 2.1 Large Hadron Collider ...................................................................................................... 9 2.2 The LHCb experiment ..................................................................................................... 10 2.3 LHCb Computing ............................................................................................................... 14 3 LHCb software infrastructure ............................................................................................... 18 3.1 Requirements ..................................................................................................................... 18 3.2 Gaudi ..................................................................................................................................... 19 3.3 Software organization .................................................................................................... 21 3.4 Event data model .............................................................................................................. 21 3.5 Gaudi Hive ........................................................................................................................... 22 4 The need for speed .................................................................................................................... 26 4.1 High-luminosity upgrade .............................................................................................. 26 4.2 Computation alternatives.............................................................................................. 27 4.3 Intel Xeon Phi ..................................................................................................................... 27 4.4 GPU programming ........................................................................................................... 28 5 State of the art ............................................................................................................................. 31 5.1 NA62 ...................................................................................................................................... 31 5.2 NA62 NaNet ........................................................................................................................ 32 5.3 ALICE ..................................................................................................................................... 35 5.4 ATLAS .................................................................................................................................... 37 5.5 ATLAS APE .......................................................................................................................... 39 6 Towards a new system ............................................................................................................ 41 6.1 Coprocessor access .......................................................................................................... 41 6.2 Interprocess communication ....................................................................................... 42 6.3 Data management ............................................................................................................ 44 6.4 Gaudi integration .............................................................................................................. 45 6.5 Apache Thrift experiment ............................................................................................. 45 7 Coprocessor Manager ............................................................................................................... 47 7.1 Overview .............................................................................................................................. 47 7.2 Design ................................................................................................................................... 48 7.3 Gaudi Hive considerations ............................................................................................ 52 7.4 Components ........................................................................................................................ 52 8 VELO Tracking on the GPU ..................................................................................................... 58 4 8.1 Track types in LHCb ........................................................................................................ 58 8.2 Track reconstruction ...................................................................................................... 59 8.3 Tracking efficiency ........................................................................................................... 61 8.4 CPU-based VELO Pixel tracking .................................................................................. 61 8.5 GPU-based VELO Pixel tracking .................................................................................. 62 9 Performance evaluation .......................................................................................................... 65 9.1 Hardware setup ................................................................................................................ 65 9.2 Overhead measurement ................................................................................................ 65 9.3 Event batching ................................................................................................................... 67 10 Analysis ..................................................................................................................................... 71 10.1 Performance ....................................................................................................................... 71 10.2 Cost ........................................................................................................................................ 73 11 Conclusion ............................................................................................................................... 75 Bibliography ........................................................................................................................................ 77 5 1 Overview We want to understand the world. For millennia, since times before Socrates, Western science was guided by various versions of the theory of elements; it was thought that all