Alice O2 Upgrade Technical Design Report

ALICE-TDR-XX CERN-LHCC-2013-XX CERN Graphic Charter: use of the outline version of the CERN logo Colour reproduction ClearMarch space 4, 2015 Minimum size The badge version must only be reproduced on a A clear space must be respected around the logo: Print: 10mm plain white background using the correct blue: other graphical or text elements must be no closer Web: 60px Pantone: 286 than 25% of the logo’s width. CMYK: 100 75 0 0 RGB: 56 97 170 Placement on a document Web: #3861AA Use of the logo at top-left or top-centre of a document is reserved for official use. Where colour reproduction is not faithful, or the background is not plain white, the logo should be reproduced in black or white – whichever provides the greatest contrast. The outline version of the logo may be reproduced in another colour in instances of single-colour print. Technical Design ReportThis is an output file created in Illustrator CS3 for the Upgrade of the Online–Offline Computing System The ALICE Collaboration∗ Copyright CERN, for the benefit of the ALICE Collaboration. This article is distributed under the terms of Creative Commons Attribution License (CC-BY-3.0), which permits any use provided the original author(s) and source are credited. O2 TDR tag for comments by the ALICE collaboration and for the internal review. ∗See AppendixD for the list of collaboration members O2 Upgrade TDR i 1 Scope of the document 2 The ALICE Collaboration is preparing a major upgrade which has been presented to the LHCC in an 3 Upgrade Letter of Intent and four Upgrade Technical Design Reports (TDRs) for the Inner Tracking 4 System (ITS), the Time Projection Chamber (TPC), the Muon Forward Tracker (MFT) and the Read-out 5 and Trigger System. 6 This document is the TDR for the Upgrade of ALICE Online and Offline Computing to a new common 2 7 system called O . The ALICE computing model, the functions of the new software framework and the 8 new computing facility proposed to be built at the ALICE experimental area are set out below, together 9 with the schedule of the project and the resources needed. 10 Executive Summary 11 In 2020, the ALICE experiment at CERN will start collecting data with an upgraded detector. This 1 12 upgrade is based on the LHC running conditions after LS2 which will deliver Pb–Pb collisions at up to 27 2 1 13 L = 6 10 cm− s− , corresponding to an interaction rate of 50kHz. The ALICE goal is to integrate a · 1 14 luminosity of 13 nb− for Pb–Pb collisions recorded in minimum bias mode. 15 The main physics topics addressed by the ALICE upgrade are precise measurements of heavy flavour 16 hadrons, low-momentum quarkonia and low mass di-leptons. These physics probes are characterised by 17 a very small signal-to-noise ratio requiring very large statistics and a large background which makes trig- 18 gering techniques very inefficient, if not impossible. In order to keep up with the 50kHz interaction rate, 19 the TPC will also require the implementation of a continuous read-out process to deal with event pile-up 20 and avoid trigger-generated dead time. Compared to RUN 1 and 2, this is significantly more challenging 21 for the online and offline computing systems. The resulting data throughput from the detector has been 22 estimated to be greater than 1TB=s for Pb–Pb events, roughly two orders of magnitude more than in 23 RUN 1. To minimise the cost and requirements of the computing system for data processing and storage, 24 the ALICE Computing Model for RUN 3 is designed for a maximal reduction of the data volume read 25 out from the detector as early as possible during the data-flow. The zero-suppressed data of all collisions 2 26 will be shipped to the O facility at the anticipated interaction rate of 50kHz for Pb–Pb collisions or 27 200kHz for pp and p–Pb. Detector read-out will be activated either by a minimum bias trigger or in a 28 self-triggered or continuous mode. The data volume reduction will be achieved by reconstructing the 29 data in several steps during data taking. For example, the raw data of the TPC (the largest contributor to 30 the data volume) will first be rapidly reconstructed using online cluster finding and a first fast tracking 31 using an early calibration based on average running conditions. Data produced during this stage will be 32 stored temporarily. 33 Taking advantage of the duty factor of the accelerator and the experiment, the second reconstruction stage 34 will be performed asynchronously, using the final calibration in order to reach the required data quality. 2 35 The O facility will be sufficient to perform both the synchronous and asynchronous reconstruction 2 36 during pp data taking. The asynchronous reconstruction will process data archived at the O facility or 37 parked at the Tier 0. However, during Pb–Pb data taking, part of the asynchronous data processing load 38 will be shed to external sites such as Grid Tier 0 and Tier 1s that can provide archiving capability. 2 39 The O facility will be a high-throughput system which will include heterogeneous computing platforms 40 similar to many high performance computing centres. The computing nodes will be equipped with hard- 2 41 ware acceleration. The O software framework will provide the necessary abstraction so that common 42 code can deliver the selected functionality on different platforms. The framework will also support a 43 concurrent computing model for a wide spectrum of computing facilities, ranging from laptops to the 2 44 complete O system. Off-the-shelf open source software conforming to open standards will be used as 1The terms LS1, LS2 and LS3 refer to the LHC Long Shutdowns in 2013-14 and anticipated in 2018-19 and 2023, during which the detector upgrades occur. RUN 1, RUN 2, RUN 3 and RUN 4 refer to the periods of data taking operation of ALICE. ii The ALICE Collaboration 45 much as possible, either for the development tools or as a basis for the framework itself. 46 This TDR describes the ALICE computing model for RUN 3 and 4. An essential aspect is the presentation 47 of a combined ALICE online and offline computing system which has evolved from the systems used 2 48 in RUN 1 and 2. It shows in more detail how the O facility can be viably implemented using proven 49 technologies and the software framework currently under development, given a conservative but realistic 50 budget. 51 A continuous evolution of the Grid has been assumed in line with WLCG projections based on a funding 52 model for RUN 3 and 4, similar to that for RUN 1 and 2. The development schedule is compatible with 53 the starting date for RUN 3 and the projected running scenarios. Adequate safety margins are included 54 in the schedule. Human resources with the required competences and experience are available in the 2 55 institutes contributing to the project. The initial ALICE computing team working on the O project 56 has been reinforced by the involvement of those institutes which successfully contributed to the design, 57 installation and operation of the ALICE DAQ, HLT and offline systems in the past as well as by new 58 member institutes. 59 Contents 60 1 Introduction 1 61 1.1 Physics objectives......................................1 62 1.2 ALICE upgrade.......................................1 63 1.3 Computing upgrade overview................................2 64 1.3.1 Upgrade concept..................................2 65 1.3.2 The ALICE computing model...........................3 2 66 1.3.3 The O software framework............................4 2 67 1.3.4 The O facility...................................4 2 68 1.3.5 The O project resources..............................4 69 1.4 Document summary.....................................4 70 2 Physics programme and data taking scenario7 71 2.1 Physics programme with the upgraded ALICE detector..................7 72 2.2 Integrated-luminosity requirements for Pb–Pb and trigger strategy............8 73 2.3 Integrated luminosity requirements for proton-proton reference runs...........9 74 2.4 ALICE running scenario after Long Shutdown 2...................... 10 75 3 Requirements and Constraints 13 76 3.1 The ALICE detector upgrade................................ 13 77 3.2 Data rates, sizes and throughput.............................. 14 78 3.3 Detector read-out...................................... 15 79 3.4 Local/global processing................................... 16 80 3.5 Physics simulation..................................... 17 81 3.6 Distributed processing and analysis............................. 18 82 4 Computing model 19 83 4.1 Computing model parameters................................ 19 iii iv The ALICE Collaboration 84 4.2 Computing and storage requirements............................ 19 2 85 4.3 Role of the O facility.................................... 20 86 4.4 Differences between pp and Pb–Pb data taking....................... 21 87 4.5 Roles of Tiers........................................ 22 88 4.6 Analysis facilities...................................... 25 89 4.7 Distributed computing................................... 25 90 4.8 Data management...................................... 26 91 4.9 Replication policy...................................... 26 92 4.10 Deletion policy....................................... 27 93 4.11 Conditions data and software distribution......................... 27 94 4.12 Workflows.......................................... 27 95 4.12.1 Calibration and reconstruction........................... 27 96 4.12.2 Simulation..................................... 27 97 4.12.3 Analysis...................................... 28 2 98 5 O architecture 31 99 5.1 Overview.......................................... 31 100 5.2 Data flow.......................................... 34 101 5.2.1 Data model..................................... 34 102 5.2.2 Time Frame size.................................. 35 103 5.2.3 Read-out links................................... 35 104 5.2.4 FLP and EPN.................................... 35 105 5.2.5 The FLP-EPN network............................... 36 106 5.3 Data storage........................................

Load more