CERN Science and Technology
Total Page:16
File Type:pdf, Size:1020Kb
Big Data for Big Discoveries How the LHC looks for Needles by Burning Haystacks Alberto Di Meglio CERN openlab Head DOI: 10.5281/zenodo.45449, CC-BY-SA, images courtesy of CERN The Large Hadron Collider (LHC) Burning Haystacks The Detectors: 7000 tons “microscopes” 150 million sensors Data generated 40 million times per second Peta Bytes / sec ! Low-level Filters and Triggers 100,000 selections per second Tera Bytes / sec ! High-level filters (HLT) 100 selections per second Giga Bytes / sec ! Alberto Di Meglio – CERN LHC - HPC & Big Data 2016 3 Storage, Reconstruction, Simulation, Distribution 6 GB/s Alberto Di Meglio – CERN LHC - HPC & Big Data 2016 4 Worldwide LHC Computing Grid Tier-0 (CERN): Tier-2 (68 •Data recording Federations, ~140 •Initial data reconstruction centres): •Data distribution • Simulation • End-user analysis Tier-1 (12 centres): •Permanent storage •Re-processing •525,000 cores •Analysis •450 PB Alberto Di Meglio – CERN LHC - HPC & Big Data 2016 5 1 PB/s of data generated by the detectors Up to 30 PB/year of stored data A distributed computing infrastructure of half a million cores working 24/7 An average of 40M jobs/month An continuous data transfer rate of 4-6 GB/s across the Worldwide LHC Grid (WLCG) Alberto Di Meglio – CERN LHC - HPC & Big Data 2016 6 The Higgs Boson completes the Standard Model, but the Model explains only about 5% of our Universe What is the other 95% of the Universe made of? How does gravity really works? Why there is no antimatter in nature? Exotic particles? Dark matter? Extra dimensions? Alberto Di Meglio – CERN LHC - HPC & Big Data 2016 7 LHC Schedule 2009 2010 2011 2011 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 … 2030? First run LS1 Second run LS2 Third run LS3 HL-LHC FCC? Phase-0 Upgrade Phase-1 Upgrade LHC startup Phase-2 Upgrade (design energy, (design energy, 900 GeV (High Luminosity) nominal luminosity) design luminosity) 50 times more data than today in the next 10 years 50 PB/s out of the detectors 5 PB/day to be stored 7 TeV 14 TeV 14 TeV 14 TeV L=6x1033 cm-2s-1 L=1x1034 cm-2s-1 L=2x1034 cm-2s-1 L=1x1035 cm-2s-1 Bunch spacing = 50 ns Bunch spacing = 25 ns Bunch spacing = 25 ns Spacing = 12.5 ns Alberto Di Meglio – CERN LHC - HPC & Big Data 2016 8 Information Technology Research Areas Data acquisition and filtering Computing platforms, data analysis, simulation Data storage and long-term data preservation Compute provisioning (cloud) Networks Data analytics Alberto Di Meglio – CERN LHC - HPC & Big Data 2016 9 How can we Address the Challenges? Flat Budget Education Technology Collaborations Alberto Di Meglio – CERN LHC - HPC & Big Data 2016 10 Technology Evolution Computing . More performance, less cost . Redesign of DAQ • Less custom hardware, more off-the-shelf Storage CPUs, fast interconnects, optimized SW . Redesign of SW to exploit modern Infrastructure CPU/GPU platforms (vectorization, parallelization) Data Analytics . New architectures (e.g. automata) Alberto Di Meglio – CERN LHC - HPC & Big Data 2016 11 Technology Evolution Computing . Reduce data to be stored • Move computation from offline to online . More tapes, less disks Storage • Dynamic data placements . New media Infrastructure • Object-disks, shingled . Persistent meta-data, in-memory ops Data Analytics • Flash/SSD, NVRAM, NVDIMM Alberto Di Meglio – CERN LHC - HPC & Big Data 2016 12 Technology Evolution Computing . Economies of scale . Large-scale, joint procurement of services and equipment Storage . Hybrid infrastructure models • Allow commercial resource procurement Infrastructure • Large-scale cloud procurement models are not yet clear Data Analytics Alberto Di Meglio – CERN LHC - HPC & Big Data 2016 13 Technology Evolution Computing . Reduce analysis cost, decrease time . More efficient operations of the LHC systems Storage • Log analysis, proactive maintenance . Infrastructure monitoring, optimization Infrastructure . Data reduction, event filtering and identification on large data sets Data Analytics (~10PB) Alberto Di Meglio – CERN LHC - HPC & Big Data 2016 14 Collaborations HEP Community Laboratories, academia, research Industrial Collaboration Frameworks Joint Infrastructure Initiatives Alberto Di Meglio – CERN openlab 15 New educational requirements Multicore CPU Data analysis Applications of programming, technologies, physics to other graphical tools, domains (hadron processors data storage, therapy, etc.), (GPU), visualization, Knowledge multithreaded monitoring, transfer software security, etc. Software & Multidisciplinary Computing Data Scientists applications Engineers Alberto Di Meglio – CERN LHC - HPC & Big Data 2016 16 Take-Away Messages . LHC is a long-term project • Need to adapt and evolve in time . Look for cost-effective solutions • Continuous technology tracking • Aim for “just as good as necessary” based on concrete scientific objectives . Collaboration and education • People is the most important asset to make this endeavour sustainable over time Alberto Di Meglio – CERN LHC - HPC & Big Data 2016 17 EXECUTIVE CONTACT Alberto Di Meglio, CERN openlab Head [email protected] TECHNICAL CONTACT Maria Girone, CERN openlab CTO [email protected] Fons Rademakers, CERN openlab CRO [email protected] COMMUNICATION CONTACT Andrew Purcell, CERN openlab Communication Officer [email protected] Mélissa Gaillard, CERN IT Communication Officer [email protected] ADMIN CONTACT Kristina Gunne, CERN openlab Administration Officer This work is licensed under a Creative Commons Attribution- [email protected] ShareAlike 3.0 Unported License. It includes photos, models and videos courtesy of CERN and uses contents provided by CERN and CERN openlab Alberto Di Meglio – CERN LHC - HPC & Big Data 2016 18.