Priyam Banerjee
Total Page:16
File Type:pdf, Size:1020Kb
IMPLEMENTATION AND ANALYSIS OF XCACHE ON SOUTHWEST TIER 2 CLOUD CLUSTER FOR LARGE HADRON COLLIDER ATLAS EXPERIMENT by PRIYAM BANERJEE Presented to the Faculty of the Graduate School of The University of Texas at Arlington in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE IN COMPUTER SCIENCE THE UNIVERSITY OF TEXAS AT ARLINGTON MAY 2019 1 THE UNIVERSITY OF TEXAS AT ARLINGTON May 2019 Copyright © by Priyam Banerjee 2019 All Rights Reserved 2 Acknowledgements I would like to extend my heartfelt thanks to my supervisor, Professor David Levine, who inspired and motivated me to work on this thesis. This thesis would not have been possible without his encouragement and support. I extend my sincere gratitude to Prof. Levine for his invaluable support and supervision. A big thanks to Mr. Patrick McGuigan from the Dept. of Physics, University of Texas at Arlington. Patrick has been an exceptional guide who left no stone unturned to explain me the concepts of High Throughput Computing. In short, this research could not have seen the light of the day without Patrick’s constant support. Sincerest gratitude to my committee member Prof. Ramez Elmasri, and the Department of Computer Science and Engineering, University of Texas Arlington, for their invaluable cooperation and support. Special thanks to Dr. Kaushik De (Dept. of Physics, University of Texas at Arlington), Mr. Andrew Hanushevsky and Mr. Wei Yang (SLAC National Accelerator Laboratory). Last but not the least, I would like to thank my Grandma (Dinda) and Late Grandpa (Dadu) who have always encouraged me to strive for excellence and perfection. March 5, 2019 3 To Maa and Moni... 4 Abstract IMPLEMENTATION AND ANALYSIS OF XCACHE ON SOUTHWEST TIER 2 CLOUD CLUSTER FOR LARGE HADRON COLLIDER ATLAS EXPERIMENT Priyam Banerjee, MS The University of Texas at Arlington, 2019 Supervising Professor : David Levine Committee Members : Ramez Elmasri, Patrick McGuigan (Dept. of Physics) The ATLAS Experiment is one of the four major particle-detector experiments at the Large Hadron Collider at CERN (birthplace of the World Wide Web). The ATLAS was one of the LHC experiments that successfully demonstrated the discovery of the Higgs-Boson in July 2012. At the end of 2018, CERN data archiving on tape-based drives reached 330 PB. Through the Worldwide LHC Computing Grid (WLCG), a distributed computing infrastructure, the calibrated data out of the particle accelerator is split in chunks and distributed all around the world for analysis. The WLCG runs more than two million jobs per day. At peak rate of data capturing, 10 GB of data is transferred per second and might require immediate storage. The workflow management system known as PanDA (for Production and Distributed Analysis) handles the data analysis jobs for LHC’s ATLAS Experiment. The University of Texas Arlington hosts two compute and storage data-centers together known as the SouthWest Tier II to process the ATLAS 5 data. SouthWest Tier II (SWT2) is a consortium between The University of Texas at Arlington and Oklahoma University. This thesis focuses on finding an efficient way to compensate and optimize the available hardware specification(s) with a caching mechanism. The Caching mechanism (called Xcache), which uses XROOTD system and ROOT Protocol is installed at one of the machines of the cluster. The machine acts as a File Caching Proxy Server (with respect to the network) which redirects incoming client requests for data files over to the Redirector at CPB (Chemistry-Physics-Building, UT Arlington), thereby, acting as a direct mode proxy. In this process, the machine caches data files into its storage space (106 TB), and can be reused by Caching (Disk Caching Storage). This research focuses on the adoption of Xcache into the cluster and finding the network dependencies, performance parameters of the cache (Cache Rotation using High and Low Watermark, Bytes Input and Bytes Output for monitoring the network). Therefore, a proxy caching mechanism (Xcache) used to address bandwidth and access-latency (reduced network traffic) is also used to optimize the storage servers. 6 Table of Contents Acknowledgements...................................................................................................................................3 Abstract.....................................................................................................................................................5 List of Illustrations..................................................................................................................................10 Chapter 1 : Introduction..........................................................................................................................13 About CERN.......................................................................................................................................13 ....................................................................................................................................................14 Scientific Achievements at CERN......................................................................................................14 Nobel Prize and CERN : A close association......................................................................................14 Storage and Computing at CERN.......................................................................................................15 The Large Hadron Collider.................................................................................................................16 Working of the Large Hadron Collider................................................................................................17 Achievements of LHC so far :.............................................................................................................18 Computing and storage upgrades during Long Shutdown 1 for Run 2...............................................18 High Luminosity Large Hadron Collider (HL-LHC)..........................................................................19 Plans for ATLAS in the next two years during Long Shutdown 2......................................................21 Chapter 2 : The ATLAS Experiment.......................................................................................................22 ATLAS Detector..................................................................................................................................23 Inner Detector......................................................................................................................................25 Pixel Detector Specifications [28]..................................................................................................26 Semiconductor Tracker Specifications [28]....................................................................................26 Transition Radiation Tracker Specifications [28]...........................................................................27 Calorimeter..........................................................................................................................................27 Tile Calorimeter (TileCal) Specifications [29]...............................................................................28 Liquid Argon (LAr) Calorimeter Specifications [30].....................................................................28 Muon Spectrometer.............................................................................................................................29 Magnet System....................................................................................................................................29 Barrel Toroid Specifications [31]...................................................................................................29 End-cap Toroid Specifications [31]................................................................................................30 Central Solenoid Magnet Specifications [31].................................................................................30 Trigger and Data Acquisition System..................................................................................................31 ATLAS Computing..............................................................................................................................32 Worldwide LHC Computing Grid.......................................................................................................32 Atlas Distributed Data Management...................................................................................................33 ATLAS Computing Model (Tier-based Architecture).........................................................................34 Tier 0 : CERN Resources................................................................................................................34 Tier 1 : National Resources (Based on participating countries).....................................................34 Tier 2 : Regional Resources (smaller in storage and compute capacity than Tier 1).....................35 Tier 3 : Local Resources.................................................................................................................36 GRID FTP (For File Transfer).............................................................................................................37 Globus File Sharing........................................................................................................................38