Development of a Performant Defragmentation Process for a Robotic Tape Library Within the CASTOR HSM

Development of a Performant Defragmentation Process for a Robotic Tape Library within the CASTOR HSM by Felix Ehm A thesis submitted in partial fulfillment of the requirements for the degree Diplom-Informatiker (Fachhochschule) at the University of Applied Sciences Wiesbaden Academic Unit of Information Technology Examiner at academy: Prof. Dr. Detlef Richter Examiner at company: Dr. Sebastien Ponce Supervisor: German Cancio-Melio Company: CERN, European Organisation for Nuclear Research Erklärung Hiermit erkläre ich an Eides statt, dass ich die vorliegende Diplomarbeit selbständig und nur unter Verwendung der angegebenen Quellen und Hilfsmittel verfasst habe. Ort, Datum Unterschrift Diplomand Erklärung zur Verbreitungsform Hiermit erkläre ich mein Einverständnis mit den im folgenden aufgeführten Verbrei- tungsformen dieser Diplomarbeit: Verbreitungsform ja nein Einstellung der Arbeit in die Bibliothek der FHW X Veröffentlichung des Titels der Arbeit im Internet X Veröffentlichung der Arbeit im Internet X Ort, Datum Unterschrift Diplomand ”We see only what we know.” Johann Wolfgang von Goethe (1749-1832) German poet. Abstract Modern magnetic tape technology is used at the European Centre for Nuclear Re- search [CERN] for permanent storage of data from high energy physics experiments. CERN is the largest particle physics institute in the world, and with the start of data taking from the Large Hardon Collider [LHC] experiments in 2008 the technical infrastructure for a predicted fifteen petabyes per year of data has to be provided. To ensure high data integrity for the unique particle events, which will be used for analysis in the following decades, the development and maintenance of storage is vital. Several areas for enhancement within the CERN developed HSM system, CASTOR (Cern Advanced STORage Manager), were discovered during the testing phase of this technical infrastructure. In particular the fragmentation of files over several tapes and the effect of this problem on file retrieval time are significant issues and the main topic of this thesis. In this thesis an analysis of these problems is made. Their solutions, in terms of performance improvements applicable to the robotic tape libraries currently used at CERN, are elaborated upon and presented. The implementation of a new defragmentation process is presented. Contents 1 Introduction 1 1.1 ThesisOverview.............................. 4 1.2 StructureofthisDocument . 4 1.3 ValidationofthisDocument . 5 1.4 Acknowledgement............................. 6 1.5 Prerequisites................................ 7 2 The Castor Project 9 2.1 History................................... 10 2.2 GeneralOverview............................. 10 2.2.1 DiskServers............................ 11 2.2.2 TapeLibraries........................... 13 2.2.3 TechnicalInformation . 16 2.3 Architecture................................ 22 2.3.1 TheComponents ......................... 23 2.3.2 TheCentralServices . 25 2.3.3 TheStagerLogic ......................... 26 2.3.4 TheRecallofFiles ........................ 27 2.3.5 TheMigrationofFiles . 28 I Contents 3 What is Repack? 31 3.1 IntroductiontoTapes. .. .. 32 3.1.1 History............................... 32 3.1.2 CurrentUse............................ 32 3.1.3 OrganisationofTapes . 34 3.1.4 OrganisationofFilesonTapes. 36 3.2 TheReasonsinDetail .......................... 37 4 The old Repack 41 4.1 Introduction................................ 41 4.2 Architecture................................ 42 5 Analysis and Design for a new Repack 45 5.1 Limitation................................. 45 5.2 Requirements ............................... 49 6 Alternative Solutions 51 6.1 IBMTivoliTapeOptimizeronz/OS . 51 6.2 NovastorTapeCopy............................ 52 6.3 Result ................................... 52 6.4 Conclusion................................. 53 7 The new Repack 55 7.1 UseCases ................................. 55 7.2 Requirements ............................... 58 7.3 TheDefinitionofaRepackProcess . 59 7.4 Architecture................................ 59 II Contents 7.4.1 SoftwareFramework . 59 7.5 HighLevelDesign............................. 62 7.6 LowLevelDesign ............................. 65 7.6.1 TheDataModel ......................... 65 7.6.2 TheStateDiagram . .. .. 69 7.6.3 RepackClient........................... 69 7.6.4 RepackServer........................... 72 7.7 The modified CASTOR II Stager Workflow . 78 7.7.1 Theaffectedcomponents . 78 7.7.2 RetrievingFilesfromTape. 79 7.7.3 CopyingFilesbacktoTape . 83 8 Optimisation 87 8.1 DefinitionofReadEfficiency. 88 8.2 TapeLoading ............................... 89 8.3 ReadingFilesfromTape . 89 8.4 WritingtoTape.............................. 93 9 Results 95 10 Conclusion 99 A Repack Class Diagram 103 B Repack Sequence Diagram 105 C Stager Catalogue Schema 107 D CD Contents 109 III Contents E Glossary 111 Bibliography 116 Index 116 IV List of Figures 1.1 Illustration of the LHC and the four experiments . .... 2 1.2 The CDC3800 used in 1967 at CERN to process the Swiss election.. 3 2.1 Illustration of data transport between Tier zones . ....... 10 2.2 DatastorageinCASTORHSM . 11 2.3 Disk server network utilisation over one year period . ....... 12 2.4 Rack with disk servers installed at CERN, August 2006 . ..... 13 2.5 Tape server network utilisation over the last year . ...... 15 2.6 The STK SL8500 tape library installed at CERN, August 2006.... 15 2.7 The IBM 3584 tape library installed at CERN, August 2006 . .... 16 2.8 Illustration of the generated code from Umbrello . ...... 19 2.9 Capture of the framework for multi-threaded daemons class diagram . 21 2.10 The overview of the CASTOR II architecture . .. 23 3.1 Illustration of reading / writing data to tape. ..... 35 3.2 The file sequence number for new data . 36 3.3 Illustration of the optimisation effect of repack on tapes........ 39 4.1 SequencediagramforRepackI . 43 7.1 UsecasediagramforRepackclient . 56 V List of Figures 7.2 Theactivitydiagram . .. .. 63 7.3 Repack II interaction with a subset of CASTOR II . .. 65 7.4 TheRepackdatamodel ......................... 68 7.5 The states of a RepackSubRequest . 69 7.6 The detailed overview of Repack and its server components...... 78 7.7 Capture of the states of Stager SubRequests . ... 81 7.8 Repackingtwotapecopies . 82 7.9 The activity diagram of the new NameServer function . ..... 86 8.1 Illustration of the advantage reading files in distance toBOT..... 90 8.2 Illustration of the parameters seek time algortithm . ....... 92 8.3 Illustration of the advantage of sorting the order to writefiles. 94 A.1 The complete Repack class overview . 104 B.1 The complete Repack sequence digram . 106 C.1 The simplified Stager Catalogue schema . 108 VI Chapter 1 Introduction The European Centre for Nuclear Research [CERN] is the largest institute for nuclear physics in the world. Since the 1950s fundamental research in this field of activity has been conducted. Not only has this research influenced the basic pillars of science, but our daily life has been affected by the technological developed for the experiments. The World Wide Web [WWW], initially used to facilitate the sharing and of data and information among researchers, was one of these technologies first presented in 1991 by Tim Berners-Lee [1]. Todays effords greatly exceed technical dimensions of those earlier experiments. The plans for a new generation of experiment facilities were decided at the end of the 1980s. The Large Hadron Collider [LHC] will become the world’s largest particle accelerator. It is being built in a circular tunnel, 27 kilometers in circumference, and buried around 100 to 120 meters underground. It straddles the Swiss and French borders on the outskirts of Geneva. Its purpose is to accelerate particles up to slightly below lightspeed and force them to collide in one of the four High Energy Physics [HEP] experiments ALICE (A Large Ion Collider Experiment), ATLAS (A Toroidal LHC ApparatuS), LHCb (LHC study of Charge Parity) and CMS (Compact Muon Solenoid). Each target at different field of nuclear physics and, like LHC, are currently being constructed. 1 Chapter 1. Introduction Figure 1.1 shows the arragement of these experiments at the LHC tunnel. Figure 1.1: Illustration of the LHC and the four experiments With the start of the experiments in 2007 / 2008 a large amount of data has to be managed. After filtering the raw data from about 40 million particle collisions per second from the detectors, between 100 and 1000 megabytes per second has to be stored permanently. Running the experiments for seven months a year results in about 15 petabytes of permanent storage requirement [2] [3] each year. The Information Technology Department [IT-DEP] at CERN is responsible for pro- viding this storage and analysis infrastructure. The IT-DEP develops in-house solutions if commercial product do not exist. But not only storing this information for 2 Chapter 1. Introduction later analysis is important: the data integrity of the results also has to be ensured for the subsequent decades, as well. For this, magnetic tape technology has been choosen. Since the introduction of computers at CERN, magnetic tape has been used to store data (see Figure 1.2). They have been found to be a reliable and cheap solution. Tape libraries organise tapes and provide a robotic control mechanism to load them into tape drives. They are very expensive and reading data from tape takes much more time than from disk, but compared to a disk solution for the same space, they are

Load more