An Investigation of a High Availability DPM-Based Grid Storage Element

Total Page:16

File Type:pdf, Size:1020Kb

An Investigation of a High Availability DPM-Based Grid Storage Element An investigation of a high availability DPM-based Grid Storage Element Kwong Tat Cheung August 17, 2017 MSc in High Performance Computing with Data Science The University of Edinburgh Year of Presentation: 2017 Abstract As the data volume of scientific experiments continues to increase, there is an increasing need for Grid Storage Elements to provide a reliable and robust storage solution. This work investigates the limitation of the single point of failure in DPM’s architecture, and identifies the components which prevent the inclusion of using redundant head nodes to provide higher availability. This work also contributes a prototype of a novel high availability DPM architecture, designed using the findings of our investigation. Contents 1 Introduction 1 1.1 Big data in science . .1 1.2 Storage on the grid . .2 1.3 The problem . .2 1.3.1 Challenges in availability . .2 1.3.2 Limitations in DPM legacy components . .3 1.4 Aim . .3 1.5 Project scope . .3 1.6 Report structure . .4 2 Background 5 2.1 DPM and the Worldwide LHC Computing Grid . .5 2.2 DPM architecture . .6 2.2.1 DPM head node . .6 2.2.2 DPM disk node . .7 2.3 DPM evolution . .8 2.3.1 DMLite . .8 2.3.2 Disk Operations Manager Engine . .9 2.4 Trade-offs in distributed systems . 10 2.4.1 Implication of CAP Theorem on DPM . 10 2.5 Concluding Remarks . 11 3 Setting up a legacy-free DPM testbed 12 3.1 Infrastructure . 13 3.2 Initial testbed architecture . 13 3.3 Testbed specification . 13 3.4 Creating the VMs . 14 3.5 Setting up a certificate authority . 15 3.5.1 Create a CA . 15 3.5.2 Create the host certificates . 16 3.5.3 Create the user certificate . 17 3.6 Nameserver . 17 3.7 HTTP frontend . 19 3.8 DMLite adaptors . 19 3.9 Database and Memcached . 19 i 3.10 Creating a VO . 19 3.11 Establishing trust between the nodes . 20 3.12 Setting up the file systems and disk pool . 20 3.13 Verifying the testbed . 22 3.14 Problems encountered and lessons learned . 23 4 Investigation 24 4.1 Automating the failover mechanism . 24 4.1.1 Implementation . 25 4.2 Database . 26 4.2.1 Metadata and operation status . 26 4.2.2 Issues . 27 4.2.3 Analysis . 27 4.2.4 Options . 29 4.2.5 Recommendation . 30 4.3 DOME in-memory queues . 32 4.3.1 Issues . 32 4.3.2 Options . 34 4.3.3 Recommendation . 36 4.4 DOME metadata cache . 37 4.4.1 Issues . 38 4.4.2 Options . 38 4.4.3 Recommendation . 38 4.5 Recommended architecture for High Availability DPM . 39 4.5.1 Failover . 41 4.5.2 Important considerations . 41 5 Evaluation 43 5.1 Durability . 43 5.1.1 Methodology . 43 5.1.2 Findings . 44 5.2 Performance . 44 5.2.1 Methodology . 44 5.2.2 Findings . 45 6 Conclusions 48 7 Future work 50 A Software versions and configurations 51 A.1 Core testbed components . 51 A.2 Test tools . 51 A.3 Example domehead.conf . 51 A.4 Example domedisk.conf . 52 A.5 Example dmlite.conf . 53 ii A.6 Example domeadapter.conf . 53 A.7 Example mysql.conf . 53 A.8 Example Galera cluster configuration . 54 B Plots 55 iii List of Tables 3.1 Network identifiers of VMs in testbed . 14 iv List of Figures 2.1 Current DPM architecture . .6 2.2 DMLite architecture . .8 2.3 Simplified view of DOME in head node . .9 2.4 Simplified view of DOME in disk node . 10 3.1 Simplified view of architecture of initial testbed . 14 4.1 Failover using keepalived . 25 4.2 Synchronising records with Galera cluster . 30 4.3 Remodeled work flow of the task queues using replicated Redis caches . 37 4.4 Remodeled work flow of the metadata cache using replicated Redis caches 39 4.5 Recommended architecture for High Availability DPM . 40 5.1 Plots of average rate of operations compared to number of threads . 46 B.1 Average rate for a write operation . 56 B.2 Average rate for a stat operation . 56 B.3 Average rate for a read operation . 57 B.4 Average rate for a delete operation . 57 v Acknowledgements First and foremost, I would like to express my gratitude to Dr Nicholas Johnson for supervising and arranging the budget for this project. Without the guidance and moti- vation he has provided, the quality of this work would certainly have suffered. I would also like to thank Dr Fabrizio Furano from the DPM development team for putting up with the stream of emails I have bombarded him with, and for answering my queries on the inner-workings of DPM. Chapter 1 Introduction 1.1 Big data in science Big data has become a well-known phenomenon in the age of social media. The vast amount of user generated contents has undeniably influenced the research and advance- ment in modern distributed computing paradigms [1][2]. However, even before the advent of social media websites, researchers in several scientific fields already faced similar challenges in dealing with a massive amount of data generated by experiments. One such field is high energy physics, including the Large Hadron Collider (LHC) ex- periments based at the European Organization for Nuclear Research (CERN). In 2016 alone, it is estimated that 50 petabytes of data were gathered by the LHC detectors post- filtering [3]. Since the financial resources required to host an infrastructure that is able to process, store, and analyse the data is far too great for any single organisation, the experiments turned to the grid computing approach. Grid computing, which is mostly developed and used in academia, follows the same principle of its commercial counterpart - cloud computing, where computing resources are provided to end-users remotely and on-demand. Similarly, the physical location of the sites which provide the resource, as well as the infrastructure is abstracted away from the users. From the end-users’ perspective, they just have to submit their jobs to an appropriate job management system without any knowledge of where the jobs will be run or where the data are physically stored. In grid computing, these comput- ing resources are often distributed across multiple locations, where a site that provides data storage capacity is called a Storage Element, and one that provides computation capacity is called a Compute Element. 1 1.2 Storage on the grid Grid storage elements have to support some unique requirements found in grid envi- ronment. For example, the grid relies on the concept of Virtual Organisations (VO) for resource allocation and accounting. A VO represents a group of users, not necessary from the same organisation but usually involved in the same experiment, and manages their membership. Resources on the grid (i.e. storage space provided by a site) are allo- cated to specific VOs instead of individual users. Storage elements also have to support file transfer protocols that are not commonly used outside of the grid environment, such as GridFTP [4] and xrootd [5]. Various storage management systems were developed for grid storage elements to fulfil these requirements, and one such system is the Disk Pool Manager (DPM) [6]. DPM is a storage management system developed by CERN. It is currently the most widely deployed storage system on tier 2 sites, providing the Worldwide LHC Comput- ing Grid (WLCG) around 73 petabytes of storage across 160 instances [7]. The main functionalities of DPM are to provide a straightforward, low maintenance solution to create a disk-based grid storage element, and to support remote file and meta-data op- erations using multiple protocols commonly used in grid environment. 1.3 The problem This section presents the main challenges for DPM, the specific limitations that motivate this work, and outlines the project’s aim. 1.3.1 Challenges in availability Due to limitations in the DPM architecture, the current deployment model supports only one meta-data server and command node. This deployment model exposes a single point of failure in a DPM-based storage element. There are several scenarios where this deployment model could affect the availability of a site: • Hardware failure in the host • Software/OS update that results in the host being offline • Retirement or replacement of machines If any of the scenario listed above happens to the command node, the entire storage element will become inaccessible, which ultimately means expensive downtime for the site. 2 1.3.2 Limitations in DPM legacy components Some components in DPM were first developed over 20 years ago. The tightly-coupled natural of these components have limited the extensibility of the DPM system and makes it impractical to modify DPM into a multi-servers system. As the grid evolves, the number of users and storage demand have also increased. New software practices and designs have also emerged that could better fulfil the requirements of a high load storage element. In light of this, the DPM development team have put in considerable amount of effort into modernising the system in the past few years, which resulted in some new com- ponents that could bypass some limitations of the legacy stack. The extensibility of these new components has opened up an opportunity to modify the current deployment model, which this work aims to explore. 1.4 Aim The aim of this work is to explore the possibility of increasing the availability of a DPM-based grid storage element by modifying its current architecture and components.
Recommended publications
  • Easybuild Documentation Release 20210907.0
    EasyBuild Documentation Release 20210907.0 Ghent University Tue, 07 Sep 2021 08:55:41 Contents 1 What is EasyBuild? 3 2 Concepts and terminology 5 2.1 EasyBuild framework..........................................5 2.2 Easyblocks................................................6 2.3 Toolchains................................................7 2.3.1 system toolchain.......................................7 2.3.2 dummy toolchain (DEPRECATED) ..............................7 2.3.3 Common toolchains.......................................7 2.4 Easyconfig files..............................................7 2.5 Extensions................................................8 3 Typical workflow example: building and installing WRF9 3.1 Searching for available easyconfigs files.................................9 3.2 Getting an overview of planned installations.............................. 10 3.3 Installing a software stack........................................ 11 4 Getting started 13 4.1 Installing EasyBuild........................................... 13 4.1.1 Requirements.......................................... 14 4.1.2 Using pip to Install EasyBuild................................. 14 4.1.3 Installing EasyBuild with EasyBuild.............................. 17 4.1.4 Dependencies.......................................... 19 4.1.5 Sources............................................. 21 4.1.6 In case of installation issues. .................................. 22 4.2 Configuring EasyBuild.......................................... 22 4.2.1 Supported configuration
    [Show full text]
  • Software Package Licenses
    DAVIX 1.0.0 Licenses Package Version Platform License Type Package Origin Operating System SLAX 6.0.4 Linux GPLv2 SLAX component DAVIX 0.x.x Linux GPLv2 - DAVIX Manual 0.x.x PDF GNU FDLv1.2 - Standard Packages Font Adobe 100 dpi 1.0.0 X Adobe license: redistribution possible. Slackware Font Misc Misc 1.0.0 X Public domain Slackware Firefox 2.0.0.16 C Mozilla Public License (MPL), chapter 3.6 and Slackware 3.7 Apache httpd 2.2.8 C Apache License 2.0 Slackware apr 1.2.8 C Apache License 2.0 Slackware apr-util 1.2.8 C Apache License 2.0 Slackware MySQL Client & Server 5.0.37 C GPLv2 Slackware Wireshark 1.0.2 C GPLv2, pidl util GPLv3 Built from source KRB5 N/A C Several licenses: redistribution permitted dropline GNOME: Copied single libraries libgcrypt 1.2.4 C GPLv2 or LGPLv2.1 Slackware: Copied single libraries gnutls 1.6.2 C GPLv2 or LGPLv2.1 Slackware: Copied single libraries libgpg-error 1.5 C GPLv2 or LGPLv2.1 Slackware: Copied single libraries Perl 5.8.8 C, Perl GPL or Artistic License SLAX component Python 2.5.1 C, PythonPython License (GPL compatible) Slackware Ruby 1.8.6 C, Ruby GPL or Ruby License Slackware tcpdump 3.9.7 C BSD License SLAX component libpcap 0.9.7 C BSD License SLAX component telnet 0.17 C BSD License Slackware socat 1.6.0.0 C GPLv2 Built from source netcat 1.10 C Free giveaway with no restrictions Slackware GNU Awk 3.1.5 C GPLv2 SLAX component GNU grep / egrep 2.5 C GPLv2 SLAX component geoip 1.4.4 C LGPL 2.1 Built from source Geo::IPfree 0.2 Perl This program is free software; you can Built from source redistribute it and/or modify it under the same terms as Perl itself.
    [Show full text]
  • Nutzerhandbuch Für Den Dienst „LSDF Online Storage“ Am SCC/KIT
    Nutzerhandbuch für den Dienst „LSDF Online Storage“ am SCC/KIT Steinbuch Centre for Computing, KIT Version 2.0, 10. Oktober 2019 Einleitung 1 1 Einleitung ..................................................................................................................................... 3 2 Registrierung ............................................................................................................................... 3 3 Abbestellung ................................................................................................................................ 6 4 Snapshots und Versionierung...................................................................................................... 7 5 Backup ......................................................................................................................................... 7 6 Verzeichnisstruktur ..................................................................................................................... 7 7 Zugangsprotokolle ....................................................................................................................... 9 7.1 Übersicht über die verfügbaren Protokolle......................................................................... 9 7.2 Zugriff über Network File System (NFS) .............................................................................. 9 7.3 Zugriff über Common Internet File System (CIFS) ............................................................... 9 7.3.1 UNIX/Linux Client .....................................................................................................
    [Show full text]
  • INSECURE-Mag-18.Pdf
    Welcome to another issue of (IN)SECURE filled with a variety of hot topics. It!s been a busy summer and we have a lot on the table for you. I had the pleasure to visit Greece earlier this month for the 1st NIS Summer School. From what I!ve seen, information and network security in Europe are in good hands. More about this fine event at page 13. In other news, Jo Stewart-Rattray, who was one of the authors featured in the November 2007 issue of (IN)SECURE, wanted to apologize for omitting proper attribution in her article - "Information Security, the Nuts and Bolts". The attribution that should have been included is “Information Security Governance: Guidance for Boards of Directors and Executive Management, 2nd Edition”, IT Governance Institute, 2006. The article has been removed from our archives as soon as questions have been raised and we are satisfied with her prompt response. Mirko Zorz Chief Editor Visit the magazine website at www.insecuremag.com (IN)SECURE Magazine contacts Feedback and contributions: Mirko Zorz, Chief Editor - [email protected] Marketing: Berislav Kucan, Director of Marketing - [email protected] Distribution (IN)SECURE Magazine can be freely distributed in the form of the original, non modified PDF document. Distribution of modified versions of (IN)SECURE Magazine content is prohibited without the explicit permission from the editor. Copyright HNS Consulting Ltd. 2008. www.insecuremag.com Kaspersky Internet Security and Anti-Virus 2009 versions Kaspersky Lab announced the North American release of Kaspersky Internet Security 2009 and Kaspersky Anti-Virus 2009. Kaspersky has entirely rebuilt its award-winning anti-malware security engine.
    [Show full text]
  • Davix Architecture
    Davix Data Management Library for WebDav/Http Devresse Adrien Furano Fabrizio IT-GT-DMS What is Davix ? Davix is NOT : - Yet an other HTTP Library - an other Grid specific software Davix is : - a toolkit for optimized remote I/O - support all HTTP based protocols → S3 / WebDav / (CDMI ? ) Why Davix ? Downloading Web Content ≠ Remote random I/O → Wget/Curl/Aria2c are good to download Full files.. → Efficient Remote random I/O is far more complex Doing efficient remote I/O implies → HTTP request tuning → Server implementation knowledges → Caching Why Davix ? We need a data-management toolkit for our customers if we wish to promote HTTP usage for Remote I/O → We need a tool similar to the xrd client libs / tools → One performant reference implementation → None of the current solutions are acceptables → No one want to reimplement the wheel Davix Goals → Transparent optimizations →Thread-safe HTTP/SSL session reuse → Transparent caching → Vector operations → Auto detection of server capabilities →Parallel stream support → Reliability of data access → Metalink → Configurable « retry policy » Davix Goals → Simplified file specific operations : → HTTP Staging file operation ? → Checksum Calculation → Right management : ACL ? → Security : proxy certificate + voms exts → Third party copy support → Cluster environment → redirection caching on every operation. Davix Goals → One Implementation for the « extensions » → Future support for metalink : → Multi-source and multi-link download. → Geolocalized download ? → Advanced Failover management Davix Implementation → Fully in C++ → Portable : → Debian / SL / Fedora , Mac OSX, Windows, FreeBSD → Already pkg on Fedora / Debian → Based on libneon for the low-level Http/Dav queries. → Reliable official http/webdav library. → No dependencies, just the minimum : → OpenSSL, libxml.
    [Show full text]
  • Xrootd and Xcache
    Xrootd and Xcache Andy Hanushevsky Wei Yang ● Xrootd ○ Plugin architecture ○ Scaling up ○ Monitoring data streams ● Xcache ○ Features of Xcache ○ Expanding functionalities of Xcache, two examples ○ Thoughts on security and general purpose, shared cache 2 July 30, 2020 @ ESnet Xrootd - open, plugin architecture Xrootd was originally an open source storage system ● Developed during Babar era as a static scientific data storage (HEP data) ○ Lightweight and reliable, hardened by the Babar experiment. ● The current Xrootd software stack allows plugin to almost everywhere. ● This greatly expands Xrootd’s functionalities ● Attracted many contributions from people outside of the core Xrootd team ● Supported by: dCache, EOS, DPM, RAL-ECHO/CEPH, Posix file systems Such an architecture also bring challenges: ● Keeping track and keeping peace of those contributions ● Complex configuration and long list of functions validation ○ Every plugin has something to config ● Documentation 3 July 30, 2020 @ ESnet Open, plugin architecture Plugins Security This is a vastly simplified view of Unix, Krb5, GSI, sss Xrootd / Xcache components and coming: JWT plugins. Many are not shown! Core/Mgmt : sched, It is here to help explaining later root(s):// xroot(s) protocol thread, memory, connection, async IO, slides. etc …... Protocol bridge Access Ctrl OFS: Open File System plugin http(s):// http(s) plugin OSS: Open Storage System N2N Rucio integration, XcacheH GSI Proxy Xrootd root(s):// CEPH plugin JWT Security Plugins Posix file client HDFS remote systems data
    [Show full text]
  • New Data Access with HTTP/Webdav in the ATLAS Experiment
    21st International Conference on Computing in High Energy and Nuclear Physics (CHEP2015) IOP Publishing Journal of Physics: Conference Series 664 (2015) 042014 doi:10.1088/1742-6596/664/4/042014 New data access with HTTP/WebDAV in the ATLAS experiment J Elmsheuser1, R Walker1, C Serfon2, V Garonne3, S Blunier4,V Lavorini5 and P Nilsson6 on behalf of the ATLAS collaboration 1 Ludwig-Maximilians-Universit¨atM¨unchen, Germany 2 CERN, Switzerland 3 University of Oslo, Norway 4 Pontificia Univ. Catolica de Chile, Chile 5 Universita della Calabria, Italy 6 Brookhaven National Laboratory, USA E-mail: [email protected] Abstract. With the exponential growth of LHC (Large Hadron Collider) data in the years 2010-2012, distributed computing has become the established way to analyse collider data. The ATLAS experiment Grid infrastructure includes more than 130 sites worldwide, ranging from large national computing centres to smaller university clusters. So far the storage technologies and access protocols to the clusters that host this tremendous amount of data vary from site to site. HTTP/WebDAV offers the possibility to use a unified industry standard to access the storage. We present the deployment and testing of HTTP/WebDAV for local and remote data access in the ATLAS experiment for the new data management system Rucio and the PanDA workload management system. Deployment and large scale tests have been performed using the Grid testing system HammerCloud and the ROOT HTTP plugin Davix. 1. Introduction The LHC at CERN is colliding protons or heavy ions at unprecedented centre-of-mass energies and these collisions are recorded by several experiments including the ATLAS experiment [1].
    [Show full text]
  • HTTP As a Data Access Protocol: Trials with Xrootd in CMS's AAA
    University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Faculty Publications, Department of Physics and Research Papers in Physics and Astronomy Astronomy 2017 HTTP sa a Data Access Protocol: Trials with XrootD in CMS’s AAA Project J. Balcas B. P. Bockelman D. Kcira H. Newman J. Vlimant See next page for additional authors Follow this and additional works at: https://digitalcommons.unl.edu/physicsfacpub This Article is brought to you for free and open access by the Research Papers in Physics and Astronomy at DigitalCommons@University of Nebraska - Lincoln. It has been accepted for inclusion in Faculty Publications, Department of Physics and Astronomy by an authorized administrator of DigitalCommons@University of Nebraska - Lincoln. Authors J. Balcas, B. P. Bockelman, D. Kcira, H. Newman, J. Vlimant, and T. W. Hendricks Journal of Physics: Conference Series PAPER • OPEN ACCESS Related content - Experiences with http/WebDAV protocols HTTP as a Data Access Protocol: Trials with for data access in high throughput computing Gerard Bernabeu, Francisco Martinez, XrootD in CMS’s AAA Project Esther Acción et al. - Towards an HTTP Ecosystem for HEP To cite this article: J Balcas et al 2017 J. Phys.: Conf. Ser. 898 062042 Data Access Fabrizio Furano, Adrien Devresse, Oliver Keeble et al. - Recent Standard Model results from CMS Simon de Visscher and CMS collaboration View the article online for updates and enhancements. This content was downloaded from IP address 129.93.168.10 on 04/08/2019 at 21:21 CHEP IOP Publishing IOP Conf. Series: Journal of Physics: Conf. Series 1234567890898 (2017) 062042 doi :10.1088/1742-6596/898/6/062042 HTTP as a Data Access Protocol: Trials with XrootD in CMS’s AAA Project J Balcas1,BPBockelman2, D Kcira1, H Newman1, J Vlimant1, T W Hendricks1 for the CMS Collaboration 1California Institute of Technology, Pasadena, CA 91125, USA 2University of Nebraska-Lincoln, Lincoln, NE 68588, USA E-mail: [email protected] Abstract.
    [Show full text]
  • Experiences with IDS and Honeypots Best Practice Document
    Experiences with IDS and Honeypots Best Practice Document Produced by the CESNET-led working group on Support of Research and Innovation projects (CBPD135) Author: Radoslav Bodó and Michal Kostěnec March 2012 © TERENA 2010. All rights reserved. Document No: GN3-NA3-T4-CBPD135 Version / date: March 26, 2012 Original language : English Original title: “Experiences with IDS and Honeypots” Original version / date: March 26, 2012 Contact: [email protected], [email protected] CESNET bears responsibility for the content of this document. The work has been carried out by a CESNET-led working group on Support of Research and Innovations projects as part of a joint-venture project within the HE sector in Czech republic. [This translated version is based on the Norwegian counterpart approved by the Norwegian HE sector on 22 December 2009 after an open consultation period of four weeks.] Parts of the report may be freely copied, unaltered, provided that the original source is acknowledged and copyright preserved. The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013) under grant agreement n° 238875, relating to the project 'Multi-Gigabit European Research and Education Network and Associated Services (GN3)'. 2 Table of Contents Table of Contents ............................................................................................................................................ 3 1 Executive summary ..................................................................................................................................
    [Show full text]
  • Overcoming Inevitable Risks of Electronic Communication
    Overcoming inevitable risks of electronic communication This page is unintentionally left blank. This publication may be cited as: Teemu Väisänen, Lorena Trinberg and Nikolaos Pissanidis, 2016, “I accidentally malware - what should I do… is this dangerous? Overcoming inevitable risks of electronic communication”, NATO CCD COE, Tallinn, Estonia. This publication is a product of the NATO Cooperative Cyber Defence Centre of Excellence (the Centre). It does not necessarily reflect the policy or the opinion of the Centre, NATO, any agency or any government. The Centre may not be held responsible for any loss or harm arising from the use of information contained in this publication and is not responsible for the content of the external sources, including external websites referenced in this publication. Information has been obtained by the Centre from sources believed to be reliable. However, because the possibility of human or mechanical error by our sources, the Centre, or others, the Centre does not guarantee the accuracy, adequacy, or completeness of any information and is not responsible for any errors or omissions or the result obtained from the use of such information. Digital or hard copies of this publication may be produced for internal use within NATO and for personal or educational use when for non- profit and non-commercial purpose, provided that copies bear a full citation. Exception: Figure 27 is under Creative Commons http://creativecommons.org/licenses/by-sa/3.0/ www.ccdcoe.org [email protected] This study is the result of the ‘Unsecurable Systems Research’ -project of NATO CCD COE’s Program of Work (POW) 2015.
    [Show full text]