Operating an HPC/HTC Cluster with Fully Containerized Jobs Using Htcondor, Singularity, Cephfs and CVMFS
Total Page:16
File Type:pdf, Size:1020Kb

Load more
Recommended publications
-
Copy on Write Based File Systems Performance Analysis and Implementation
Copy On Write Based File Systems Performance Analysis And Implementation Sakis Kasampalis Kongens Lyngby 2010 IMM-MSC-2010-63 Technical University of Denmark Department Of Informatics Building 321, DK-2800 Kongens Lyngby, Denmark Phone +45 45253351, Fax +45 45882673 [email protected] www.imm.dtu.dk Abstract In this work I am focusing on Copy On Write based file systems. Copy On Write is used on modern file systems for providing (1) metadata and data consistency using transactional semantics, (2) cheap and instant backups using snapshots and clones. This thesis is divided into two main parts. The first part focuses on the design and performance of Copy On Write based file systems. Recent efforts aiming at creating a Copy On Write based file system are ZFS, Btrfs, ext3cow, Hammer, and LLFS. My work focuses only on ZFS and Btrfs, since they support the most advanced features. The main goals of ZFS and Btrfs are to offer a scalable, fault tolerant, and easy to administrate file system. I evaluate the performance and scalability of ZFS and Btrfs. The evaluation includes studying their design and testing their performance and scalability against a set of recommended file system benchmarks. Most computers are already based on multi-core and multiple processor architec- tures. Because of that, the need for using concurrent programming models has increased. Transactions can be very helpful for supporting concurrent program- ming models, which ensure that system updates are consistent. Unfortunately, the majority of operating systems and file systems either do not support trans- actions at all, or they simply do not expose them to the users. -
PDF, 32 Pages
Helge Meinhard / CERN V2.0 30 October 2015 HEPiX Fall 2015 at Brookhaven National Lab After 2004, the lab, located on Long Island in the State of New York, U.S.A., was host to a HEPiX workshop again. Ac- cess to the site was considerably easier for the registered participants than 11 years ago. The meeting took place in a very nice and comfortable seminar room well adapted to the size and style of meeting such as HEPiX. It was equipped with advanced (sometimes too advanced for the session chairs to master!) AV equipment and power sockets at each seat. Wireless networking worked flawlessly and with good bandwidth. The welcome reception on Monday at Wading River at the Long Island sound and the workshop dinner on Wednesday at the ocean coast in Patchogue showed more of the beauty of the rather natural region around the lab. For those interested, the hosts offered tours of the BNL RACF data centre as well as of the STAR and PHENIX experiments at RHIC. The meeting ran very smoothly thanks to an efficient and experienced team of local organisers headed by Tony Wong, who as North-American HEPiX co-chair also co-ordinated the workshop programme. Monday 12 October 2015 Welcome (Michael Ernst / BNL) On behalf of the lab, Michael welcomed the participants, expressing his gratitude to the audience to have accepted BNL's invitation. He emphasised the importance of computing for high-energy and nuclear physics. He then intro- duced the lab focusing on physics, chemistry, biology, material science etc. The total head count of BNL-paid people is close to 3'000. -
Wang Paper (Prepublication)
Riverbed: Enforcing User-defined Privacy Constraints in Distributed Web Services Frank Wang Ronny Ko, James Mickens MIT CSAIL Harvard University Abstract 1.1 A Loss of User Control Riverbed is a new framework for building privacy-respecting Unfortunately, there is a disadvantage to migrating applica- web services. Using a simple policy language, users define tion code and user data from a user’s local machine to a restrictions on how a remote service can process and store remote datacenter server: the user loses control over where sensitive data. A transparent Riverbed proxy sits between a her data is stored, how it is computed upon, and how the data user’s front-end client (e.g., a web browser) and the back- (and its derivatives) are shared with other services. Users are end server code. The back-end code remotely attests to the increasingly aware of the risks associated with unauthorized proxy, demonstrating that the code respects user policies; in data leakage [11, 62, 82], and some governments have begun particular, the server code attests that it executes within a to mandate that online services provide users with more con- Riverbed-compatible managed runtime that uses IFC to en- trol over how their data is processed. For example, in 2016, force user policies. If attestation succeeds, the proxy releases the EU passed the General Data Protection Regulation [28]. the user’s data, tagging it with the user-defined policies. On Articles 6, 7, and 8 of the GDPR state that users must give con- the server-side, the Riverbed runtime places all data with com- sent for their data to be accessed. -
How to Create a Custom Live CD for Secure Remote Incident Handling in the Enterprise
How to Create a Custom Live CD for Secure Remote Incident Handling in the Enterprise Abstract This paper will document a process to create a custom Live CD for secure remote incident handling on Windows and Linux systems. The process will include how to configure SSH for remote access to the Live CD even when running behind a NAT device. The combination of customization and secure remote access will make this process valuable to incident handlers working in enterprise environments with limited remote IT support. Bert Hayes, [email protected] How to Create a Custom Live CD for Remote Incident Handling 2 Table of Contents Abstract ...........................................................................................................................................1 1. Introduction ............................................................................................................................5 2. Making Your Own Customized Debian GNU/Linux Based System........................................7 2.1. The Development Environment ......................................................................................7 2.2. Making Your Dream Incident Handling System...............................................................9 2.3. Hardening the Base Install.............................................................................................11 2.3.1. Managing Root Access with Sudo..........................................................................11 2.4. Randomizing the Handler Password at Boot Time ........................................................12 -
In Search of the Ideal Storage Configuration for Docker Containers
In Search of the Ideal Storage Configuration for Docker Containers Vasily Tarasov1, Lukas Rupprecht1, Dimitris Skourtis1, Amit Warke1, Dean Hildebrand1 Mohamed Mohamed1, Nagapramod Mandagere1, Wenji Li2, Raju Rangaswami3, Ming Zhao2 1IBM Research—Almaden 2Arizona State University 3Florida International University Abstract—Containers are a widely successful technology today every running container. This would cause a great burden on popularized by Docker. Containers improve system utilization by the I/O subsystem and make container start time unacceptably increasing workload density. Docker containers enable seamless high for many workloads. As a result, copy-on-write (CoW) deployment of workloads across development, test, and produc- tion environments. Docker’s unique approach to data manage- storage and storage snapshots are popularly used and images ment, which involves frequent snapshot creation and removal, are structured in layers. A layer consists of a set of files and presents a new set of exciting challenges for storage systems. At layers with the same content can be shared across images, the same time, storage management for Docker containers has reducing the amount of storage required to run containers. remained largely unexplored with a dizzying array of solution With Docker, one can choose Aufs [6], Overlay2 [7], choices and configuration options. In this paper we unravel the multi-faceted nature of Docker storage and demonstrate its Btrfs [8], or device-mapper (dm) [9] as storage drivers which impact on system and workload performance. As we uncover provide the required snapshotting and CoW capabilities for new properties of the popular Docker storage drivers, this is a images. None of these solutions, however, were designed with sobering reminder that widespread use of new technologies can Docker in mind and their effectiveness for Docker has not been often precede their careful evaluation. -
Master Boot Record Vs Guid Mac
Master Boot Record Vs Guid Mac Wallace is therefor divinatory after kickable Noach excoriating his philosophizer hourlong. When Odell perches dilaceratinghis tithes gravitated usward ornot alkalize arco enough, comparatively is Apollo and kraal? enduringly, If funked how or following augitic is Norris Enrico? usually brails his germens However, half the UEFI supports the MBR and GPT. Following your suggested steps, these backups will appear helpful to restore prod data. OK, GPT makes for playing more logical choice based on compatibility. Formatting a suit Drive are Hard Disk. In this guide, is welcome your comments or thoughts below. Thus, making, or paid other OS. Enter an open Disk Management window. Erase panel, or the GUID Partition that, we have covered the difference between MBR and GPT to care unit while partitioning a drive. Each record in less directory is searched by comparing the hash value. Disk Utility have to its important tasks button activated for adding, total capacity, create new Container will be created as well. Hard money fix Windows Problems? MBR conversion, the main VBR and the backup VBR. At trial three Linux emergency systems ship with GPT fdisk. In else, the user may decide was the hijack is unimportant to them. GB even if lesser alignment values are detected. Interoperability of the file system also important. Although it hard be read natively by Linux, she likes shopping, the utility Partition Manager has endeavor to working when Disk Utility if nothing to remain your MBR formatted external USB hard disk drive. One station time machine, reformat the storage device, GPT can notice similar problem they attempt to recover the damaged data between another location on the disk. -
Understanding the Performance of Container Execution Environments
Understanding the performance of container execution environments Guillaume Everarts de Velp, Etienne Rivière and Ramin Sadre [email protected] EPL, ICTEAM, UCLouvain, Belgium Abstract example is an automatic grading platform named INGIn- Many application server backends leverage container tech- ious [2, 3]. This platform is used extensively at UCLouvain nologies to support workloads formed of short-lived, but and other institutions around the world to provide computer potentially I/O-intensive, operations. The latency at which science and engineering students with automated feedback container-supported operations complete impacts both the on programming assignments, through the execution of se- users’ experience and the throughput that the platform can ries of unit tests prepared by instructors. It is necessary that achieve. This latency is a result of both the bootstrap and the student code and the testing runtime run in isolation from execution time of the containers and is impacted greatly by each others. Containers answer this need perfectly: They the performance of the I/O subsystem. Configuring appro- allow students’ code to run in a controlled and reproducible priately the container environment and technology stack environment while reducing risks related to ill-behaved or to obtain good performance is not an easy task, due to the even malicious code. variety of options, and poor visibility on their interactions. Service latency is often the most important criteria for We present in this paper a benchmarking tool for the selecting a container execution environment. Slow response multi-parametric study of container bootstrap time and I/O times can impair the usability of an edge computing infras- performance, allowing us to understand such interactions tructure, or result in students frustration in the case of IN- within a controlled environment. -
Push-Based Job Submission Using Reverse SSH Connections
rvGAHP – Push-Based Job Submission using Reverse SSH Connections Scott Callaghan Gideon Juve Karan Vahi University of Southern California USC Information Sciences Institute USC Information Sciences Institute Los Angeles, California Marina Del Rey, California Marina Del Rey, California [email protected] [email protected] [email protected] Philip J. Maechling Thomas H. Jordan Ewa Deelman University of Southern California University of Southern California USC Information Sciences Institute Los Angeles, California Los Angeles, California Marina Del Rey, California [email protected] [email protected] [email protected] ABSTRACT using Reverse SSH Connections. In WORKS’17: WORKS’17: 12th Workshop Computational science researchers running large-scale scientific on Workflows in Support of Large-Scale Science, November 12–17, 2017, Denver, CO, USA. ACM, New York, NY, USA, 8 pages. https://doi.org/10.1145/3150994. workflow applications often want to run their workflows onthe 3151003 largest available compute systems to improve time to solution. Workflow tools used in distributed, heterogeneous, high perfor- mance computing environments typically rely on either a push- 1 INTRODUCTION based or a pull-based approach for resource provisioning from these Modern scientific applications typically require the execution of compute systems. However, many large clusters have moved to multiple codes, may include computational and data dependencies, two-factor authentication for job submission, making traditional au- and contain varied computational models ranging from a bag of tomated push-based job submission impossible. On the other hand, tasks to a monolithic parallel code. These applications are often pull-based approaches such as pilot jobs may lead to increased com- complex software suites with substantial computational and data plexity and a reduction in node-hour efficiency. -
Hardware-Driven Evolution in Storage Software by Zev Weiss A
Hardware-Driven Evolution in Storage Software by Zev Weiss A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Computer Sciences) at the UNIVERSITY OF WISCONSIN–MADISON 2018 Date of final oral examination: June 8, 2018 ii The dissertation is approved by the following members of the Final Oral Committee: Andrea C. Arpaci-Dusseau, Professor, Computer Sciences Remzi H. Arpaci-Dusseau, Professor, Computer Sciences Michael M. Swift, Professor, Computer Sciences Karthikeyan Sankaralingam, Professor, Computer Sciences Johannes Wallmann, Associate Professor, Mead Witter School of Music i © Copyright by Zev Weiss 2018 All Rights Reserved ii To my parents, for their endless support, and my cousin Charlie, one of the kindest people I’ve ever known. iii Acknowledgments I have taken what might be politely called a “scenic route” of sorts through grad school. While Ph.D. students more focused on a rapid graduation turnaround time might find this regrettable, I am glad to have done so, in part because it has afforded me the opportunities to meet and work with so many excellent people along the way. I owe debts of gratitude to a large cast of characters: To my advisors, Andrea and Remzi Arpaci-Dusseau. It is one of the most common pieces of wisdom imparted on incoming grad students that one’s relationship with one’s advisor (or advisors) is perhaps the single most important factor in whether these years of your life will be pleasant or unpleasant, and I feel exceptionally fortunate to have ended up iv with the advisors that I’ve had. -
Container-Based Virtualization for Byte-Addressable NVM Data Storage
2016 IEEE International Conference on Big Data (Big Data) Container-Based Virtualization for Byte-Addressable NVM Data Storage Ellis R. Giles Rice University Houston, Texas [email protected] Abstract—Container based virtualization is rapidly growing Storage Class Memory, or SCM, is an exciting new in popularity for cloud deployments and applications as a memory technology with the potential of replacing hard virtualization alternative due to the ease of deployment cou- drives and SSDs as it offers high-speed, byte-addressable pled with high-performance. Emerging byte-addressable, non- volatile memories, commonly called Storage Class Memory or persistence on the main memory bus. Several technologies SCM, technologies are promising both byte-addressability and are currently under research and development, each with dif- persistence near DRAM speeds operating on the main memory ferent performance, durability, and capacity characteristics. bus. These new memory alternatives open up a new realm of These include a ReRAM by Micron and Sony, a slower, but applications that no longer have to rely on slow, block-based very large capacity Phase Change Memory or PCM by Mi- persistence, but can rather operate directly on persistent data using ordinary loads and stores through the cache hierarchy cron and others, and a fast, smaller spin-torque ST-MRAM coupled with transaction techniques. by Everspin. High-speed, byte-addressable persistence will However, SCM presents a new challenge for container-based give rise to new applications that no longer have to rely on applications, which typically access persistent data through slow, block based storage devices and to serialize data for layers of block based file isolation. -
Introduction to Python University of Oxford Department of Particle Physics
Particle Physics Cluster Infrastructure Introduction University of Oxford Department of Particle Physics October 2019 Vipul Davda Particle Physics Linux Systems Administrator Room 661 Telephone: x73389 [email protected] Particle Physics Computing Overview 1 Particle Physics Linux Infrastructure Distributed File System gluster NFS /data Worker Nodes /data/atlas /data/lhcb NFS HTCondor /home Batch Server Interactive Servers physics_s/eduroam Network Printer Managed Laptops Managed Desktops Particle Physics Computing Overview 2 Introduction to the Unix Operating System Unix is a Multi-User/Multi-Tasking operating system. Developed in 1969 at AT&T’s Bell Labs by Ken Thompson (Unix) Dennis Ritchie (C) Unix is written in C programming language. Unix was originally a command-line OS, but now has a graphical user interface. It is available in many different forms: Linux , Solaris, AIX, HP-UX, freeBSD It is a well-suited environment for program development: C, C++, Java, Fortran, Python… Unix is mainly used on large servers for scientific applications. Particle Physics Computing Overview 3 Linux Distributions Source: https://www.muylinux.com/2009/04/24/logos-de-distribuciones-gnulinux/ Particle Physics Computing Overview 4 Particle Physics Linux Infrastructure Particle Physics uses CentOS Linux on the cluster. It is a free version of RedHat Enterprise Linux. Particle Physics Computing Overview 5 Basic Linux Commands Particle Physics Computing Overview 6 Basic Linux Commands o ls - list directory contents ls –l - long listing -
The Translational Journey of the Htcondor-CE
Journal of Computational Science xxx (xxxx) xxx Contents lists available at ScienceDirect Journal of Computational Science journal homepage: www.elsevier.com/locate/jocs Principles, technologies, and time: The translational journey of the HTCondor-CE Brian Bockelman a,*, Miron Livny a,b, Brian Lin b, Francesco Prelz c a Morgridge Institute for Research, Madison, USA b Department of Computer Sciences, University of Wisconsin-Madison, Madison, USA c INFN Milan, Milan, Italy ARTICLE INFO ABSTRACT Keywords: Mechanisms for remote execution of computational tasks enable a distributed system to effectively utilize all Distributed high throughput computing available resources. This ability is essential to attaining the objectives of high availability, system reliability, and High throughput computing graceful degradation and directly contribute to flexibility, adaptability, and incremental growth. As part of a Translational computing national fabric of Distributed High Throughput Computing (dHTC) services, remote execution is a cornerstone of Distributed computing the Open Science Grid (OSG) Compute Federation. Most of the organizations that harness the computing capacity provided by the OSG also deploy HTCondor pools on resources acquired from the OSG. The HTCondor Compute Entrypoint (CE) facilitates the remote acquisition of resources by all organizations. The HTCondor-CE is the product of a most recent translational cycle that is part of a multidecade translational process. The process is rooted in a partnership, between members of the High Energy Physics community and computer scientists, that evolved over three decades and involved testing and evaluation with active users and production infrastructures. Through several translational cycles that involved researchers from different organizations and continents, principles, ideas, frameworks and technologies were translated into a widely adopted software artifact that isresponsible for provisioning of approximately 9 million core hours per day across 170 endpoints.