Lonestar 5: Customizing the Cray XC40 Software Environment
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Slurm Overview and Elasticsearch Plugin
Slurm Workload Manager Overview SC15 Alejandro Sanchez [email protected] Copyright 2015 SchedMD LLC http://www.schedmd.com Slurm Workload Manager Overview ● Originally intended as simple resource manager, but has evolved into sophisticated batch scheduler ● Able to satisfy scheduling requirements for major computer centers with use of optional plugins ● No single point of failure, backup daemons, fault-tolerant job options ● Highly scalable (3.1M core Tianhe-2 at NUDT) ● Highly portable (autoconf, extensive plugins for various environments) ● Open source (GPL v2) ● Operating on many of the world's largest computers ● About 500,000 lines of code today (plus test suite and documentation) Copyright 2015 SchedMD LLC http://www.schedmd.com Enterprise Architecture Copyright 2015 SchedMD LLC http://www.schedmd.com Architecture ● Kernel with core functions plus about 100 plugins to support various architectures and features ● Easily configured using building-block approach ● Easy to enhance for new architectures or features, typically just by adding new plugins Copyright 2015 SchedMD LLC http://www.schedmd.com Elasticsearch Plugin Copyright 2015 SchedMD LLC http://www.schedmd.com Scheduling Capabilities ● Fair-share scheduling with hierarchical bank accounts ● Preemptive and gang scheduling (time-slicing parallel jobs) ● Integrated with database for accounting and configuration ● Resource allocations optimized for topology ● Advanced resource reservations ● Manages resources across an enterprise Copyright 2015 SchedMD LLC http://www.schedmd.com -
Version 7.8-Systemd
Linux From Scratch Version 7.8-systemd Created by Gerard Beekmans Edited by Douglas R. Reno Linux From Scratch: Version 7.8-systemd by Created by Gerard Beekmans and Edited by Douglas R. Reno Copyright © 1999-2015 Gerard Beekmans Copyright © 1999-2015, Gerard Beekmans All rights reserved. This book is licensed under a Creative Commons License. Computer instructions may be extracted from the book under the MIT License. Linux® is a registered trademark of Linus Torvalds. Linux From Scratch - Version 7.8-systemd Table of Contents Preface .......................................................................................................................................................................... vii i. Foreword ............................................................................................................................................................. vii ii. Audience ............................................................................................................................................................ vii iii. LFS Target Architectures ................................................................................................................................ viii iv. LFS and Standards ............................................................................................................................................ ix v. Rationale for Packages in the Book .................................................................................................................... x vi. Prerequisites -
Pluggable Authentication Modules
Who this book is written for This book is for experienced system administrators and developers working with multiple Linux/UNIX servers or with both UNIX and Pluggable Authentication Windows servers. It assumes a good level of admin knowledge, and that developers are competent in C development on UNIX-based systems. Pluggable Authentication Modules PAM (Pluggable Authentication Modules) is a modular and flexible authentication management layer that sits between Linux applications and the native underlying authentication system. The PAM framework is widely used by most Linux distributions for authentication purposes. Modules Originating from Solaris 2.6 ten years ago, PAM is used today by most proprietary and free UNIX operating systems including GNU/Linux, FreeBSD, and Solaris, following both the design concept and the practical details. PAM is thus a unifying technology for authentication mechanisms in UNIX. This book provides a practical approach to UNIX/Linux authentication. The design principles are thoroughly explained, then illustrated through the examination of popular modules. It is intended as a one-stop introduction and reference to PAM. What you will learn from this book From Technologies to Solutions • Install, compile, and configure Linux-PAM on your system • Download and compile third-party modules • Understand the PAM framework and how it works • Learn to work with PAM’s management groups and control fl ags • Test and debug your PAM confi guration Pluggable Authentication Modules • Install and configure the pamtester utility -
Security Guide
Fedora 19 Security Guide A Guide to Securing Fedora Linux Johnray Fuller John Ha David O'Brien Scott Radvan Eric Christensen Adam Ligas Murray McAllister Scott Radvan Daniel Walsh Security Guide Dominick Grift Eric Paris James Morris Fedora 19 Security Guide A Guide to Securing Fedora Linux Edition 19.1 Author Johnray Fuller [email protected] Author John Ha [email protected] Author David O'Brien [email protected] Author Scott Radvan [email protected] Author Eric Christensen [email protected] Author Adam Ligas [email protected] Author Murray McAllister [email protected] Author Scott Radvan [email protected] Author Daniel Walsh [email protected] Author Dominick Grift [email protected] Author Eric Paris [email protected] Author James Morris [email protected] Copyright © 2007-2013 Fedora Project Contributors. The text of and illustrations in this document are licensed by Red Hat under a Creative Commons Attribution–Share Alike 3.0 Unported license ("CC-BY-SA"). An explanation of CC-BY-SA is available at http://creativecommons.org/licenses/by-sa/3.0/. The original authors of this document, and Red Hat, designate the Fedora Project as the "Attribution Party" for purposes of CC-BY-SA. In accordance with CC-BY-SA, if you distribute this document or an adaptation of it, you must provide the URL for the original version. Red Hat, as the licensor of this document, waives the right to enforce, and agrees not to assert, Section 4d of CC-BY-SA to the fullest extent permitted by applicable law. Red Hat, Red Hat Enterprise Linux, the Shadowman logo, JBoss, MetaMatrix, Fedora, the Infinity Logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and other countries. -
Intel® SSF Reference Design: Intel® Xeon Phi™ Processor, Intel® OPA
Intel® Scalable System Framework Reference Design Intel® Scalable System Framework (Intel® SSF) Reference Design Cluster installation for systems based on Intel® Xeon® processor E5-2600 v4 family including Intel® Ethernet. Based on OpenHPC v1.1 Version 1.0 Intel® Scalable System Framework Reference Design Summary This Reference Design is part of the Intel® Scalable System Framework series of reference collateral. The Reference Design is a verified implementation example of a given reference architecture, complete with hardware and software Bill of Materials information and cluster configuration instructions. It can confidently be used “as is”, or be the foundation for enhancements and/or modifications. Additional Reference Designs are expected in the future to provide example solutions for existing reference architecture definitions and for utilizing additional Intel® SSF elements. Similarly, more Reference Designs are expected as new reference architecture definitions are introduced. This Reference Design is developed in support of the specification listed below using certain Intel® SSF elements: • Intel® Scalable System Framework Architecture Specification • Servers with Intel® Xeon® processor E5-2600 v4 family processors • Intel® Ethernet • Software stack based on OpenHPC v1.1 2 Version 1.0 Legal Notices No license (express or implied, by estoppel or otherwise) to any intellectual property rights is granted by this document. Intel disclaims all express and implied warranties, including without limitation, the implied warranties of merchantability, fitness for a particular purpose, and non-infringement, as well as any warranty arising from course of performance, course of dealing, or usage in trade. This document contains information on products, services and/or processes in development. All information provided here is subject to change without notice. -
SUSE Linux Enterprise High Performance Computing 15 SP3 Administration Guide Administration Guide SUSE Linux Enterprise High Performance Computing 15 SP3
SUSE Linux Enterprise High Performance Computing 15 SP3 Administration Guide Administration Guide SUSE Linux Enterprise High Performance Computing 15 SP3 SUSE Linux Enterprise High Performance Computing Publication Date: September 24, 2021 SUSE LLC 1800 South Novell Place Provo, UT 84606 USA https://documentation.suse.com Copyright © 2020–2021 SUSE LLC and contributors. All rights reserved. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or (at your option) version 1.3; with the Invariant Section being this copyright notice and license. A copy of the license version 1.2 is included in the section entitled “GNU Free Documentation License”. For SUSE trademarks, see http://www.suse.com/company/legal/ . All third-party trademarks are the property of their respective owners. Trademark symbols (®, ™ etc.) denote trademarks of SUSE and its aliates. Asterisks (*) denote third-party trademarks. All information found in this book has been compiled with utmost attention to detail. However, this does not guarantee complete accuracy. Neither SUSE LLC, its aliates, the authors nor the translators shall be held liable for possible errors or the consequences thereof. Contents Preface vii 1 Available documentation vii 2 Giving feedback vii 3 Documentation conventions viii 4 Support x Support statement for SUSE Linux Enterprise High Performance Computing x • Technology previews xi 1 Introduction 1 1.1 Components provided 1 1.2 Hardware platform support 2 1.3 Support -
SUSE Linux Enterprise Server 15 SP2 Security and Hardening Guide Security and Hardening Guide SUSE Linux Enterprise Server 15 SP2
SUSE Linux Enterprise Server 15 SP2 Security and Hardening Guide Security and Hardening Guide SUSE Linux Enterprise Server 15 SP2 Introduces basic concepts of system security, covering both local and network security aspects. Shows how to use the product inherent security software like AppArmor, SELinux, or the auditing system that reliably collects information about any security-relevant events. Supports the administrator with security-related choices and decisions in installing and setting up a secure SUSE Linux Enterprise Server and additional processes to further secure and harden that installation. Publication Date: September 24, 2021 SUSE LLC 1800 South Novell Place Provo, UT 84606 USA https://documentation.suse.com Copyright © 2006– 2021 SUSE LLC and contributors. All rights reserved. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or (at your option) version 1.3; with the Invariant Section being this copyright notice and license. A copy of the license version 1.2 is included in the section entitled “GNU Free Documentation License”. For SUSE trademarks, see https://www.suse.com/company/legal/ . All other third-party trademarks are the property of their respective owners. Trademark symbols (®, ™ etc.) denote trademarks of SUSE and its aliates. Asterisks (*) denote third-party trademarks. All information found in this book has been compiled with utmost attention to detail. However, this does not guarantee complete accuracy. Neither SUSE LLC, its -
Version Control Graphical Interface for Open Ondemand
VERSION CONTROL GRAPHICAL INTERFACE FOR OPEN ONDEMAND by HUAN CHEN Submitted in partial fulfillment of the requirements for the degree of Master of Science Department of Electrical Engineering and Computer Science CASE WESTERN RESERVE UNIVERSITY AUGUST 2018 CASE WESTERN RESERVE UNIVERSITY SCHOOL OF GRADUATE STUDIES We hereby approve the thesis/dissertation of Huan Chen candidate for the degree of Master of Science Committee Chair Chris Fietkiewicz Committee Member Christian Zorman Committee Member Roger Bielefeld Date of Defense June 27, 2018 TABLE OF CONTENTS Abstract CHAPTER 1: INTRODUCTION ............................................................................ 1 CHAPTER 2: METHODS ...................................................................................... 4 2.1 Installation for Environments and Open OnDemand .............................................. 4 2.1.1 Install SLURM ................................................................................................. 4 2.1.1.1 Create User .................................................................................... 4 2.1.1.2 Install and Configure Munge ........................................................... 5 2.1.1.3 Install and Configure SLURM ......................................................... 6 2.1.1.4 Enable Accounting ......................................................................... 7 2.1.2 Install Open OnDemand .................................................................................. 9 2.2 Git Version Control for Open OnDemand -
Extending SLURM for Dynamic Resource-Aware Adaptive Batch
1 Extending SLURM for Dynamic Resource-Aware Adaptive Batch Scheduling Mohak Chadha∗, Jophin John†, Michael Gerndt‡ Chair of Computer Architecture and Parallel Systems, Technische Universitat¨ Munchen¨ Garching (near Munich), Germany Email: [email protected], [email protected], [email protected] their resource requirements at runtime due to varying computa- Abstract—With the growing constraints on power budget tional phases based upon refinement or coarsening of meshes. and increasing hardware failure rates, the operation of future A solution to overcome these challenges and efficiently utilize exascale systems faces several challenges. Towards this, resource awareness and adaptivity by enabling malleable jobs has been the system components is resource awareness and adaptivity. actively researched in the HPC community. Malleable jobs can A method to achieve adaptivity in HPC systems is by enabling change their computing resources at runtime and can signifi- malleable jobs. cantly improve HPC system performance. However, due to the A job is said to be malleable if it can adapt to resource rigid nature of popular parallel programming paradigms such changes triggered by the batch system at runtime [6]. These as MPI and lack of support for dynamic resource management in batch systems, malleable jobs have been largely unrealized. In resource changes can either increase (expand operation) or this paper, we extend the SLURM batch system to support the reduce (shrink operation) the number of processors. Malleable execution and batch scheduling of malleable jobs. The malleable jobs have shown to significantly improve the performance of a applications are written using a new adaptive parallel paradigm batch system in terms of both system and user-centric metrics called Invasive MPI which extends the MPI standard to support such as system utilization and average response time [7], resource-adaptivity at runtime. -
Guide to the Secure Configuration of Red Hat Enterprise Linux 5
Guide to the Secure Configuration of Red Hat Enterprise Linux 5 Revision 4.2 August 26, 2011 Operating Systems Division Unix Team of the Systems and Network Analysis Center National Security Agency 9800 Savage Rd. Suite 6704 Ft. Meade, MD 20755-6704 2 Warnings Do not attempt to implement any of the recommendations in this guide without first testing in a non- production environment. This document is only a guide containing recommended security settings. It is not meant to replace well- structured policy or sound judgment. Furthermore this guide does not address site-specific configuration concerns. Care must be taken when implementing this guide to address local operational and policy concerns. The security changes described in this document apply only to Red Hat Enterprise Linux 5. They may not translate gracefully to other operating systems. Internet addresses referenced were valid as of 1 Dec 2009. Trademark Information Red Hat is a registered trademark of Red Hat, Inc. Any other trademarks referenced herein are the property of their respective owners. Change Log Revision 4.2 is an update of Revision 4.1 dated February 28, 2011. Added section 2.5.3.1.3, Disable Functionality of IPv6 Kernel Module Through Option. Added discussion to section 2.5.3.1.1, Disable Automatic Loading of IPv6 Kernel Module, indicating that this is no longer the preferred method for disabling IPv6. Added section 2.3.1.9, Set Accounts to Disable After Password Expiration. Revision 4.1 is an update of Revision 4 dated September 14, 2010. Added section 2.2.2.6, Disable All GNOME Thumbnailers if Possible. -
Distributed HPC Applications with Unprivileged Containers Felix Abecassis, Jonathan Calmels GPUNVIDIA Computing Beyond Video Games
Distributed HPC Applications with Unprivileged Containers Felix Abecassis, Jonathan Calmels GPUNVIDIA Computing Beyond video games VIRTUAL REALITY SCIENTIFIC COMPUTING MACHINE LEARNING AUTONOMOUS MACHINES GPU COMPUTING 2 Infrastructure at NVIDIA DGX SuperPOD GPU cluster (Top500 #20) x96 3 NVIDIA Containers Supports all major container runtimes We built libnvidia-container to make it easy to run CUDA applications inside containers We release optimized container images for each of the major Deep Learning frameworks every month We use containers for everything on our HPC clusters - R&D, official benchmarks, etc Containers give us portable software stacks without sacrificing performance 4 Typical cloud deployment e.g. Kubernetes Hundreds/thousands of small nodes All applications are containerized, for security reasons Many small applications running per node (e.g. microservices) Traffic to/from the outside world Not used for interactive applications or development Advanced features: rolling updates with rollback, load balancing, service discovery 5 GPU Computing at NVIDIA HPC-like 10-100 very large nodes “Trusted” users Not all applications are containerized Few applications per node (often just a single one) Large multi-node jobs with checkpointing (e.g. Deep Learning training) Little traffic to the outside world, or air-gapped Internal traffic is mostly RDMA 6 Slurm Workload Manager https://slurm.schedmd.com/slurm.html Advanced scheduling algorithms (fair-share, backfill, preemption, hierarchical quotas) Gang scheduling: scheduling and -
Security Guide Security Guide SUSE Linux Enterprise Server 12 SP4
SUSE Linux Enterprise Server 12 SP4 Security Guide Security Guide SUSE Linux Enterprise Server 12 SP4 Introduces basic concepts of system security, covering both local and network security aspects. Shows how to use the product inherent security software like AppArmor or the auditing system that reliably collects information about any security-relevant events. Publication Date: September 24, 2021 SUSE LLC 1800 South Novell Place Provo, UT 84606 USA https://documentation.suse.com Copyright © 2006– 2021 SUSE LLC and contributors. All rights reserved. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or (at your option) version 1.3; with the Invariant Section being this copyright notice and license. A copy of the license version 1.2 is included in the section entitled “GNU Free Documentation License”. For SUSE trademarks, see https://www.suse.com/company/legal/ . All other third-party trademarks are the property of their respective owners. Trademark symbols (®, ™ etc.) denote trademarks of SUSE and its aliates. Asterisks (*) denote third-party trademarks. All information found in this book has been compiled with utmost attention to detail. However, this does not guarantee complete accuracy. Neither SUSE LLC, its aliates, the authors nor the translators shall be held liable for possible errors or the consequences thereof. Contents About This Guide xvi 1 Available Documentation xvi 2 Giving Feedback xviii 3 Documentation Conventions xviii 4 Product Life Cycle