Scyld Clusterware Documentation Release 7.6.2

Total Page:16

File Type:pdf, Size:1020Kb

Scyld Clusterware Documentation Release 7.6.2 Scyld ClusterWare Documentation Release 7.6.2 Penguin Computing June 11, 2019 CONTENTS 1 Release Notes 1 1.1 Release Notes: Scyld ClusterWare Release v7.6.2-762g0000......................1 1.1.1 About This Release.......................................1 Important: Recommend using /usr/sbin/install-scyld script............1 Important for clusters using 3rd-party drivers or applications..................1 Important for clusters using Panasas storage...........................2 1.1.2 First Installation of Scyld ClusterWare 7 On A Server.....................2 1.1.3 Upgrading An Earlier Release of Scyld ClusterWare 7 to 7.6.................3 1.1.4 Post-Installation Configuration Issues.............................4 Resolve *.rpmnew and *.rpmsave configuration file differences................4 Disable SELinux and NetworkManager.............................4 Edit /etc/beowulf/conf.d/sysctl.conf as needed.................5 Optionally reduce size of /usr/lib/locale/locale-archive.....................5 Optionally configure and enable compute node CPU speed/power management........5 Optionally install a different TORQUE package.........................5 Optionally enable job manager..................................6 Optionally enable TORQUE scheduler..............................7 Optionally enable Ganglia monitoring tool...........................7 Optionally enable NFS locking..................................7 Optionally adjust the size limit for locked memory.......................7 Optionally increase the max number of processes per user...................8 Optionally enable SSHD on compute nodes...........................8 Optionally allow IP Forwarding.................................8 Optionally increase the nf_conntrack table size.........................8 Optionally configure vm.zone_reclaim_mode on compute nodes................9 Optionally configure automount on compute nodes.......................9 Optionally reconfigure node names...............................9 1.1.5 Post-Installation Configuration Issues For Large Clusters................... 10 Optionally increase the number of nfsd threads......................... 10 Optionally increase the max number of processID values.................... 10 Optionally increase the max number of open files........................ 10 Issues with Ganglia........................................ 11 1.1.6 Post-Installation Release of Updated Packages......................... 11 1.1.7 Notable Feature Enhancements And Bug Fixes........................ 11 v7.6.2 - June 11, 2019...................................... 11 v7.6.1 - April 8, 2019....................................... 11 v7.6.0 - December 14, 2018................................... 12 v7.5.4 - October 2, 2018..................................... 12 v7.5.3 - July 27, 2018...................................... 13 v7.5.2 - July 14, 2018...................................... 14 i v7.5.1 - June 4, 2018....................................... 14 v7.5.0 - June 4, 2018....................................... 15 v7.4.6 - October 3, 2018..................................... 16 v7.4.5 - March 23, 2018..................................... 16 v7.4.4 - February 8, 2018..................................... 17 v7.4.3 - January 11, 2018..................................... 17 v7.4.2 - December 26, 2017................................... 18 v7.4.1 - October 31, 2017.................................... 18 v7.4.0 - October 11, 2017.................................... 18 v7.4.0 - October 6, 2017..................................... 18 v7.3.7 - October 13, 2017.................................... 19 v7.3.6 - August 7, 2017...................................... 19 v7.3.5 - June 28, 2017...................................... 20 v7.3.4 - June 16, 2017...................................... 20 v7.3.3 - April 27, 2017...................................... 20 v7.3.2 - April 3, 2017....................................... 21 v7.3.1 - March 8, 2017...................................... 21 v7.3.0 - January 20, 2017..................................... 21 v7.2.0 - November 14, 2016................................... 22 1.1.8 Known Issues And Workarounds................................ 22 Issues with bootmodule firmware................................ 22 Kernel panic using non-invpcid old Intel nodes......................... 22 Managing environment modules .version files.......................... 22 Installing and managing concurrent versions of packages.................... 24 Issues with OpenMPI....................................... 24 Issues with Scyld ClusterWare process migration in heterogeneous clusters.......... 25 Issues with MVAPICH2 and mpirun_rsh or mpispawn..................... 26 Issues with ptrace......................................... 26 Issues with IP Forwarding.................................... 26 Issues with kernel modules.................................... 26 Issues with port numbers..................................... 27 Issues with Spanning Tree Protocol and portfast......................... 27 Issues with Gdk.......................................... 27 Caution when modifying Scyld ClusterWare scripts....................... 28 Caution using tools that modify config files touched by Scyld ClusterWare........... 28 Running nscd service on master node may cause kickbackdaemon to misbehave........ 28 Beofdisk does not support local disks without partition tables................. 29 Issues with bproc and the getpid() syscall............................ 29 2 Installation Guide 30 2.1 Preface.................................................. 30 2.2 Scyld ClusterWare System Overview.................................. 30 2.3 Recommended Components....................................... 31 2.4 Assembling the Cluster.......................................... 32 2.5 Software Components.......................................... 32 2.6 Quick Start Installation.......................................... 33 2.6.1 Introduction........................................... 33 2.6.2 Network Interface Configuration................................ 33 Cluster Public Network Interface................................. 33 Cluster Private Network Interface................................ 34 2.6.3 Network Security Settings................................... 35 2.6.4 Red Hat RHEL7 or CentOS7 Installation........................... 36 2.6.5 Scyld ClusterWare Installation................................. 36 Configure Yum To Support ClusterWare............................. 36 ii Install ClusterWare........................................ 36 2.6.6 Trusted Devices......................................... 37 2.6.7 Compute Nodes......................................... 38 2.7 Detailed Installation Instructions..................................... 39 2.7.1 Red Hat Installation Specifics................................. 39 Network Interface Configuration................................. 39 Network Security Settings.................................... 44 Package Group Selection..................................... 46 2.7.2 Updating Red Hat or CentOs Installation............................ 47 Updating Using Yum....................................... 47 Updating Using Media...................................... 48 2.7.3 Scyld ClusterWare Installation................................. 48 Configure Yum To Support ClusterWare............................. 48 Install ClusterWare........................................ 48 2.7.4 Trusted Devices......................................... 49 2.7.5 Enabling Access to External License Servers......................... 50 2.7.6 Post-Installation Configuration................................. 51 2.7.7 Scyld ClusterWare Updates................................... 51 Updating ClusterWare...................................... 51 2.8 Cluster Verification Procedures..................................... 51 2.8.1 Monitoring Cluster Status.................................... 52 bpstat............................................... 52 BeoStatus............................................. 52 2.8.2 Running Jobs Across the Cluster................................ 53 Directed Execution with bpsh.................................. 54 Dynamic Execution with beorun and mpprun.......................... 54 2.9 Troubleshooting ClusterWare...................................... 54 2.9.1 Failing PXE Network Boot................................... 54 2.9.2 Mixed Uni-Processor and SMP Cluster Nodes......................... 56 2.9.3 IP Forwarding.......................................... 56 2.9.4 SSH Traffic........................................... 56 2.9.5 Device Driver Updates..................................... 56 2.9.6 Finding Further Information.................................. 56 2.10 Compute Node Disk Partitioning.................................... 57 2.10.1 Architectural Overview..................................... 57 2.10.2 Operational Overview...................................... 57 2.10.3 Disk Partitioning Procedures.................................. 57 Typical Partitioning........................................ 58 Default Partitioning........................................ 58 Generalized, User-Specified Partitions.............................. 59 Unique Partitions......................................... 59 2.10.4 Mapping Compute Node Partitions............................... 59 2.11 Changes to Configuration Files..................................... 60 2.11.1 Changes to Red Hat Configuration Files............................ 60 2.11.2 Possible Changes to ClusterWare Configuration
Recommended publications
  • Department of Energy National Laboratories and Plants: Leadership in Cloud Computing (Brochure), U.S. Department of Energy (DOE)
    Department of Energy National Laboratories and Plants Leadership in Cloud Computing Prepared by the National Renewable Energy Laboratory (NREL), a national laboratory of the U.S. Department of Energy, Office of Energy Efficiency and Renewable Energy; NREL is operated by the Alliance for Sustainable Energy, LLC. JJJTABLE OF CONTENTS U.S. DEPARTMENT OF ENERGY NEVADA NATIONAL SECURITY SITE ........................................34 LABORATORIES AND PLANTS ......................................................4 Current State ...............................................................................34 Cloud Vision .................................................................................34 ABOUT THIS REPORT .....................................................................8 Key Initiatives ..............................................................................34 History of Computing ...............................................................9 Evolution of Computing Models ...........................................9 OAK RIDGE NATIONAL LABORATORY ....................................36 What is Cloud Computing? ....................................................9 Current State and Future Work ............................................36 Cloud Security .............................................................................10 RightPath – DOE/NNSA Cloud Strategy ...........................11 PACIFIC NORTHWEST NATIONAL LABORATORY ..............38 Vision ..............................................................................................38
    [Show full text]
  • Intel® Technologies Advance Supercomputing for U.S. Security
    White Paper Government High Performance Computing Intel® Technologies Advance Supercomputing at National Laboratories for U.S. Security Penguin Clusters with Intel® Omni-Path Architecture and Intel® Xeon® Processors Deliver Enhanced Performance and Scalability for Department of Energy’s Workhorse Supercomputers, the Commodity Technology Systems 1 Executive Summary Table of Contents America’s nuclear stockpile of weapons is aging, yet they must be certified as Executive Summary ................1 effective without physical testing. That is the challenge of the Department of Energy’s National Nuclear Security Administration (NNSA). Without underground Protecting America—Safely .........2 testing, scientists must test, validate, and certify the nuclear stockpile using a Stockpile Stewardship Through combination of focused above-ground experiments, computational science and Computational Science. .2 simulations that run on large computing clusters. The Commodity Technology The Commodity Technology Systems 2 System 1 (CTS-1) deployment has over 15 Penguin Computing, Inc. supercomputer Scalable Computing at the Tri-labs. 2 clusters consisting of 83 Tundra* Scalable Units (SU). These clusters incorporate Building Scalable Systems with Intel® Intel® Xeon® processors and many other Intel® technologies, and Asetek* DCLC Omni-Path Architecture and Intel® direct chip cooling technologies. All clusters began deployment at the NNSA’s Architecture. 3 tri-labs in early 2016, providing the computational power for scientists to carry Driving Next-Generation
    [Show full text]
  • SPECIAL STUDY D2 Interim Report: Development of a Supercomputing Strategy in Europe
    SPECIAL STUDY D2 Interim Report: Development of a Supercomputing Strategy in Europe (SMART 2009/0055, Contract Number 2009/S99-142914) Earl C. Joseph, Ph.D. Steve Conway Chris Ingle Gabriella Cattaneo Cyril Meunier Nathaniel Martinez Author(s) IDC: Earl C. Joseph, Ph.D., Steve Conway, Chris Ingle, Gabriella Cattaneo, Cyril Meunier and Nathaniel Martinez Deliverable D2 Interim Report Date of July 7, 2010 delivery Version 2.0 Addressee Bernhard FABIANEK officers Performance Computing Officer A P.508.872.8200 F.508.935.4015 www.idc.com www.idc.com F.508.935.4015 P.508.872.8200 A European Commission DG Information Society Unit F03 — office BU25 04/087 25 Avenue Beaulieu, B-1160 Bruxelles Tel: Email: Contract ref. Contract Nr 2009/S99-142914 Global Headquarters: 5 Speen Street Framingham, MA 01701 US Framingham, MA Street Global Headquarters: 5 Speen Filing Information: April 2010, IDC #IDCWP12S : Special Study IDC OPINION This is the Interim Report (Deliverable D2) of the study, "Development of a Supercomputing Strategy in Europe" by IDC, the multinational market research and consulting company specialized in the ICT markets, on behalf of DG Information Society and Media of the European Commission. This D2 Interim Report presents the main results of the supercomputer market and industry analysis and the overview of main the technology trends and requirements (WP1 and WP2 of the study workplan). With the terms supercomputers or HPC (High Performance Computers) in this study, we refer to all technical computing servers and clusters used to solve problems that are computationally intensive or data intensive, excluding desktop computers.
    [Show full text]
  • Scyld Clusterware HPC
    Scyld ClusterWare HPC User’s Guide Scyld ClusterWare HPC: User’s Guide Revised Edition Published April 4, 2010 Copyright © 1999 - 2010 Penguin Computing, Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording or otherwise) without the prior written permission of the publisher. Scyld ClusterWare, the Highly Scyld logo, and the Penguin Computing logo are trademarks of Penguin Computing, Inc. All other trademarks and copyrights referred to are the property of their respective owners. Table of Contents Preface .....................................................................................................................................................................................v Feedback.........................................................................................................................................................................v 1. Scyld ClusterWare Overview ............................................................................................................................................1 What Is a Beowulf Cluster?............................................................................................................................................1 A Brief History of the Beowulf.............................................................................................................................1 First-Generation Beowulf Clusters .......................................................................................................................2
    [Show full text]
  • March Transcript
    i CHINA’S PURSUIT OF NEXT FRONTIER TECH: COMPUTING, ROBOTICS, AND BIOTECHNOLOGY HEARING BEFORE THE U.S.-CHINA ECONOMIC AND SECURITY REVIEW COMMISSION ONE HUNDRED FIFTEENTH CONGRESS FIRST SESSION THURSDAY, MARCH 16, 2017 Printed for use of the United States-China Economic and Security Review Commission Available via the World Wide Web: www.uscc.gov UNITED STATES-CHINA ECONOMIC AND SECURITY REVIEW COMMISSION WASHINGTON: 2017 ii U.S.-CHINA ECONOMIC AND SECURITY REVIEW COMMISSION CAROLYN BARTHOLOMEW, CHAIRMAN HON. DENNIS C. SHEA, VICE CHAIRMAN Commissioners: ROBIN CLEVELAND JONATHAN N. STIVERS SEN. BYRON L. DORGAN SEN. JAMES TALENT SEN. CARTE P. GOODWIN DR. KATHERINE C. TOBIN ROBERT G. HUBBARD MICHAEL R. WESSEL DANIEL M. SLANE DR. LARRY M. WORTZEL MICHAEL R. DANIS, Executive Director The Commission was created on October 30, 2000 by the Floyd D. Spence National Defense Authorization Act for 2001 § 1238, Public Law No. 106-398, 114 STAT. 1654A-334 (2000) (codified at 22 U.S.C. § 7002 (2001), as amended by the Treasury and General Government Appropriations Act for 2002 § 645 (regarding employment status of staff) & § 648 (regarding changing annual report due date from March to June), Public Law No. 107-67, 115 STAT. 514 (Nov. 12, 2001); as amended by Division P of the “Consolidated Appropriations Resolution, 2003,” Pub L. No. 108-7 (Feb. 20, 2003) (regarding Commission name change, terms of Commissioners, and responsibilities of the Commission); as amended by Public Law No. 109- 108 (H.R. 2862) (Nov. 22, 2005) (regarding responsibilities of Commission and applicability of FACA); as amended by Division J of the “Consolidated Appropriations Act, 2008,” Public Law Nol.
    [Show full text]