Loadleveler: Using and Administering

Total Page:16

File Type:pdf, Size:1020Kb

Loadleveler: Using and Administering IBM LoadLeveler Version 5 Release 1 Using and Administering SC23-6792-04 IBM LoadLeveler Version 5 Release 1 Using and Administering SC23-6792-04 Note Before using this information and the product it supports, read the information in “Notices” on page 423. This edition applies to version 5, release 1, modification 0 of IBM LoadLeveler (product numbers 5725-G01, 5641-LL1, 5641-LL3, 5765-L50, and 5765-LLP) and to all subsequent releases and modifications until otherwise indicated in new editions. This edition replaces SC23-6792-03. © Copyright 1986, 1987, 1988, 1989, 1990, 1991 by the Condor Design Team. © Copyright IBM Corporation 1986, 2012. US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents Figures ..............vii LoadLeveler for AIX and LoadLeveler for Linux compatibility ..............35 Tables ...............ix Restrictions for LoadLeveler for Linux ....36 Features not supported in LoadLeveler for Linux 36 Restrictions for LoadLeveler for AIX and About this information ........xi LoadLeveler for Linux mixed clusters ....36 Who should use this information .......xi Conventions and terminology used in this information ..............xi Part 2. Configuring and managing Prerequisite and related information ......xii the LoadLeveler environment . 37 How to send your comments ........xiii Chapter 4. Configuring the LoadLeveler Summary of changes ........xv environment ............39 The master configuration file ........40 Part 1. Overview of LoadLeveler Setting the LoadLeveler user .......40 concepts and operation .......1 Setting the configuration source ......41 Overriding the shared memory key .....41 File-based configuration ..........42 Chapter 1. What is LoadLeveler? ....3 Database configuration option ........43 LoadLeveler basics ............4 Understanding remotely configured nodes . 43 LoadLeveler: A network job management and Using the configuration editor ........44 scheduling system ............4 Modifying configuration data ........45 Job definition .............5 Defining LoadLeveler administrators .....45 Machine definition ...........5 Defining a LoadLeveler cluster .......45 How LoadLeveler schedules jobs .......7 Defining LoadLeveler machine characteristics . 59 How LoadLeveler daemons process jobs .....8 Defining security mechanisms .......60 The master daemon ...........9 Defining usage policies for consumable resources 65 The Schedd daemon ..........10 Gathering job accounting data .......65 The startd daemon ...........12 Managing job status through control expressions 72 The region manager daemon .......14 Tracking job processes ..........73 The resource manager daemon .......15 Querying multiple LoadLeveler clusters ....74 The kbdd daemon ...........15 Handling switch-table errors........75 The negotiator daemon .........15 Providing additional job-processing controls The LoadLeveler job cycle .........16 through installation exits .........75 LoadLeveler job states ..........19 Consumable resources ...........22 Chapter 5. Defining LoadLeveler Consumable resources and Workload Manager 23 resources to administer .......89 Overview of reservations ..........24 Fair share scheduling overview ........27 Defining machines ............89 Planning considerations for defining machines . 90 Chapter 2. Getting a quick start using Machine_group stanza format and keyword summary ..............90 the default configuration .......29 Machine substanza format and keyword What you need to know before you begin ....29 summary ..............91 Using the default configuration files ......29 Machine stanza format and keyword summary 91 LoadLeveler for Linux quick start .......30 Default values for machine_group and machine Quick installation ...........30 stanzas ...............92 Quick configuration ..........31 Examples of machine_group and machine stanzas 92 Quick verification ...........31 Dynamic adapter discovery .........93 Post-installation considerations ........32 LoadLeveler adapter and node status monitoring. 94 Starting LoadLeveler ..........32 Defining classes .............94 Directory considerations .........33 Using limit keywords ..........94 Allowing users to use a class .......97 Chapter 3. What operating systems are Class stanza format and keyword summary . 97 supported by LoadLeveler?......35 Examples: Class stanzas .........98 Defining user substanzas in class stanzas ....99 © Copyright IBM Corp. 1986, 2012 iii Examples: Substanzas ..........99 Using the ckpt_dir and ckpt_subdir keywords 143 Defining users .............102 Removing old checkpoint files.......144 User stanza format and keyword summary . 102 Using the ckpt_execute_dir keyword ....144 Examples: User stanzas .........102 Initiating a checkpoint using the ll_ckpt() API 146 Defining groups ............103 LoadLeveler scheduling affinity support ....147 Group stanza format and keyword summary 104 Configuring LoadLeveler to use scheduling Examples: Group stanzas ........104 affinity ..............148 Defining clusters ............104 LoadLeveler multicluster support.......149 Cluster stanza format and keyword summary 104 Configuring a LoadLeveler multicluster . 150 Examples: Cluster stanzas ........105 LoadLeveler Blue Gene support .......153 Defining regions ............106 Configuring LoadLeveler Blue Gene support 155 Region stanza format and keyword summary 106 Blue Gene reservation support.......157 Examples: Region stanzas ........106 Blue Gene fair share scheduling support . 157 Blue Gene heterogeneous memory support . 157 Chapter 6. Performing additional Blue Gene preemption support ......157 administrator tasks .........109 Using fair share scheduling.........158 Fair share scheduling keywords ......158 Setting up the environment for parallel jobs . 110 Reconfiguring fair share scheduling keywords 161 Scheduling considerations for parallel jobs. 110 Example: three groups share a LoadLeveler Steps for reducing job launch overhead for cluster...............161 parallel jobs .............111 Example: two thousand students share a Steps for allowing users to submit interactive LoadLeveler cluster ..........162 POE jobs ..............112 Querying information about fair share Setting up a class for parallel jobs .....112 scheduling .............163 Striping when some networks fail .....113 Resetting fair share scheduling ......163 Setting up a parallel master node......113 Saving historic data ..........163 Using the BACKFILL scheduler .......114 Restoring saved historic data .......164 Tips for using the BACKFILL scheduler . 116 Procedure for recovering a job spool......164 Example: BACKFILL scheduling ......117 Configuring and using island scheduling ....165 Data staging ..............117 Energy aware job support .........166 Configuring LoadLeveler to support data S3 state support ............166 staging ..............118 Using an external scheduler ........119 Replacing the default LoadLeveler scheduling Part 3. Submitting and managing algorithm with an external scheduler ....120 LoadLeveler jobs .........169 Customizing the configuration file to define an external scheduler ...........121 Chapter 7. Building and submitting Example: Retrieving specific information . 122 Example: Changing scheduler types ......122 jobs ...............171 Preempting and resuming jobs .......122 Building a job command file ........171 Overview of preemption ........123 Using multiple steps in a job command file . 172 Planning to preempt jobs ........124 Examples: Job command files .......173 Steps for configuring a scheduler to preempt Editing job command files .........176 jobs ...............126 Defining resources for a job step .......177 Configuring LoadLeveler to support reservations 127 Submitting jobs requesting data staging ....177 Steps for configuring reservations in a Working with coscheduled job steps......178 LoadLeveler cluster ..........128 Submitting coscheduled job steps......178 Steps for integrating LoadLeveler with the Determining priority for coscheduled job steps 178 Workload Manager ...........133 Supporting preemption of coscheduled job steps 179 LoadLeveler support for checkpointing jobs . 135 Coscheduled job steps and commands and APIs 179 Checkpoint keyword summary ......136 Termination of coscheduled steps......179 Planning considerations for checkpointing jobs 136 Using bulk data transfer..........180 Additional planning considerations for Preparing a job for checkpoint/restart .....180 checkpointing MetaCluster HPC jobs on AIX . 138 Preparing a job for preemption .......183 Checkpoint and restart limitations .....138 Submitting a job command file .......183 Submitting a MetaCluster HPC checkpoint job to Job state monitoring ..........184 LoadLeveler ..............138 Submitting a job using a submit-only machine 184 job_1.cmd - A checkpointable job command file 138 Working with parallel jobs .........184 Using the llckpt command to checkpoint a job Step for controlling whether LoadLeveler copies step ...............139 environment variables to all executing nodes . 185 Restarting a job step from a checkpoint....140 Making periodic checkpoints .......142 iv LoadLeveler: Using and Administering Ensuring that parallel jobs in a cluster run on 64-bit support for configuration file keywords the correct levels of PE and LoadLeveler and expressions ...........232 software ..............185 Configuration keyword descriptions......233 Task-assignment considerations ......186 User-defined keywords ..........284 Submitting jobs that use striping
Recommended publications
  • Featuring Independent Software Vendors
    Software Series 32000 Catalog Featuring Independent Software Vendors Software Series 32000 Catalog Featuring Independent Software Vendors .. - .. a National Semiconductor Corporation GENIX, Series 32000, ISE, SYS32, 32016 and 32032 are trademarks of National Semiconductor Corp. IBM is a trademark of International Business Machines Corp. VAX 11/730, 750, 780, PDP-II Series, LSl-11 Series, RSX-11 M, RSX-11 M PLUS, DEC PRO, MICRO VAX, VMS, and RSTS are trademarks of Digital Equipment Corp. 8080, 8086, 8088, and 80186 are trademarks of Intel Corp. Z80, Z80A, Z8000, and Z80000 are trademarks of Zilog Corp. UNIX is a trademark of AT&T Bell Laboratories. MS-DOS, XENIX, and XENIX-32 are trademarks of Microsoft Corp. NOVA, ECLIPSE, and AoS are trademarks of Data General Corp. CP/M 2.2, CP/M-80, CP/M-86 are registered trademarks of Digital Research, Inc. Concurrent DOS is a trademark of Digital Research, Inc. PRIMOS is a trademark of Prime Computer Corp. UNITY is a trademark of Human Computing Resources Corp. Introduction Welcome to the exciting world of As a result, the Independent software products for National's Software Vendor (ISV) can provide advanced 32-bit Microprocessor solutions to customer software family, Series 32000. We have problems by offering the same recently renamed our 32-bit software package independent of microprocessor products from the the particular Series 32000 CPU. NS16000 family to Series 32000. This software catalog was This program was effective created to organize, maintain, and immediately following the signing disseminate Series 32000 software of Texas Instruments, Inc. as our information. It includes the software second source for the Series vendor, contact information, and 32000.
    [Show full text]
  • {PDF} MAC Facil Ebook, Epub
    MAC FACIL PDF, EPUB, EBOOK Pivovarnick | none | 27 Feb 1995 | Prentice Hall (a Pearson Education Company) | 9789688804735 | English, Spanish | Hemel Hempstead, United Kingdom MAC Facil PDF Book It's nice to have the options for a floating window or menu bar icon. Your browser does not support the video tag. There are two main disk images we use to carry MacOS versions. Digitally embody awesome characters using just a webcam Get started now Watch it in action. Did you click the "Add People" button at the bottom of the People album? Are you saying to keep Photos open but not use it or should I shut down the app? This information is in the link that Idris provided, which you apparently did not read before responding. You don't need to keep Photos open. Dec 6, AM in response to Esquared In response to Esquared I beg to differ, having designed computers for many years. Otherwise, it would be considered non-secure. Enjoy this tip? While future Macs may have the added hardware required, they currently do not. I don't think you understand how seriously Apple takes security. Apple has added new features, improvements, and bug fixes to this version of MacOS. Bomb Dodge. No menus, no sliders—simply natural text to control your timer. Ask other users about this article Ask other users about this article. Then enter your Apple ID and password and click Deauthorize. People that you mark as favorites appear in large squares at the top of the window. The magical Faces feature is based on facial detection and recognition technologies.
    [Show full text]
  • A Survey of Engineering of Formally Verified Software
    The version of record is available at: http://dx.doi.org/10.1561/2500000045 QED at Large: A Survey of Engineering of Formally Verified Software Talia Ringer Karl Palmskog University of Washington University of Texas at Austin [email protected] [email protected] Ilya Sergey Milos Gligoric Yale-NUS College University of Texas at Austin and [email protected] National University of Singapore [email protected] Zachary Tatlock University of Washington [email protected] arXiv:2003.06458v1 [cs.LO] 13 Mar 2020 Errata for the present version of this paper may be found online at https://proofengineering.org/qed_errata.html The version of record is available at: http://dx.doi.org/10.1561/2500000045 Contents 1 Introduction 103 1.1 Challenges at Scale . 104 1.2 Scope: Domain and Literature . 105 1.3 Overview . 106 1.4 Reading Guide . 106 2 Proof Engineering by Example 108 3 Why Proof Engineering Matters 111 3.1 Proof Engineering for Program Verification . 112 3.2 Proof Engineering for Other Domains . 117 3.3 Practical Impact . 124 4 Foundations and Trusted Bases 126 4.1 Proof Assistant Pre-History . 126 4.2 Proof Assistant Early History . 129 4.3 Proof Assistant Foundations . 130 4.4 Trusted Computing Bases of Proofs and Programs . 137 5 Between the Engineer and the Kernel: Languages and Automation 141 5.1 Styles of Automation . 142 5.2 Automation in Practice . 156 The version of record is available at: http://dx.doi.org/10.1561/2500000045 6 Proof Organization and Scalability 162 6.1 Property Specification and Encodings .
    [Show full text]
  • Top Functional Programming Languages Based on Sentiment Analysis 2021 11
    POWERED BY: TOP FUNCTIONAL PROGRAMMING LANGUAGES BASED ON SENTIMENT ANALYSIS 2021 Functional Programming helps companies build software that is scalable, and less prone to bugs, which means that software is more reliable and future-proof. It gives developers the opportunity to write code that is clean, elegant, and powerful. Functional Programming is used in demanding industries like eCommerce or streaming services in companies such as Zalando, Netflix, or Airbnb. Developers that work with Functional Programming languages are among the highest paid in the business. I personally fell in love with Functional Programming in Scala, and that’s why Scalac was born. I wanted to encourage both companies, and developers to expect more from their applications, and Scala was the perfect answer, especially for Big Data, Blockchain, and FinTech solutions. I’m glad that my marketing and tech team picked this topic, to prepare the report that is focused on sentiment - because that is what really drives people. All of us want to build effective applications that will help businesses succeed - but still... We want to have some fun along the way, and I believe that the Functional Programming paradigm gives developers exactly that - fun, and a chance to clearly express themselves solving complex challenges in an elegant code. LUKASZ KUCZERA, CEO AT SCALAC 01 Table of contents Introduction 03 What Is Functional Programming? 04 Big Data and the WHY behind the idea of functional programming. 04 Functional Programming Languages Ranking 05 Methodology 06 Brand24
    [Show full text]
  • Comparing UNIX with Other Systems
    Comparing UNIX with other systems Timothy DO' Chase Corporate Computer systems, Inc. 33 West Main Street Holmdel, New Jersey he original concept behind this article was to make a grand comparison between UNIX Tand several other well known systems. This was to beall encompassing and packed with vital information summarized in neat charts, tables and graphs. As the work began, the realization settled in that this was not only difficult to do, but would result in a work so boring as to beincomprehensible. The reader, faced with such awealth ofinformation would be lost at best. Conclusions would be difficult to draw and, in short, the result would be worthless. Mtertearfully filling my waste basket with the initial efforts, I regrouped and began by as­ king myself why would anyone be interested in comparing UNIX with another operating system? There appears to be only two answers. First, one might hope to learn something about UNIX by analogy. IfI understand the file system on MPEN and someone tells me that UNIX is like that except for such and so, then I might be more quickly able to under­ stand UNIX. This I felt was an unlikely motivation. After all, there are much simpler ways to learn UNIX. Instead, the motivation for comparing UNIX to other systems must come from a need to evaluate UNIX. Ifwe are aware of the features or short comings ofother systems, then we can benefit by evaluating UNIX relative to those systems. Choosing an operating system or computer is a major decision which we can benefit from or be stuck with for a long time.
    [Show full text]
  • David Colignon, Ulg
    Introduction to Parallel Programming in OpenMP David Colignon, ULg CÉCI - Consortium des Équipements de Calcul Intensif http://www.ceci-hpc.be Main References • “Parallel Programming with GCC”, Diego Novillo, Red Hat Red Hat Summit, Nashville, May 2006 http://www.airs.com/dnovillo/Papers/rhs2006.pdf • "An Overview of OpenMP", Ruud van der Pas, Oracle IWOMP 2010, Tsukuba, 14-16 June 2010 http://www.compunity.org/training/tutorials/3 Overview_OpenMP.pdf and http://openmp.org/wp/2010/07/iwomp-2010-material-available/ More References: Specification OpenMP, The OpenMP API specification for parallel programming http://openmp.org/ Articles Wikipedia (good summary) http://en.wikipedia.org/wiki/Openmp 32 OpenMP traps for C++ developers http://software.intel.com/en-us/articles/32-openmp-traps-for-c-developers/ Common Mistakes in OpenMP and How To Avoid Them http://www.michaelsuess.net/.../suess_leopold_common_mistakes_06.pdf IWOMP 2009, The 2009 International Workshop on OpenMP (Slides) http://openmp.org/wp/2009/06/iwomp2009/ IWOMP 2010, The 2010 International Workshop on OpenMP (Slides) http://openmp.org/wp/2010/07/iwomp-2010-material-available/ Avoiding and Identifying False Sharing Among Threads http://software.intel.com/en-us/articles/avoiding-and-identifying-false-sharing Tutorials Parallel Programming with GCC, D. Novillo, Red Hat Summit, Nashville, May 2006 http://www.airs.com/dnovillo/Papers/rhs2006.pdf An Overview of OpenMP, IWOMP 2010, Ruud van der Pas, Oracle http://www.compunity.org/training/tutorials/3 Overview_OpenMP.pdf A "Hands-on" Introduction to OpenMP, SC08, Mattsonand Meadows, Intel http://www.openmp.org/mp-documents/omp-hands-on-SC08.pdf Cours OpenMP (en français !) de l'IDRIS http://www.idris.fr/data/cours/parallel/openmp/ Using OpenMP, SC09, Hartman-Baker R., ORNL, NCCS http://www.greatlakesconsortium.org/events/scaling/files/openmp09.pdf OpenMP Tutorial, Barney B., LLNL https://computing.llnl.gov/tutorials/openMP/ Books Using OpenMP - Portable Shared Memory Parallel Programming, by Chapman et al.
    [Show full text]
  • A Short UNIX History How Our Culture Created Linux
    A Short UNIX History or How Our Culture Created Linux Clement T. Cole Witch Doctor Compaq [email protected] A UNIX Family History RIG CMU CMU CMU CMU OSF1 Tru64 Linux Accent Mach Mach Mach Linux . 1.X Other Players 2.5 3.0 99 Minix Multics Idris BBN GNU C 386/BSD FreeBSD Generic TCP/IP Net 2.0 Net 1.0 4BSD 4.1A UCB BSD 2BSD 3BSD 4.1BSD 4.2BSD 4.3BSD 4.3Tahoe 4.3Reno 4.4BSD Research X Windows X 10 X 11 32V 1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th Ed Ed Ed Ed Ed Ed Ed Ed Ed Ed NT/OS2 NT/Win PWB PWB/UNIX PWB 1.0 PWB 2.0 Sys III Sys V Sys V. SVR3 SVR4 SVR4/ 2 ESMP µSoft/SCO µSoft/Xenix SCO/Xenix SCO/UNIX UNIXWARE ‘72 ‘73 ‘74 ‘75 ‘76 ‘77 ‘78 ‘79 ‘80 ‘81 ‘82 ‘83 ‘84 ‘85 ‘86 ‘87 ‘88 ‘89 ‘90 ‘91 ‘92 ‘93 Clem Cole Themes ◆ Something new is really something old. ◆ The Open Source Culture predates UNIX and Linux. ◆ Evolution is good. ◆ Fighting is not always bad, but learn when it’s good enough and stop fighting. Clem Cole Agenda ◆ Technical History ◆ Legal History ◆ What it all means Clem Cole A Word on Engineers “Good programmers write good programs. Great programmers start and build upon other great programmer’s work.” Unknown origin, often attributed to Fred Brooks. Clem Cole Multics ◆ MULTiplexed Information and Computing Service ❖ or “Many Unbelievably Large Tables In Core Simultaneously.” ❖ Actually very cool system, see: The Multics system; an Examination of its Structure, Elliott I.
    [Show full text]
  • Installation Procedures for Clusters
    Moreno Baricevic CNR-IOM DEMOCRITOS Trieste, ITALY InstallationInstallation ProceduresProcedures forfor ClustersClusters PART 3 – Cluster Management Tools and Security AgendaAgenda Cluster Services Overview on Installation Procedures Configuration and Setup of a NETBOOT Environment Troubleshooting ClusterCluster ManagementManagement ToolsTools NotesNotes onon SecuritySecurity Hands-on Laboratory Session 2 ClusterCluster managementmanagement toolstools CLUSTERCLUSTER MANAGEMENTMANAGEMENT AdministrationAdministration ToolsTools Requirements: ✔ cluster-wide command execution ✔ cluster-wide file distribution and gathering ✔ password-less environment ✔ must be simple, efficient, easy to use for CLI addicted 4 CLUSTERCLUSTER MANAGEMENTMANAGEMENT AdministrationAdministration ToolsTools C3 tools – The Cluster Command and Control tool suite allows configurable clusters and subsets of machines concurrently execution of commands supplies many utilities cexec (parallel execution of standard commands on all cluster nodes) cexecs (as the above but serial execution, useful for troubleshooting and debugging) cpush (distribute files or directories to all cluster nodes) cget (retrieves files or directory from all cluster nodes) crm (cluster-wide remove) ... and many more PDSH – Parallel Distributed SHell same features as C3 tools, few utilities pdsh, pdcp, rpdcp, dshbak Cluster-Fork – NPACI Rocks serial execution only ClusterSSH multiple xterm windows handled through one input grabber Spawn an xterm for each node! DO NOT EVEN TRY IT ON A LARGE CLUSTER!
    [Show full text]
  • Bare Metal Provisioning to Openstack Using Xcat
    JOURNAL OF COMPUTERS, VOL. 8, NO. 7, JULY 2013 1691 Bare Metal Provisioning to OpenStack Using xCAT Jun Xie Network Information Center, BUPT, Beijing, China Email: [email protected] Yujie Su, Zhaowen Lin, Yan Ma and Junxue Liang Network Information Center, BUPT, Beijing, China Email: {suyj, linzw, mayan, liangjx}@bupt.edu.cn Abstract—Cloud computing relies heavily on virtualization mainly built on virtualization technologies (Xen, KVM, technologies. This also applies to OpenStack, which is etc.). currently the most popular IaaS platform; it mainly provides virtual machines to cloud end users. However, B. Motivation virtualization unavoidably brings some performance The adoption of virtualization in cloud environment penalty, contrasted with bare metal provisioning. In this has resulted in great benefits, however, this has not been paper, we present our approach for extending OpenStack to without attendant problems,such as unresponsive support bare metal provisioning through xCAT(Extreme Cloud Administration Toolkit). As a result, the cloud virtualized systems, crashed virtualized servers, mis- platform could be deployed to provide both virtual configured virtual hosting platforms, performance tuning machines and bare metal machines. This paper firstly and erratic performance metrics, among others. introduces why bare metal machines are desirable in a cloud Consequently, we might want to run the compute jobs on platform, then it describes OpenStack briefly and also the "bare metal”: without a virtualization layer, Consider the xCAT driver, which makes xCAT work with the rest of the following cases: OpenStack platform in order to provision bare metal Case 1: Your OpenStack cloud environment contains machines to cloud end users. At the end, it presents a boards with Tilera processors (non-X86), and Tilera does performance comparison between a virtualized and bare- not currently support virtualization technologies like metal environment KVM.
    [Show full text]
  • Xcat 2 Cookbook for Linux 8/7/2008 (Valid for Both Xcat 2.0.X and Pre-Release 2.1)
    xCAT 2 Cookbook for Linux 8/7/2008 (Valid for both xCAT 2.0.x and pre-release 2.1) Table of Contents 1.0 Introduction.........................................................................................................................................3 1.1 Stateless and Stateful Choices.........................................................................................................3 1.2 Scenarios..........................................................................................................................................4 1.2.1 Simple Cluster of Rack-Mounted Servers – Stateful Nodes....................................................4 1.2.2 Simple Cluster of Rack-Mounted Servers – Stateless Nodes..................................................4 1.2.3 Simple BladeCenter Cluster – Stateful or Stateless Nodes......................................................4 1.2.4 Hierarchical Cluster - Stateless Nodes.....................................................................................5 1.3 Other Documentation Available......................................................................................................5 1.4 Cluster Naming Conventions Used in This Document...................................................................5 2.0 Installing the Management Node.........................................................................................................6 2.1 Prepare the Management Node.......................................................................................................6
    [Show full text]
  • Torqueadminguide-5.1.0.Pdf
    Welcome Welcome Welcome to the TORQUE Adminstrator Guide, version 5.1.0. This guide is intended as a reference for both users and system administrators. For more information about this guide, see these topics: l Overview l Introduction Overview This section contains some basic information about TORQUE, including how to install and configure it on your system. For details, see these topics: l TORQUE Installation Overview on page 1 l Initializing/Configuring TORQUE on the Server (pbs_server) on page 10 l Advanced Configuration on page 17 l Manual Setup of Initial Server Configuration on page 32 l Server Node File Configuration on page 33 l Testing Server Configuration on page 35 l TORQUE on NUMA Systems on page 37 l TORQUE Multi-MOM on page 42 TORQUE Installation Overview This section contains information about TORQUE architecture and explains how to install TORQUE. It also describes how to install TORQUE packages on compute nodes and how to enable TORQUE as a service. For details, see these topics: l TORQUE Architecture on page 2 l Installing TORQUE on page 2 l Compute Nodes on page 7 l Enabling TORQUE as a Service on page 9 Related Topics Troubleshooting on page 132 TORQUE Installation Overview 1 Overview TORQUE Architecture A TORQUE cluster consists of one head node and many compute nodes. The head node runs the pbs_server daemon and the compute nodes run the pbs_mom daemon. Client commands for submitting and managing jobs can be installed on any host (including hosts not running pbs_server or pbs_mom). The head node also runs a scheduler daemon.
    [Show full text]
  • Magellan Final Report
    The Magellan Report on Cloud Computing for Science U.S. Department of Energy Office of Advanced Scientific Computing Research (ASCR) December, 2011 CSO 23179 The Magellan Report on Cloud Computing for Science U.S. Department of Energy Office of Science Office of Advanced Scientific Computing Research (ASCR) December, 2011 Magellan Leads Katherine Yelick, Susan Coghlan, Brent Draney, Richard Shane Canon Magellan Staff Lavanya Ramakrishnan, Adam Scovel, Iwona Sakrejda, Anping Liu, Scott Campbell, Piotr T. Zbiegiel, Tina Declerck, Paul Rich Collaborators Nicholas J. Wright Jeff Broughton Rollin Thomas Brian Toonen Richard Bradshaw Michael A. Guantonio Karan Bhatia Alex Sim Shreyas Cholia Zacharia Fadikra Henrik Nordberg Kalyan Kumaran Linda Winkler Levi J. Lester Wei Lu Ananth Kalyanraman John Shalf Devarshi Ghoshal Eric R. Pershey Michael Kocher Jared Wilkening Ed Holohann Vitali Morozov Doug Olson Harvey Wasserman Elif Dede Dennis Gannon Jan Balewski Narayan Desai Tisha Stacey CITRIS/University of STAR Collboration Krishna Muriki Madhusudhan Govindaraju California, Berkeley Linda Vu Victor Markowitz Gabriel A. West Greg Bell Yushu Yao Shucai Xiao Daniel Gunter Nicholas Dale Trebon Margie Wylie Keith Jackson William E. Allcock K. John Wu John Hules Nathan M. Mitchell David Skinner Brian Tierney Jon Bashor This research used resources of the Argonne Leadership Computing Facility at Argonne National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under contract DE-AC02¬06CH11357, funded through the American Recovery and Reinvestment Act of 2009. Resources and research at NERSC at Lawrence Berkeley National Laboratory were funded by the Department of Energy from the American Recovery and Reinvestment Act of 2009 under contract number DE-AC02-05CH11231.
    [Show full text]