Sun HPC Clustertools 8.2.1C Software User's Guide
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
The Component Architecture of Open Mpi: Enabling Third-Party Collective Algorithms∗
THE COMPONENT ARCHITECTURE OF OPEN MPI: ENABLING THIRD-PARTY COLLECTIVE ALGORITHMS∗ Jeffrey M. Squyres and Andrew Lumsdaine Open Systems Laboratory, Indiana University Bloomington, Indiana, USA [email protected] [email protected] Abstract As large-scale clusters become more distributed and heterogeneous, significant research interest has emerged in optimizing MPI collective operations because of the performance gains that can be realized. However, researchers wishing to develop new algorithms for MPI collective operations are typically faced with sig- nificant design, implementation, and logistical challenges. To address a number of needs in the MPI research community, Open MPI has been developed, a new MPI-2 implementation centered around a lightweight component architecture that provides a set of component frameworks for realizing collective algorithms, point-to-point communication, and other aspects of MPI implementations. In this paper, we focus on the collective algorithm component framework. The “coll” framework provides tools for researchers to easily design, implement, and exper- iment with new collective algorithms in the context of a production-quality MPI. Performance results with basic collective operations demonstrate that the com- ponent architecture of Open MPI does not introduce any performance penalty. Keywords: MPI implementation, Parallel computing, Component architecture, Collective algorithms, High performance 1. Introduction Although the performance of the MPI collective operations [6, 17] can be a large factor in the overall run-time of a parallel application, their optimiza- tion has not necessarily been a focus in some MPI implementations until re- ∗This work was supported by a grant from the Lilly Endowment and by National Science Foundation grant 0116050. -
Sun Microsystems: Sun Java Workstation W1100z
CFP2000 Result spec Copyright 1999-2004, Standard Performance Evaluation Corporation Sun Microsystems SPECfp2000 = 1787 Sun Java Workstation W1100z SPECfp_base2000 = 1637 SPEC license #: 6 Tested by: Sun Microsystems, Santa Clara Test date: Jul-2004 Hardware Avail: Jul-2004 Software Avail: Jul-2004 Reference Base Base Benchmark Time Runtime Ratio Runtime Ratio 1000 2000 3000 4000 168.wupwise 1600 92.3 1733 71.8 2229 171.swim 3100 134 2310 123 2519 172.mgrid 1800 124 1447 103 1755 173.applu 2100 150 1402 133 1579 177.mesa 1400 76.1 1839 70.1 1998 178.galgel 2900 104 2789 94.4 3073 179.art 2600 146 1786 106 2464 183.equake 1300 93.7 1387 89.1 1460 187.facerec 1900 80.1 2373 80.1 2373 188.ammp 2200 158 1397 154 1425 189.lucas 2000 121 1649 122 1637 191.fma3d 2100 135 1552 135 1552 200.sixtrack 1100 148 742 148 742 301.apsi 2600 170 1529 168 1549 Hardware Software CPU: AMD Opteron (TM) 150 Operating System: Red Hat Enterprise Linux WS 3 (AMD64) CPU MHz: 2400 Compiler: PathScale EKO Compiler Suite, Release 1.1 FPU: Integrated Red Hat gcc 3.5 ssa (from RHEL WS 3) PGI Fortran 5.2 (build 5.2-0E) CPU(s) enabled: 1 core, 1 chip, 1 core/chip AMD Core Math Library (Version 2.0) for AMD64 CPU(s) orderable: 1 File System: Linux/ext3 Parallel: No System State: Multi-user, Run level 3 Primary Cache: 64KBI + 64KBD on chip Secondary Cache: 1024KB (I+D) on chip L3 Cache: N/A Other Cache: N/A Memory: 4x1GB, PC3200 CL3 DDR SDRAM ECC Registered Disk Subsystem: IDE, 80GB, 7200RPM Other Hardware: None Notes/Tuning Information A two-pass compilation method is -
Cray Scientific Libraries Overview
Cray Scientific Libraries Overview What are libraries for? ● Building blocks for writing scientific applications ● Historically – allowed the first forms of code re-use ● Later – became ways of running optimized code ● Today the complexity of the hardware is very high ● The Cray PE insulates users from this complexity • Cray module environment • CCE • Performance tools • Tuned MPI libraries (+PGAS) • Optimized Scientific libraries Cray Scientific Libraries are designed to provide the maximum possible performance from Cray systems with minimum effort. Scientific libraries on XC – functional view FFT Sparse Dense Trilinos BLAS FFTW LAPACK PETSc ScaLAPACK CRAFFT CASK IRT What makes Cray libraries special 1. Node performance ● Highly tuned routines at the low-level (ex. BLAS) 2. Network performance ● Optimized for network performance ● Overlap between communication and computation ● Use the best available low-level mechanism ● Use adaptive parallel algorithms 3. Highly adaptive software ● Use auto-tuning and adaptation to give the user the known best (or very good) codes at runtime 4. Productivity features ● Simple interfaces into complex software LibSci usage ● LibSci ● The drivers should do it all for you – no need to explicitly link ● For threads, set OMP_NUM_THREADS ● Threading is used within LibSci ● If you call within a parallel region, single thread used ● FFTW ● module load fftw (there are also wisdom files available) ● PETSc ● module load petsc (or module load petsc-complex) ● Use as you would your normal PETSc build ● Trilinos ● -
Introduchon to Arm for Network Stack Developers
Introducon to Arm for network stack developers Pavel Shamis/Pasha Principal Research Engineer Mvapich User Group 2017 © 2017 Arm Limited Columbus, OH Outline • Arm Overview • HPC SoLware Stack • Porng on Arm • Evaluaon 2 © 2017 Arm Limited Arm Overview © 2017 Arm Limited An introduc1on to Arm Arm is the world's leading semiconductor intellectual property supplier. We license to over 350 partners, are present in 95% of smart phones, 80% of digital cameras, 35% of all electronic devices, and a total of 60 billion Arm cores have been shipped since 1990. Our CPU business model: License technology to partners, who use it to create their own system-on-chip (SoC) products. We may license an instrucBon set architecture (ISA) such as “ARMv8-A”) or a specific implementaon, such as “Cortex-A72”. …and our IP extends beyond the CPU Partners who license an ISA can create their own implementaon, as long as it passes the compliance tests. 4 © 2017 Arm Limited A partnership business model A business model that shares success Business Development • Everyone in the value chain benefits Arm Licenses technology to Partner • Long term sustainability SemiCo Design once and reuse is fundamental IP Partner Licence fee • Spread the cost amongst many partners Provider • Technology reused across mulBple applicaons Partners develop • Creates market for ecosystem to target chips – Re-use is also fundamental to the ecosystem Royalty Upfront license fee OEM • Covers the development cost Customer Ongoing royalBes OEM sells • Typically based on a percentage of chip price -
Sun Microsystems
OMPM2001 Result spec Copyright 1999-2002, Standard Performance Evaluation Corporation Sun Microsystems SPECompMpeak2001 = 11223 Sun Fire V40z SPECompMbase2001 = 10862 SPEC license #:HPG0010 Tested by: Sun Microsystems, Santa Clara Test site: Menlo Park Test date: Feb-2005 Hardware Avail:Apr-2005 Software Avail:Apr-2005 Reference Base Base Peak Peak Benchmark Time Runtime Ratio Runtime Ratio 10000 20000 30000 310.wupwise_m 6000 432 13882 431 13924 312.swim_m 6000 424 14142 401 14962 314.mgrid_m 7300 622 11729 622 11729 316.applu_m 4000 536 7463 419 9540 318.galgel_m 5100 483 10561 481 10593 320.equake_m 2600 240 10822 240 10822 324.apsi_m 3400 297 11442 282 12046 326.gafort_m 8700 869 10011 869 10011 328.fma3d_m 4600 608 7566 608 7566 330.art_m 6400 226 28323 226 28323 332.ammp_m 7000 1359 5152 1359 5152 Hardware Software CPU: AMD Opteron (TM) 852 OpenMP Threads: 4 CPU MHz: 2600 Parallel: OpenMP FPU: Integrated Operating System: SUSE LINUX Enterprise Server 9 CPU(s) enabled: 4 cores, 4 chips, 1 core/chip Compiler: PathScale EKOPath Compiler Suite, Release 2.1 CPU(s) orderable: 1,2,4 File System: ufs Primary Cache: 64KBI + 64KBD on chip System State: Multi-User Secondary Cache: 1024KB (I+D) on chip L3 Cache: None Other Cache: None Memory: 16GB (8x2GB, PC3200 CL3 DDR SDRAM ECC Registered) Disk Subsystem: SCSI, 73GB, 10K RPM Other Hardware: None Notes/Tuning Information Baseline Optimization Flags: Fortran : -Ofast -OPT:early_mp=on -mcmodel=medium -mp C : -Ofast -mp Peak Optimization Flags: 310.wupwise_m : -mp -Ofast -mcmodel=medium -LNO:prefetch_ahead=5:prefetch=3 -
Best Practice Guide - Generic X86 Vegard Eide, NTNU Nikos Anastopoulos, GRNET Henrik Nagel, NTNU 02-05-2013
Best Practice Guide - Generic x86 Vegard Eide, NTNU Nikos Anastopoulos, GRNET Henrik Nagel, NTNU 02-05-2013 1 Best Practice Guide - Generic x86 Table of Contents 1. Introduction .............................................................................................................................. 3 2. x86 - Basic Properties ................................................................................................................ 3 2.1. Basic Properties .............................................................................................................. 3 2.2. Simultaneous Multithreading ............................................................................................. 4 3. Programming Environment ......................................................................................................... 5 3.1. Modules ........................................................................................................................ 5 3.2. Compiling ..................................................................................................................... 6 3.2.1. Compilers ........................................................................................................... 6 3.2.2. General Compiler Flags ......................................................................................... 6 3.2.2.1. GCC ........................................................................................................ 6 3.2.2.2. Intel ........................................................................................................ -
Pathscale ENZO - 2011-04-15 by Vincent - Streamcomputing
PathScale ENZO - 2011-04-15 by vincent - StreamComputing - http://www.streamcomputing.eu PathScale ENZO by vincent – Friday, 15 April 2011 http://www.streamcomputing.eu/blog/2011-04-15/pathscale-enzo/ My todo-list gets too large, because there seem so many things going on in GPGPU-world. Therefore the following article is not really complete, but I hope it gives you an idea of the product. ENZO was presented as the alternative to CUDA and OpenCL. In that light I compared them to Intel Array Building Blocks a few weeks ago, not that their technique is comparable in a technical way. PathScale’s CTO mailed me and tried to explain what ENZO really is. This article consists mainly of what he (Mr. C. Bergström) told me. Any questions you have, I will make sure he receives them. ENZO ENZO is a complete GPGPU solution and ecosystem of tools that provide full support for NVIDIA Tesla (kernel driver, runtime, assembler, code generation, front-end programming model and various other things to make a developer’s life easier). Right now it supports the HMPP C/Fortran programming model. HMPP is an open standard jointly developed by CAPS and PathScale. I’ve mentioned HMPP before, as it can translate Fortran and C-code to OpenCL, CUDA and other languages. ENZO’s implementation differs from HMPP by using native front-ends and does hardware optimised code generation. You can learn ENZO in 5 minutes if you’ve done any OpenMP-like programming in the past. For example the Fortran-code (sorry for the missing indenting): !$hmpp simple codelet, target=TESLA1 -
Infiniband and 10-Gigabit Ethernet for Dummies
The MVAPICH2 Project: Latest Developments and Plans Towards Exascale Computing Presentation at OSU Booth (SC ‘19) by Hari Subramoni The Ohio State University E-mail: [email protected] http://www.cse.ohio-state.edu/~subramon Drivers of Modern HPC Cluster Architectures High Performance Accelerators / Coprocessors Interconnects - InfiniBand high compute density, high Multi-core <1usec latency, 200Gbps performance/watt SSD, NVMe-SSD, Processors Bandwidth> >1 TFlop DP on a chip NVRAM • Multi-core/many-core technologies • Remote Direct Memory Access (RDMA)-enabled networking (InfiniBand and RoCE) • Solid State Drives (SSDs), Non-Volatile Random-Access Memory (NVRAM), NVMe-SSD • Accelerators (NVIDIA GPGPUs and Intel Xeon Phi) • Available on HPC Clouds, e.g., Amazon EC2, NSF Chameleon, Microsoft Azure, etc. NetworkSummit Based Computing Sierra Sunway K - Laboratory OSU Booth SC’19TaihuLight Computer 2 Parallel Programming Models Overview P1 P2 P3 P1 P2 P3 P1 P2 P3 Memo Memo Memo Logical shared memory Memor Memor Memor Shared Memory ry ry ry y y y Shared Memory Model Distributed Memory Model Partitioned Global Address Space (PGAS) SHMEM, DSM MPI (Message Passing Interface)Global Arrays, UPC, Chapel, X10, CAF, … • Programming models provide abstract machine models • Models can be mapped on different types of systems – e.g. Distributed Shared Memory (DSM), MPI within a node, etc. • PGAS models and Hybrid MPI+PGAS models are Network Based Computing Laboratorygradually receiving importanceOSU Booth SC’19 3 Designing Communication Libraries for Multi-Petaflop and Exaflop Systems: Challenges Co-DesignCo-Design Application Kernels/Applications OpportuniOpportuni tiesties andand Middleware ChallengeChallenge ss acrossacross Programming Models VariousVarious MPI, PGAS (UPC, Global Arrays, OpenSHMEM), LayersLayers CUDA, OpenMP, OpenACC, Cilk, Hadoop (MapReduce), Spark (RDD, DAG), etc. -
User Manual Revision: C96e096
Bright Cluster Manager 8.1 User Manual Revision: c96e096 Date: Fri Sep 14 2018 ©2018 Bright Computing, Inc. All Rights Reserved. This manual or parts thereof may not be reproduced in any form unless permitted by contract or by written permission of Bright Computing, Inc. Trademarks Linux is a registered trademark of Linus Torvalds. PathScale is a registered trademark of Cray, Inc. Red Hat and all Red Hat-based trademarks are trademarks or registered trademarks of Red Hat, Inc. SUSE is a registered trademark of Novell, Inc. PGI is a registered trademark of NVIDIA Corporation. FLEXlm is a registered trademark of Flexera Software, Inc. PBS Professional, PBS Pro, and Green Provisioning are trademarks of Altair Engineering, Inc. All other trademarks are the property of their respective owners. Rights and Restrictions All statements, specifications, recommendations, and technical information contained herein are current or planned as of the date of publication of this document. They are reliable as of the time of this writing and are presented without warranty of any kind, expressed or implied. Bright Computing, Inc. shall not be liable for technical or editorial errors or omissions which may occur in this document. Bright Computing, Inc. shall not be liable for any damages resulting from the use of this document. Limitation of Liability and Damages Pertaining to Bright Computing, Inc. The Bright Cluster Manager product principally consists of free software that is licensed by the Linux authors free of charge. Bright Computing, Inc. shall have no liability nor will Bright Computing, Inc. provide any warranty for the Bright Cluster Manager to the extent that is permitted by law. -
Setting up a Beowulf Cluster Using Open MPI on Linux
Setting up a Beowulf Cluster Using Open MPI on Linux https://techtinkering.com/2009/12/02/setting-up-a-beowulf-c... TechTinkering ============= Twitter YouTube Email GitHub RSS Setting up a Beowulf Cluster Using Open MPI on Linux 2 December 2009 Lawrence Woodman #Beowulf Clusters #Distributed Processing #High Performance Computing #Linux #MPI #Parallel Processing I have been doing a lot of work recently on Linear Genetic Programming. This requires a great deal of processing power and to meet this I have been using Open MPI to create a Linux cluster. What follows is a quick guide to getting a cluster running. The basics really are very simple and, depending on the size, you can get a simple cluster running in less than half an hour, assuming you already have the machines networked and running Linux. What is a Beowulf Cluster? A Beowulf Cluster is a collection of privately networked computers which can be used for parallel computations. They consist of commodity hardware running open source software such as Linux or a BSD, often coupled with PVM (Parallel Virtual Machine) or MPI (Message Passing Interface). A standard set up will consist of one master node which will control a number of slave nodes. The slave nodes are typically headless and generally all access the same files from a server. They have been referred to as Beowulf Clusters since before 1998 when the original Beowulf HOWTO was released. What is Open MPI? Open MPI is one of the leading MPI-2 implementations used by many of the TOP 500 Supercomputers. It is open source and is developed and maintained by a consortium of academic, research, and industry partners. -
Qlogic Pathscale™ Compiler Suite User Guide
Q Simplify QLogic PathScale™ Compiler Suite User Guide Version 3.0 1-02404-15 Page i QLogic PathScale Compiler Suite User Guide Version 3.0 Q Information furnished in this manual is believed to be accurate and reliable. However, QLogic Corporation assumes no responsibility for its use, nor for any infringements of patents or other rights of third parties which may result from its use. QLogic Corporation reserves the right to change product specifications at any time without notice. Applications described in this document for any of these products are for illustrative purposes only. QLogic Corporation makes no representation nor warranty that such applications are suitable for the specified use without further testing or modification. QLogic Corporation assumes no responsibility for any errors that may appear in this document. No part of this document may be copied nor reproduced by any means, nor translated nor transmitted to any magnetic medium without the express written consent of QLogic Corporation. In accordance with the terms of their valid PathScale agreements, customers are permitted to make electronic and paper copies of this document for their own exclusive use. Linux is a registered trademark of Linus Torvalds. QLA, QLogic, SANsurfer, the QLogic logo, PathScale, the PathScale logo, and InfiniPath are registered trademarks of QLogic Corporation. Red Hat and all Red Hat-based trademarks are trademarks or registered trademarks of Red Hat, Inc. SuSE is a registered trademark of SuSE Linux AG. All other brand and product names are trademarks or registered trademarks of their respective owners. Document Revision History 1.4 New Sections 3.3.3.1, 3.7.4, 10.4, 10.5 Added Appendix B: Supported Fortran intrinsics 2.0 New Sections 2.3, 8.9.7, 11.8 Added Chapter 8: Using OpenMP in Fortran New Appendix B: Implementation dependent behavior for OpenMP Fortran Expanded and updated Appendix C: Supported Fortran intrinsics 2.1 Added Chapter 9: Using OpenMP in C/C++, Appendix E: eko man page. -
Hybrid MPI & Openmp Parallel Programming
Hybrid MPI & OpenMP Parallel Programming MPI + OpenMP and other models on clusters of SMP nodes Rolf Rabenseifner1) Georg Hager2) Gabriele Jost3) [email protected] [email protected] [email protected] 1) High Performance Computing Center (HLRS), University of Stuttgart, Germany 2) Regional Computing Center (RRZE), University of Erlangen, Germany 3) Texas Advanced Computing Center, The University of Texas at Austin, USA Tutorial M02 at SC10, November 15, 2010, New Orleans, LA, USA Hybrid Parallel Programming Slide 1 Höchstleistungsrechenzentrum Stuttgart Outline slide number • Introduction / Motivation 2 • Programming models on clusters of SMP nodes 6 8:30 – 10:00 • Case Studies / pure MPI vs hybrid MPI+OpenMP 13 • Practical “How-To” on hybrid programming 48 • Mismatch Problems 97 • Opportunities: Application categories that can 126 benefit from hybrid parallelization • Thread-safety quality of MPI libraries 136 10:30 – 12:00 • Tools for debugging and profiling MPI+OpenMP 143 • Other options on clusters of SMP nodes 150 • Summary 164 • Appendix 172 • Content (detailed) 188 Hybrid Parallel Programming Slide 2 / 169 Rabenseifner, Hager, Jost Motivation • Efficient programming of clusters of SMP nodes SMP nodes SMP nodes: cores shared • Dual/multi core CPUs memory • Multi CPU shared memory Node Interconnect • Multi CPU ccNUMA • Any mixture with shared memory programming model • Hardware range • mini-cluster with dual-core CPUs Core • … CPU(socket) • large constellations with large SMP nodes SMP board … with several sockets