Inside the RS/6000 SP

Inside the RS/6000 SP Marcelo R. Barrios, Mohammad Arif Kalem, Yoshimichi Kosuge, Philippe Lamarche Belinda Schmolke, George Sohos, Juan Jose Vasquez, Tim Wherle International Technical Support Organization http://www.redbooks.ibm.com SG24-5145-00 SG24-5145-00 International Technical Support Organization Inside the RS/6000 SP July 1998 Take Note! Before using this information and the product it supports, be sure to read the general information in Appendix A, “Special Notices” on page 379. First Edition (July 1998) This edition applies to Version 2 Release 4 of IBM Parallel System Support Programs for AIX (5765-529) for use with the AIX Version 4 Operating System. Comments may be addressed to: IBM Corporation, International Technical Support Organization Dept. HYJ Mail Station P099 522 South Road Poughkeepsie, New York 12601-5400 When you send information to IBM, you grant IBM a non-exclusive right to use or distribute the information in any way it believes appropriate without incurring any obligation to you. © Copyright International Business Machines Corporation 1998. All rights reserved Note to U.S Government Users – Documentation related to restricted rights – Use, duplication or disclosure is subject to restrictions set forth in GSA ADP Schedule Contract with IBM Corp. Contents Figures. .xi Tables. xvii Preface. .xix The Team That Wrote This Redbook . xix Comments Welcome . xxi Part 1. The Big Picture . 1 Chapter 1. History and Design Philosophy. 3 1.1 Ultra-Large Computing Problems . 3 1.2 Origins of the SP . 4 1.3 High-Level System Characteristics . 6 1.3.1 Scalability. 6 1.3.2 Use of Mainstream Architectures and Technologies . 7 1.3.3 Flexibility . 8 1.3.4 Manageability . 8 Chapter 2. System Architectures . 9 2.1 Way of Categorizing Computers . 9 2.2 Single Instruction, Single Data Uniprocessor . 10 2.2.1 Everyday Examples . 12 2.2.2 Limitations of SISD Architecture . 12 2.3 Parallelism . 13 2.4 Single Instruction, Multiple Data (SIMD) Machines . 14 2.4.1 Everyday Examples of SIMD Machines . 14 2.4.2 Limitations of the SIMD Architecture. 14 2.5 Multiple Instruction, Single Data Machines . 15 2.6 Multiple Instruction, Multiple Data Machines. 15 2.6.1 Shared Memory MIMD Machines . 15 2.6.2 Shared Nothing MIMD Machines . 18 Part 2. System Implementation . 23 Chapter 3. Hardware Components . 25 3.1 Frames. 25 3.1.1 Power Supplies . 26 3.1.2 Hardware Control and Supervision . 27 3.2 Nodes . 27 3.2.1 Internal Nodes . 27 © Copyright IBM Corp. 1998 iii 3.2.2 Extension Nodes . 33 3.3 Control Workstation . 35 3.3.1 High Availability Control Workstation . 36 3.3.2 Supported Control Workstations . 36 3.4 High-Performance Communication Network . 38 3.4.1 Hardware Concepts . 38 3.4.2 SP Switch Network . 42 3.4.3 SP Switch Products . 45 3.5 Peripheral Devices . 47 3.6 Configuration Rules . 48 3.6.1 Hardware Components . 48 3.6.2 Basic Configuration Rules . 49 3.6.3 Short Frame Configurations . 50 3.6.4 Tall Frame Configurations . 53 3.6.5 Numbering Rules . 73 Chapter 4. Software Components . 81 4.1 System Data Repository (SDR) . 82 4.1.1 Introduction . 82 4.1.2 SDR Data Model. 82 4.1.3 User Interface to SDR . 84 4.1.4 SDR Daemons . 84 4.1.5 Client/Server Communication . 85 4.1.6 Locking. 86 4.1.7 Manipulating SDR Data . 87 4.1.8 Useful SDR Commands . 88 4.1.9 The SDR Log File . 89 4.1.10 Backup and Restore of SDR . 89 4.2 The Hardware Control Subsystem . 89 4.2.1 What Can Be Monitored and Controlled? . 89 4.2.2 How It Works . 90 4.2.3 User Interfaces . 92 4.3 Overview of Switch Software . 93 4.3.1 Introduction to the Switch . 93 4.3.2 Switch Operating Principles . 94 4.3.3 Primary Node Behavior. 95 4.3.4 Secondary Nodes Behavior . 96 4.3.5 Switch Topology File . 96 4.3.6 Routing in SP Switch . 101 4.3.7 Switch Initialization . 104 4.3.8 Switch Clocks . 105 4.3.9 Switch Log Files . 105 4.3.10 The out.top File . 106 iv Inside the RS/6000 SP 4.3.11 Managing the Switch . 107 4.4 Time Service . 110 4.4.1 Why Network Time Protocol? . 110 4.4.2 NTP Architecture . 110 4.4.3 NTP Modes . 111 4.5 High Availability Infrastructure (HAI). 112 4.5.1 "In the Beginning ...". 113 4.5.2 HAI Packaging . 114 4.5.3 HAI Architecture Overview - "The Picture" . 114 4.5.4 Topology Services . 115 4.5.5 Group Services. 120 4.5.6 Event Management. 127 4.5.7 Putting It All Together - A Simple Example of Exploiting HAI . 136 4.6 SP Security . 141 4.6.1 General Security Concepts . 141 4.6.2 AIX Security . ..

Load more