IBM High Performance Computing Cluster Health Check

124

Copy Link

Front cover IBM High Performance Computing Cluster Health Check Learn how to verify cluster functionality See sample scenarios Understand best practices Dino Quintero Jie Gong Ross Aiken Markus Hilger Shivendra Ashish Herbert Mehlhose Manmohan Brahma Justin I. Morosi Murali Dhandapani Thorsten Nitsch Rico Franke Fernando Pizzano ibm.com/redbooks International Technical Support Organization IBM High Performance Computing Cluster Health Check February 2014 SG24-8168-00 Note: Before using this information and the product it supports, read the information in “Notices” on page vii. First Edition (February 2014) This edition applies to Red Hat Enterprise Linux 6.3, IBM Parallel Environment (PE) (Versions 1.3.0.5, 1.3.0.6, and 1.3.0.7), GPFS 3.5.0-9, xCAT 2.8.1 and 2.8.2, Mellanox OFED 2.0-2.0.5 and 1.5.3-4.0.22.3, Mellanox UFM 4.0.0 build 19, iperf 2.0.5, IOR version 2.10.3 and 3.0.1, mdtest version 1.9.1, stream version 5.10. © Copyright International Business Machines Corporation 2014. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents Notices . vii Trademarks . viii Preface . ix Authors. ix Now you can become a published author, too! . xi Comments welcome. xii Stay connected to IBM Redbooks . xii Chapter 1. Introduction. 1 1.1 Overview of the IBM HPC solution . 2 1.2 Why we need a methodical approach for cluster consistency checking . 2 1.3 Tools and interpreting their results for HW and SW states . 2 1.3.1 General Parallel File System. 3 1.3.2 Extreme Cloud Administration Toolkit. 3 1.3.3 The OpenFabrics Enterprise Distribution . 4 1.3.4 Red Hat Package Manager. 4 1.4 Tools and interpreting their results for identifying performance inconsistencies. 4 1.5 Template of diagnostics steps that can be used (checklists) . 5 Chapter 2. Key concepts and interdependencies . 7 2.1 Introduction to High Performance Computing . 8 2.2 Rationale for clusters . 8 2.3 Definition of an HPC Cluster . 8 2.4 Definition of a “healthy cluster” . 9 2.5 HPC preferred practices . 10 Chapter 3. The health lifecycle methodology . 13 3.1 Why a methodology is necessary . 14 3.2 The health lifecycle methodology . 14 3.3 Practical application of the health lifecycle methodology . 15 3.3.1 Deployment phase . 16 3.3.2 Verification or pre-production readiness phase. 19 3.3.3 Production phase (monitoring) . 21 Chapter 4. Cluster components reference model . 23 4.1 Overview of installed cluster systems . 24 4.2 ClusterA nodes hardware description . 25 4.3 ClusterA software description . 25 4.4 ClusterB nodes hardware description . 26 4.5 ClusterB software description . 27 4.6 ClusterC nodes hardware description . 27 4.7 ClusterC software description . 28 4.8 Interconnect infrastructure . 28 4.8.1 InfiniBand . 28 4.8.2 Ethernet Infrastructure . 30 4.8.3 IP Infrastructure . 30 4.9 GPFS cluster. 33 © Copyright IBM Corp. 2014. All rights reserved. iii Chapter 5. Toolkits for verifying health (individual diagnostics) . 37 5.1 Introduction to CHC. 38 5.1.1 Requirements . 38 5.1.2 Installation. 38 5.1.3 Configuration. 38 5.1.4 Usage . 43 5.2 Tool output processing methods . 44 5.2.1 The plain mode . 44 5.2.2 The xcoll mode . 44 5.2.3 The compare (config_check) mode. 45 5.3 Compute node. 48 5.3.1 The leds check . 48 5.3.2 The cpu check. 49 5.3.3 The memory check . 49 5.3.4 The os check. 50 5.3.5 The firmware check. 50 5.3.6 The temp check . 50 5.3.7 The run_daxpy check . 51 5.3.8 The run_dgemm check . 52 5.4 Ethernet network: Port status, speed, bandwidth, and port errors . 53 5.4.1 Ethernet firmware and drivers. 53 5.4.2 Ethernet port state . 54 5.4.3 Network settings . 55 5.4.4 Bonding. 56 5.5 InfiniBand: Port status, speed, bandwidth, port errors, and subnet manager . 57 5.5.1 The hca_basic check . 57 5.5.2 The ipoib check. 58 5.5.3 The switch_module check. ..

Recommended publications

MLNX OFED Documentation Rev 5.0-2.1.8.0

MLNX_OFED Documentation Rev 5.0-2.1.8.0 Exported on May/21/2020 06:13 AM https://docs.mellanox.com/x/JLV-AQ Notice This document is provided for information purposes only and shall not be regarded as a warranty of a certain functionality, condition, or quality of a product. NVIDIA Corporation (“NVIDIA”) makes no representations or warranties, expressed or implied, as to the accuracy or completeness of the information contained in this document and assumes no responsibility for any errors contained herein. NVIDIA shall have no liability for the consequences or use of such information or for any infringement of patents or other rights of third parties that may result from its use. This document is not a commitment to develop, release, or deliver any Material (defined below), code, or functionality. NVIDIA reserves the right to make corrections, modifications, enhancements, improvements, and any other changes to this document, at any time without notice. Customer should obtain the latest relevant information before placing orders and should verify that such information is current and complete. NVIDIA products are sold subject to the NVIDIA standard terms and conditions of sale supplied at the time of order acknowledgement, unless otherwise agreed in an individual sales agreement signed by authorized representatives of NVIDIA and customer (“Terms of Sale”). NVIDIA hereby expressly objects to applying any customer general terms and conditions with regards to the purchase of the NVIDIA product referenced in this document. No contractual obligations are formed either directly or indirectly by this document. NVIDIA products are not designed, authorized, or warranted to be suitable for use in medical, military, aircraft, space, or life support equipment, nor in applications where failure or malfunction of the NVIDIA product can reasonably be expected to result in personal injury, death, or property or environmental damage.
Oracle Database Licensing Information, 11G Release 2 (11.2) E10594-26

Oracle® Database Licensing Information 11g Release 2 (11.2) E10594-26 July 2012 Oracle Database Licensing Information, 11g Release 2 (11.2) E10594-26 Copyright © 2004, 2012, Oracle and/or its affiliates. All rights reserved. Contributor: Manmeet Ahluwalia, Penny Avril, Charlie Berger, Michelle Bird, Carolyn Bruse, Rich Buchheim, Sandra Cheevers, Leo Cloutier, Bud Endress, Prabhaker Gongloor, Kevin Jernigan, Anil Khilani, Mughees Minhas, Trish McGonigle, Dennis MacNeil, Paul Narth, Anu Natarajan, Paul Needham, Martin Pena, Jill Robinson, Mark Townsend This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, the following notice is applicable: U.S. GOVERNMENT END USERS: Oracle programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, delivered to U.S. Government end users are "commercial computer software" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations.
Linux Software User's Manual

New Generation Systems (NGS) Linux Software User’s Manual Version 1.0, September 2019 www.moxa.com/product © 2019 Moxa Inc. All rights reserved. New Generation Systems (NGS) Linux Software User’s Manual The software described in this manual is furnished under a license agreement and may be used only in accordance with the terms of that agreement. Copyright Notice © 2019 Moxa Inc. All rights reserved. Trademarks The MOXA logo is a registered trademark of Moxa Inc. All other trademarks or registered marks in this manual belong to their respective manufacturers. Disclaimer Information in this document is subject to change without notice and does not represent a commitment on the part of Moxa. Moxa provides this document as is, without warranty of any kind, either expressed or implied, including, but not limited to, its particular purpose. Moxa reserves the right to make improvements and/or changes to this manual, or to the products and/or the programs described in this manual, at any time. Information provided in this manual is intended to be accurate and reliable. However, Moxa assumes no responsibility for its use, or for any infringements on the rights of third parties that may result from its use. This product might include unintentional technical or typographical errors. Changes are periodically made to the information herein to correct such errors, and these changes are incorporated into new editions of the publication. Technical Support Contact Information www.moxa.com/support Moxa Americas Moxa China (Shanghai office) Toll-free: 1-888-669-2872 Toll-free: 800-820-5036 Tel: +1-714-528-6777 Tel: +86-21-5258-9955 Fax: +1-714-528-6778 Fax: +86-21-5258-5505 Moxa Europe Moxa Asia-Pacific Tel: +49-89-3 70 03 99-0 Tel: +886-2-8919-1230 Fax: +49-89-3 70 03 99-99 Fax: +886-2-8919-1231 Moxa India Tel: +91-80-4172-9088 Fax: +91-80-4132-1045 Table of Contents 1.
Name Synopsis Description Options

ovs−vlan−test(1) Open vSwitch Manual ovs−vlan−test(1) NAME ovs−vlan−test −check Linux drivers for problems with vlan traffic SYNOPSIS ovs−vlan−test [−s | −−server] control_ip vlan_ip Common options: [−h | −−help][−V | −−version] DESCRIPTION The ovs−vlan−test utility has some limitations, for example, it does not use TCP in its tests. Also it does not takeinto account MTU to detect potential edge cases. Toovercome those limitations a newtool was developed − ovs−test. ovs−test is currently supported only on Debian so, if possible try to use that on instead of ovs−vlan−test. The ovs−vlan−test program may be used to check for problems sending 802.1Q traffic which may occur when running Open vSwitch. These problems can occur when Open vSwitch is used to send 802.1Q traffic through physical interfaces running certain drivers of certain Linux kernel versions. Torun a test, configure Open vSwitch to tag traffic originating from vlan_ip and forward it out the target interface. Then run the ovs−vlan−test in client mode connecting to an ovs−vlan−test server. ovs−vlan−test will display "OK" if it did not detect problems. Some examples of the types of problems that may be encountered are: • When NICs use VLAN stripping on receive theymust pass a pointer to a vlan_group when reporting the stripped tag to the networking core. If no vlan_group is in use then some drivers just drop the extracted tag. Drivers are supposed to only enable stripping if a vlan_group is regis- tered but not all of them do that.
Openshift Container Platform 4.6 Scalability and Performance

OpenShift Container Platform 4.6 Scalability and performance Scaling your OpenShift Container Platform cluster and tuning performance in production environments Last Updated: 2021-09-22 OpenShift Container Platform 4.6 Scalability and performance Scaling your OpenShift Container Platform cluster and tuning performance in production environments Legal Notice Copyright © 2021 Red Hat, Inc. The text of and illustrations in this document are licensed by Red Hat under a Creative Commons Attribution–Share Alike 3.0 Unported license ("CC-BY-SA"). An explanation of CC-BY-SA is available at http://creativecommons.org/licenses/by-sa/3.0/ . In accordance with CC-BY-SA, if you distribute this document or an adaptation of it, you must provide the URL for the original version. Red Hat, as the licensor of this document, waives the right to enforce, and agrees not to assert, Section 4d of CC-BY-SA to the fullest extent permitted by applicable law. Red Hat, Red Hat Enterprise Linux, the Shadowman logo, the Red Hat logo, JBoss, OpenShift, Fedora, the Infinity logo, and RHCE are trademarks of Red Hat, Inc., registered in the United States and other countries. Linux ® is the registered trademark of Linus Torvalds in the United States and other countries. Java ® is a registered trademark of Oracle and/or its affiliates. XFS ® is a trademark of Silicon Graphics International Corp. or its subsidiaries in the United States and/or other countries. MySQL ® is a registered trademark of MySQL AB in the United States, the European Union and other countries. Node.js ® is an official trademark of Joyent.
MC-1200 Series Linux Software User's Manual

MC-1200 Series Linux Software User’s Manual Version 1.0, November 2020 www.moxa.com/product © 2020 Moxa Inc. All rights reserved. MC-1200 Series Linux Software User’s Manual The software described in this manual is furnished under a license agreement and may be used only in accordance with the terms of that agreement. Copyright Notice © 2020 Moxa Inc. All rights reserved. Trademarks The MOXA logo is a registered trademark of Moxa Inc. All other trademarks or registered marks in this manual belong to their respective manufacturers. Disclaimer Information in this document is subject to change without notice and does not represent a commitment on the part of Moxa. Moxa provides this document as is, without warranty of any kind, either expressed or implied, including, but not limited to, its particular purpose. Moxa reserves the right to make improvements and/or changes to this manual, or to the products and/or the programs described in this manual, at any time. Information provided in this manual is intended to be accurate and reliable. However, Moxa assumes no responsibility for its use, or for any infringements on the rights of third parties that may result from its use. This product might include unintentional technical or typographical errors. Changes are periodically made to the information herein to correct such errors, and these changes are incorporated into new editions of the publication. Technical Support Contact Information www.moxa.com/support Moxa Americas Moxa China (Shanghai office) Toll-free: 1-888-669-2872 Toll-free: 800-820-5036 Tel: +1-714-528-6777 Tel: +86-21-5258-9955 Fax: +1-714-528-6778 Fax: +86-21-5258-5505 Moxa Europe Moxa Asia-Pacific Tel: +49-89-3 70 03 99-0 Tel: +886-2-8919-1230 Fax: +49-89-3 70 03 99-99 Fax: +886-2-8919-1231 Moxa India Tel: +91-80-4172-9088 Fax: +91-80-4132-1045 Table of Contents 1.
Sample Applications User Guide Release 16.04.0

Sample Applications User Guide Release 16.04.0 April 12, 2016 CONTENTS 1 Introduction 1 1.1 Documentation Roadmap...............................1 2 Command Line Sample Application2 2.1 Overview........................................2 2.2 Compiling the Application...............................2 2.3 Running the Application................................3 2.4 Explanation.......................................3 3 Ethtool Sample Application5 3.1 Compiling the Application...............................5 3.2 Running the Application................................5 3.3 Using the application.................................5 3.4 Explanation.......................................6 3.5 Ethtool interface....................................6 4 Exception Path Sample Application8 4.1 Overview........................................8 4.2 Compiling the Application...............................9 4.3 Running the Application................................9 4.4 Explanation....................................... 10 5 Hello World Sample Application 13 5.1 Compiling the Application............................... 13 5.2 Running the Application................................ 13 5.3 Explanation....................................... 13 6 Basic Forwarding Sample Application 15 6.1 Compiling the Application............................... 15 6.2 Running the Application................................ 15 6.3 Explanation....................................... 15 7 RX/TX Callbacks Sample Application 20 7.1 Compiling the Application..............................
Ubuntu Server Guide Basic Installation Preparing to Install

Ubuntu Server Guide Welcome to the Ubuntu Server Guide! This site includes information on using Ubuntu Server for the latest LTS release, Ubuntu 20.04 LTS (Focal Fossa). For an offline version as well as versions for previous releases see below. Improving the Documentation If you find any errors or have suggestions for improvements to pages, please use the link at thebottomof each topic titled: “Help improve this document in the forum.” This link will take you to the Server Discourse forum for the specific page you are viewing. There you can share your comments or let us know aboutbugs with any page. PDFs and Previous Releases Below are links to the previous Ubuntu Server release server guides as well as an offline copy of the current version of this site: Ubuntu 20.04 LTS (Focal Fossa): PDF Ubuntu 18.04 LTS (Bionic Beaver): Web and PDF Ubuntu 16.04 LTS (Xenial Xerus): Web and PDF Support There are a couple of different ways that the Ubuntu Server edition is supported: commercial support and community support. The main commercial support (and development funding) is available from Canonical, Ltd. They supply reasonably- priced support contracts on a per desktop or per-server basis. For more information see the Ubuntu Advantage page. Community support is also provided by dedicated individuals and companies that wish to make Ubuntu the best distribution possible. Support is provided through multiple mailing lists, IRC channels, forums, blogs, wikis, etc. The large amount of information available can be overwhelming, but a good search engine query can usually provide an answer to your questions.
07 07 Unixintropart2 Lucio Week 3

Unix Basics Command line tools Daniel Lucio Overview • Where to use it? • Command syntax • What are commands? • Where to get help? • Standard streams(stdin, stdout, stderr) • Pipelines (Power of combining commands) • Redirection • More Information Introduction to Unix Where to use it? • Login to a Unix system like ’kraken’ or any other NICS/ UT/XSEDE resource. • Download and boot from a Linux LiveCD either from a CD/DVD or USB drive. • http://www.puppylinux.com/ • http://www.knopper.net/knoppix/index-en.html • http://www.ubuntu.com/ Introduction to Unix Where to use it? • Install Cygwin: a collection of tools which provide a Linux look and feel environment for Windows. • http://cygwin.com/index.html • https://newton.utk.edu/bin/view/Main/Workshop0InstallingCygwin • Online terminal emulator • http://bellard.org/jslinux/ • http://cb.vu/ • http://simpleshell.com/ Introduction to Unix Command syntax $ command [<options>] [<file> | <argument> ...] Example: cp [-R [-H | -L | -P]] [-fi | -n] [-apvX] source_file target_file Introduction to Unix What are commands? • An executable program (date) • A command built into the shell itself (cd) • A shell program/function • An alias Introduction to Unix Bash commands (Linux) alias! crontab! false! if! mknod! ram! strace! unshar! apropos! csplit! fdformat! ifconfig! more! rcp! su! until! apt-get! cut! fdisk! ifdown! mount! read! sudo! uptime! aptitude! date! fg! ifup! mtools! readarray! sum! useradd! aspell! dc! fgrep! import! mtr! readonly! suspend! userdel! awk! dd! file! install! mv! reboot! symlink!
Cnvision 4.5 User Guide

USER GUIDE cnVision HUB FLEXr, HUB 360r, CLIENT MAXr, CLIENT MAXrp, CLIENT MINI, CLIENT MICRO System Release 4.5.x Accuracy While reasonable efforts have been made to assure the accuracy of this document, Cambium Networks assumes no liability resulting from any inaccuracies or omissions in this document, or from use of the information obtained herein. Cambium reserves the right to make changes to any products described herein to improve reliability, function, or design, and reserves the right to revise this document and to make changes from time to time in content hereof with no obligation to notify any person of revisions or changes. Cambium does not assume any liability arising out of the application or use of any product, software, or circuit described herein; neither does it convey license under its patent rights or the rights of others. It is possible that this publication may contain references to, or information about Cambium products (machines and programs), programming, or services that are not announced in your country. Such references or information must not be construed to mean that Cambium intends to announce such Cambium products, programming, or services in your country. Copyrights This document, Cambium products, and 3rd Party software products described in this document may include or describe copyrighted Cambium and other 3rd Party supplied computer programs stored in semiconductor memories or other media. Laws in the United States and other countries preserve for Cambium, its licensors, and other 3rd Party supplied software certain exclusive rights for copyrighted material, including the exclusive right to copy, reproduce in any form, distribute and make derivative works of the copyrighted material.
Network Interface Controller Drivers Release 20.08.0

Network Interface Controller Drivers Release 20.08.0 Aug 08, 2020 CONTENTS 1 Overview of Networking Drivers1 2 Features Overview4 2.1 Speed capabilities.....................................4 2.2 Link status.........................................4 2.3 Link status event......................................4 2.4 Removal event.......................................5 2.5 Queue status event.....................................5 2.6 Rx interrupt.........................................5 2.7 Lock-free Tx queue....................................5 2.8 Fast mbuf free.......................................5 2.9 Free Tx mbuf on demand..................................6 2.10 Queue start/stop......................................6 2.11 MTU update........................................6 2.12 Jumbo frame........................................6 2.13 Scattered Rx........................................6 2.14 LRO............................................7 2.15 TSO.............................................7 2.16 Promiscuous mode.....................................7 2.17 Allmulticast mode.....................................8 2.18 Unicast MAC ﬁlter.....................................8 2.19 Multicast MAC ﬁlter....................................8 2.20 RSS hash..........................................8 2.21 Inner RSS..........................................8 2.22 RSS key update.......................................9 2.23 RSS reta update......................................9 2.24 VMDq...........................................9 2.25 SR-IOV...........................................9
Forcepoint Appliances Command Line Interface (CLI) Guide

Forcepoint Appliances Command Line Interface (CLI) Guide V Series, X Series, & Virtual Appliances v8.4.x ©2018, Forcepoint All rights reserved. 10900-A Stonelake Blvd, Quarry Oaks 1, Suite 350, Austin TX 78759 Published 2018 Forcepoint and the FORCEPOINT logo are trademarks of Forcepoint. Raytheon is a registered trademark of Raytheon Company. All other trademarks used in this document are the property of their respective owners. This document may not, in whole or in part, be copied, photocopied, reproduced, translated, or reduced to any electronic medium or machine- readable form without prior consent in writing from Forcepoint. Every effort has been made to ensure the accuracy of this manual. However, Forcepoint makes no warranties with respect to this documentation and disclaims any implied warranties of merchantability and fitness for a particular purpose. Forcepoint shall not be liable for any error or for incidental or consequential damages in connection with the furnishing, performance, or use of this manual or the examples herein. The information in this documentation is subject to change without notice. Contents Topic 1 Forcepoint Appliances Command Line Interface . .1 Conventions . .1 Logon and authentication . .2 CLI modes and account privileges . .2 Basic account management . .3 Command syntax. .9 Help for CLI commands . .9 System configuration . .10 Time and date . .11 Host name and description . .14 User certificates. .15 Filestore definition and file save commands. .16 Appliance interface configuration. .18 Appliance vswitch configuration . .29 Content Gateway Decryption Port Mirroring (DPM) . .29 Static routes. .31 Appliance status . .35 SNMP monitoring (polling) . .35 SNMP traps and queries . .38 Module-specific commands .