Failsafetm Administrator's Guide for SGI
Total Page:16
File Type:pdf, Size:1020Kb
FailSafeTM Administrator’s Guide for SGI® InfiniteStorage 007–3901–009 CONTRIBUTORS Written by Jenn Byrnes, Susan Ellis, Lori Johnson, Steven Levine Edited by Susan Wilkening Illustrated by Chrystie Danzer, Dany Galgani Production by Glen Traefald Engineering contributions by Gemma Exton, Scott Henry, Vidula Iyer, Ashwinee Khaladkar, Harald Kaul, Tony Kavadias, Linda Lait, Michael Nishimoto, Nate Pearlstein, Alain Renaud, Wesley Smith, Bill Sparks, Paddy Sreenivasan, Dan Stekloff, Rebecca Underwood, Manish Verma COPYRIGHT © 1999–2003, Silicon Graphics, Inc. All rights reserved; provided portions may be copyright in third parties, as indicated elsewhere herein. No permission is granted to copy, distribute, or create derivative works from the contents of this electronic documentation in any manner, in whole or in part,without the prior written permission of Silicon Graphics, Inc. LIMITED RIGHTS LEGEND The electronic (software) version of this document was developed at private expense; if acquired under an agreement with the USA government or any contractor thereto, it is acquired as "commercial computer software" subject to the provisions of its applicable license agreement, as specified in (a) 48 CFR 12.212 of the FAR; or, if acquired for Department of Defense units, (b) 48 CFR 227-7202 of the DoD FAR Supplement; or sections succeeding thereto. Contractor/manufacturer is Silicon Graphics, Inc., 1600 Amphitheatre Pkwy 2E, Mountain View, CA 94043-1351. TRADEMARKS AND ATTRIBUTIONS Silicon Graphics, SGI, the SGI logo, IRIS, IRIX, Onyx, Onyx2, Origin, and XFS are registered trademarks and CXFS, FailSafe, IRIS FailSafe, NUMAlink, Performance Co-Pilot, SGI SAN Server, and Silicon Graphics Fuel are trademarks of Silicon Graphics, Inc., in the United States and/or other countries worldwide. Informix is a registered trademark of IBM. Java is a trademark of Sun Microsystems, Inc. Netscape and Netscape FastTrack Server are trademarks of Netscape Communications Corporation. Oracle is a registered trademark of Oracle Corporation. UNIX is a registered trademark of The Open Group in the United States and other countries. Cover design by Sarah Bolles, Sarah Bolles Design, and Dany Galgani, SGI Technical Publications. New Features in This Guide This revision contains the following new information: • Support for IRIS FailSafe 2.1.6, which supports IRIX 6.5.22. • Up to 8 nodes in a cluster can run FailSafe. A cluster that is also running CXFS (a coexecution cluster) is supported with as many as 48 nodes. All of these nodes must run CXFS and up to 8 can also run FailSafe. FailSafe must be run on an IRIX CXFS administration node; FailSafe cannot run on a client-only node. • The ability to determine the heartbeat network that is currently in use by using the cmgr command, cluster_status command, and the FailSafe Manager graphical user interface (GUI). See "System Status" on page 283. • Support for running n-2 releases within one cluster; see "Release Support Policy" on page 29. • Overview of the best practices for system administration in a FailSafe cluster. See Chapter 3, "Best Practices" on page 57. • Information about allowing nodes time to join the cluster after a power failure; see "Power Failure" on page 336. • New section: "Updating the Checksum Version for 6.5.21 and Earlier Clusters" on page 322. • Details about software architecture and memberships has been moved to the FailSafe Architecture for SGI InfiniteStorage white paper, available from the SGI Technical Publications Library at http://docs.sgi.com. • Information about updating from IRIS FailSafe 1.2 to IRIS FailSafe 2.1.x has been moved to the Migrating from IRIS FailSafe 1.2 to IRIS FailSafe 2.1.X white paper. 007–3901–009 iii Record of Revision Version Description 002 December 1999 Published in conjunction with FailSafe 2.0 rollup patch. Supports IRIX 6.5.2 and later. 003 November 2000 Supports IRIS FailSafe 2.1 004 May 2001 Supports IRIS FailSafe 2.1.1 005 October 2001 Supports IRIS FailSafe 2.1.2 006 April 2002 Supports IRIS FailSafe 2.1.3 007 October 2002 Supports IRIS FailSafe 2.1.4 008 April 2003 Supports IRIS FailSafe 2.1.5 009 November 2003 Supports IRIS FailSafe 2.1.6 007–3901–009 v Contents About This Guide .....................xxxi Audience . ........................xxxi Assumptions . ........................xxxi Structure of This Guide . ...................xxxi Related Documentation . ...................xxxii Obtaining Publications . ...................xxxiv Conventions . ........................xxxv Reader Comments .......................xxxvi 1. Overview . ..................... 1 High Availability with FailSafe ................... 1 Complete Storage Solution: FailSafe, CXFS, DMF, and TMF . ......... 2 CXFS . ........................ 2 DMF.......................... 4 TMF .......................... 4 Cluster Environment . ................... 4 Terminology ........................ 5 Cluster . ........................ 5 Node . ........................ 5 Pool . ........................ 5 Cluster Database . ................... 6 Membership ....................... 6 Quorum ........................ 7 Private Network . ................... 9 Resource ........................ 9 007–3901–009 vii Contents Resource Type ....................... 10 Resource Name . ................... 10 Resource Group . ................... 10 Dependency ....................... 11 Failover . ........................ 12 Failover Policy . ................... 13 Failover Domain . ................... 13 Failover Attribute . ................... 13 Failover Scripts . ................... 14 Action Scripts ....................... 14 Cluster Process Group . ................... 15 Plug-In . ........................ 15 Hardware Components . ................... 16 Disk Connections . ................... 18 Supported Configurations . ................... 18 Additional Features ....................... 21 Dynamic Management . ................... 21 Fine-Grain Failover . ................... 21 Local Restarts ........................ 22 Administration ........................ 22 Highly Available Resources . ................... 23 Nodes . ........................ 23 Network Interfaces and IP Addresses . .............. 23 Disks . ........................ 25 Highly Available Applications ................... 27 Failover and Recovery Processes ................... 27 Overview of Configuring and Testing a New Cluster . ......... 28 Release Support Policy . ................... 29 viii 007–3901–009 FailSafeTM Administrator’s Guide for SGI® InfiniteStorage 2. Configuration Planning . ................. 31 Example of the Planning Process .................. 31 Disk Configuration ....................... 32 Planning Disk Configuration ................... 33 Configuration Parameters for Disks . .............. 38 XFS Filesystem Configuration ................... 38 Planning XFS Filesystems . ................... 38 XLV Logical Volume Configuration . .............. 41 Example Logical Volume Configuration . .............. 41 Resource Attributes for Logical Volumes .............. 42 Local XVM Volumes . ................... 42 XVM Resource Type for FailSafe . .............. 42 Resource Attributes for Local XVM Volumes ............. 43 Example Local XVM Volume Configuration .............. 44 Example XLV Filesystem Configuration . .............. 44 CXFS Filesystem Configuration ................... 46 Planning CXFS Filesystems ................... 46 Coexecution of CXFS and FailSafe . .............. 46 Size of the Coexecution Cluster . .............. 48 Cluster Type ....................... 48 Node Types for CXFS Metadata Servers . .............. 48 CXFS Metadata Servers and Failover Domain ............. 48 CXFS Resource Type for FailSafe . .............. 49 Separate CXFS and FailSafe GUIs . .............. 51 Conversion Between CXFS and FailSafe . .............. 51 Network Interfaces . ................... 51 HA IP Address Configuration ................... 52 007–3901–009 ix Contents Planning Network Interface and HA IP Address Configuration ........ 52 Determining if Re-MACing is Required . .............. 53 Example HA IP Address Configuration . .............. 54 Local Failover of HA IP Addresses . .............. 55 Using FailSafe with SGI Gigabit Ethernet Interfaces . ......... 55 3. Best Practices . ..................... 57 Planning and Installing a FailSafe Cluster . .............. 57 How Do You Want to Use FailSafe? . .............. 57 Hardware Requirements . ................... 59 Software Installation . ................... 61 FailSafe Plugins . ................... 61 Upgrades ........................ 62 Customer Education . ................... 63 Knowing the Tools ....................... 63 Physical Storage Tools . ................... 63 Cluster Configuration Tools ................... 64 Cluster Control Tools . ................... 64 Networking Tools . ................... 64 Cluster/Node Status Tools . ................... 64 Performance Monitoring Tools .................. 65 Log Files . ........................ 65 FailSafe Diagnostic Commands .................. 66 Configuration ........................ 66 System File Configuration . ................... 67 Cluster Database Configuration .................. 69 Using the Administration Tools . .............. 70 Determining the Number of Clusters . .............. 71 x 007–3901–009 FailSafeTM Administrator’s Guide for SGI® InfiniteStorage Node Names ....................... 71 Issues with a Two-Node Cluster . .............. 71 Determining Which Nodes Perform Resets .............. 72 Network Interface and Hostnames vs