Front cover

Achieving the Highest Levels of Parallel Sysplex Availabilitybility

The latest availability recommendations

Covers hardware, systems management, and software

One-stop shopping for availability information

Frank Kyne Christian Matthys Uno Bengtsson Andy Clifton Steve Cox Gary Hines Glenn McGeoch Dougie Lawson Geoff Nicholls David Raften

.com/redbooks

International Technical Support Organization

Achieving the Highest Levels of Parallel Sysplex Availability December 2004

SG24-6061-00

Note: Before using this information and the product it supports, read the information in “Notices” on page ix.

First Edition (December 2004)

This edition applies to IBM Parallel Sysplex technology used with operating systems z/OS (program number 5694-A01) or OS/390 (program number 5647-A01).

© Copyright International Business Machines Corporation 2004. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents

Notices ...... ix Trademarks ...... x

Preface ...... xi The team that wrote this redbook...... xi Become a published author ...... xiii Comments welcome...... xiv

Chapter 1. Introduction...... 1 1.1 Why availability is important to you ...... 2 1.2 Cost of an outage ...... 2 1.2.1 Component outage versus service outage ...... 4 1.2.2 Availability overview ...... 4 1.3 Continuous availability in a Parallel Sysplex ...... 5 1.3.1 Availability definitions ...... 5 1.3.2 Spectrum of availability factors ...... 6 1.4 What this book is all about ...... 8

Chapter 2. Hardware ...... 11 2.1 Environmental ...... 12 2.1.1 Power ...... 12 2.1.2 Cooling ...... 13 2.1.3 Geographic location ...... 13 2.1.4 Physical security ...... 14 2.1.5 Automation ...... 14 2.1.6 Physical configuration control ...... 14 2.2 Central Processing Complexes (CPCs) ...... 14 2.2.1 How many CPCs to have ...... 15 2.2.2 Availability features ...... 15 2.2.3 Concurrent upgrade ...... 17 2.2.4 Redundant capacity ...... 21 2.2.5 Hardware configuration...... 23 2.3 Coupling Facilities ...... 24 2.3.1 Capacity...... 25 2.3.2 Failure isolation...... 27 2.3.3 Recovering from CF failure ...... 31 2.3.4 How many CFs ...... 36 2.3.5 Coupling Facility Control Code Level considerations ...... 37 2.3.6 CF maintenance procedures...... 38 2.3.7 CF volatility ...... 39 2.3.8 Nondisruptive Coupling Facilities hardware upgrades ...... 40 2.4 9037 Sysplex Timers considerations ...... 41 2.4.1 Sysplex Timer® Models ...... 41 2.4.2 Recovering from loss of all timer signals...... 43 2.4.3 Maximizing 9037 availability ...... 43 2.4.4 Message time ordering ...... 44 2.5 Intelligent Resource Director...... 45 2.5.1 An IRD Illustration ...... 46 2.5.2 WLM LPAR CPU Management...... 47

© Copyright IBM Corp. 2004. All rights reserved. iii 2.5.3 Dynamic Channel-path Management (DCM) ...... 48 2.5.4 Channel Subsystem I/O Priority Queueing ...... 50 2.6 Switches ...... 51 2.6.1 ESCON Directors ...... 51 2.6.2 FICON Switches ...... 53 2.7 DASD ...... 55 2.7.1 Peer to Peer Remote Copy (PPRC) ...... 57 2.7.2 Extended Remote Copy (XRC)...... 57 2.8 Geographically Dispersed Parallel Sysplex™...... 58 2.8.1 Data consistency...... 59 2.8.2 The HyperSwap ...... 59 2.9 Other hardware equipment ...... 60 2.9.1 3494 Tape library/VTS ...... 60 2.9.2 Stand-alone tape...... 63 2.9.3 3174, 2074 ...... 63

Chapter 3. z/OS ...... 65 3.1 Configure software for high availability ...... 66 3.1.1 Couple Data Sets ...... 66 3.1.2 Other important data sets ...... 71 3.1.3 Sysres and master catalog sharing...... 75 3.2 Consoles ...... 76 3.2.1 Addressing WTO and WTOR buffer shortages...... 77 3.2.2 EMCS consoles ...... 79 3.2.3 Using the HMC as a console...... 79 3.2.4 Hardware consoles ...... 80 3.2.5 Console setup recommendations ...... 80 3.3 Coupling Facility management ...... 85 3.3.1 Defining CFs and structures ...... 85 3.3.2 Structure placement ...... 86 3.3.3 Structure rebuild considerations ...... 87 3.3.4 Structure duplexing ...... 87 3.3.5 Structure monitoring ...... 90 3.3.6 Structure recommendations ...... 90 3.4 CF operations ...... 92 3.5 IBM Health Checker for z/OS and Sysplex ...... 92 3.5.1 Health Checker description...... 93 3.5.2 IBM Health Checker recommendations ...... 96 3.6 z/OS msys for Operations ...... 97 3.6.1 Automated Recovery Actions ...... 97 3.6.2 Sysplex operation ...... 100 3.6.3 z/OS msys for Operations recommendations ...... 103 3.7 Sysplex Failure Management (SFM)...... 103 3.7.1 Configuring for status update missing conditions ...... 103 3.7.2 Configuring for signaling connectivity failures...... 104 3.7.3 Configuring for Coupling Facility failures...... 105 3.7.4 SFM recommendations...... 105

3.8 Automatic Restart Manager (ARM) ...... 106 3.8.1 Configuring for Automatic Restart Management...... 107 3.8.2 ARMWRAP - The ARM JCL Wrapper...... 108 3.8.3 ARM recommendations...... 108 3.9 System Logger (LOGR) ...... 109 3.9.1 Logstream types ...... 110 iv Achieving the Highest Levels of Parallel Sysplex Availability 3.9.2 CF structure considerations ...... 111 3.9.3 System-Managed CF Structure Duplexing ...... 112 3.9.4 DASD based staging data set considerations (DASD-Only)...... 112 3.9.5 DASD-based staging data set considerations (Coupling Facility)...... 113 3.9.6 DASD-based log data set considerations ...... 114 3.9.7 Offload considerations ...... 115 3.9.8 Log data retention ...... 115 3.9.9 GMT considerations ...... 116 3.9.10 System Logger recovery ...... 116 3.9.11 System Logger recommendations ...... 117 3.10 Cross-system Coupling Facility (XCF) ...... 118 3.10.1 XCF systems, groups, and members ...... 119 3.10.2 XCF signaling paths ...... 120 3.10.3 XCF Transport Classes...... 121 3.10.4 XCF signal path performance problems ...... 124 3.10.5 XCF message buffer length performance problems ...... 125 3.10.6 XCF message buffer space performance problems ...... 127 3.10.7 XCF Coupling Facility performance problems...... 128 3.10.8 XCF recommendations ...... 128 3.11 GRS ...... 128 3.11.1 GRS start options ...... 130 3.11.2 Dynamic RNLs ...... 130 3.11.3 GRS Ring Availability considerations - Fully connected complex...... 131 3.11.4 GRS Ring Availability considerations - Mixed complex...... 131 3.11.5 GRS Star Availability considerations...... 132 3.11.6 SYNCHRES option ...... 133 3.11.7 Resource Name Lists (RNLs) ...... 134 3.11.8 RNL design ...... 134 3.11.9 GRS monitor (ISGRUNAU)...... 135 3.11.10 RNL syntax checking ...... 136 3.11.11 GRS recommendations...... 136 3.12 Tape sharing ...... 137 3.12.1 IEFAUTOS ...... 138 3.12.2 ATS Star ...... 138 3.12.3 Coexistence between Dedicated, IEFAUTOS, and ATS Star...... 138 3.12.4 Tape-sharing recommendations ...... 139 3.13 JES2 ...... 139 3.13.1 JES2 SPOOL considerations ...... 140 3.13.2 JES2 Checkpoint considerations ...... 140 3.13.3 JES2 Checkpoint access ...... 142 3.13.4 JES2 Checkpoint performance ...... 143 3.13.5 JES2 Checkpoint management ...... 144 3.13.6 JES2 Health Monitor ...... 145 3.13.7 Scheduling environment ...... 146 3.13.8 WLM-managed initiators ...... 147 3.13.9 JESLOG SPIN data sets...... 147 3.13.10 JES2 recommendations ...... 147 3.14 WLM ...... 148 3.14.1 Service classes ...... 148 3.14.2 WLM recommendations ...... 148 3.15 UNIX System Services ...... 148 3.15.1 Shared HFS ...... 149 3.15.2 Automove ...... 152

Contents v 3.15.3 zFS ...... 154 3.15.4 BRLM issues...... 156 3.15.5 UNIX System Services recommendations ...... 157 3.16 RACF ...... 157 3.16.1 RACF sysplex communication ...... 159 3.16.2 RACF non-data sharing mode ...... 159 3.16.3 RACF data sharing mode ...... 159 3.16.4 RACF read-only mode ...... 161 3.16.5 RACF recovery procedures...... 161 3.16.6 PKI Services ...... 163 3.16.7 RACF recommendations...... 164 3.17 DFSMShsm...... 165 3.17.1 Common Recall Queue...... 166 3.17.2 Hot standby (Secondary Host promotion)...... 167 3.17.3 Use of record level sharing (RLS) for CDSs ...... 168 3.17.4 DFSMShsm recommendations ...... 168 3.18 Catalog ...... 169 3.18.1 VVDS mode catalog sharing...... 169 3.18.2 Enhanced catalog sharing (ECS) ...... 170 3.18.3 Catalog integrity ...... 171 3.18.4 Catalog performance ...... 172 3.18.5 Catalog sizing ...... 173 3.18.6 Catalog backup and recovery ...... 174 3.18.7 Catalog security ...... 176 3.18.8 Catalog recommendations ...... 178 3.19 Software maintenance ...... 178 3.19.1 Types of maintenance...... 178 3.19.2 Classification of maintenance ...... 179 3.19.3 Sources of maintenance ...... 180 3.19.4 Consolidated Service Test (CST) ...... 181 3.19.5 Enhanced Holddata ...... 182 3.19.6 Software maintenance recommendations...... 183 3.20 Testing the sysplex ...... 183 3.20.1 Test Sysplex ...... 184 3.20.2 Sysplex testing recommendations ...... 184 3.21 Planned outages ...... 184 3.21.1 APPC/MVS configuration ...... 184 3.21.2 APPC/MVS Transaction Scheduler...... 185 3.21.3 Authorized Program Facility (APF) ...... 185 3.21.4 Diagnostics ...... 186 3.21.5 Dump options ...... 186 3.21.6 Dump Analysis and Elimination (DAE) ...... 186 3.21.7 Console management...... 187 3.21.8 Console group management...... 187 3.21.9 Exits ...... 188 3.21.10 Global Resource Serialization (GRS) ...... 188 3.21.11 IODF management ...... 189 3.21.12 IOS ...... 190 3.21.13 LNKLST ...... 190 3.21.14 LOGREC error recording ...... 191 3.21.15 LPALST...... 191 3.21.16 Message Processing Facility (MPF) ...... 192 3.21.17 MVS Message Service (MMS) ...... 192 vi Achieving the Highest Levels of Parallel Sysplex Availability 3.21.18 Local Page Data Sets ...... 192 3.21.19 Parmlib concatenation ...... 193 3.21.20 Products ...... 193 3.21.21 Program properties table ...... 194 3.21.22 Run-time library services (RTLS) ...... 194 3.21.23 SLIP ...... 195 3.21.24 System Measurement Facility (SMF) ...... 195 3.21.25 Storage Management Subsystem (SMS) ...... 196 3.21.26 Subsystem Names (SSN) ...... 196 3.21.27 System Resources Manager (SRM) ...... 197 3.21.28 Time Sharing Option (TSO) ...... 197 3.21.29 UNIX System Services (USS)...... 198 3.21.30 XCF...... 198 3.21.31 Planned outages recommendations ...... 199 3.22 Unplanned outages ...... 199 3.22.1 Dump options ...... 199 3.22.2 ABEND dumps ...... 200 3.22.3 SVC dumps...... 201 3.22.4 Stand-alone dump...... 203 3.22.5 Dump suppression ...... 203 3.22.6 SLIP traps ...... 204 3.22.7 System Hardcopy log ...... 204 3.22.8 Environmental Record Editing and Printing (EREP) ...... 206 3.22.9 Unplanned outages recommendations ...... 208

Chapter 4. Systems Management ...... 209 4.1 Overall Availability Management processes ...... 211 4.1.1 Develop availability practices ...... 213 4.1.2 Develop standards ...... 213 4.2 Service Level Management...... 214 4.2.1 Business requirements ...... 215 4.2.2 Negotiating objectives...... 215 4.2.3 Documenting agreements - Managing expectations...... 215 4.2.4 Building infrastructure - Technical and support...... 216 4.2.5 Measuring availability ...... 216 4.2.6 Track and report availability ...... 217 4.2.7 Customer satisfaction ...... 217 4.3 Change Management ...... 218 4.3.1 Develop and prepare change ...... 219 4.3.2 Assess and minimize risk ...... 219 4.3.3 Testing ...... 220 4.3.4 Back-out planning ...... 222 4.3.5 Verify change readiness ...... 223 4.3.6 Schedule change ...... 223 4.3.7 Communicate change ...... 224 4.3.8 Implement and document ...... 224 4.3.9 Change record content ...... 224

4.3.10 Review quality...... 226 4.4 Organization ...... 226 4.4.1 Skills ...... 226 4.4.2 Help desk activities ...... 227 4.4.3 Operations ...... 227 4.4.4 Automation ...... 228

Contents vii 4.4.5 Application testing...... 229 4.4.6 Passive and active monitoring ...... 229 4.5 Recovery Management ...... 232 4.5.1 Terminology ...... 232 4.5.2 Recovery potential initiatives...... 233 4.5.3 Recovery Management activities ...... 234 4.5.4 Event Management...... 236 4.5.5 Incident Management ...... 236 4.5.6 Crisis Management ...... 237 4.6 Problem Management...... 241 4.6.1 Data tracking and reporting...... 241 4.6.2 Causal analysis...... 243 4.6.3 Maintenance policies ...... 245 4.7 Performance Management ...... 245 4.8 Capacity planning ...... 245 4.9 Security Management ...... 246 4.9.1 Security policy...... 246 4.9.2 Physical security ...... 248 4.10 Configuration Management...... 249 4.10.1 Component Failure Impact Analysis ...... 250 4.11 Enterprise architecture ...... 251 4.11.1 Infrastructure simplification ...... 254 4.11.2 Ideal mainframe implementations ...... 255 4.11.3 Ideal BladeCenter implementation ...... 255 4.11.4 Examples and scenarios demonstrate real infrastructure simplification efforts 256 4.12 IBM IGS High Availability Services ...... 256 4.12.1 More than just technology...... 256 4.13 Summary...... 259

Related publications ...... 261 IBM Redbooks ...... 261 Other publications ...... 261 Online resources ...... 262 How to get IBM Redbooks ...... 263 Help from IBM ...... 263

Index ...... 265

viii Achieving the Highest Levels of Parallel Sysplex Availability Notices

This information was developed for products and services offered in the U.S.A.

IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service.

IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing, IBM Corporation, North Castle Drive Armonk, NY 10504-1785 U.S.A.

The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you.

This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice.

Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk.

IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you.

Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.

This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental.

COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrates programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. You may copy, modify, and distribute these sample programs in any form without payment to IBM for the purposes of developing, using, marketing, or distributing application programs conforming to IBM's application programming interfaces.

© Copyright IBM Corp. 2004. All rights reserved. ix Trademarks

The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both: ibm.com® Redbooks™ Balance® IBM® Redbooks(logo) ™ BladeCenter™ IBMLink™ Resource Link™ CICS® IMS™ RMF™ DB2® Magstar® S/390® DFSMSdfp™ MQSeries® Sysplex Timer® DFSMSdss™ MVS/ESA™ System/390® DFSMShsm™ MVS™ Tivoli® DFS™ NetView® TotalStorage® Enterprise Storage Server® OpenEdition® VM/ESA® ESCON® OS/390® VTAM® Extended Services® Parallel Sysplex® WebSphere® FICON® PR/SM™ xSeries® GDPS® Processor Resource/Systems z/OS® Geographically Dispersed Parallel Manager™ z/VM® Sysplex™ RACF® zSeries® HyperSwap™ RAMAC®

The following terms are trademarks of other companies:

Intel, Intel Inside (logos), MMX, and Pentium are trademarks of Intel Corporation in the United States, other countries, or both.

Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.

Java and all Java-based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.

UNIX is a registered trademark of The Open Group in the United States and other countries.

SET, SET Secure Electronic Transaction, and the SET Logo are trademarks owned by SET Secure Electronic Transaction LLC.

Other company, product, and service names may be trademarks or service marks of others.

x Achieving the Highest Levels of Parallel Sysplex Availability Preface

The increasingly competitive global economy is driving the need for higher and higher application availability. Designing and implementing a continuous availability solution to meet business objectives is not an easy task. It can involve considerable effort and expense.

Continuous application availability for zSeries® applications cannot be achieved without the use of Parallel Sysplex®. However, Parallel Sysplex on its own cannot provide a continuous application availability environment. Continuous or near-continuous application availability can only be achieved by properly designing, implementing, and managing the Parallel Sysplex systems environment.

In this document, we discuss how to configure hardware and software, and how to manage systems processes for maximum availability in a Parallel Sysplex environment. We discuss the basic concepts of continuous availability and describe a structured approach to developing and implementing a continuous availability solution.

This document provides a list of items to consider for trying to achieve near-continuous application availability and should be used as a guide when creating a high-availability Parallel Sysplex.

Information is provided in recommendations lists and will help you configure and manage your IT environment to meet your availability requirements.

This publication is intended to help customers’ systems and operations personnel and IBM systems engineers to plan, implement, and use a Parallel Sysplex in order to get closer to a goal of continuous availability. It is not intended to be a guide to implementing or using Parallel Sysplex as such. It only covers topics related to continuous availability.

The team that wrote this redbook

This IBM Redbook was produced by a team of specialists from around the world working at the International Technical Support Organization, Poughkeepsie Center.

Frank Kyne is a Certified IT Specialist at the International Technical Support Organization, Poughkeepsie Center. He writes and teaches IBM classes worldwide on all areas of Parallel Sysplex. Before joining the ITSO six years ago, Frank worked in IBM Ireland as an MVS™ Systems Programmer.

Christian Matthys spent more than 20 years at IBM as a System Engineer, working with large mainframe-oriented customers in France. He spent three years as an ITSO project leader on assignment in Poughkeepsie, NY, writing extensively about performance and capacity items in the S/390® environment. He then joined the EMEA e-business marketing to promote the Web technologies on the mainframe. In 2000 he joined the EMEA Design Center for ON DEMAND BUSINESS, based in Montpellier, France, working with customers’ projects to make use of the leading edge technologies. He is Consulting IBM IT Specialist certified.

Uno Bengtsson is a Senior IT Specialist with over 25 years of experience in the computing field. He has worked as a z/OS® Systems programmer for 20 years at one of the larger banks in Sweden. He joined IBM in 1999. He started to work for IGS; during that time he participated in several Parallel Sysplex projects, both doing implementations, and he also participated in Parallel Sysplex reviews.

© Copyright IBM Corp. 2004. All rights reserved. xi Andy Clifton has worked for IBM since 1987, having graduated from The University of Bath, England with a degree in Mathematics and Computing. Throughout this time, Andy has worked as a CICS tester on many releases of CICS, including products such as CICS/VM, CICS/VSE, CICS/ESA, and CICS Transaction Server. He has been involved in several IBM cross-lab projects working in Toronto, Poughkeepsie, and San Jose as a CICS expert. At

Hursley, Andy works on testing various areas of CICS including new function, and acts as an informal mentor and sounding board for new and experienced testers as well as receiving requests for assistance from a wide variety of IBM colleagues. He also specializes in automation of testing and development of test environments/workloads (a role that has resulted in patent filings for approaches to result validation).

Steve Cox is an Advisory IT Specialist in the IBM Software Group organization. He is based in Warwick, England, and works in zSeries Software Technical Sales. He previously spent four years in WBI Software Services as a WebSphere® MQ consultant. He has 15 years of computing experience across numerous platforms and disciplines including extensive work with CICS®. He also holds a degree in Computer Science from Wolverhampton University.

Gary Hines is an OS/390® Technical Consultant with IBM Australia. He has 20 years of experience in the IBM mainframe operating systems area, with particular focus on sysplex, product installation and maintenance, and security.

Dougie Lawson is a Senior Software Specialist in IMS™ with IBM Global Services in the United Kingdom where he is a member of the IGS TSRO Front Office team. He has 24 years of experience in the IT field and 23 years working with IMS. His areas of expertise include IMS, DB2®, Linux, and z/OS. Before he joined IBM in 1994 he was working as a systems programmer for a large United Kingdom bank, responsible for IMS systems and IMS-related projects.

Glenn McGeoch is a Senior DB2 Consultant for IBM's DB2 for z/OS Services organization in the United States, working out of San Francisco, CA. He has 26 years of experience in the software industry, with 18 years of experience working with DB2 for z/OS and OS/390. He holds a degree in Business Administration from the University of Massachusetts and an MBA from Rensselaer Polytechnic Institute. Glenn worked for 19 years as an IBM customer with a focus on CICS and DB2 application development, and has spent the last seven years with IBM assisting DB2 customers. His areas of expertise include application design and performance, DB2 data sharing, and DB2 migration planning. He has given presentations on data sharing at regional DB2 User Groups and he has presented to customers on DB2 stored procedures, migration planning, and application programming topics.

Geoff Nicholls is a Certified Senior IT Specialist, working in the IBM IMS Advocate team for the Silicon Valley Laboratory, based in Melbourne, Australia. Geoff has 15 years of experience supporting IMS customers in Australia, across Asia, and throughout the world. Previously, Geoff was an IMS specialist with the International Technical Support Organization, San Jose Center. Geoff has a Science degree from the University of Melbourne, Australia, majoring in Computer Science. He has worked as an application programmer and database administrator for several insurance companies before specializing in database and transaction management systems with Unisys and IBM.

David Raften has been with IBM since 1982. Currently on the GDPS® team in Poughkeepsie, NY, David has worked with Parallel Sysplex since its inception and has helped many customers design their systems for availability. He is a member of the GDPS and Parallel Sysplex Product Development Teams. David has written several Parallel Sysplex white papers and helped write and review many Redbooks™ on Parallel Sysplex and Performance.

xii Achieving the Highest Levels of Parallel Sysplex Availability Tom Russell is a Consulting Systems Engineer in Canada. He has more than 30 years experience at IBM, supporting MVS and OS/390. He has produced ITSO Redbooks on Parallel Sysplex implementation and performance, continuous availability, Oracle, and the MVS . His areas of expertise include online systems design, continuous availability, hardware and software performance, and Parallel Sysplex implementation. Tom

holds a degree in Mechanical Engineering from the University of Waterloo.

Thanks to the following people for their contributions to this project:

Julie Czubik |International Technical Support Organization, Poughkeepsie Center

Paolo Bruni International Technical Support Organization, Almaden Center

Bart Steegmans International Technical Support Organization, Almaden Center

James Caffrey IBM USA

Angelo Corridori IBM USA

Scott Fagen IBM USA

Tony Giaccone IBM USA

Dick H Jorna IBM Netherlands

Jeff Nesbitt IBM Australia

Iain Neville IBM UK

Ewerson Palacio IBM Brazil

Frank Rodegeb IBM USA

Henrik Thorsen IBM Denmark

Kenneth Trowell IBM USA

Become a published author

Join us for a two- to six-week residency program! Help write an IBM Redbook dealing with specific products or solutions, while getting hands-on experience with leading-edge

Preface xiii technologies. You'll team with IBM technical professionals, Business Partners and/or customers.

Your efforts will help increase product acceptance and customer satisfaction. As a bonus, you'll develop a network of contacts in IBM development labs, and increase your productivity and marketability.

Find out more about the residency program, browse the residency index, and apply online at: ibm.com/redbooks/residencies.html

Comments welcome

Your comments are important to us!

We want our Redbooks to be as helpful as possible. Send us your comments about this or other Redbooks in one of the following ways:  Use the online Contact us review redbook form found at: ibm.com/redbooks  Send your comments in an Internet note to: [email protected]  Mail your comments to: IBM® Corporation, International Technical Support Organization Dept. HYJ Mail Station P099 2455 South Road Poughkeepsie, NY 12601-5400

xiv Achieving the Highest Levels of Parallel Sysplex Availability

1

Chapter 1. Introduction

This chapter introduces the business importance and the concepts of continuous availability with emphasis on Parallel Sysplex. It defines the concepts of continuous availability and discusses the use of Parallel Sysplex to provide seven days a week, twenty-four hours a day availability of an application to the users.

A Parallel Sysplex allows the workload from an application system to be balanced across multiple processor images, each sharing the data bases. Processors can be added to the sysplex to increase capacity, or removed from the sysplex for either planned or unplanned reasons without an outage to the application.

© Copyright IBM Corp. 2004. All rights reserved. 1 1.1 Why availability is important to you

Availability of IT systems can be viewed much like safety in transportation or other industries. Safety does not just happen, but rather requires awareness and constant vigil. Likewise, high availability does not happen by magic or by installing technology alone. Achieving high availability requires choosing specific hardware, configuring with adequacy, setting up specific parameters, and managing the systems and their environment, which effectively focus on service delivery.

No product, hardware, or software, is error free. The reliability of vendor products varies from product to product, whether those products are processors, DASD, network components, or software. Though software has become more reliable in recent years, the reliability characteristics of many product releases are still very cyclic in nature. Application software is typically focused on providing function and not on availability objectives. All too often, availability is not considered when the design specifications are defined and there is less than full exploitation of availability features. In addition, IT organizations are faced with constant change due to business growth or legal considerations. This change activity introduces instability.

A system is built through an integration of many diverse components that have probably not been fully tested in the production environment in which they are running. Sooner or later it is likely that the system will experience a failure. Plugging a bunch of pieces together and expecting them to run in such a complex environment puts IT and their business users at the mercy of their vendors and application programmers and makes them heavily dependent on luck.

To achieve the highest possible levels of availability, it is necessary to reduce the risk associated with changes and proactively prevent problems rather than wait for problems to cause impact.

1.2 Cost of an outage

Downtime costs tens of thousands to millions of dollars per hour. In the Financial Sector Security, breaches have cost from hundreds of dollars to hundreds of millions of dollars per incident; fully thirty per cent were greater than $500,0001.

To measure the business impact of an outage, two complementary approaches can be considered: A first method estimates the cost of missed business opportunities, and a second estimates the cost of lost productivity.  To determine the value of missed business opportunities for outages, four factors are needed: – Number of users affected by the outage – Duration of the outage, from the end user's perspective – Average number of transactions executed per hour, per user – Estimated revenue per transaction For example, if an outage to an order entry application impacted a group of sixty telephone order entry clerks for two hours, then the cost of missed business opportunities could be calculated as follows. As an example: a. 60 clerks x 12 transactions/hour x 2 hours = 1440 missed transactions b. 1440 missed transactions x $50/transaction = $72,000 missed revenue

1 Source: US Secret Service and CERT Coordination Center/SEI Insider Thread Study: Illicit Cyber Activity in the Banking and Finance Sector

2 Achieving the Highest Levels of Parallel Sysplex Availability c. If this outage occurred once per month, then: i. 1440 missed transactions x 12 months/year = 17,280 transactions missed/year ii. 17,280 missed transactions annually x $50 = $864,000 missed revenue The estimated annual impact to the business for this two-hour outage was $864,000. Quantifying the impact of an outage in terms of lost revenue can guide IT in applying resources to the proper business areas.  Another method of estimating the business cost of an outage is to determine lost productivity. To determine the value of lost productivity for outages, factors needed are: – Number of users affected by the outage – Duration of the outage from the end user's perspective – Average cost of an employee, including benefits and burden expenses The previous example can be examined from a lost-productivity perspective. For example: – Employees in the order entry department have an annual cost of $35,000, which equates to $16.57/hour. – 60 clerks x 16.57/hour x 2 hours = $1,988/month. – An estimated annual value would be $1,988/month x 12 months/year = $23,856/year.

Add to this the cost of customer dissatisfaction caused by the outage and possibly having customers move to a competitor.

Figure 1-1 summarizes the cost of an outage calculation.

The cost of an outage is the sum of: Productivity of users impacted

Lost Hourly cost Hours End-user = of users X of Productivity affected disruption Lost IT productivity

Lost Hourly cost Hours of IT = of affected X lost Productivity staff productivity Impact to customer service Lost revenue

Lost Lost Hours Revenue = revenue X of per hour outage Other business losses incurred

Overtime payments = Hourly Wages x Overtime hours + Wasted goods + Financial penalties or fines

Figure 1-1 Cost outage factors

The numbers here are just simple ways to calculate the cost of outages. The effects of impact to customer service are the most subjective of the costs. This depends a lot on how frequently outages occur and for how long. The more customers are affected by the outage, the more chance there is of them taking their business elsewhere. Putting it another way, the cost per minute of an outage increases with the duration of the outage.

Chapter 1. Introduction 3 If you do not know what an outage costs, how can you cost justify measures to improve availability?

How much can be saved by better system management processes? As there is no way to measure the outages that you have not had because you had good practices, the best you can do is look back on the outages you did have and see which of those could have been avoided or shortened with better practices.

1.2.1 Component outage versus service outage If a component such as a channel is broken, IT cares because they must fix it, but the end users should neither know nor care. On the other hand, if service is unavailable, everyone cares.

One of the strengths of Parallel Sysplex technology is that every server in a Parallel Sysplex cluster has access to all data resources and every cloned application can run on every server. Using the zSeries Coupling Technology, the Parallel Sysplex provides a shared data clustering technique that enables workloads to be dynamically balanced across all servers in the Parallel Sysplex cluster. In the event of a hardware or software outage, either planned or unplanned, workloads can be dynamically redirected to available servers, thus providing near-continuous application availability for the application.

Availability needs to be perceived through the eyes of the end user. Sometimes the host z/OS is up, yet the user cannot access the data due to network problems. Sometimes the application is down on one server, yet due to the benefits of a Parallel Sysplex, the user can still access the data. In a more extreme situation, the entire Parallel Sysplex may be unavailable, but due to pre-planning (via a front-end processor or a D/R fail-over) the customer can still do what he wants to. If the customer is serviced, there is no perceived outage.

1.2.2 Availability overview Taking care of availability is much more than just restoring service following an outage. Planning actions are activities to minimize the chance that an incident would occur in the first place and if they do occur, to make sure the environment and procedures are already in place to minimize the impact of this outage. Management activities are presented in Chapter 4, “Systems Management” on page 209. This is done by setting up the environment to eliminate single points of failure whenever possible, and by code design and testing. It is done by having pre-defined procedures, and by training and preparation. Recovery actions start when the incident first takes place, until service is restored. Although IT historically has concentrated attention to this phase, in reality it is only a small part of the total availability plan. Corrective actions deal with fixing the trigger of the outage, while preventive actions attack the root cause of the outage.

Figure 1-2 on page 5 illustrates the cycle of an outage situation.

4 Achieving the Highest Levels of Parallel Sysplex Availability

THE PROBLEM CYCLE Planning Real Time Impact Follow-up and Resolution Analysis Techniques

Measured in months and years Measured in minutes and hours Measured in days and weeks Measured in weeks and months Recognition and Notification Data Gathering

Problem Induced Problem Determination (PD) Defect Recovery Break End User Recovery Change Assignment Service Data Gathering Restored Escalation Back to Problem Source Identification (PSI) Normal Resolution

Problem Fixed Recovery Management Event Management Fault Management Post Incident Review Document Procedures Monitoring & Detection Tracking Root Cause Analysis Management Policies Incident Management Escalation Secondary Contributors Availability Design Problem Determination & Recovery Resolution System Outage Analysis Configuration Database Crisis Management Change Management "Common themes" Develop PD Tools Communication & Coordination Schedule Fixes Trend reports Planning Actions Recovery Actions Corrective Actions Preventive Actions (proactive) (reactive) (reactive) (proactive)

Figure 1-2 The problem cycle of an outage

1.3 Continuous availability in a Parallel Sysplex

A major design goal for the System/390® Parallel Sysplex is to provide a continuously available environment with scalable parallel servers, building on the robustness of MVS and System/390. This book is about continuous availability for the Parallel Sysplex environment. Hopefully you will see the value that Parallel Sysplex brings to availability.

1.3.1 Availability definitions Availability may have different definitions, depending on the context and the role of the people; therefore we must define here what we want to address in this book. For example, too often, fault tolerance is still being erroneously used as a synonym for high availability, and it is still common to encounter reliability and high availability being used interchangeably. This confusion can cause misinterpretations, and we think that straightforward definitions and a clear classification are necessary to avoid any misunderstanding.

High availability (HA) This is the characteristic of a system that delivers an acceptable or agreed-upon level of service during scheduled periods. The availability level can be stated in a service level agreement between the end users' representative and the service provider; this is the agreed level. However, when the service is delivered to the general public, the availability level must be "acceptable" since there is no means to establish a service agreement. Many installations consider an availability of 99.7 percent as an accepted minimum level for high-availability systems. The fact that the service is available only during scheduled periods implies that there is time available to apply changes and maintenance outside these periods.

Chapter 1. Introduction 5 Continuous operation (CO) This is the characteristic of a system that operates 24 hours a day, 365 days a year, with no scheduled outages. This does not imply that this system is highly available. An application can run 24 hours a day, 7 days a week and be available only 95 percent of the time, because of unscheduled outages.

Continuous availability (CA) This is HA 24 hours a day and 365 days a year. CA combines the characteristics of continuous operations and high availability, to mask or eliminate all planned (scheduled) and unplanned (unscheduled) outages from the end user. Availability must be measured at the level of the end user, who is concerned only with getting his/her work done. In order for that to happen, applications must be operational and available to the end user, including hardware, software, network resources, and data required to run the applications.

It is immediately apparent that CA requires additional system capacity. This allows the workload requiring CA to be either executed in parallel on two or more systems, or switchable nondisruptively from a primary to an alternate system. When it is necessary to apply service or maintenance, one of the systems or the primary system can be shut off (after switch-over of the primary to the alternate) without service disruption.

However, it is worth noting that many system components, including zSeries processors and z/OS, have CA attributes allowing changes and maintenance to be applied even while a system continues to run. Such CA attributes remove the need to switch the workload to an alternate system to perform changes and maintenance on the primary system.

It is important to note that availability, as defined here, is both a service and a system characteristic, and a component of service quality assessed by the end user. For this to be meaningful, availability must be measured at the level of the end user.

1.3.2 Spectrum of availability factors Figure 1-3 on page 7 illustrates the different definitions of the availability.

6 Achieving the Highest Levels of Parallel Sysplex Availability

Spectrum of Availability Factors Systems Data People Management

Networks Hardware

High Continuous Availability Operation Environmentals Software

Continuous Availability

Figure 1-3 Continuous availability factors

There are a number of factors that can affect the continuous availability of a system to the end user. Outages occur either in a planned manner or an unplanned manner.  Planned outages are situations where the installation decides to interrupt the service to the end user by stopping components in order to make changes or updates to them. Examples of planned outages are anticipated application shutdowns, hardware of software maintenance, or data base reorganization.  The unplanned outages are a result of a failure to one or more components that causes the service to the end user to be interrupted for a period of time. Examples of unplanned outages are problems with the hardware of the system or the software, an environmental problem such as a power failure, a failure in the TP network, a server data loss or data integrity problem, or an error caused by an operator or with a systems management procedure not invoked properly.

In order to achieve continuous availability, both planned and unplanned outages must be masked or eliminated from the point of view of the end user. With each type of outage, there is a price to pay. It comes in terms of how long and how often the service is interrupted, and how many users or critical applications are affected. For example, taking a system down for several hours in the middle of Sunday night may not be nearly as disruptive as five minutes in the middle of Monday morning.

A computer system is made of many components, most of which must remain available in order to insure the availability of the system. Each has a role to play. But, each of these components is related, so strengthening the availability characteristics of one often will improve another.

To eliminate unplanned outages, or to achieve high availability, requires consideration of things like fault-tolerant hardware, resilient software, duplicated data, and alternate links to the system and to data. These are all fault tolerance techniques that mask failures, but this is

Chapter 1. Introduction 7 not enough to achieve continuous availability. In addition, systems management techniques must be employed to provide policy-based automation to:  Prevent errors by detecting conditions that require intervention.  Automate recovery procedures to cut down on recover time.  Eliminate opportunities for operator error.

S/390 hardware and software running as a Parallel Sysplex offers you a unique continuous availability platform, handling both fault tolerance and system management issues.

Software comments System failures often occur at the joints of components. Since software contains many interfaces, it is a major contributor of unplanned outages. This is true for system software as well for application software. If applications are inadequately tested or poorly designed many disruptions will occur. Availability should be one of the design factors from the early planning of a new application.

This is an area where IBM has invested heavily. For example, using a combination of hardware and software, we have created the ability to isolate the CICS system code and CICS applications from other CICS applications and thus to prevent an inadvertent overlay of the code. As a result most programming errors from storage overlays are found in test instead of production.

The number of planned outages can be decreased by good planning as well. IBM has invested heavily in this area as well. A number of new features helps to eliminate system IPLs, for example:  Dynamic I/O reconfiguration management allows addition, deletion, or changes in system devices without an IPL.  Dynamic APFlist and dynamic exits in MVS/ESA™ allow changes to software without an IPL.

In Chapter 3, “z/OS” on page 65, we discuss how the Parallel Sysplex allows for system maintenance without an outage to the end user.

1.4 What this book is all about

We have seen already that multiple numbers of activities must be used to achieve the highest levels of availability, as illustrated in Figure 1-4 on page 9 and named here below.

8 Achieving the Highest Levels of Parallel Sysplex Availability

People Technology Application and Data

Process Figure 1-4 The different design and technique processes to approach the highest availability

The activities that must be used to achieve the highest levels of availability are:  Quality of technology: – Reliable components – Redundancy in systems and network – Capacity to support failures – Exploitation of product availability features – Resiliency and fast initialization – Isolation of function – Controlled diversity – Security controls – Automation – End-to-end monitoring tools  Robust applications and data – Applications designed for high availability – Redundant data and applications – Minimal defects and fast recovery  Skilled people – Competence and experience – Awareness and forethought – Adequate depth of resources –Proactive focus – Management support  Effective processes – Service level objectives aligned with business – High availability strategy and architecture – Predefined recovery and incident procedures – Robust testing – Documented configuration and procedures – Effective incident management – Proactive problem prevention – Minimal risk associated with change

Chapter 1. Introduction 9 In this book we want to mainly address the technology, hardware and software (mainly related), and the management aspects for the highest availability.

This book aims to provide a checklist of best configuration practices and recommendations that will allow you to move toward continuous application availability in a Parallel Sysplex configuration. It is not a design tutorial. This checklist can be used to structure the assessment of an installation, pointing out areas where single points of failure can be reduced or eliminated. Items in the checklist are identified in a list format that looks like the following: Point to be checked when considered.

Most of this document is concerned with high availability. That is, it is focussed on the proper use of Parallel Sysplex to remove single points of failure. This term is defined in 1.3.1, “Availability definitions” on page 5.

As technologies and management processes are progressing every day, this list of recommendations is a work in process, and is therefore incomplete. The information will need to be continuously updated as technologies and management tools change. In particular, we focus in this book on major availability components: Hardware, operating system, and systems management processes. Other components such as middlewares and applications play an important role in high-availability concerns; they are not part of this book. Hopefully later editions of the book or other technical papers will include more of these components.

Please use the Reader’s Comment Form at the end of the book to provide us with feedback and any comments you have. We look forward to hearing from you.

10 Achieving the Highest Levels of Parallel Sysplex Availability

2

Chapter 2. Hardware

In this chapter we discuss the hardware aspects of delivering a high-availability service, focusing on a Parallel Sysplex environment. Although hardware has become more reliable over the years, it is still vital to the availability of your applications that the underlying hardware is configured to enable you to deliver the required levels of service. It is important to remember that performance is an aspect of availability, so in this chapter we touch on capacity as well as availability topics.

Specifically, we discuss the following:  Environmental  CPCs  Coupling facilities  Switches  DASD and other devices

© Copyright IBM Corp. 2004. All rights reserved. 11 2.1 Environmental

One of the most basic requirements for continuity of service is that the computing hardware is operating in a safe and secure environment, within the environmental parameters specified by the manufacturer.

If you wish to attain the highest levels of availability for your hardware, you must start with sound fundamentals; in other words, you must provide a secure environment for the hardware to operate in without interruption.

2.1.1 Power There are three basic requirements for your power infrastructure:  The normal main power supply should be as reliable and robust as possible, with a minimal number of outages.  The infrastructure you provide must be able to keep your equipment up and running over any power interruption.  It must also be able to keep all your essential equipment running until main power is available again.

In order to satisfy these requirements, the following has to be carefully considered: Ensure that critical components have redundant power connections and that these components are actually connected to redundant power sources. – In order to minimize the risk of a complete loss of external power, you should avoid any single points of failure in the delivery of that power. For example, power should be delivered via at least two diverse routes, entering the building at different locations, feeding to two power distribution panels, and so on. Each supply must be capable of delivering enough power to maintain the whole installation on its own. Some installations even go as far as sourcing their power from two different power supply companies to protect themselves from an event that would impact one of the vendors. – Some critical equipment (Sysplex Timer, CPCs, CFs, ESS, etc.) comes with two power supplies and two power cords. Make sure that for those devices, each power cord is connected back to a different circuit board, different external power supply, and so on. Provide an uninterruptable or redundant power supply for critical components. – The second requirement can only be met by an Uninterruptable Power Supply (UPS). You must ensure that the UPS has sufficient capacity for your configuration. You should also monitor the batteries—as UPSs age, the battery life decreases. You must be sure that the UPS can keep all your equipment running over the time it takes for your generator to start up and take over the load. – The last requirement can only be met by a generator or set of generators. Even if you go to the trouble of sourcing your external power from more than one power company, there are situations where this can prove to be insufficient. For example, the power failure that impacted the whole northeast of the United States in summer 2003 is an event where even having two separate power providers would have been no help. – You should consider when looking at power supplies the devices that will be provided with power from your own source. The obvious items are the computer room equipment. However, do not forget other critical equipment that you cannot function without, for example: • The cooling equipment for all machine room areas must continue to operate.

12 Achieving the Highest Levels of Parallel Sysplex Availability • Cordless phones will not function without power. Most computer centers and help desks have telephony equipment that will not function without power. • Security system. The system that controls badge access to the computer area must always be available, and that requires that the server running the system, and probably the badge readers themselves, will require power. • You will need at least reduced lighting in order to be able to operate effectively. • At least some of the normal wall sockets should be connected to the generator(s). – While we discuss it again in 2.3.7, “CF volatility” on page 39, it is worth mentioning here that we recommend that coupling facilities are always ordered with internal battery backup features. These will not make any difference if just one CF loses power; however, if both CFs were to lose power, the presence of the internal battery will significantly shorten recovery times. Note that the Internal Battery Feature can be ordered as an MES on 9672 and z990 processors; however, it can only be ordered as a feature on new 2064-100s. If a 2064-100 is ordered without the IBF feature, that feature cannot be added at a later time. The z800 does not support the IBF feature. Information about the expected “hold up” time for each model can be obtained in the Installation Manual - Physical Planning for the appropriate range of CPCs. Establish procedures to test failover to the redundant power source on a regular basis. – For example, you should also test your UPS regularly to make sure it cuts in as expected. In particular, any time you make any changes to the power configuration or install significant new equipment, make sure the UPS still behaves as you expect. – Just as with the UPS, you should test the generator regularly to ensure that it behaves as planned. And do not forget to ensure that you always have a sufficient supply of fuel for the generators.

2.1.2 Cooling You have to provide sufficient cooling capacity and redundancy to ensure that environment conditions can be maintained in case of an equipment failure. Even though modern mainframe equipment no longer requires water cooling, a termperature-controlled environment still is required. Therefore make sure that the cooling capacity is sufficient and install a redundant cooling unit that can take over in the event of equipment failure. Establish procedures to ensure speedy takeover in the event of an environmental failure.

2.1.3 Geographic location If you are hoping to deliver a high-availability service, there is little point locating the equipment that regularly suffers from power failures, severe storms, flooding, fires, or any of the many other natural disasters that some locations are prone to. You should also give consideration to neighboring establishments. For example, a chemical leakage at an adjoining plant can mean that physical access to your installation is denied, even if there is nothing actually wrong with your site itself. At a minimum, you should ensure that all your equipment can be operated, monitored, and

controlled from a remote location that is at least distant enough that access will not also be denied to that facility in case of a local disaster.

Chapter 2. Hardware 13 2.1.4 Physical security There has always been a need for strict security at computer sites because of the value and sensitivity of the equipment contained therein. However, in the modern world, there is the additional risk of terrorism.

It is extremely expensive, maybe even not possible, to protect a computer installation from all potential forms of attack. Certainly one of the ways of improving security is through anonymity and protection of confidential information. Many companies will no longer divulge the location of their computer facilities. The next best line of defence is to use remote copy technology and GDPS to mirror all data to a remote site that can take over operations at very short notice in the event of a disaster impacting one of the sites. So, while it may not be possible to have a 100 percent secure primary site, the chances of losing two adequately equipped and protected sites should be low enough to constitute an acceptable business risk.

2.1.5 Automation Nearly all hardware is designed so that some form of message or alert is given in case of a problem or failure. However, there is no point issuing these alerts unless they reach the person responsible for that piece of equipment. It is not unknown for a redundant piece of hardware (like a Sysplex Timer, for example), to be broken for months without anyone noticing. Therefore, automation should be put in place to trap and forward such messages and alerts to the person responsible for that item of hardware, and also to at least one other individual (to avoid single points of failure).

2.1.6 Physical configuration control Lift the false floor in most computer rooms, and you are faced with a snake pit of tangled cables of various types. In order to ensure that devices are actually connected in line with your plans, and most especially, to ensure that devices and cables can be easily identified in case of a problem, it is vital that all equipment and cables are clearly labelled. Ideally an integrated cable management system will be employed. An example is HCM, which lets you control not only your hardware devices, but also the cables used to connect them, cable cabinets, and so on.

2.2 Central Processing Complexes (CPCs)

There are many different terms used to describe the same thing (the piece of hardware that the z/OS operating system runs on). For consistency, throughout this chapter we use the term CPC; however, other frequently used terms are Server, CPU, processor, CEC, box, footprint, and mainframe. We also use this term to distinguish between processors that are only able to run Coupling Facility Control Code (which we will call stand-alone CFs), and the ones that can run z/OS and other operating systems, as well as CFCC.

The configuration in Figure 2-1 on page 15 shows the basics of high availability. We start off by discussing the CPCs in this section, and then proceed through the rest of the configuration.

14 Achieving the Highest Levels of Parallel Sysplex Availability

network CF1 CF2 network

12 1 11 2 10 VTAM/TCP 3 9 VTAM/TCP 4 8 5 6 7 CICS TOR CICS TOR

Sysplex

R R R R R R R R

O O O O O

O O

Cons O Timer Cons

A A A A A A A A

S S S S S S S S

C C C C C C C

12 C

I I I I I I I 1 11 I

2 10

C C C C C C C 3 9 C 4 8 5 7 DB2 6 DB2 IMS Sysplex IMS z/OS Timer z/OS Chan Ext. DWDM

Figure 2-1 Basics of a high-availability configuration

2.2.1 How many CPCs to have In order to avoid a single point of failure, you should always configure a minimum of two CPCs, regardless of the fact that a single CPC might be able to provide sufficient processing power to handle the whole workload. Even though the reliability of CPCs is increasing all the time, there are still planned changes that require a shutdown or Power-On-Reset of a whole CPC, so having at least two CPCs allows you to maintain application availability at all times. For availability reasons, the CPCS should be placed in different locations, or if in the same location, at least separated from each other by a fireproof/flood-proof wall. When placing CPCs in different locations, it is important to be aware of the distance limitations of CF links, ESCON® channels, Sysplex Timers, and so on. Also be aware of the performance impact the distance can have on your applications. Each CPC should have at least one, and preferably two, LPARs that participate in the production sysplex. By doing this, we make it possible to lose one CPC and still have the possibility to have our applications available. By having two production LPARs on each CPC, we still have access to the MIPS of that CPC even if we need to shut down one of those LPARs.

2.2.2 Availability features Not all CPCs have the same availability features. Each new generation of CPC usually provides higher inherent availability as well as new features and functions that can improve availability.

Chapter 2. Hardware 15 In addition, with the constantly growing functionality of z/OS and the ability to run Linux on zSeries CPCs, the applications that can run on zSeries extend beyond the traditional CICS/IMS/DB2 ones. As a result, when deciding where best to place a given application, you may actually be comparing the availability features of a zSeries CPC against a different architecture box, for example, an Intel® or UNIX system. For this reason, we have provided

the following list of availability features on current generation zSeries CPCs, some of which are built in, and others are optional or require some effort to exploit:  Independent dual power feeds.  N+1 power supplies - A redundant power supply, able to take over in case of a failure.  N+1 cooling.  Internal Battery Feature - A backup power supply, packaged in the machine, that can be used to keep the CPC running in case of a power failure. This feature can be used in a addition to a UPS.  Concurrent maintenance and repair of selected components (such as fans, power supplies, I/O cards) within the CPC. It also covers the ability to upgrade the CPC microcode concurrently with normal operations.  Dynamic I/O reconfiguration management - Enhances system availability by supporting the dynamic addition, removal, or modification of channels, control units, I/O devices, and I/O configuration definitions to both hardware and software without requiring a planned outage.  Nondisruptive replacement of I/O - License Internal Code (LIC) enables z900 servers to remove traditional I/O cards Parallel, ESCON, and OSA-2) and replace them with higher bandwidth I/O cards (FICON® and OSA-Express) in a nondisruptive manner. An Initial Machine Load (IML) or a re-Initial Program Load (IPL) is not required when replacing Parallel or ESCON channels. Installations at or near the 256 CHPID, within this type of server, limit will find this capability a valuable enabler to maximize their configurations when adding higher bandwidth connections.  The ability to share channels between multiple LPARs, known as Multiple Image Facility (MIF) for servers not being able to share channels between partitions residing on different LCSS.  Console Integration Feature, which allows you to continue processing even if all channel-attached consoles are lost.  Transparent CP/SAP/ICF sparing is transparent in all modes of operation and requires no operator or operating system intervention to invoke a spare CP. It is effective on all models including uniprocessor models. With transparent sparing, the application that was running on the failed CP will be preserved and will continue processing on a new CP with no customer intervention required.  Dual cryptographic coprocessors - There are two Cryptographic Coprocessor Elements; if one fails, the operating system automatically reschedules and dispatches the failed instruction on the other CMOS Cryptographic Coprocessor Element (for 9672 and 2064 models).  Concurrent CPC upgrades: – Installed but unused memory can be activated (see “Capacity Back Up” on page 20 for more information). – Installed but unused ESCON and ISC3 ports can be activated. – Installed and spare PUs can be activated (see “Concurrent upgrade” on page 17 for more information). – Spare slots in existing I/O cages can be populated with new I/O cards.

16 Achieving the Highest Levels of Parallel Sysplex Availability – Capacity Upgrade on Demand can be used to enable these upgrades concurrently (see “Definitions” on page 18 for more information). – Capacity BackUp provides the ability to quickly and nondisruptively bring additional CPs online on a temporary basis, to assist in recovery from a disaster (see “Capacity Back Up” on page 20 for more information). – Customer Initiated Upgrade allows a customer to upgrade their CPC without the involvement of an IBM service representative (see “Definitions” on page 18 for more information). – On/Off Capacity Upgrade on Demand allows spare PUs to be turned on and then off again to meet peak processing requirements.  Plan Ahead for nondisruptive I/O upgrades provides for the installation of spare I/O cages on a new CPC, allowing later concurrent installation of new I/O cards. Plan Ahead can also be used to enable nondisruptive memory upgrades to the maximum amount of memory supported on each model.  Support for CICS Subsystem Storage Protection.  Support for CICS Subspace Group Facility.  Ability to run multiple independent operating systems on a CPC—known as Logical Partitions.  PAR dynamic storage reconfiguration - Provides the ability to nondisruptively move CPC storage from one LPAR to another. This facility has been enhanced to remove the restriction of storage reconfiguration only being possible between adjacent LPARs.  Automatic Reconfiguration Facility (ARF) - A facility that z/OS can invoke that automatically resets a LPAR, moves storage to that LPAR from a donor LPAR, and deactivates the donor LPAR. This would typically be used with hot standby or backup LPARs. If ARF is being used, the I/O Interface Reset facility should not be enabled in the PR/SM Reset profile.  Automatic I/O Interface Reset Facility - This facility allows an operating system to reset its I/O configuration prior to entering a disabled wait. This facility minimizes the risk of shared DASD, causing a hang within a sysplex. This facility should not be used if ARF is being used.  System Isolation Function that is exploited by Sysplex Failure Manager (SFM).  Ability to move to a new CFCC level without requiring a Power-On-Reset. Ability to add an LPAR nondisruptively.

Note: Be aware that the above information may differ between different CPCs. To find the appropriate information for your CPC read the Reference Guide for the associated CPC. There are also IBM Redbooks available that discuss the subject for the z900 and z990. Refer to the IBM Redbooks IBM eServer zSeries 900 Technical Guide, SG24-5975, and IBM eServer zSeries 990 Technical Guide, SG24-6947.

2.2.3 Concurrent upgrade Current CPCs have the capability of concurrent upgrades, providing additional capacity with no outage. In most cases, with prior planning and operating system support, a concurrent upgrade can also be nondisruptive to the operating system, meaning that capacity can be added (and in some cases removed) without impacting the running systems. The LIC - Configuration Control (LIC-CC) provides for CPC upgrade with no hardware changes by enabling the activation of additional installed capacity.

Chapter 2. Hardware 17  Planned upgrades can be done using the Capacity Upgrade on Demand (CUoD), Customer Initiated Upgrade (CIU), or On/Off Capacity Upgrade on Demand functions. CUoD and CIU are functions available that enable concurrent and permanent capacity growth of a server. – CUoD can concurrently add processors, memory, and channels, up to the limit allowed by the existing configuration. CUoD requires IBM service personnel for the upgrade. The CPCs that support CUoD are listed in Table 2-1. – CIU can concurrently add processors and memory up to the limit of the installed MCM and memory cards (books). CIU is initiated by the customer via the Web, using IBM Resource Link™, and makes use of CUoD techniques. CIU requires a special contract. On/Off Capacity Upgrade on Demand provides the ability to temporarily add and then remove additional PUs. On/Off Capacity Upgrade on Demand is intended to be used to add additional capacity that may be required to get over peak processing periods. The CPCs that support On/Off Capacity Upgrade on Demand are listed in Table 2-1.  Unplanned upgrades can be managed by the Capacity BackUp (CBU) for emergency or disaster/recovery situations. All IBM S/390 and zSeries CPCs since 9672 G5 support CBU. In addition, some CPCs support concurrent CBU UNDO, allowing you to back off the CBU capacity without a Power-On-Reset.

The CPCs that support these features are listed in Table 2-1

Table 2-1 Concurrent upgrade support matrix Feature 9672 G5/G6 z900 z990

Capacity BackUp Y Y Y

CBU UNDO Y Y

CUoD Y Y Y

CIU Y Y

On/Off Capacity Upgrade on Demand Y

Concurrent Memory Upgrade Y Y

Concurrent CP Upgrade Y Y Y

Concurrent ICF Upgrade Y Y

Concurrent IFL Upgrade Y Y

Concurrent SAP Upgrade Y Y

Concurrent I/O Upgrade Y Y

Notes: For detailed information on the capabilities of 9672 G5 and G6 CPCs, refer to S/390 Parallel Enterprise Server and OS/390 Reference Guide, G326-3070. For detailed information on the capabilities of z900 CPCs, refer to IBM eServer zSeries 900 and z/OS Reference Guide, G326-3092. For detailed information on the capabilities of z990 CPCs, refer to IBM eServer zSeries 990 and z/OS Reference Guide, GM13-0229.

Definitions Definitions follow.

18 Achieving the Highest Levels of Parallel Sysplex Availability Capacity Upgrade on Demand/Customer Initiated Upgrade Capacity Upgrade on Demand (CUoD) allows for the nondisruptive addition of one or more Central Processors (CPs), Internal Coupling Facilities (ICFs), Integrated Facility for Linux (IFLs), and IBM zSeries Assist Processor (zAAP). CUoD can quickly add processors up to the maximum number of available inactive engines. The CUoD functions are:  Nondisruptive CP, ICF, IFL, and zAAP upgrades within minutes. Concurrent upgrades via LIC-CC can be done for CPs (which can be used by the operating system or CF LPARs), IFLs (which can only be used by Linux or by z/VM® if it only runs Linux guests), and ICFs (which can only be used by CF LPARs). It is only possible to do a concurrent upgrade if the configuration contains spare PUs. For information on which configurations contain spare PUs, refer to the Reference Guide for your CPC.  Dynamic upgrade of all I/O cards in the I/O Cage. There are two options for nondisruptively adding new I/O capacity: – The first one involves enabling spare ports on already-installed channel cards. This is supported for ESCON and ISC-3 CF Links. – The other option involves adding new channel cards to spare slots in an I/O cage. In this case, I/O configuration upgrades can be concurrent by installing, nondisruptively, additional channel cards. The concurrent upgrade capability can be better exploited when a future target configuration is considered in the initial configuration. Using this plan-ahead concept, the required infrastructure for concurrent upgrades, up to the target configuration, can be included in the server’s initial configuration. This process of adding spare capacity to allow future upgrades in known as Plan Ahead.  Dynamic upgrade of spare installed memory - This capability allows a processor’s memory to be increased without disrupting the processor operation. To take advantage of this capability, a customer should not plan processor storage on the 16 or 32 GB increments. A customer, for example, with 24 GB of storage will be able to concurrently upgrade to 32 GB but will not be able to get to the next increment of 40 GB without a disruption.

Customer Initiated Upgrade (CIU) is the capability to initiate a processor and/or memory upgrade when spare PUs/installed unused memory are available via the Web using IBM Resource Link. Customers will be able to download and apply the upgrade using functions on the Hardware Management Console via the Remote Support Facility.

A Capacity Upgrade on Demand (CUoD) or a Customer Initiated Upgrade (CIU) always results in a permanent capacity upgrade:  With CUoD you go through the usual ordering process for getting more storage or processors. Once the order has been processed, the IBM service personnel can use the CUoD functionality to enable that additional capacity nondisruptively.  CIU provides the capability for the zSeries customer to initiate a zSeries processor and/or memory (depending on which model) upgrade via the Web using IBM ResourceLink. When the order has been entered, it allows a customer to download and automatically apply the upgrade using the Remote Support Facility functions on the hardware.

On/Off Capacity Upgrade on Demand The On/Off Capacity Upgrade on Demand (On/Off CUoD) for z990 servers is the ability for the z990 user to temporarily turn on spare PUs available within the current model. On/Off CUoD uses the Customer Initiated Upgrade (CIU) process to request the upgrade via the Web, using IBM Resource Link. This capability is mutually exclusive with Capacity BackUp (CBU) because both use the same record type.

The only resources eligible for temporary use are CPs. Temporary use of IFLs, ICFs, memory, and I/O ports is not supported. Spare PUs that are currently spare can be

Chapter 2. Hardware 19 temporarily and concurrently activated as CPs via LIC-CC, up to the limits of the physical CPC size. This means that an On/Off CUoD Upgrade cannot change the z990 model, as additional book installation is not supported. However, On/Off CUoD changes the server’s software model.

Capacity Back Up Capacity Back Up (CBU) is offered to provide reserved emergency backup processor capacity for unplanned situations where customers have lost capacity in another part of their establishment and want to recover by adding reserved capacity on a designated CPC. CBU cannot be used for peak load management of customer workload. Also, CBU can only add CPs—it cannot be used to add memory or I/O.

The installation of the CBU code provides an alternate configuration that can be activated in the face of an actual emergency. The CBU contract provides for a number of test activations and an agreed maximum period that the CBU can be used for following a real disaster/recovery.

When the emergency is over (or the CBU test is complete), the machine must be taken back to its original, permanent configuration. The CBU features can be deactivated by the customer at any time before the expiration date. If CBU is not deactivated before the expiry date, the performance of the system will automatically be degraded.

When a disaster occurs, CBU provides the extra capacity without disruption. When the disaster is over and CBU is deactivated, Concurrent UNDO CBU allows the system to return to its previous configuration without disruption.

For detailed instructions refer to the zSeries Capacity Backup User’s Guide, SC28-6810.

Software support for concurrent upgrades Most CPCs allow concurrent upgrades, meaning they can dynamically add more capacity to the CPC. If operating system images running on the upgraded CPC do not need to be IPLed to use the new capacity, the upgrade is nondisruptive. This means that Power-On-Resets, deactivations, and IPLs cannot take place. If an “image upgrade” is required to a logical partition, the operating system running in this partition must have the capability to concurrently configure more capacity online.

Be aware of that to make use of the nondisruptive upgrade support, you have to prepare by making use of the RESERVED option in LPAR Profile on the HMC.

If reserved processors are defined to a logical partition, z/OS, OS/390, z/VM, and VM/ESA® operating system images can dynamically configure more processors online. The Coupling Facility Control Code (CFCC) can also configure additional processors online to Coupling Facility logical partitions.

Memory can also be concurrently added to a server running in LPAR mode, up to the physical installed memory limit. Using the previously defined reserved memory, z/OS and OS/390 operating system images can dynamically configure more memory online, allowing nondisruptive memory upgrades.

Recommendations Here is a list of things to consider to avoid disruptive upgrades for this environment: Define “spare” LPARs. It is possible to define more partitions than you need in the initial configuration, just by including more partition names in the IOCP statement RESOURCE. The spare partitions do not need to be activated, so any valid partition configuration can be used during their

20 Achieving the Highest Levels of Parallel Sysplex Availability definitions. The initial definitions (LPAR mode, processors, and so on) can be changed later to match the image type requirements. The only resource that spare partitions will use is sub channels, so careful planning must be done here. Note: a new feature, called “dynamic partition naming and activation” is now available for the latest server models and may be considered instead. Configure as many Reserved Processors (CPs, IFLs, and ICFs) as possible. Configuring Reserved Processors for an LPAR enables them to be nondisruptively added. The operating system running in the LPAR must have the ability to dynamically configure processors online. Configure Reserved Storage to logical partitions. Configuring Reserved Storage for all logical partitions before their activation enables them to be nondisruptively upgraded. The operating system running in the logical partition must have the ability to configure memory online.

2.2.4 Redundant capacity Even with all these features available on current CPCs, there is still the possibility that a CPC can fail or you might need to make an upgrade or apply service. To be able to continue to deliver an acceptable level of service after removing a system or a CPC from the sysplex, configure your CPCs for redundancy, making it possible for the remaining CPC(s) to handle the workload from a CPC that has failed or was shut down for maintenance.

You have a number of choices about how you provide the required backup capacity:  You can configure all your CPCs with spare capacity that lies unused until required.  You can configure so that you have other systems that can be shut down at short notice so their capacity can be used to restore systems from the unavailable CPC.  You can use features like CBU to turn on the additional capacity only when and as needed.  Rather than moving complete systems, you can just move workloads from the failed systems to other systems in the same sysplex.

The extra capacity you will require to handle a fail-over situation in the event of a server failure or a system outage may be related to different components:  Storage Make sure you have enough extra storage capacity available to handle the increased workload. Note that CBU provides the ability to turn on additional CPs, but not additional storage. On z990 and later CPCs, however, it may be possible to bring additional memory online dynamically using a feature known as Memory Upgrade on Demand (see “Capacity Back Up” on page 20 for more information). If you are going to deactivate some test or development LPARs in order to be able to start a production one, you need to ensure that sufficient storage will be available for the production LPAR. While it is not necessary for the storage of the two LPARs to be contiguous, there are considerations for an LPAR that will use storage that is not contiguous. For more information, refer to the section entitled “Determining the Characteristics” in Processor Resource/System Manager Planning Guide for the CPC in question. Note that there is a different level of this document for each generation of CPCs. Be aware that moving a workload to another LPAR could impact the virtual storage consumption in that LPAR, so you must ensure that the target LPARs have enough spare as well as real storage capacity.

Chapter 2. Hardware 21  Processor capacity If you have empty LPARs that will be used to IPL the systems from the unavailable CPC, make sure those LPARs are set up with sufficient relative weight to get the capacity they require, and that they have enough logical CPs defined to deliver that capacity. If you are going to reuse LPARs that were previously used for test or development systems, you will probably need to adjust the weight of those LPARs to reflect the fact that they are now running production work. You may also want to define those LPARs with Reserved CPs that can be brought online when you IPL the production system in that LPAR. If you are going to use CBU to bring additional processing capacity online for the duration of the outage, you should test the process for bringing the additional CPs online. The CBU contract normally provides for an agreed number of test activations each year. If you are going to move workloads into existing LPARs, you may need to increase the weight and number of logical CPs in that LPAR.  Channel bandwidth Make sure there is sufficient bandwidth to handle the increased workload. You must also ensure that as you add devices to the production LPARs that you provide equivalent connectivity from the backup CPC and that the LPARs that will be used for recovery are in the access list of the required channels.  Access to required resources If your applications use special resources, such as cryptographic coprocessors, you need to make sure that those resources are available on any CPC the application might be moved to. Your applications may have connections to other resources at your installation, such as connections to remote locations. If you have a critical application that relies on such resources, you need to make sure that you have connectivity from all CPCs to that resource, or at least have a tested process for establishing a connection in the event of a CPC failure.

Recommendations In the ideal sysplex configuration, you would have CPCs that are of equal size and are able to handle each other’s workload, even though it might be somewhat degraded in the event of a disaster or server failure. However, this is not always the case, so make plans for critical applications and subsystems that must be recovered in the event of z/OS system or CPC failure.

Remember as well that additional capacity, over and above the normal capacity requirement, will be needed during the recovery period. During this time, there is a spike of activity as all the systems and subsystems communicate to organize the recovery. You should remember to budget for this additional capacity. This capacity must be available immediately and should not rely on some workloads being manually stopped or quiesced.

We recommend that the following LPAR definitions are set: Enable the Automatic I/O Interface Reset Facility (if you are not using Automatic Reconfiguration Facility (ARF)). If you are using ARF, you should not use the Automatic I/O Interface Reset Facility, as it is incompatible with ARF. ARF and the Automatic I/O Interface Reset Facility are described in the System Overview manual for the relevant processor.

22 Achieving the Highest Levels of Parallel Sysplex Availability

Cryptographic coprocessors In order to make the addition of PCI cryptographic coprocessors concurrent in LPAR mode, logical partitions must be predefined with the appropriate PCI cryptographic coprocessor number selected in its candidate list. To maximize concurrent upgrade possibilities in this area, it is recommended that all LPARs that have a cryptographic coprocessor defined also define all possible PCI cryptographic coprocessors as candidates for the LPAR. This is possible even if there are no PCI cryptographic coprocessors currently installed on the machine. Support for dynamic reconfiguration When you perform a POR, the hardware I/O configuration definition is loaded from an IOCDS into the HSA. When you activate a new configuration, HSA is updated to reflect the new hardware I/O configuration definition. If you plan to use dynamic I/O reconfiguration to add equipment to your configuration, you must either specify an expansion factor for the HSA size before initiating the POR, or, for z990 CPCs, define in HCD the maximum number of subchannels you want to have for the LPAR. The expansion factor defines how much additional HSA storage is reserved for future dynamic I/O configuration changes. Remember, if you do multiple dynamic I/O reconfiguration you can end up being unable to make dynamic changes, without first having to make a POR. How many changes you can do depends on the expansion factor you entered for the server, without first having to make a POR. Ensure that storage definitions are correct for your environment. Remember that the total amount of storage you can define to all images is the total installed amount of storage, minus the HSA, minus the size of one storage increment. Ensure LPAR weight definitions are correct for your environment. An LPAR with dedicated CPs is not affected by processing weights. Particularly for an LPAR that is a member of a sysplex, you must ensure that the LPAR will receive enough processor cycles to play its part in the sysplex. Insufficient capacity for the LPAR will result in the LPAR not being able to keep up with the other sysplex members, eventually leading to sysplex disruptions. We recommend that any LPAR in a sysplex, especially a production sysplex, should have a weight that guarantees the LPAR is not less than 5 percent of the capacity of that CPC. Give serious consideration to configuring two images from the sysplex on each server. Depending on your LPAR definitions, this may give you the ability to continue to utilize all the available MIPS on the processor even if one of the images has an outage (planned or otherwise).

More information about LPAR definitions can be found in the Processor Resource/Systems Manager™ Planning Guide for your CPC type.

2.2.5 Hardware configuration In addition to your physical configuration, there are things you can do from a logical perspective that can also have a positive impact on availability. Specifically, good configuration management practices lead to better understanding and fewer mistakes. This section contains a number of recommendations that contribute to good configuration management. Have a single IODF containing all the hardware in the installation. At a minimum, the IODF should at least contain all CPCs and devices used by systems in the same sysplex. For availability reasons you might decide to keep more than one physical copy, but the content should all be the same. Having a single IODF allows you to do sysplex-wide

Chapter 2. Hardware 23 dynamic I/O reconfiguration. It also helps you avoid duplicate device numbers—a common source of confusion and mistakes. Dynamic I/O configurations are supported by some operating systems (z/OS, OS/390, and z/VM), allowing nondisruptive I/O upgrades. Note that it is not possible to perform dynamic I/O reconfiguration on an stand-alone Coupling Facility because there is no operating system to drive the reconfiguration on that machine. Remember that dynamic I/O reconfiguration requires space in the HSA for expansion. Use the CHPID Mapping tool to assign CHPID numbers for the channels on your processor. You can assign CHPID numbers by either of two methods: – Manual, where you enter the new CHPID values individually. In this case, the CHPID Mapping tool checks your input for errors. – Much more preferable is the Availability Mapping method, where the CHPID Mapping tool assigns CHPIDs for maximum system availability, distributing channel paths across multiple channels cards, STIs, and SAPs. The tool and documentation can be found at: https://app-06.www.ibm.com/servers/resourcelink/hom03010.nsf You will have to register and create a user ID to get access. Establish a meaningful naming convention for CPCs, LPARs, and OS configs. The convention should communicate information that will avoid any confusion about which entity is being discussed or changed. You should always use the same device number for the same physical device across all images. Where possible use the same CHPID numbers across all CPCs. Never use the same device number for different physical devices or you could end up varying the wrong device offline (or worse, initializing the wrong volume). Try to maintain I/O symmetry across all systems within the sysplex, so that all I/O appears as one I/O pool that is shared among all images. Doing this will simplify operations and make it much easier to move workloads between different systems as required. Configure at least two paths to every device, regardless of the bandwidth requirement. Obviously, heavily used devices should have more paths to provide the required level of performance. Configuring paths through ESCON directors and FICON switches provides much greater flexibility, and utilization of the available bandwidth, especially for FICON channels. Make sure the paths are configured through different directors and switches. Rather than dedicating channels to an LPAR, consider using MIF to share the channels between LPARs and provide greater redundancy. For example, rather than having two dedicated channels to each of two LPARs, use four shared channels. This should provide equivalent performance and gives better availability in case of a failure of any of those channels. Work with your IBM representative to review the Systems Assurance Product Review document for the associated CPCs to ensure that you have followed all the recommendations to maximize availability.

2.3 Coupling Facilities

A Coupling Facility (CF) is a special LPAR that provides high-speed caching, list processing, and locking functions in a Parallel Sysplex. The LPAR can be configured to run within a CPC that runs other operating systems (it is often referred to as an ICF in this case), or it could be configured in a stand-alone processor that only runs Coupling Facility Control Code (CFCC).

24 Achieving the Highest Levels of Parallel Sysplex Availability It is important to understand the difference between a CF and an operating system LPAR. If an operating system LPAR dies, all the work that was running in that system is dead and must be restarted. On the other hand, depending on which structures are in a CF that suffers a failure, it is possible that only a short pause in processing will be encountered as the CF contents are rebuilt in an alternate CF. As a result, it is possible that a CF will not be impacted

to the same degree as an operating system by a failure. What is more important, from an availability point of view, is whether the structures in the failed CF support rebuild and/or duplexing, and whether the failure also impacted any connected systems or not.

In this section we provide a set of recommendations to maximize the availability of the CF service. We also touch on performance, because good performance is especially important for a CF, and poor CF performance can be disruptive for the systems connected to it.

2.3.1 Coupling Facility Capacity In order to be able to deliver acceptable response times, you must have enough capacity in your CFs. In addition to CF CPU utilization, you must also consider response times. It is possible for CF CPU utilization to be low, but CF response times to be so long that the cost of using the CF becomes unacceptable (this is especially the case if there is a large disparity in speed between the CF CPC and the CPCs the operating systems are running on).

Dedicated or shared engines Below we discuss the differences. We always recommend using dedicated engines for production CFs if at all possible, and especially when doing data sharing. The use of shared engines for CFs impacts response times and the cost of using that CF. As a rule of thumb, the use of shared engines in a CF increases the cost of using the CF by 20 to 100 percent. Lock structures in particular should not be placed in a CF that will be using shared engines. Except for this restriction on lock structures, the use of a shared engine would probably provide acceptable performance and overhead for a resource sharing environment. An additional consideration is the use of shared CF engines with System-Managed CF Structure Duplexing. Because duplexing already impacts the response time of the duplexed requests, the additional elongation of response times as a result of using shared engines would likely result in response times that are not acceptable for production environments. Therefore, we recommend that you do not use shared engines in any CF that will contain structures duplexed with System-Managed CF Structure Duplexing. If there is no other alternative to sharing an engine between production and test CFs, enable Dynamic CF Dispatching for the test CF, disable it for the production CF, and give the production LPAR a sufficiently high weight to ensure that it always gets the required resources. If the production CF does not contain any System-Managed Duplexed structures, the weight of the production CF LPAR should guarantee that it gets at least 50 percent of the shared engine. If the production CF does contain any System-Managed Duplexed structures, the weight of the production CF LPAR should guarantee that it gets at least 95 percent of the shared engine.

The use of shared CF engines is discussed in more detail in a Hint and Tip entitled Use of Shared Engines for Coupling Facilities - Implementation and Impact, TIPS0237, on the Redbooks Web site.

How much memory Below we discuss how much memory you will need. Ensuring that your CFs have sufficient storage is actually a much simpler process than storage planning for an operating system LPAR. Unlike z/OS, CFCC is not a virtual

Chapter 2. Hardware 25 storage operating system. The amount of storage you need in your CF is the total of all the storage requirements of all defined structures, plus some room for growth, and taking into account whatever white space you may need to recover from a CF failure.

Which types of Coupling Facility links There are several types of coupling links that can be used to connect an operating system LPAR to a CF LPAR. Which one is most appropriate for a given situation depends on the configuration your performance/capacity requires.

IC links Internal Coupling (IC) links were introduced with the IBM 9672 G5 servers. These are microcoded “linkless” coupling channels that can be used to connect CF LPARs and OS/390 or z/OS LPARs in the same CPC. Because they are memory-to-memory communication, rather than external links, they always provide greater performance than equivalent external links (ISC or ICB). We recommend that IC links should always be used to connect a CF with other LPARs in the same CPC. There is a side-benefit of using IC links. That is, that the speed of the links increases in line with the speed of the CPC. So, if you are using IC links on a z900 base model and upgrade it to a Turbo, the speed of the links will increase as well, ensuring good response times and a consistent cost for using that CF.

External CF links There are two other types of coupling links available to connect a CF LPAR to an operating system or another CF LPAR—Integrated Cluster Bus (ICB) and Inter System Coupling (ISC) links. The significant difference between the two is the speed of the link, and the maximum supported distance—from an availability point of view, both types could be considered to be equivalent, with the same considerations and recommendations.  ICB links are high-speed external coupling links for short distance connections. ICB links consist of a ten-meter copper cable (up to seven meters between CPCs) that connects directly to the Self-Timed Interface (STI) I/O bus. There are three different types of ICBs—ICB-2, ICB-3, and ICB-4. The one that is most appropriate for a given situation depends on the types of CPCs being connected. For information about the connectivity options, refer to the Reference Guide for the relevant CPCs.  The other type of external CF links are Inter System Coupling (ISC) links. These links run at lower speeds than ICB links, but support distances as high as 100 km (when used in conjunction with an zSeries Qualified WDM). Be aware, however, that each 1 km distance on the CF links costs about 10 microseconds in extra response time.

In general, ICB links should be used when the CPCs to be connected are within the supported distances, with ISC links being used when the distance exceeds this amount.

Recommendations Regardless of the type of link being used, you need to configure for:  Availability. You need at least two links between each pair of CPCs (at a minimum). CF links can be shared using MIF. However, there can only be one CF LPAR sharing a given “end” of a link, and the operating system LPARs sharing a link must all be in the same sysplex. Remember that if you share a link between multiple LPARs, the loss of that link will impact all those LPARs.

26 Achieving the Highest Levels of Parallel Sysplex Availability  Performance. While in the early days of Parallel Sysplex, two CF links provided sufficient capacity for most users, the intensiveness with which CFs are being used has increased to the point that in many sites two CF links are no longer sufficient. However, the rule of thumb for PATH BUSY and SUBCHANNEL BUSY remains the same: Both PATH BUSY and the number of requests delayed because of SUBCHANNEL BUSY should not exceed 10 percent of the number of requests. In addition, subchannel utilization (calculated by multiplying the number of requests on the subchannel by the average response time, divided by the number of seconds in the interval) should not exceed 30 percent. You should ensure that you configure sufficient links that these thresholds are not exceeded during peak processing windows.  Avoiding single points of failure. There should be as few single points of failure as possible in common between the links used to connect a set of LPARs. When you order CF links, the configuration provided will normally do the best to avoid single points of failure. For example, if you order two ICB4 links on a 2084-B16, one link will be connected to one book, and the other link will be connected to the other. However, depending on the configuration of your CPC, it is possible that some links will share a mother card, for example. When deciding which links to use to connect an OS LPAR to a CF LPAR, you should work with your IBM service representative to ensure that the links you select have a minimum number of single points of failure.

2.3.2 Failure isolation One of the most important characteristics of a CF is its location in relation to the systems that are connected to it. There are some advantages to it being in a stand-alone processor (we discuss this further in “ICF vs stand-alone CF” on page 30), however the most important question is whether a single failure can impact both the CF and one or more systems connected to it. A CF in a CPC where none of the other images in that CPC are connected to that CF can provide nearly as good availability as one running in a stand-alone processor.

Many structures can be rebuilt even if there is a double failure. However, some of the most important ones in a data sharing environment cannot recover from such a failure without a data sharing group restart (unless the structure is duplexed). To determine your need for a failure-isolated CF, you must review the structures you use to see if any of them fall into that category. Table 2-2 contains a list of all IBM structures available at the time of writing. Category 1 structures are largely unaffected by a double failure. Users of category 2 structures would notice a short impact to service as the exploiter recovers from a double failure. Users of category 3 structures would suffer a data-sharing group-wide outage following a double failure (unless those structures are duplexed).

Table 2-2 Failure isolation requirements of CF structures Structure Category 1 Category 2 Category 3

CICS Logger X (Note 1)

CICS Data Tables X (Note 1)

CICS Named Counter X (Note 1)

CICS Temp Storage X (Note 1)

DB2 GBP X

DB2 Lock X (Note 1)

Chapter 2. Hardware 27 Structure Category 1 Category 2 Category 3

DB2 SCA X (Note 1) Enhanced Cat Sharing X

GRS Star X

HSM Common Recall X Queue

IEFAUTOS (Note 4) X

IMS Cache X

IMS Lock X (Note 1)

IMS Logger X (Note 1)

IMS Shared Msg Q X

IMS VSO X

IMS Resource X

JES2 X

LOGREC X

MQSeries® X (Note 1)

OPERLOG X

RACF® X

RRS Logger X (Note 1)

VSAM/RLS Cache X

VSAM/RLS Lock X (Note 1)

VTAM® GR X (Note 2) X (Note 1)

VTAM MNPS X (Note 3) X (Note 1)

WLM Enclaves X

WLM LPAR Clust. X

XCF Signalling X

Note 1: The failure isolation requirement for this structure can be met if the structure is duplexed using System-Managed CF Structure Duplexing. If this structure is duplexed, it would become a category 2 structure. Note 2: If no VTAMs in the same failure domain as the CF use GR. Note 3: If no VTAMs in the same failure domain as the CF use MNPS. Note 4: The IEFAUTOS structure is only used by releases prior to z/OS 1.2.

To understand the need for failure isolation, let us look at an example of an exploiter with this requirement. To keep it simple, assume we have two general purpose CPCs and a single stand-alone CF. There is one z/OS image in each CPC, and one of the CPCs also has a CF LPAR. Within each z/OS image there is a DB2 subsystem, and both DB2s are members of the same data sharing group. This is all shown in Figure 2-2 on page 29.

28 Achieving the Highest Levels of Parallel Sysplex Availability

CF01

CF02 DB22 (internal CF)

DB21

z/OS z/OS SYSFK01 SYSFK02 Figure 2-2 DB2 data sharing and failure isolation

A data sharing DB2 will have a minimum of three CF structures:  A lock structure  An SCA structure  At least one Group Buffer Pool (GBP) structure

In this example, assume the lock and SCA structures start off in the stand-alone CF. The IRLM address space associated with each DB2 is connected to the lock structure. Each IRLM maintains an in-storage copy of the locks it holds, and also has a copy of the same information in the lock structure.  So, what happens if the stand-alone CF fails? The lock structure will be lost. However, all the information that was in the lock structure is also available by pooling the in-storage lock information from each of the connected IRLMs. In this case, a new lock structure will be allocated in the internal CF, all the connected IRLMs will forward a copy of their in-storage lock information to the new structure, and a few seconds later everything is recovered and both DB2s can continue processing.  And what happens if a DB2 or one of the z/OS images fail? To protect your data, it is vital that any locks held by the failed DB2 will remain in place until DB2 is recovered and can back off any work in progress at the time of the failure. However, because the lock structure is in a failure-isolated CF, and the lock structure contains the superset of all lock information, all the locks held by the failed DB2 are still available and will remain so until the failed DB2 restarts.

Now, let us assume that the lock structure is in the internal CF rather than the stand-alone one, and the CPC containing that CF suffers a complete failure. In this case, both the lock structure and one of the connected IRLMs will be lost. We now have a problem because the remaining DB2 has no idea what locks were held by the DB2 that just disappeared. The only way that DB2 can guarantee that it will not touch the data that the failed DB2 was in the middle of updating is to immediately shut itself down. When that DB2 is restarted, it will read its own logs and the logs of the failed DB2 and reconstruct the lock information that way. However, we have now suffered a complete DB2 data sharing group-wide outage. If the lock structure had been in the stand-alone CF, only one DB2 would have died, and the other one could have continued operating. This is the reason that we say that failure isolation is a requirement for certain structures.

Chapter 2. Hardware 29 Washington Systems Center FLASH W98029 Parallel Sysplex Configuration Planning for Availability provides a discussion of the different categories of CF structures and discusses why some have a requirement for failure isolation. Note that this FLASH was written before System-Managed CF Structure Duplexing was announced. System-Managed CF Structure Duplexing provides the same level of recoverability for these structures as a failure-isolated CF would.

ICF vs stand-alone CF As stated previously, you have the option of running your CF in a general-purpose CPC (one containing operating systems LPARs) or a stand-alone CF processor. The most important attribute is whether the CF is failure-isolated from the connected systems, as just discussed. Apart from this, however, there are some other differences between a stand-alone CF and one in a general-purpose CPC that you should be aware of:  If the processor is only running CFCC, you eliminate the tiny chance that a problem in an operating system LPAR will crash the whole CPC, bringing down the CF in the process.  Because CF level upgrades are disruptive on CPCs older than z990, you need to Power-On-Reset those CPCs to upgrade the CF level. You will probably find it easier to get an opportunity to Power-On-Reset a stand-alone CF than a general-purpose CPC containing production operating system images.  If the CPC has an Internal Battery Feature (IBF), the IBF will probably have a longer hold-up time on a stand-alone CF because it would typically not be as fully configured as a general-purpose CPC.  If you want to be able to add CF links nondisruptively, you can only do that if the CF is running in the same CPC as at least one MVS-based LPAR. This is because you need an OS/390 or z/OS system to drive the dynamic I/O reconfiguration process on the CPC.  Whether you should use only ICFs or only stand-alone CFs depends on whether you are exploiting data sharing, or plan to do so in the foreseeable future. You can of course use a combination of stand-alone CFs and ICFs, with the structures that require failure isolation being placed in the stand-alone CF, and the remaining structures being placed in either one.

Structure placement Structures that have a failure isolation requirement should either be placed in a failure-isolated CF (normally a stand-alone CF), or else they should be duplexed.

Another thing to beware of is structures ending up in a CF other than the one you intended. This can happen, for example, following CF maintenance if the correct procedures are not followed to return the structures to their intended CF. This is a situation that has caused problems for a number of customers. We will give two examples.  Let us say that you have one ICF and a stand-alone CF. DB2 lock structures should always be placed in failure-isolated CF (the stand-alone one, in this case). However, assume that the lock structure has somehow ended up in the ICF. If the CPC containing the ICF fails, you would suffer a double failure (assuming the CPC also contained DB2 subsystems connected to that lock structure). The result would be an outage of the entire DB2 data sharing group. However, if the lock structure had been in the CF you intended it to be in, the only DB2 affected would have been the one in the failed CPC—all other DB2s in the data sharing group would have continued processing.  Another example is where all your XCF signalling structures end up in the same CF. This can often happen following CF maintenance if the POPULATECF option of the SETXCF START,REBUILD command is not used to move all structures back to their normal location. If all XCF signaling is via the structures, and the CF containing those

30 Achieving the Highest Levels of Parallel Sysplex Availability structures were to fail, you would experience a complete sysplex-wide outage. On the other hand, if one XCF was in each CF, as you would normally have, the systems would have only suffered a brief slow down as the affected structures are recovered. Check regularly that all structures are actually in the first available CF in their PREFLIST to avoid this situation. One way to do this is using the z/OS Health Checker, which has a check for this condition. In addition, you might consider issuing a SETXCF START,REBUILD,POPCF=cfname for each of your CFs on a regular basis (maybe once a week on a Saturday night). This will ensure that all structures are in the first CF in their PREFLIST (all other things being equal). Never use a REBUILDPERCENT greater than 1. REBUILDPERCENT can be used to stop a structure being moved in case of a partial connectivity failure. However, with the speed of modern CF technology, it would be very rare that you would not want to move a structure following a connectivity failure. For simplicity, the best thing is just to not use that keyword at all—the default is to always rebuild a structure following such a failure.

2.3.3 Recovering from CF failure As stated previously, CFs are different to operating system LPARs in that if a CF fails, it is possible that the CF contents can be recreated in an alternate CF, allowing operations to continue with just a pause in processing.

When planning for CF recovery, you need to consider whether the structures in the failed CF support recovery from a CF failure, and whether the remaining CFs have sufficient capacity to take over from the failed CF.

Structure recovery Whether a structure can recover from a CF failure or not depends on the exploiter that uses that structure, and whether it supports structure rebuild or duplexing.

Structure rebuild There are two flavors of structure rebuild: User-managed rebuild and system-managed rebuild. Exploiters that support user-managed rebuild can generally recreate the structure contents following the failure of a CF or one of the connectors; however, in some cases they cannot recover from a double failure. Exploiters that only support system-managed rebuild cannot recreate the structure contents following a CF failure.  User-managed rebuild User-managed rebuild, supported by most of the earlier CF exploiters, means that the address spaces connected to a structure take responsibility for moving a structure in case of a planned move, or of recovering the structure in case of an unplanned move (failure of a CF, for example). Most exploiters that support user-managed rebuild maintain an in-storage copy of their data from the structure, meaning that even if the CF containing the structure is lost, they can still create a new version by merging all their in-storage information. To determine if a given connector supports user-managed rebuild, use the D XCF,STR,STRNM=structure_name,CONNAME=ALL command as shown in Figure 2-3 on page 32. The line entitled ALLOW REBUILD in the output indicates whether that connector supports user-managed rebuild. YES indicates that this connector does support user-managed rebuild.

Chapter 2. Hardware 31

D XCF,STR,STRNM=D#$#_LOCK1,CONNAME=ALL IXC360I 11.03.10 DISPLAY XCF 191 STRNAME: D#$#_LOCK1 STATUS: REASON SPECIFIED WITH REBUILD START: POLICY-INITIATED DUPLEXING REBUILD METHOD : SYSTEM-MANAGED AUTO VERSION: BA298A48 C5D0AA4C REBUILD PHASE: DUPLEX ESTABLISHED POLICY INFORMATION: POLICY SIZE : 4096 K POLICY INITSIZE: 2048 K POLICY MINSIZE : 0 K FULLTHRESHOLD : 80 ALLOWAUTOALT : NO REBUILD PERCENT: N/A DUPLEX : ENABLED PREFERENCE LIST: FACIL01 FACIL02 FACIL03 FACIL04 FACIL05 FACIL06 ENFORCEORDER : NO EXCLUSION LIST IS EMPTY ...... CONNECTION NAME : DR$#IRLM$DR$1001 ID : 02 VERSION : 0002000E CONNECT DATA : 80BCB000 093B2048 SYSNAME : #@$1 JOBNAME : D#$1IRLM ASID : 001F STATE : DUPLEXING REBUILD ACTIVE NEW NOT FAILURE ISOLATED FROM CF DUPLEXING REBUILD ACTIVE OLD NOT FAILURE ISOLATED FROM CF CONNECT LEVEL : 00000000 00000000 INFO LEVEL : 01 CFLEVEL REQ : 00000002 NONVOLATILE REQ : NO CONDISP : KEEP ALLOW REBUILD : YES ALLOW DUPREBUILD: NO ALLOW AUTO : YES SUSPEND : YES ALLOW ALTER : YES USER ALLOW RATIO: NO

Figure 2-3 Determining rebuild support in CF structure connectors

 System-managed rebuild System-managed rebuild was introduced with OS/390 2.8 and CF level 8. and makes it significantly easier for products to avail of the facilities provided by a CF. Whereas the connector is responsible for moving a structure when using user-managed rebuild, the operating system takes over that role when using system-managed rebuild. In this case, the connected operating systems work together to copy the contents of a structure from one CF to another. The systems have no knowledge of the meaning of the contents of the structure; they just copy the structure in its entirety. To find out if a connector supports system-managed rebuild, check the line entitled ALLOW AUTO in the output from the D XCF,STR,STRNM command as shown in

32 Achieving the Highest Levels of Parallel Sysplex Availability Figure 2-3 on page 32. If the value is YES, this connector supports system-managed rebuild. Some exploiters support both system-managed and user-managed rebuild; in this case, the system will give preference to a user-managed rebuild of that structure. The downside of system-managed rebuild is that the old structure must still be intact and accessible in order for the systems to be able to copy it. If the reason for the attempted rebuild is a CF failure, system-managed rebuild cannot be used to recreate the structure’s contents. If you have a structure whose connectors only support system-managed rebuild, the only way to protect that structure in the case of a loss of a CF is to use System-Managed CF Structure Duplexing.

Structure Duplexing Similarly, there are two flavors of structure duplexing: User-managed duplexing and System-Managed CF Structure Duplexing. User-managed duplexing is only used by DB2, and then only for its Group Buffer Pool structures. System-Managed CF Structure Duplexing is supported for any structure whose connectors support system-managed rebuild. Both types of structure duplexing provide the ability to continue processing following a failure (assuming there is no single point of failure between the CFs containing the two copies of the structure).  User-managed structure duplexing User-managed structure duplexing was introduced in OS/390 R3 with APAR OW28460 and was integrated into OS/390 R6. This capability is used by DB2 for its group buffer pool structures. With user-managed duplexing, the setting up of the second structure, maintaining synchronization between the structures, and recovering from structure-related failures is the responsibility of the connector—DB2 in this case. Turning on user-managed duplexing for a DB2 GBP structure introduces very little additional overhead, but delivers significant benefits in terms of reduced recovery time should the GBP structure be lost. We highly recommend that user-managed duplexing be enabled for all production DB2 data sharing groups.  System-Managed CF Structure Duplexing System-Managed CF Structure Duplexing was announced as part of z/OS 1.2, with support in CF level 11 and later. With System-Managed CF Structure Duplexing, the operating system takes responsibility for setting up a duplex copy of a CF structure, maintaining the duplex relationship, and reverting back to simplex mode in the event of a CF or CF link failure. Assuming that the two copies of a duplexed structure are failure-isolated from each other, the use of System-Managed CF Structure Duplexing for a structure allows you to place a structure in the same failure domain as one or more of its connectors, but still be protected from a single failure that would otherwise cause a data sharing group restart. However, while System-Managed CF Structure Duplexing provides substantial availability benefits, it can also have a significant impact on the response time for update requests to the duplexed structures. For more information about System-Managed CF Structure Duplexing, and to help determine whether this function is appropriate for your environment, refer to the white paper entitled System-Managed CF Structure Duplexing, available on the Web at: http://www.ibm.com/servers/eserver/zseries/library/techpapers/pdf/gm130103.pdf

Testing The types of rebuild supported by CF exploiters, and whether they support duplexing, can change over time. The best way to ensure that the structures you use can successfully recover from a CF failure is to actually test failure scenarios in your own environment.

Chapter 2. Hardware 33

We recommend that your test sysplex has workloads to exercise all the structures you use, and that you create system, CF link, and CF failures to ensure that the exploiters can recover successfully.

CF capacity for recovery If you experience a CF failure, the normal recovery process is to move all the work from the failed CF to the other CF(s) in the sysplex.

“Good” CF response times on current hardware would be in the low teens or single digits of microseconds. As response times increase, systems will slow down and the coupling cost on all connected systems increases. If response times deteriorate too much, sysplex disruptions can occur.

This is especially the case during recovery from some sort of a failure. In the event of a CF failure, the alternate CF(s) must have sufficient capacity (MIPS, memory, and CF Link bandwidth) to not only take over the workload of the failed CF, but also the additional processing that results from all the systems communicating to coordinate recovery from the failure. Ensure that your configuration has sufficient capacity to be able to withstand the loss of capacity that results when a CF fails.

CF CPU capacity Figure 2-4 shows the impact of CF CPU utilization on response times. This chart is based on standard queueing theory; experience shows that CFs actually react even more dramatically to very high CF CPU utilization.

Impact of CF CPU Busy

2500

2000

1500 Response Time 1000

500 Response (mics) Time 0 10 20 30 40 50 60 70 80 90 99 CPU Busy

Figure 2-4 Impact of CF CPU utilization on CF response times

Based on this information, you can see that as CF CPU utilization exceeds 90 percent, the response time increases dramatically. If you have two CFs and run each CF at around 50 percent busy, when you lose one CF, the remaining CF must take on the workload of the failed CF, pushing its utilization close to 100 percent. In addition to this, you need to allow for the additional processing that takes place during recovery. You can see that CF response times can become so elongated that disruptions may occur.

34 Achieving the Highest Levels of Parallel Sysplex Availability

If high availability is really important to you, it would be safer to keep your normal CF CPU utilization below 35 percent. This will give you the capacity to handle the workload of both CFs, and still have sufficient spare capacity to handle the recovery spike without getting into the area where response times become unacceptable.

One option for providing the additional MIPS required during a failover scenario is to use the Dynamic ICF Expansion capability. This feature lets you define a so-called “L-shaped” CF LPAR. That is, you can define a CF LPAR that has a dedicated engine plus a portion of a shared engine. In normal processing, the dedicated engine will pick up most of the incoming requests, ensuring good average response times. As the utilization of the dedicated engine increases, as in a failover scenario, the shared engine starts to pick up more of the incoming requests, thereby providing capacity on top of that available through the dedicated engine. This can help you ensure that you are able to continue to deliver acceptable response times in a failover situation, without having to have that amount of capacity constantly dedicated to the production CF. If you decide to use Dynamic ICF Expansion to provide the capacity you require for failover of a production CF, it is very important that the CF LPAR is defined with a high weight relative to the LPAR(s) it will be sharing that engine with. If the production CF does not have enough weight, it is possible for that LPAR to lose access to an engine in the middle of processing a request—in some cases, this can happen while that engine is holding a CF latch, resulting in delays to the dedicated engine as well.

CF memory In order to be able to successfully take over the work from a failed CF, it is necessary for the remaining CF to have enough spare memory (also known as “white space”). Before the advent of duplexing, it was quite easy to check this; basically, you made sure that each CF was not using more than 50 percent of its storage—as long as utilization was below 50 percent, you knew that you had enough capacity for each CF to hold all the allocated structures.

With the advent of duplexing, this planning becomes a little more complex. Figure 2-5 shows an example of how you would map out your structures to allow for duplexed structures.

CF01 CF02 CF01 Structure Stor Structure Stor Structure Stor GBP1 260 GBP1 ' 260 CF02 fails... GBP1 260 GBP2' 150 GBP2 150 GBP2 150 Lock 32 Lock' 32 Lock 32 CICS Log' 102 CICS Log 102 CICS Log 102 XCF1 32 XCF1 32 XCF2 48 XCF2 48 GRS 64 GRS 64 Total 640 Total 592 Total 688

Figure 2-5 Planning for CF storage to handle CF failures

In this example, CF01 contains duplex copies of structures GBP2 and CICS Log, and CF02 contains duplex copies of structures GBP1 and Lock. If either of these CFs were to fail, the total storage requirement for structures would reduce by the amount of storage consumed by these structures. So, in the example, we see that CF01 normally requires 640 MB for its structures, and CF02 normally requires 592 MB for its structures. If either CF were to fail, the

Chapter 2. Hardware 35 total storage requirement would only be 688 MB, and not 1232 MB (640+592) as you might initially think.

If you are running CFs with a CF level lower than CF level 12, you must also consider the control storage requirement. In 31-bit CFs (CF level 11 and lower), you are limited to 2 GB of control storage, and a certain amount of each structure must reside in control storage. While it is possible to determine how much control storage is being used in each CF (using the D CF command), the control storage requirement of each structure is not reported anywhere.

If you issue the D CF command for each of your CFs and find that the total control storage use for all CFs is less than 2 GB, then you do not have to worry. On the other hand, if the total control storage use exceeds 2 GB, there is a possibility that you may not be able to rebuild all CFs in case of a failure.

If you have some structures duplexed, the control storage associated with the duplex copy of the structure will not be needed after a CF failure. The best way to determine your total control storage requirement in this situation is to use an expanded version of the table shown in Figure 2-5 on page 35, to help you identify which structures will exist after the failure, and the amount of control storage you will require for each. Use the following formulas to calculate the amount of control storage: – If the structure is a Lock structure, the whole structure must reside in control storage. – For other structures, multiply the total number of LST and Directory entries by 200 bytes and add 2 MB. The number of LST and Directory entries can be found in the column entitled LST/DIR ENTRIES TOT/CUR in the RMF™ Structure Summary report. – Add up the control storage requirement for each structure and ensure the total comes to less than 2 GB. If it is more than 2 GB, you will need to identify some structures that will not be rebuilt in case of a failure—those structures would then be defined with just a single CF in their PREFLIST.

CF Links “Which types of Coupling Facility links” on page 26 discusses what type and how many CF links to have. In order to ensure acceptable performance following a failover, you should use the same methodology to ensure that your CF links will not be over-utilized. While it is normal to expect slightly elongated response times when in failover mode, you must still ensure that the links will not be so overwhelmed that response times become unacceptable. Check RMF reports: A good indication is the current level of subchannel and path busy as reported by RMF. If you already have busy counts that approach the recommended thresholds, it is unlikely that you will be able to deliver acceptable response times from those links when the loads are doubled.

2.3.4 How many CFs As a general rule, you should always have at least two CFs. There are a small number of structures whose loss does not have a significant impact on the sysplex (OPERLOG and LOGREC are two examples), but in most cases, the loss of a structure will have some negative or disabling effect on the connected systems. As a result, it is vital that there is always an alternative CF available for a structure to rebuild into in the case of a planned or unplanned CF outage. This is why we recommend that every Parallel Sysplex has at least two CFs, even if you are only exploiting the Resource Sharing capability of the sysplex. Some larger customers, especially those doing extensive exploitation of data sharing and those that have very high availability requirements are now starting to deploy three CFs in

36 Achieving the Highest Levels of Parallel Sysplex Availability their production Parallel Sysplexes. This provides greater capacity to cope with unexpected workload spikes, and also ensures that there is still no single point of failure, even if one CF is unavailable for some reason. Whatever number of CFs you have, make sure that all of them are specified on the PREFLIST for all critical structures.

2.3.5 Coupling Facility Control Code Level considerations The level of Coupling Facility Control Code (CFCC) running in your CF determines the functions that CF supports. An example would be CF level 9, which is a prerequisite for MQ queue sharing in the CF.

Prior to z990, CF level upgrades were always disruptive. Therefore: Make sure to plan all CF level upgrades well in advance. In fact, it is a good idea to upgrade to the latest CF level available on your CPCs as they come available and as you get the opportunity to do a Power-On-Reset. This ensures that should a need arise for the functionality in that CF level, there is a good chance that it will already be installed and available.

When planning for CF level changes, remember that new CF levels generally result in more space in each structure being used for CF control information, meaning that the amount of space available for application use within the structure will decrease unless you adjust your CFRM policy accordingly. The Hint and Tip entitled Determining Structure Size Impact of CF Level Changes on the Redbooks Web site contains a recommended procedure for adjusting structure sizes when upgrading to a new CF level.

Figure 2-6 on page 38 contains a summary of the CFCC functions.

Chapter 2. Hardware 37

Figure 2-6 CF level supported by each processor

CF level co-existence IBM does not have a formal coexistence policy for CF levels. It is possible to run any combination of supported CF levels in the same sysplex. However, there are considerations should you plan to run in that configuration for more than a few weeks. One of these is that you cannot exploit any of the new features of the higher CF level. If you were to exploit any of these features, it would not be possible to rebuild the associated structures in the lower level CF. Also, the differing storage requirements of different CF levels complicates CF storage management and increases the risk of structures being allocated with inappropriate amounts of storage.

2.3.6 CF maintenance procedures Some customers maintain three sets of CFRM policies: One that contains both CFs, one that contains just one CF, and a third that contains just the other. To empty a CF, they switch to the policy containing just the CF that will remain in service, and then rebuild all the structures that now have a POLICY CHANGE PENDING status.

Another method is to simply issue a SETXCF START,RB,CFNM=cfname,LOC=OTHER for the CF that is to be emptied. If you do this, you will need to move any XCF structures out of the CF with individual SETXCF START,RB commands.

Once the work has been completed and the CF LPAR is activated and available to all systems, you should always use the SETXCF START,RB,POPCF=cfname command to repopulate the CF.

38 Achieving the Highest Levels of Parallel Sysplex Availability Our preferred method for preparing CFs for service, and then bringing them back into production after the change, is to use the CF management capabilities in System Automation for OS/390 Version 2 (if you are licensed for that product) or in msys for Operations, which is included with z/OS as of z/OS 1.2.

This provides a much easier to use interface and does everything from moving the structures, through to deactivating the CF LPAR, and then bringing everything back to normal after service is complete.

In any case: Maintain a well-documented and understood procedure in place for emptying the CF in question and then re-populating it following the changes, as it may be necessary from time to time to take a CF offline to implement hardware or microcode changes. In either case, before you hand over the CF to the service representative, you should check to ensure that the CF is in fact empty. Once you are happy that it is empty, configure off the CF links from all systems to the affected CF, then deactivate that CF LPAR on the HMC.

2.3.7 CF volatility When we talk about CF volatility, we are actually referring to the capability to keep the contents of the CF’s memory intact across a power failure. The volatility of a CF is important to some CF exploiters, and is also important from an overall recovery point of view. Some exploiters, like JES2 and System Logger, prefer to use a non-volatile CF, and may behave differently if forced to use a volatile one.

You have a number of options relating to CF volatility:  One is to order a CF with a built-in battery backup unit (referred to as the Internal Battery Feature (IBF). Except for the z800, all IBM 9672 and zSeries CPCs have this option.  Another option is to provide an external UPS, in which case you must use the MODE NONVOLATILE CFCC command to tell the CF that it has a UPS and is therefore nonvolatile. If possible, we recommend having both a UPS and an IBF. The reason for this is simple—the life of these devices diminishes over time, and neither are as reliable as typical mainframe equipment, so having both is simply insurance that you will be able to maintain the CF storage should you have a power failure and one of the IBFs or UPSs fail. If you lose just one CF because of a power failure, the contents would normally be automatically rebuilt in the alternate CF. In this case, processing continues with the alternate CF, and there is really no benefit from the fact that the contents of the failed CF are still intact. However, if there is a power failure that impacts both CFs, the contents of both CFs will be intact when power is restored. This can have a significant benefit when restarting subsystems that had data in the CF. For example, when restarting DB2, if all the locks are still intact in the CF, it is not necessary to read through the DB2 logs to recreate this information, thus enabling DB2 to restart in significantly less time. Depending on the zSeries model, you can use (or not) a specific feature, called POWERSAVE:

– On CPCs prior to the z990, there was a capability called POWERSAVE. If you select this option using the MODE POWERSAVE CFCC command, the CF would use the UPS or IBF power to keep the whole CF running for the RIDEOUT time (which defaults to 10 seconds), after which all power would be diverted to maintain the CF memory. Once the RIDEOUT interval expires, the CF will appear to be unavailable to any system that tries to access it.

Chapter 2. Hardware 39 – On z990 CPCs, POWERSAVE is not available, meaning that the whole CF will remain powered up until the battery runs out, or external power is restored. As a result, the amount of time that an IBF will maintain your CF on a z990 is less than on a previous generation CPC.

2.3.8 Nondisruptive Coupling Facilities hardware upgrades The CFCC code includes commands giving you the ability to vary CPs and CF links online and offline. Whether you can nondisruptively upgrade the CPC containing the CF and start using that new hardware depends on whether the CPC supports the dynamic upgrade ability, and possibly on whether there is an operating system running on the CPC that can drive the dynamic reconfiguration process. Table 2-3 summarizes CF dynamic upgrade capabilities.

Table 2-3 CF Dynamic upgrade capabilities Resource CF shares CPC with OS CPC only contains CF that can drive dynamic LPARs reconfiguration

ICF or CP If RESERVED ICFs are If RESERVED ICFs are defined in the PR/SM™ defined in the PR/SM profile profile for the CF LPAR, you for the CF LPAR, you can can bring the new engines bring the new engines online online using CP xx ONLINE using the CP xx ONLINE CF CF command. command.

CF Link (note 1) Activate new IOCDS using Do POR using new IOCDS. OS that supports dynamic When CF LPAR comes up it reconfiguration. Then issue will have new links available. the CONFIGURE chpid ONLINE CF command to vary the new link online.

Memory Change LPAR Profile, Change LPAR Profile, deactivate, reactivate CF deactivate, reactivate CF LPAR. LPAR.

Note 1: The ability to dynamically add CF links is available on all z990s and z890s. It is also available on z800 and z900 at current service levels. CF Links cannot be dynamically added to a 9672.

Adding CPs nondisruptively To add engines to a CF without restarting the CF, you must define RESERVED CPs in advance in the LPAR profile for the CF. If this has been done, and you add an appropriate type of PU (CP or ICF) to the CPC, the CP cp_addr ONLINE CF command can be used to bring the additional engine online.

Adding CF links nondisruptively Adding CF links requires that the zSeries CPC the CF is running on also contains an MVS-based LPAR. The dynamic reconfiguration process must be driven by an MVS-based system. Once the new configuration has been activated, and assuming the CF LPAR has been placed in the access list for the new links, they can be added to the CF LPAR nondisruptively using the CONFIGURE chpid ONLINE CFCC command.

CF storage It is not possible to add storage to a CF LPAR without deactivating and reactivating the CF LPAR.

40 Achieving the Highest Levels of Parallel Sysplex Availability 2.4 9037 Sysplex Timers considerations

The 9037 provides the synchronization for the Time-of-Day (TOD) clocks of multiple CPCs, and thereby allows events started by different CPCs to be properly sequenced in time. When multiple CPCs update the same database and database recovery is necessary, all updates are required to be time-stamped in proper sequence.

In a sysplex environment, the allowable differences between TOD clocks in different CPCs is limited to the inter-CPC signalling time, which is very small. Some environments require that TOD clocks be accurately set to an international time standard. The Sysplex Timer and the Sysplex Timer attachment feature enable these requirements to be met by providing an accurate clock-setting process, a common clock-stepping signal, and an optional capability for attaching to an external time source.

It is a requirement that all systems in a multi-CPC sysplex are connected to the same set of Sysplex Timers. If one of these systems loses access to the signals from both timers, it will go into either a disabled or an enabled wait state. It is for this reason that the availability of the Sysplex Timers are so important in a sysplex. To minimize the risk of having no functioning Sysplex Timer, you should always have two Sysplex Timers. In addition, if your sysplex is spread over two sites, you should place one timer in each site.

2.4.1 Sysplex Timer® Models The 9037-001 Sysplex Timer ran out of service on December 31, 2003, and therefore will not be discussed further in this document. Any customers still using 9037-001 timers should plan on upgrading to 9037-002s as soon as possible. Note that the upgrade is disruptive—you should only plan on being able to keep one CPC running during the migration. An updated migration procedure entitled “Migration Planning for the IBM 9037 Model 2 Sysplex Timer” is available on the Web at: http://www.redbooks.ibm.com/redpapers/pdfs/redp3666.pdf The recommended configuration in a Parallel Sysplex environment is the 9037 Expanded Availability configuration. This configuration is fault-tolerant to single points of failure and minimizes the possibility that a failure can cause a loss of time synchronization information to the attached CPCs. In an Expanded Availability configuration, the TOD clocks in the two 9037s are synchronized using the hardware on the Control Link Oscillator (CLO) card and the CLO links between the 9037s. Both 9037s simultaneously transmit the same time synchronization information to all attached CPCs. The connections between 9037 units are duplicated to provide redundancy, and critical information is exchanged between the two 9037s every 1.048576 seconds, so if one of the 9037 units fails, the other will continue transmitting to the attached CPCs. Figure 2-7 on page 42 contains an example of an Expanded Availability Configuration. Note that this example only supports a maximum of 3 kms between Sysplex Timers. For greater distances, some form of repeater or DWDM must be used.

Chapter 2. Hardware 41

Active console Standby console

Preferred Standby ETS Token-Ring Token-Ring ETS

Fiber (3 Km max) 9037 - 2 9037 - 2 "remote" RS232 RS232 Fiber (3 Km max) 24 Ports 24 Ports

Fiber (3 Km max)

Fiber (3 Km max)

Fiber (3 Km max) 0 1 0 1 Attachment Feature Attachment Feature

Figure 2-7 Sysplex Timer Expanded Availability configuration

Redundant fiber optic cables are used to connect each 9037 to the Sysplex Timer attachment feature on each CPC. Each Sysplex Timer attachment feature consists of two ports, the active or stepping port, and the alternate port. If the CPC hardware detects the stepping port to be non-operational, it forces an automatic port switch over to the alternate port so that the TOD now steps to signals received from that port. This switchover takes place without disrupting CPC operation. Note that the 9037s do not switch over, and are unaware of the port change at the CPC end. For an effective fault-tolerant Expanded Availability configuration, Port 0 and Port 1 of the Sysplex Timer attachment feature in each CPC must be connected to different 9037 units.

There are three options for configuring 9037-002s attached to each other over distances greater than the 3 meters that was supported with 9037-001s:  For distances up to 3 km, the 9037-002s can be connected to each other directly using either 62.5/125-micrometer or 50/125-micrometer multimode fiber, and without requiring any additional hardware or RPQs.  For distances greater than 3 km and less than 26 km, you have two choices: – You can use 9036-003 repeaters, order able via RPQ 8K1919. Two RPQ 8K1919s are required for each link (remember that there are two links connecting the 9037-002s, so you will need four RPQs to connect the two 9037-002s). This RPQ provides one

9036-003 repeater. Note that the length of each multimode cable cannot exceed 3 km, and the distance between the repeaters cannot exceed 20 km. – You can use an IBM-qualified Dense Wavelength Division Multiplexer (DWDM). Note that the length of each multimode cable cannot exceed 3 km and the total fibre distance between the 9037-002s cannot exceed 26 km.

42 Achieving the Highest Levels of Parallel Sysplex Availability  For distances greater than 26 km and less than 100 km, you must use an IBM-qualified DWDM. You must order RPQ 8P1955. The length of each multimode cable cannot exceed 3 km. More importantly, even though the maximum supported distance between a Sysplex Timer and a connected CPC is 100 km, the maximum distance between the two Sysplex Timers is still 40 km. Figure 2-8 shows a sample extended distance sysplex configuration.

Max distances:

Timer to Timer 40 km Timer to CPC 100km CPC to CF 100km

LPARA 12 LPAR1 11 1 D 10 2 LPAR1 9 3 8 4 W 7 5 6 D Max 40km Max 100km M D D D

12 LPAR2 W 11 1 W W LPAR2 LPAR1 10 2 9 3 LPAR2 D 8 4 D D 7 5 M 6 M M

D D LPAR3LPAR2 W W LPAR3 D Amplifier D LPAR3 M M LPAR3

Max 100km

ETR link CLO link CF link Figure 2-8 Extended distance sysplex support

2.4.2 Recovering from loss of all timer signals If a CPC loses access to the Sysplex Timer signals on one of its two ETR ports, all OS/390 and z/OS systems on that CPC will issue message IEA262I, informing you that one of the Sysplex Timer signals has been lost. It is very important that this message is trapped and acted upon to get the signal restored as quickly as possible.

If the second Sysplex Timersignal is lost, message IEA015A is issued (assuming APAR OW44231 is applied or integrated into your system), giving you the option to restore the signal and let the system continue processing, or, if you are unable to restore the signal in a timely manner, to place the system in a disabled wait. It is important that operators understand the importance of this message and take action immediately. See WSC FLASH W10041 for more details about the changes introduced by this APAR. You should consider implementing some automation to monitor for this message and send an e-mail or an alert to at least two responsible people.

2.4.3 Maximizing 9037 availability The 9037-002 has two power supplies and dual power cords. You should ensure that there are as few single points of failure as possible between the power sources used for the two power cords. Also ensure that both are connected to a UPS to protect the device from power failures and glitches.

Chapter 2. Hardware 43 The 9037 has its own console. It also supports a standby console that can take over in case of a failure in the active console. We recommend that the same PCs be used for the Sysplex Timer console and the ESCON Director console and that one console is located close to the Timers, and another console be placed in the operations area, close to the operators. In normal operation, the active console should be in the operations area and so the operators can see it. However, if a service representative is working on a Timer, the alternate console beside the Timer can be activated at that time. In addition, even though it is possible to run the console application on the HMC, we recommend using dedicated PCs for the consoles, rather than merging the function onto the HMCs. If possible, place the two 9037s in locations that are isolated from each other, so an environmental problem (flooding or fire) will not result in both Sysplex Timers being unavailable. If fiber optic extenders or DWDMs are being used, ensure that the power source of at least one of the extender pairs (same CLO link) is different than the power source of the 9037 in the primary data center. If you have more than two Sysplex Timers, ensure that each ETR network has a unique network ID. Physically separate the routing of the Control Link Oscillator (CLO link) and Sysplex Timer port cables. We recommend that the 9037s should be set to GMT (not local time) and all time changes (for Daylight Savings Time) are implemented using the Sysplex Timer time zone facility. Your IBM representative can access a suggested procedure for implementing the time changes at the following Web site: http://nascpok.pok.ibm.com/miscellaneous/misc.htm

2.4.4 Message time ordering As processor and CF and CF link technologies have improved over the years, we are now approaching the point where it will be possible to exchange signals between CPCs in less time than the maximum time difference that can exist in an ETR network. To help ensure that any exchanges of time-stamped information between systems in a sysplex involving the Coupling Facility observe the correct time ordering, time stamps will now be included in the message-transfer protocol between sufficiently fast systems and CFs.

If the operating system and the CF it is connected to are running on a z900 Turbo (2064-2xx) or faster CPC, both the system and the CF will time stamp all messages back and forth between them. In order to be able to timestamp its messages, any CF that is running on an appropriately fast CPC now must be connected to the same Sysplex Timer as the systems it is connected to. To enable this support in z/OS, you must apply APAR OW53831 to any system running on an appropriately fast CPC. Figure 2-9 on page 45 shows the required connections when all CPCs are z900 turbos or z990s, and you have at least one CF in a stand-alone processor.

44 Achieving the Highest Levels of Parallel Sysplex Availability

Figure 2-9 Example configuration for Sysplex Timer with Message Time Ordering

2.5 Intelligent Resource Director

Intelligent Resource Director(IRD), announced on October 3, 2000, is one of the capabilities available on the IBM zSeries range of processors and delivered as part of z/OS. You could say that IRD is the next stage of a Parallel Sysplex. With Parallel Sysplex you have the ability to share data among several operating systems. But not all applications are sysplex enabled. With IRD you instead have the possibility to move resources within one processor.

IRD uses facilities in z/OS Workload Manager, Parallel Sysplex, and PR/SM. Compared to other platforms, z/OS with WLM already provides benefits from the ability to drive a processor at 100 percent, while still providing acceptable response times for your critical applications. Intelligent Resource Director amplifies this advantage by helping you make sure that all those resources are being utilized by the right workloads, even if the workloads exist in different Logical Partitions (LPARs).

Intelligent Resource Director is not actually a product or a system component; rather it is three separate but mutually supportive functions:  WLM LPAR CPU Management  Dynamic Channel-path Management (DCM)  Channel Subsystem I/O Priority Queueing (CSS IOPQ)

Intelligent Resource Director is implemented by new functions in:  z/OS (in 64-bit mode)  Workload Manager (WLM) (need to be in goal mode)  IBM zSeries 900 and later CPCs

Chapter 2. Hardware 45 2.5.1 An IRD Illustration To illustrate how the IRD works in a mixed workload environment, here is an example:

 You have three workloads running on one server: – Online transactions, your most important workload. This runs only during the day shift. – Data mining, which has a medium importance. This is always running, and will consume as much resource as you give it. – Batch, which is your lowest importance work. Like data mining, it is always running, and will consume as much resource as you give it.  In this example, the server is divided into two logical partitions (and both partitions are in the same sysplex): – Partition 1 runs both the online transactions and the batch work, as they happen to share the same database. – Partition 2 runs the data mining work.

Figure 2-10 IRD example: Day shift

Figure 2-10 shows a day shift configuration. As the online transaction workload is the most important, partition 1 is given a high enough weight to ensure that the online transaction work does not miss its goals due to CPU delay. Within the partition, the existing Workload Management function is making sure that the online transaction work is meeting its goals before giving any CPU resource to the batch work. The DASD used by the online transaction work is given enough channel bandwidth to ensure that channel path delays do not cause the work to miss its goals. The channel subsystem I/O priority ensures that online transaction I/O requests are handled first. Even though the batch work is running in partition 1 (with the

46 Achieving the Highest Levels of Parallel Sysplex Availability increased partition weight and channel bandwidth), the data mining I/O requests will still take precedence over the batch I/O requests if the data mining work is not meeting its goals.

Figure 2-11 IRD Example: Night shift

Figure 2-11 shows the night shift, when there are no more online transactions. If the partition weights had remained the same, then the batch work would be consuming most of the CPU resource, and using most of the I/O bandwidth, even though the more important data mining work may still be missing its goals. LPAR CPU Management automatically adjusts to this change in workload, adjusting the partition weights accordingly. Now the data mining work will receive the CPU resource it needs to meet its goals. Similarly, dynamic channel path management will move most of the I/O bandwidth back to the data mining work.

2.5.2 WLM LPAR CPU Management Prior to using IRD, a few options were available for controlling the distribution and allocation of CPU resources in an LPAR environment.

You had the following options:  Use shared or dedicated CPs. – A dedicated CP is only available to one LP. – A shared CP is available to up to 15 LPARs.  Use capping. Limits an LP's use of CPU resource defined in the LPAR's image profile  Operator can change weights and capping status of LPARs dynamically using the HMC.  Operator varies logical CPs online or offline to z/OS dynamically.

Chapter 2. Hardware 47 Using the CF CPU (x),ONLINE | OFFLINE command The problem with handling all of this manually is that you have to decide in advance what you need, bearing in mind total available capacity, total requirements, requirements of each LP, “spikiness” of workload, minimizing overhead, and minimizing future disruptions. Changing an LPAR from using dedicated to shared CPs requires deactivation of the LP. Capping can prevent available CP resource from being utilized. How does an operator know that an LP's weight needs to be increased, or know that an LPAR is suffering a CPU resource problem? Does an operator know the importance of a specific workload towards another workload? Does an operator have detailed knowledge of workloads and monitoring tools? Do operators have time and skills for this type of analysis?

WLM LPAR CPU Management is implemented by z/OS Workload Manager (WLM) Goal mode and IBM zSeries PR/SM LPAR scheduler Licensed Internal Code (LIC). WLM LPAR CPU Management actually consist of two parts:  WLM LPAR Weight Management – Automatically change the weight of a logical partition. – Based on analysis of the current workloads by WLM.  WLM Vary CPU Management – Automatically vary a logical CP online or offline in an LP. – Based on analysis and requests from WLM. To take advantage of WLM LPAR CPU Management, the following prerequisites have to be fulfilled: – Be running z/OS in 64-bit mode. – Be running on an IBM z/900 or later in LPAR mode. – Not be running under VM. – Be using shared CPs. – Not be capped using traditional LPAR capping. – Be in WLM Goal mode. – Have access to a Coupling Facility.

Note that from z/OS 1.2 you have the ability for WLM LPAR CPU Management to also manage the weights of Linux LPARs that are using CPs (as opposed to IFLs).

More information on prerequisites and consideration can be found on the WLM/IRD Web site: http://www-1.ibm.com/servers/eserver/zseries/zos/wlm/documents/ird/ird.html

2.5.3 Dynamic Channel-path Management (DCM) Dynamic Channel-path Management (DCM) is implemented by exploiting new and existing functions in software components (implemented in z/OS 1.1 and following z/OS versions), such as WLM, IOS, HCD, Dynamic I/O Reconfiguration; and in hardware components, such as IBM zSeries CPC, ESCON Directors, and DASD controllers. DCM provides the ability to have the system automatically manage the number of ESCON and FICON Bridge (FCV) I/O paths available to supported DASD subsystems.

DCM provides a number of benefits:  Improve overall I/O performance. – Prior to DCM, you had to manually balance your available channels across your I/O devices, trying to provide sufficient paths to handle the average load on every controller. This means that at any one time, some controllers probably have more I/O paths available than they need, while other controllers possibly have too few.

48 Achieving the Highest Levels of Parallel Sysplex Availability – DCM can provide improved performance by dynamically moving the available channel bandwidth to where it is most needed. DCM attempts to balance the responsiveness of the available channels by moving channels to the controllers that require additional bandwidth.  Simplify the I/O configuration definition task. – Prior to DCM you had to decide how many paths were required for each CU to provide acceptable performance. Decide which CUs should share channels; try to identify ones that are busy at different times. Decide which paths to use for each CU to balance utilization. Select paths that minimize points of failure. Define up to eight paths to each CU Monitor and tune on an ongoing basis. – With DCM you only need to estimate the maximum channel bandwidth required to handle the workload on the managed CUs at the peak time. Define at least two non-managed paths and the maximum number of managed paths you are likely to need for each CU.  Reduce skills required to manage z/OS. DCM is self-tuning. It should be possible to spend considerably less time on configuration management and monitoring and capacity planning.  Maximize the utilization of installed hardware. DCM can help you drive more traffic to your DASD subsystems without necessarily having to invest in additional channels. DCM may let you increase the overall average utilization of your channels, without adversely impacting the response time for the connected subsystems.  Enhance availability.  Reduce the need for more than 256 channels.

Requirements To be able to fully benefit from DCM you have to fulfill the following requirements: – IBM 2064 or later CPC in Basic or LPAR mode • If running in Basic mode, a Coupling Facility (CF) is not needed for DCM. This is because DCM only uses the CF to share information between LPARs that are in the same LPAR Cluster. As no LPAR Cluster exists on a CPC in Basic mode, there is no need for a CF. • If you are running in LPAR mode, a CF is required if you wish to use DCM in any LPAR containing a system that is a member of a multi-system sysplex, even if the LPAR is the only member of that sysplex on this CPC. – Coupling Facility (with CFCC Level 9 or higher), if in LPAR mode, with a MULTI SYSTEM sysplex • The WLM structure that resides in the CF does not have any specific failure isolation requirements, however, the CF must be using CF level 9 or higher. CF level 9 is only supported on IBM 9672-Rn6 (including 9672-R06), Yn6, Xn7, and Zn7 (that is, 9672 G5 and G6), or any model of IBM zSeries 900. It is perfectly acceptable to place the structure in the same failure domain as the connected LPARs. There are no specific considerations for a double failure; that is, a single failure taking out both the CF and one or more connected operating systems. You do not need a CF if all the LPARs using DCM are running in XCFLOCAL or Monoplex mode. Because these systems are not in a multisystem sysplex, there is no other LPAR that they will share the managed channels with, and therefore no need for the WLM structure.

Chapter 2. Hardware 49 – ESCON channels As stated previously, DCM currently only works with ESCON and FICON Bridge (FCV) channels. – ESCON Director (any model)

The managed channels must be connected to an ESCON Director.

To support DCM a DASD Control Unit needs to fully support non-synchronous I/O operations. This means that the device should never transfer data directly to or from the channel. Fully non-synchronous devices always transfer data to and from the channel from the cache. Return valid node descriptor information. The information that the control unit places in the tag field of the node descriptor is used to identify single points of failure. There is no pre-defined layout for the contents of this field, so it is possible that a given control unit may not provide information in the format expected by DCM. Check the IRD PSP subset of the 2064 DEVICE PSP bucket for information about any IBM microcode upgrades that may be required. At the present time, the following IBM devices are supported for Dynamic Channel Path Management when connected through an ESCON or FICON Bridge channel: – IBM 9393 RAMAC® Virtual Array – IBM 2105 Enterprise Storage Server®

For non-IBM devices, please contact the device manufacturer to determine if the device is supported by Dynamic Channel Path Management.

For the latest news on prerequisites and requirements go to the WLM/IRD Web site at: http://www-1.ibm.com/servers/eserver/zseries/zos/wlm/documents/ird/ird.html

2.5.4 Channel Subsystem I/O Priority Queueing Channel Subsystem Priority Queuing is an extension of the existing concept of I/O priority queuing. Previously, I/O requests were handled by the channel subsystem on a first-in, first-out basis. This could at times cause high priority work to be delayed behind low priority work. With Channel Subsystem Priority Queuing, if important work is missing its goals due to I/O contention on channels shared with other work, it will be given a higher channel subsystem I/O priority than the less important work. This function goes hand in hand with the Dynamic Channel Path Management described above—as additional channel paths are moved to control units to help an important workload meet goals, Channel Subsystem Priority.

Queueing ensures that the important workload receives the additional bandwidth before less important workloads that happen to be using the same channel. WLM sets the priorities using the following basic scheme:  System-related work is given the highest priority.  High-importance work missing goals is given a higher priority than other work.  Work meeting goals is managed so that light I/O users will have a higher priority than heavy I/O users.

 Discretionary work is given the lowest priority in the system. Channel Subsystem Priority Queuing requires z/OS and a zSeries server in 64-bit mode. It works in both basic mode or LPAR mode, and does not require a coupling facility structure. Channel Subsystem I/O Priority Queueing must be enabled at the CPC level. There is no requirement for a CF, regardless of the mode. All device types and channel types are supported. Channel Subsystem Priority Queuing is transparent to devices.

50 Achieving the Highest Levels of Parallel Sysplex Availability For further discussion, planning, and implementation of IRD refer to z/OS Intelligent Resource Director, SG24-5952. The latest information regarding prerequisites and considerations can be found on the WLM/IRD Web site: http://www-1.ibm.com/servers/eserver/zseries/zos/wlm/documents/ird/ird.html

2.6 Switches

Depending on the type of channels in your configuration, you will require different kinds of switches.  For ESCON channels you can use the IBM 9032 ESCON Director.  For FICON channels, IBM markets different models. For FICON channels, two types of switch are available: – Those that provide a Control Unit Port are referred to as FICON Directors. – Those that do not provide a Control Unit Port as referred to as FICON-capable Fibre Channel Switches.

When discussing availability and the role played by the switches, you need to bear in mind the type of device you are connecting. If a device only supports a single host connection (like an old printer or a 3174, for example), the only way to ensure the highest level of availability for those devices is to have more than one of them. Having multiple switches will not help the device availability in these cases. For devices that support two host attachments, you should provide connectivity to those devices through two switches. For devices that support more than two host attachments, you should provide connectivity through at least two switches, and also exploit the high-availability features of the switch where appropriate. In this section, we concentrate on those features.

2.6.1 ESCON Directors IBM produced multiple models of ESCON Directors: 9033-001, 9032-002, 9032-003, 9032-004, and 9032-005. The only model still being marketed is the Model 5, so that is the one we concentrate on. Detailed information about the 9032-005 can be found in the IBM ESCON Director 9032-5 Presentation, SG24-2005.

ESCON Director availability features One of the optional features that can be ordered on a 9032-005 is the Enhanced Availability Feature. This feature consists of three parts:  Control Processor (CTP) Enhanced Availability Feature The CTP Enhanced Availability Feature provides redundancy for the CTP card function, allowing concurrent replacement of a CTP card and concurrent LIC changes. LIC updates are disruptive if the CTP Enhanced Availability Feature is not installed.  Matrix Controller/Matrix Switch (MXC/MXS) Enhanced Availability Feature The MXC/MXS Enhanced Availability Feature provides redundancy for the MXC/MXS card function and allows for concurrent replacement of MXC and MXS cards.  Token Ring (TKRG) Enhanced Availability Feature The TKRG Enhanced Availability Feature provides redundancy for the TKRG card function. While the TKRG card can be replaced nondisruptively even if this feature is not

Chapter 2. Hardware 51 installed, it would not be possible for the Director to communicate with the ESCON Director console until the card is replaced. This feature provides a second token ring connection. The ESCON Director provides the ability to swap a connection from one port to another, without requiring any changes in HCD. This function (known as port swapping) can be invoked from the ESCON Director console and requires that you have an unused port available that you can swap the connection to. If you have a fully configured ESCON Director (that is, one that already has 248 ports), we recommend installing the spare port feature. This feature provides four additional ESCON ports that can be used for port swapping in the event of port failures or for problem determination. This eliminates potential difficulty in trying to obtain an unused ESCON port if problems arise. The spare ports are not available to provide additional connectivity; they can only be used as substitutes for existing ESCON port addresses. Even with all of these features installed, there are still some parts that cannot be replaced while the ESCON Director is operational. So, you should always have at least two ESCON Directors to be able to maintain connectivity to your devices in case of a planned or unplanned outage of an entire Director.

Configuring for resiliency In addition to the things you can do to improve the availability of the Director itself, there are also things you can do to ensure that the failure of a Director does not result in complete loss of access to one or more devices. We recommend that the paths from a given system to any multi-path device should be spread across at least two Directors. This ensures that if you lose a Director, you at least maintain connectivity. In addition, the more Directors you can spread the paths across, the smaller will be the impact on each device (in terms of reduced bandwidth) should you lose a Director. Within a Director, we recommend that you spread the connections from a given system or a given control unit across different Director Device Port (DVP) cards. This ensures that the loss of a DVP card will not remove all paths to a device, and will have minimal impact in terms of lost bandwidth. You should also attempt to spread connections from a given system or a given control unit across multiple quadrants in the Director. The Director port cards are divided into quadrants, with each quadrant representing a potential failure boundary. It is possible to lose an entire quadrant, so spreading the connections across multiple quadrants once again limits the impact of a quadrant failure. Plan the Director configurations for future expansion. If the installation of a new device requires the installation of new DVPs, you could end up with all the paths to the new CU/device on a single DVP or a single quadrant.

Managing the Directors We highly recommend that you define a Control Unit Port for each Director in your configuration. There are two management points for an ESCON Director—one is the ESCON Director console, and the other is the z/OS system(s) it is connected to. Any errors in the Director will be reported on the Director console. However, if a Control Unit Port is defined in HCD for the Director, all internal director problems will also be reported back to the operating system. We recommend having at least two PCs that contain the ESCON Director console application. It is not necessary to have two per ESCON Director—one PC can act as the console for up to 16 ESCON Directors.

52 Achieving the Highest Levels of Parallel Sysplex Availability

We also recommend that you do not use the HMC as the ESCON Director console. We do recommend using the same PC to act as the console for both the ESCON Directors and the Sysplex Timers. There should be one console close to the Directors, for the service representative to use while working on the Directors. There should also be a console in the operator’s bridge area, so the operators can watch for errors or alerts.

The ESCON Director console is not required for continuous operation of the Directors, nor is it required to recover after a Director is powered off/on. One matrix configuration file, the IPL file, is stored in the CTP card. If defaults are accepted (Active = Saved option selected), this file will be the most recently activated matrix configuration. After power off/on, the Director operation will be restored and the matrix configuration stored in the CTP card will be active. We recommend that the Director console should always be available in order to: – Maintain copies of matrix configuration files. – Provide error reporting and log functions. – Provide current status of the entire Director network.

The console is not required for activating matrix configuration changes from the host using the I/O Operations component of System Automation for OS/390 (SA/390). The use of I/O Operations reduces the requirement for physical access to the ESCON Director console, but it does not remove the requirement for the console to be available. The console is required if matrix configuration changes are to be saved to the HDD in the console PC. For maximum availability, a backup console should be maintained if the Directors provide access to critical system or application data. There are two options for configuring the backup console: – Maintain a PC with all current configuration files but without attachment to the LAN. This PC may have the same IP address as the primary Director Console. In the event of a failure of the primary console, its LAN connection can be transferred to the secondary console. This is referred to as a replacement console. – Maintain a PC with all current configuration files and LAN attachment. An additional unique IP address will be required for this PC. The ESCON Director Console Application on the backup PC must not be active. In the event of a failure of the primary console, the ESCON Director Console Application can be started on the secondary console. This is referred to as a backup console. For normal operations and management, we recommend the use of I/O Operations rather than the Director console.

2.6.2 FICON Switches At the time of writing, IBM markets multiple FICON switches: The Inrange FC9000, the McDATA Intrepid, and the IBM 2109 Model 12.

The FC/9000 and Intrepid features N+1 redundancy for all of its elements. This redundancy results in high availability. Even in the case of failure of any internal elements, the FICON Switch remains operational. All activities, including code loads, code-activation, and replacement of failed parts can be done nondisruptively. In the event of a failure you shall always configure at least two FICON switches and

spread the channels between the two directors. Before installing the FICON switch, consider where to connect your FICON channels from the server and the control unit ports, as well as the ISLs (if applicable), based on your requirements. – Distribute the channels among different cards.

Chapter 2. Hardware 53 If two channels are defined to access the same CU, plug both fiber optic cables into cards in different switches. – Distribute the CU ports among different cards. If two paths are defined to attach the CU to the server via a switch, connect both fiber optic cables to ports in cards in different switches. – Distribute the ISLs across different cards. If two or more ISLs are to be attached between the switches, then connect the fiber optic cables to ports as per the switch vendors recommendations. Following these simple rules will assure that there is always one path available between the server and the CU in case of a defective card in the switch. It is recommended to have two separate, independent, and uninterruptable power sources and circuit brakers for the switches.

A FICON channel in FICON native (FC) mode allows access to FICON native interface control units, either directly by a FICON channel in FC mode (point-to-point), or from a FICON channel in FC mode connected in series through one or two Fibre Channel switches (FICON Switches). Channel-to-channel (FCTC) is also supported in this mode.

A Cascaded FICON Director Though the IBM 9672 G5/G6 processors only support a single switch topology, known as switched point-to-point, the zSeries processors support single and dual switch topologies. A two-switch configuration is known as cascaded FICON Switches. The cascaded FICON Director configuration is supported by the zSeries (z800, z890, z900, and z990) servers only. In a cascaded environment, both FICON Directors must be from the same vendor, and be at the same firmware level.

In a cascaded FICON Director connection:  At least three Fibre Channel (FC) links are needed in the channel-control unit path. One is between the FICON channel card (N_Port) and the FICON Switch port (F_Port), then internally within the switch (through the backplane) to another port (E_Port) that connects to the second FICON Switch E_Port, via the second FC link, and then to a FICON adapter card in the control unit (N_Port) via the third FC link. With this configuration, the connection between sites can consist of multiple FC links. A sample configuration of cascaded FICON Switches is shown in Figure 4-6.  Multiple channel images and multiple control unit images can share resources of the Fibre Channel link and Fibre Channel switches, such that multiplexed I/O operations can be performed.  Channels and control unit links can be attached to the Fibre Channel switches in any combination, depending on configuration requirements and on available switch ports. Sharing a control unit through a Fibre Channel switch means that communication from a number of channels to the control unit can take place either over one switch-to-CU link (in the case where a control unit has only one link to the Fibre Channel switch), or over multiple link interfaces in the case where a control unit has more than one link to the Fibre Channel switch. Just one Fibre Channel link is attached to a FICON channel in a cascaded FICON Director configuration. However, from the FC switch (FICON Director), the FICON channel can communicate with a number of FICON CUs on different ports of the second FC switch. Once at the control unit, the same control unit and device addressing capability exists as for a point-to-point configuration. However, the communication and addressing capability is greatly increased for the channel when connected to a FC switch with the ability to use the domain and port address portion of the 24-bit N_Port address (8 bits for the domain and 8 bits for the port) to access multiple

54 Achieving the Highest Levels of Parallel Sysplex Availability control units. Note the domain address portion of the FC 24-bit port address is different since there are two FC switches in the channel-to-control unit path.

Figure 2-12 Cascaded FICON Director configuration

One benefit of having a cascade configuration would be that you can reduce the number of fibers between the two different locations. You need to have at least four FICON Directors in such an environment to achieve a reasonable level of availability.

For detailed information about the various FICON switches, refer to the following IBM Redbooks:  Getting Started with the INRANGE FC/9000 Ficon Director, SG24-6858  Getting Started with the IBM 2109 M12 FICON Director, SG24-6089  Getting Started with the McDATA Intrepid Ficon Director, SG24-6857

2.7 DASD

As with all the other hardware components in a sysplex configuration, when looking at our DASD, we have two objectives from an availability perspective:  We want to create a configuration that will survive the loss of any single component within the subsystem.  We want to configure so that we can survive, or at least recover from, a complete loss of the whole subsystem.

When choosing which DASD an application should use, there are a number of things to consider. We have to select a device that meets the availability and performance requirements of the application. The most important devices within the sysplex are probably the system and sysplex volumes—if the sysplex is not up, there is little benefit in having the application volumes available. The following lists availability-related features and capabilities

Chapter 2. Hardware 55 of modern DASD subsystems. Some of the features are standard, some are automatically enabled if present, some are optional, and some are not available on all DASD types:  Independent dual power feeds  N+1 power supply technology/hot swappable power supplies, fans

 N+1 cooling  Battery backup  Non-Volatile Subsystem cache, to protect writes that have not been hardened to DASD yet  Nondisruptive maintenance  Concurrent LIC activation  Concurrent repair and replace actions  RAID architecture  Redundant microprocessors and data paths  Concurrent upgrade support (that is, ability to add disks while subsystem is online)  Redundant shared memory  Spare disk drives  FlashCopy/SnapShot/Concurrent copy  Remote Copy (PPRC and/or XRC)  HyperSwap™ support (GDPS/PPRC function)

With all of the available features above in mind, we still need to make the configuration fault tolerant and ensure that we have continuous availability for our applications that use the device. Following are some recommendations. Establish multiple/redundant channel paths to all critical devices to eliminate single points of failure. For redundant channel paths, ensure the following: – Configure paths through different ESCON/FICON Directors. – When using EMIF to share channels between LPs, remember that this means that the failure of a single channel can now impact a number of LPs instead of just one. – Ensure that I/O connections for redundant paths are not connected to the same I/O driver card or board at the control unit. – The number of paths you should configure depends on the device capabilities and the performance requirements of the users of that device. Regardless of capacity considerations, however, you should always configure at least two paths to every device. In a high-availability environment we strongly recommend implementing some kind of DASD mirroring such as PPRC, XRC, or a non-IBM equivalent. Note that at the time of writing, both HDS and EMC have licensed the PPRC and XRC architectures from IBM, so it should be possible to use a common remote copy mechanism across all your IBM, EMC, and HDS DASD, assuming the installed devices provide the required support.

We are briefly going to describe the PPRC/XRC concept and implementation on Enterprise Storage Server (ESS). For a description of PPRC/XRC implementation on other DASDs than ESS, consult the appropriate documentation. Before deciding what technique to use you need to decide what the business needs are for your applications. IBM remote copy is designed to shadow or mirror data from one site to another in either an asynchronous or synchronous fashion. In the event of a disaster

56 Achieving the Highest Levels of Parallel Sysplex Availability that renders the application or primary site unusable, the recovery or secondary site can use the remote data to take over the workload. When the site containing the primary DASD is repaired, the data can be resyncronization and the workload can be moved back. This continuous copying of data from one location disk to another is achieved through remote copy. Quite often you will hear the terms Recovery Time Objective (RTO) and

Recovery Point Objective (RPO) when discussing a customer’s disaster recovery implementation. – The Recovery Time Objective is the amount of time your business can afford to function without access to its applications. This determines how much time is available, following a disaster, to get all the applications up and running again. – The Recovery Point Objective is the amount of data that you can afford to recreate or lose as a result of a disaster. If your RPO is zero data loss, a synchronous remote copy solution is generally the only option. If your RPO is a few minutes, and there is a significant distance between your sites, XRC may be an appropriate solution.

2.7.1 Peer to Peer Remote Copy (PPRC) PPRC performs synchronous real-time mirroring of data from one logical volume to another. The target logical volumes can be in the same ESS, or they could be in another ESS located at a site up to 103 km away. It is important to remember, however, that the greater the distance between the sites, the greater the impact on the response times of those volumes. As sites get farther apart in a synchronous environment it will take longer to receive an acknowledgment back. As a result, the distance between ESS controllers may or may not have an affect on your applications meeting their performance requirements.

PPRC is designed to be application independent. Because the copy function occurs at the storage subsystem level (in storage hardware, not server), the application does not need to know that it exists. The PPRC protocol is designed to keep the secondary copy up-to-date by requiring that the write I/O is not returned as completed until the primary storage subsystem receives acknowledgment that the secondary copy has been written to.

PPRC institutes the following sequence when copying records from one volume to another. 1. Step 1: Write to primary storage volume at local site. The local application server writes data to a local volume on an ESS at the primary site. 2. Step 2: Write to secondary storage volume at remote site. PPRC sends the write over an ESCON or Fibre Channel link to the secondary ESS at the recovery site. 3. Step 3: Signal write complete on secondary storage. The recovery or secondary site ESS signals write complete to the application site or primary ESS when the data has been committed to the secondary volumes. 4. Step 4: Post I/O complete. When the application site ESS receives the write complete from the recovery site ESS, it returns an I/O complete status to the application server. This process preserves the consistency of the data on both the primary and secondary storage.

2.7.2 Extended Remote Copy (XRC) XRC offers some of the highest levels of data integrity and data availability in a disaster

recovery, workload movement, and device migration environment. It differs from PPRC in that it is a combination of hardware and software (requires the XRC hardware micro code feature and a software component in DFSMSdfp™ known as the System Data Mover solution for remote copy). XRC is asynchronous and is therefore designed to span thousands of kilometers.

Chapter 2. Hardware 57 Like most asynchronous implementations, Extended Remote Copy returns an immediate acknowledgment from the primary storage controller to the primary application server. XRC manages the process of passing the updates to the recovery site after they have completed at the application site. Because of its use of asynchronous transfers, the currentness of data at the recovery site will usually lag slightly behind that of the primary or application site. XRC

promotes data integrity and consistency at the recovery site by managing the application of secondary updates that are time stamped by z/OS in the same sequence as they were applied at the primary site. This can be done across many storage controllers in order to help preserve global consistency.

XRC is designed to maintain minimal data loss and an RPO of seconds to minutes. As mentioned earlier, the System Data Mover (SDM) is a software component that resides in the remote site enterprise server. As updates occur to primary storage, the SDM manages the process of copying those updates to secondary volumes. This helps confirm that updates to secondary volumes are made in the same order in which they were made to the primary volumes, maintaining update sequence consistency. The SDM then groups the updates according to consistency times, and writes them out to journal data sets.

XRC institutes the following sequence when copying records from one volume to another remotely. 1. Application Write I/Os. Multiple time-stamped writes are directed to the primary disk volumes. 2. IBM Enterprise Storage Subsystem maintains Update Group. The write I/Os in step 1 are targeted to records on XRC managed volumes. The ESS maintains updates in subsystem cache until they are transferred to the SDM system asynchronously. 3. The SDM requests data transfer. The ESS and SDM have several flexible methods that trigger the transfer of updated records from the primary site disk to the secondary site disk. One method is for the SDM to poll the ESS at regular intervals. Alternatively, the ESS can interrupt the SDM whenever a certain threshold of updated records has been reached and request that this group of records be read. 4. SDM reads the updates from the ESS. When the record thresholds have been reached, or SDM polling intervals expire, updated reconfiguration groups are transferred from primary disk volumes to the SDM. 5. SDM forms consistency groups (CGs). The updated records are processed into consistency groups (CGs) by the SDM. The CG contains records that have their order of update preserved, even across multiple storage controllers. This preservation of order is vital for applications that process dependent write I/Os. The creation of CGs helps confirm that XRC will shadow data to a remote site in real time with update sequence integrity. 6. SDM journals CGs. When a CG is formed, it is written to the journal data sets residing on volumes at the secondary site.

2.8 Geographically Dispersed Parallel Sysplex™

Geographically Dispersed Parallel Sysplex (GDPS) is an automation package that promotes data consistency while maintaining a customer’s RPO and RTO. It works in conjunction with PPRC, XRC.

GDPS is a combination of software and services that is built on the IBM Enterprise Server zSeries and z/OS. GDPS provides for a single, automated solution to dynamically manage storage subsystem mirroring (disk and tape), processors, and network resources to help a business attain “near continuous availability” and “near transparent business continuity (disaster recovery)” with minimal or no data loss.

58 Achieving the Highest Levels of Parallel Sysplex Availability It is designed to minimize and potentially eliminate the impact of any failure, including disasters or planned outage. Additionally, it can provide the ability to perform a controlled site switch from the production data center to the recovery data center for both planned and unplanned site outages, with minimal (GDPS/XRC) or no data loss (GDPS/PPRC), helping maintain data integrity across multiple volumes and storage subsystems and the ability to

perform a normal Data Base Management System (DBMS) restart (rather than a lengthy and manual DBMS recovery) at the opposite site.

GDPS is application independent and therefore covers the customer's complete application environment. It can automate many of the design flow steps mentioned in this paper. If a customer is looking to improve on its RTO and RPO as well as reduce human intervention, then GDPS is the answer to complement IBM remote copy services and support recovery needs.

2.8.1 Data consistency Data consistency across all primary and secondary volumes spread across any number of storage subsystems is essential in providing data integrity and the ability to do a normal database restart in the event of a disaster.

The main focus of GDPS automation is to make sure that, whatever happens in site 1, the secondary copy of the data in site 2 is data consistent (the primary copy of data in site 1 will be data consistent for any site 2 failure).

Data consistent means that, from an application’s perspective, the secondary disks contain all updates until a specific point in time, and no updates beyond that specific point in time. The fact that the secondary copy of the data is data consistent means that applications can be restarted in the secondary location without having to go through a lengthy and time-consuming data recovery process.

Data recovery involves restoring image copies and logs to disk and executing forward recovery utilities to apply updates to the image copies. Since applications only need to be restarted, an installation can be up and running quickly, even when the primary site has been rendered totally unusable.

2.8.2 The HyperSwap Together with GDPS/PPRC there is a new function called GDPS/PPRC HyperSwap. This function is designed to broaden the continuous availability attributes of GDPS/PPRC by extending the Parallel Sysplex redundancy to disk subsystems. The hyperswap function can help significantly increase the speed of switching sites and switching disk between sites. The hyperswap function is designed to be controlled by complete automation, allowing all aspects of the site switch to be controlled via GDPS.

Stage 1 of the hyperswap function provides the ability to transparently switch all primary PPRC disk subsystems with the secondary PPRC disk subsystems for a planned switch reconfiguration. Stage 1 provides the ability to perform disk configuration maintenance and planned site maintenance without requiring any applications to be quiesced. Large configurations can be supported, as hyperswap has been designed to provide capacity and

capability to swap a large number of disk devices very quickly. The important ability to re-synchronize incremental disk data changes, in both directions, between primary/secondary PPRC disks is provided as part of this function.

HyperSwap swaps SSIDs in parallel, but within each SSID devices are swapped serially. The duration of a hyperswap is related to the SSID with the largest number of devices.

Chapter 2. Hardware 59 If you plan to use hyperswap there are certain things you need to be aware of:  You have to run GRS star and convert all reserves to global enqueues.  The disk subsystem must support PPRC level 3 (extended query).  All disk volumes have to be PPRCed except Couple dataset volumes.  The PPRC configuration must have one-to-one correspondence between each primary PPRC SSID and secondary PPRC SSID.  HyperSwap devices cannot attach to systems outside sysplex.  Production systems must have sufficient channel bandwidth to primary and secondary PPRC disk subsystems.

For further information about GDPS see: http://www-1.ibm.com/servers/eserver/zseries/library/whitepapers/gf225114.html http://www.ibm.com/servers/eserver/zseries/announce/april2002/gdps.html

2.9 Other hardware equipment

What we have discussed so far are all the necessary things you have to consider when implementing a high-availability solution on zSeries. What we have focused on are the different hardware that makes a Parallel Sysplex possible, like servers, coupling facilities, etc. And that you have to configure all devices for redundancy.

An application environment of today will most likely consist of parts that run on other servers than zSeries or functions that run on other servers. When building an application that has a demand for high availability, you have to consider the dependencies to other servers, and investigate if there are possible solutions for this server to cooperate in a Parallel Sysplex environment.

You might have installed a fully redundant Parallel Sysplex with hardware in two different sites. Applications are made redundant with two instances making use of data sharing and workload distribution spread across multiple sites. If this application has any dependency to a server outside the scope of Parallel Sysplex, this becomes the weakest link. If the outside server fails and there is no redundancy your application will stop. If possible make sure that the outside server can communicate with multiple instances of your application. If not you will have a single point of failure for that specific application.

Immediately after the updates have been hardened on the journal data sets, the records are written to their corresponding secondary volumes. The records are written from SDM’s real storage buffers.

2.9.1 3494 Tape library/VTS When configuring your Tape Library or VTS for high availability make sure you configure enough paths to the device. If possible configure paths thru different ESCON Directors. If running in an LPAR environment, use EMIF to reduce the number of required channels, but be aware of implications of using EMIF—if one channel fails, it could affect more than

one LPAR. When installing a 3494 Tape library plan for expansion of the tape library with additional frames. Plan to use the High Availability feature of 3494 Tape library.

The IBM TotalStorage® Enterprise 3494 Model HA1 High Availability Frames consist of two service bay unit frames that are installed adjacent to both ends of the other 3494 tape library

60 Achieving the Highest Levels of Parallel Sysplex Availability frames. The left-most HA1 frame, next to the IBM 3494 Model L12 frame, provides a garage for the active accessor in case it fails. The right-most HA1 frame provides a second accessor and a second library manager. The HA1 components are:

 A second library manager, located in the right-hand service frame.  A second cartridge accessor.  Communication links between the two library managers and a LAN hub if a VTS is integrated with the IBM 3494.  Shared nonvolatile random access memory (NVRAM) holding the current state of the active and standby library manager. It is used when a library manager cannot communicate with the second library manager on either of the links to decide whether this library manager is active.  Hardware switches to switch the operator panel from one library manager to the other, or to switch hosts or tape controllers to either library manager.  Digital input/digital output (DI/DO) lines to communicate component commands and status between both library managers and the library components, such as the accessors.  Two service frames (or bays) for storage of the inactive cartridge accessors, service diagnostics, or accessor repair. Service frames contain rail extenders (to allow the accessor to be stored within the frame) and a barrier door. The right service frame (when you look at the library from the front) contains the second library manager. A barrier door in the service frame is used to keep the functioning cartridge accessor from entering the service frame during service. Each service frame has its own AC power control compartment, which is different from the compartment of the 3494 Model L12.  A second unit emergency power off (EPO) switch as in the 3494 Model L12 to drop power to the entire library.

For system-managed tape, information about volumes is stored in the TCDB(Tape configuration database), which is an integrated catalog facility (ICF) catalog of type VOLCAT. The TCDB consists of one or more volume catalogs. A volume catalog contains entries for tape volumes and tape libraries but does not contain entries for individual data sets. At least one general volume catalog and any number of specific volume catalogs must be defined. Storing the information for any particular range of volume serial numbers in a specific volume catalog aids performance in accessing the TCDB and may ease the use of TCDBs across systems and applications. Even though Access Method Services (IDCAMS) commands can change the content in a TCDB, we recommend highly that you use ISMF to perform functions against a tape library. Changes through the IDCAMS command in the TCDB are not transferred to the library manager in the 3494 tape library. Therefore, discrepancies can occur. Use the IDCAMS CREATE, ALTER, and DELETE commands only to recover from volume catalog errors. If you want to use the 3494 tape library without systems-managed tape you have to use Basic Tape Library support (BTLS).

Your 3494 tape library installation depends heavily on the availability of your TCDB or BTLS databases. An extended outage of a database can be extremely disruptive because tape dates stored in the IBM 3494 cannot be retrieved without access to the database. Remember to allocate these databases on high availability Dasd. The most important task is to make sure that the databases are included in the backup job or job stream for catalogs. We recommend that you run IDCAMS EXPORT for the backup. You may use other programs, such as DFSMShsm™ or DFDSS, to back up your

Chapter 2. Hardware 61 databases. However, if you want to use ICFRU for recovery too, then IDCAMS is the backup program to use. If you use general and specific volcat and use DFSMSdfs for backup, backup of all the volcats in the same time is required for consistency reasons. If you use IDCAMS EXPORT, this is not required, but can reduce your outage. For example, there may be a disaster with direct access storage device (DASD) where the volcats

reside, and you can only sample the right SMF data once. We recommend that you place at least one backup of all TCDBs on DASD. If you lose the TCDB, and the backup is on a tape in an library environment (including native and VTS), you must perform some manual reconstruction to access the library with the backup tape. Performing a backup on DASD is the easiest way to avoid this manual intervention. Wether or not you using VTS or 3494 native Tape Library, you need to consider that all of your tape volumes now are placed in the same location. For disaster recovery purposes, you need to place your backups or other critical tape datasets at another locations. With 3494 native Tape Library you export the physical volumes and move them to another location. The import/export component feature of the VTS installed gives you an opportunity to export logical volumes, for example, to store them in a disaster vault, or to import them into another VTS subsystem. You should consider, however, that exporting logical volumes involves writing them onto exported stacked volumes, which then are ejected from the 3494. If you need to export a large number of logical volumes frequently, you might consider placing them on native 3490 or Magstar® cartridges or writing them to a 3494 installed at the vault location.

For an installation with high-availability demands there is a possibility to use Peer-to-Peer VTS. That is the best solution. A Peer-to-Peer VTS is designed to eliminate all single points of failure and provide higher performance than the stand-alone IBM TotalStorage Enterprise Virtual Tape Server. IBM TotalStorage Peer-to-Peer VTS couples two separate IBM TotalStorage Enterprise Virtual Tape Servers into a single image with additional components and features. The two VTSs can be located at the same site or can be geographically separated for disaster tolerance. Peer-to-Peer VTS allows you to select when to create the secondary copy of a virtual volume:  Immediately during close processing (immediate copy mode) This creates a copy of the logical volume in the companion connected Virtual Tape Server prior to completion of a Rewind/Unload command. This mode provides the highest level of data protection.  Asynchronously at a later time (deferred copy mode) This creates a copy of the logical volume in the companion connected Virtual Tape Server as activity permits after receiving a Rewind/Unload command. This mode provides protection that is superior to most currently available backup schemes.

There are other possible alternatives for copying volumes. But these are the most important for an installation with demands for high availability.

For further discussion on 3494 Tape Library and VTS look at the following documentation.  3494 Tape Library Data server Introduction and Planning Guide, GA32-0279  IBM TotalStorage Virtual Tape Server: Planning, Implementing, and Monitoring, SG24-2229  IBM TotalStorage Peer-to-Peer Virtual Tape Server Planning and Implementation Guide, SG24-6115

62 Achieving the Highest Levels of Parallel Sysplex Availability GDPS and tapes GDPS also supports Peer-to-Peer Virtual Tape Server. By extending GDPS support to data resident on tape, the GDPS solution provides continuous availability and near-transparent business continuity benefit for both disk and tape resident data. Enterprises will no longer be forced to develop and utilize processes that create duplex tapes and maintain the tape copies in alternate sites.

For example, previous techniques created two copies of each DBMS image copy and archived log as part of the batch process and manual transportation of each set of tapes to different locations. Operational data (data that is used directly by applications supporting end users) is normally found on disk. However, there is another category of data that supports the operational data, which is typically found on tape subsystems. Support data typically covers migrated data, point in time backups, archive data, etc. For sustained operation in the fail over site, the support data is indispensable. Furthermore, several enterprises have mission critical data that only resides on tape.The PtP VTS provides a hardware based duplex tape solution and GDPS automatically manages the duplexed tapes in the event of a planned site switch or a site failure. Control capability has been added to allow GDPS to “freeze” copy operations, so that tape data consistency can be maintained across GDPS managed sites during a switch between the primary and secondary VTSs.

For further information see: http://www-1.ibm.com/servers/eserver/zseries/library/whitepapers/gf225114.html http://www.ibm.com/servers/eserver/zseries/announce/april2002/gdps.html

2.9.2 Stand-alone tape Even though you probably have some kind of tape library or Virtual tape system for the daily work. You probably have some stand-alone tape drives that might be important for the daily batch, wether it is reading input from external sources or sending data externally. Make sure you connect all stand-alone tape drives thru ESCON Director. In the event of a system outage or a processor outage you have the ability to switch tape drives to another system at another location. For a site disaster you have to plan for having stand-alone tape drives at secondary site.

2.9.3 3174, 2074 Most modern S/390 installations use multiple LPARs. Each LPAR requires one or more local 3270 devices/sessions for z/OS consoles and a few TSO terminals. The local 3270 devices require a local IBM 3174 Control Unit for each LPAR. As the number of LPARs grows, the number of local 3174s also grows. Lack of additional 3174s can inhibit use of additional LPARs for test and development purposes. IBM is no longer producing 3174 Control Units. Most existing local 3174 Control Units were built for parallel channels. Channel cabling and switching arrangements for these, in a situation where LPAR flexibility is important, can become difficult. Even where sufficient ESCON 3174 control units are available, the switching arrangements (through ESCON directors) and IOCDS definitions for many control units potentially used by many LPARs can become unreasonable. Even with ESCON capable 3174 you need at least two per LPAR whenever you use them for z/OS consoles.

The IBM 2074 is a replacement for local, non-SNA 3174 Control Units. The 2074 is a shared control unit that can be used simultaneously by multiple LPARs, using EMIF-capable channels. For example, a single 2074 might provide z/OS consoles (and TSO sessions) for ten LPARs. The LPARs might be in a single S/390 or spread across multiple S/390 systems.

Chapter 2. Hardware 63 The 2074 does not use existing real 3270 devices or connections. That is, it does not use coax cables or any of the family of real 3270 terminals. Instead, it uses TCP/IP connections (tn3270e) over LAN(s) to connect terminals (usually PCs) to the 2074. It uses ESCON channels to connect the 2074 to S/390 channels, either directly or through ESCON directors. Figure 1-1 on page 3 provides a conceptual view of a 2074 installation. Practical details, with

multiple channels, multiple systems, and ESCON directors, can make a more complex environment, but the basic concept remains the same.

Configuring a high-availability environment with 2074s would mean that you need at least two 2074s.

64 Achieving the Highest Levels of Parallel Sysplex Availability

3

Chapter 3. z/OS

At this point, we have discussed the things you should do to ensure the computing environment is supportive of continuous processing, and we have discussed how to configure your hardware to remove as many single points of failure as possible. The next area to address is software.

Before we move on to talk about the transaction and database managers, we need to cover the next layer up from the hardware—specifically, the operating system and related products. Therefore, in this chapter, we discuss:  The considerations for providing maximum sysplex availability. If you have implemented data sharing and workload balancing, your applications should be able to survive the loss of a system; however, it is vital that the sysplex remain available. Therefore we will first concentrate on the issues that affect sysplex availability.  The considerations for providing maximum system availability. This is to ensure that each individual system has the highest possible availability. While data-sharing applications can survive the loss of a system, there will obviously be an impact on users using that system at the time of the failure, so we want to do all we can to avoid system outages as well.  The features of z/OS that provide support for dynamic workload balancing, providing the ability to continue to run your applications, even though one system may be unavailable.

Note for GDPS users: Some of the recommendations in this chapter do not apply to sysplexes where GDPS is being used. This is because of the unique way the DASD are configured with GDPS/PPRC, and because of the automation functions that GDPS provides. If there are contradictions between any of the recommendations in this chapter and those provided in the GDPS documentation, the GDPS documents should be viewed as the definitive recommendation.

© Copyright IBM Corp. 2004. All rights reserved. 65 3.1 Configure software for high availability

One of the first things to consider is the definition, use, and placement of data sets critical to the availability of the whole sysplex. As far as possible, you need to make sure that there are no single points of failure that could affect these data sets. You also have to be careful to define and place the data sets so that they achieve the required performance.

Tip: There are various recommendations throughout this chapter to have just one copy of many system elements (sysres, master catalog, and so on). Having just one copy significantly eases systems management, results in fewer entities that can break, and reduces the systems programmer’s workload. However, having just one copy can also create a single point of failure.

Rather than addressing this issue over and over, we want to say here that the use of HyperSwap, provided with GDPS/PPRC, removes the related DASD volumes as single points of failure. GDPS/PPRC HyperSwap allows you to nondisruptively swap between PPRC primary and secondary devices. If HyperSwap is enabled, a failure of a device or a whole DASD subsystem results in a pause in processing of between 30 seconds and one minute, rather than requiring an IPL. As a result, you can have, for example, a single sysres volume shared between all members in the sysplex, and still not have a single point of failure.

3.1.1 Couple Data Sets z/OS requires a sysplex to have a sysplex Couple Data Set (CDS), and optionally a number of policy CDSs, depending upon which sysplex features are being utilized. These data sets are shared by all members of the sysplex and are used to store information relating to the status of the sysplex, or to manage certain aspects of the sysplex.

All types of CDSs may be defined as a primary data set only, or as a primary/alternate data set pair. Updates are made to both the primary and alternate data set concurrently. This allows XCF to automatically switch to the alternate CDS should the primary CDS fail or if there is any type of transient or permanent I/O error related to the primary CDS.

Important: Even though it is possible to run with just a primary CDS, you should never do so, as this constitutes a single point of failure.

In this section, whenever we refer to a CDS, we always assume that you will actually have a failure-isolated pair (primary/alternate) of each type of CDS you use.

Sysplex Couple Data Sets The sysplex CDS is used to store information about the sysplex, about the members of the sysplex, and about the XCF groups and XCF group members currently active within the sysplex. Heartbeat monitoring of systems in the sysplex is primarily done via the sysplex CDS. As such, performance of the volumes containing the CDSs is important, as well as choosing volumes with minimal possibility of any lockouts (for example, RESERVEs). These data sets are critical to the sysplex, as any member that loses access to the sysplex CDSs will be placed in a non-restartable wait state.

66 Achieving the Highest Levels of Parallel Sysplex Availability Policy Couple Data Sets The policy CDSs are defined only when particular sysplex features are being utilized, and are categorized as CFRM, SFM, ARM, WLM, LOGR, or BPXMCDS.  The Coupling Facility Resource Manager (CFRM) CDS contains the CFRM policy, which is used for defining the Coupling Facilities (CFs) the sysplex can use and the structures that may be allocated in those CFs. It also contains information about the state of all structures. During a structure rebuild, you will see a lot of activity to this data set as the state information is checked and updated by all the systems in the sysplex. The performance improvements made in z/OS 1.4 will decrease the contention on this data set during rebuild, and can have a significant positive effect on recovery time following the failure of a CF, where many structures need to be rebuilt in parallel. These improvements are rolled back to previous releases via APAR OW48624. We recommend that you apply this APAR if you are on a release prior to z/OS 1.4. If any system loses connectivity to the CFRM CDS, it will be placed in a non-restartable wait state (0A2, reason code 09C) if it is using a CF structure.  The Sysplex Failure Management (SFM) CDS contains the SFM policy that is used for defining how system failures, signaling connectivity failures, and PR/SM reconfiguration actions are to be managed. If any system loses connectivity to the SFM CDS, the SFM policy becomes inactive in the whole sysplex. SFM is discussed in more detail in 3.7, “Sysplex Failure Management (SFM)” on page 103.  The Automatic Restart Management (ARM) CDS contains the ARM policy that is used for defining how z/OS is to manage restarts for started tasks and specific batch jobs that are registered as elements of ARM. If any system loses connectivity to the ARM CDS, the ARM policy becomes inactive on that system and all elements registered with ARM on that system are deregistered. Depending on the exploiter, they may or may not re-register with ARM when/if the CDS becomes available again. Some products will have to be restarted in order to re-register. ARM is discussed in more detail in 3.8, “Automatic Restart Manager (ARM)” on page 106.  The Workload Manager (WLM) CDS contains the WLM policy that defines the service goals and other performance-related parameters used by WLM to manage the sysplex workload. If any system loses connectivity to the WLM CDS, the system continues to run using the WLM policy information that was in effect at the time of the failure. In that case, WLM is described as being in independent mode, operating only on local data, and does not transmit data to other members of the sysplex. While the system will not stop, the sysplex-wide performance achieved by critical applications may be impacted if this situation is allowed to continue for any period of time.  The System Logger (LOGR) CDS contains the LOGR policy that defines the log streams and related structures, status information about all log streams, and information about the staging and offload data sets associated with each log stream. If any system loses access to the LOGR CDS, the lXGLOGR address space terminates

on that system and requests to System Logger are rejected. After an abend, System Logger will normally make a few attempts to restart (assuming APAR OW53349 is applied). However, if it still cannot access the LOGR CDS, the restart will obviously fail.  The z/OS UNIX (BPXMCDS) CDS is used to support the Shared HFS facility in a sysplex environment (as of OS/390 2.9). While there is no policy associated with this data set, the data set must exist and must be formatted with the IXCL1DSU utility.

Chapter 3. z/OS 67 The CDS contains (primarily) the file system mount table. Anything that changes the file system hierarchy obtains exclusive access to the CDS to read, then write it. This includes all mounts, unmounts, remounts, moves, system recovery, and system initialization. Systems that specify SYSPLEX(YES) in their BPXPRMxx member use the CDS for all mounts, regardless of whether the file system is using sysplex sharing support or not.

Systems that specify SYSPLEX(NO) never use the CDS and are never impacted by any CD-related problems. If a system loses access to the BPXMCDS, message BPXF214E will be issued on that system, and no system in the sysplex will be able to mount or unmount any file system until either access is restored (and a SETXCF COUPLE,TYPE=BPXMCDS,PCOUPLE=xxx is issued to reestablish connectivity), or else that system is partitioned out of the sysplex.

If you have automation in place to define and allocate alternate CDSs, you should never get in the situation where a system completely loses access to a CDS (assuming the primary and alternate are failure-isolated from each other).

IXC253I will be issued when a couple data set is being removed from use. If it is the last couple data set of its type, then this is followed quickly by IXC220W and a wait state.

Recommendations for Couple Data Sets CDSs are critical to the ongoing availability of the sysplex, and should be created and managed with availability considerations in mind.

Current DASD technology, such as the use of RAID arrays, means that the failure of an individual DASD volume is unlikely. In fact, it is more likely that an entire DASD subsystem will be lost than an individual volume. If this happens, the size of modern storage subsystems means that such a failure will, in all likelihood, render the entire system or sysplex unusable, unless a remote copy solution such as XRC or PPRC is in place. The use of HyperSwap with PPRC connected devices is especially attractive in these cases, as the systems can nondisruptively swap over to the secondary DASD in the remote site.

Logical availability, on the other hand, depends on how and where you define the CDSs. To maximize CDS availability, the following considerations should be kept in mind: Always define alternate CDSs to cater for possible failure of the primary CDS. You should also define spare CDSs at the same time, and using the same parameters, as the corresponding primary and alternate CDSs. These spare CDSs are available to be immediately activated as valid alternate CDSs should the existing alternate CDSs be required to assume the primary role. This means that each CDS type should have three data sets—primary, alternate, and spare. Alternate and spare CDSs must not be smaller than the primary. Do not excessively over-specify parameters when formatting CDSs. This results in wasted space and degraded CDS performance. More important is that over-specifying values significantly elongates IPL times. If you need to increase the values used to format a CDS, this can easily be done nondisruptively at any time by defining new CDSs and switching to those. Therefore, there is no benefit in specifying values significantly larger than you actually require. Note that the only way to switch to CDSs smaller than the ones currently being used is to do a sysplex IPL. Use a clear naming convention to identify the CDS type and owning sysplex. You should have automation that will automatically add a new alternate CDS should the current one get switched to being the primary one, or if the current alternate becomes unavailable for some other reason. msys for Operations provides automation to

68 Achieving the Highest Levels of Parallel Sysplex Availability automatically allocate and add a new alternate CDS should one be required. Refer to “z/OS msys for Operations” on page 97 for information about this function.

Attention: It is important to understand that msys for Operations provides a subset of the functions provided in the full System Automation for OS/390 V2 (SA/390) product, so any time we refer to a function provided by msys for Operations, you should understand that that function is also provided in SA/390 if you are licensed for that product.

Place the primary, alternate, and spare CDSs on different volumes, and on different physical DASD subsystems, if at all possible. There are certain error recovery situations that can cause a whole DASD control unit to stop processing any I/O requests. It usually rejects I/O requests with a status of “busy” (called a “long busy”) during this time. If the couple data set gets a permanent I/O error, it will be removed immediately with message IXF253I. If there is no alternate, then the system will issue message IXF220W and the system will enter a non-restartable wait state. If the I/O gets delayed to the point that the I/O timing facility indicates an I/O time-out, then the request for the couple data set is queued by XCF, and retried every 15 seconds for a period of up to 5 minutes. If the condition still persists after this period, the couple data set will be removed, and a wait state will result as above. Allocate the primary sysplex CDS on a different volume from the primary CFRM and LOGR CDSs. You also probably would like to spread your other CDSs over different volumes so that fewer CDSs are affected in case of the failure of one volume. A possible mix is shown in Figure 3-1.

VOLAAA VOLBBB VOLCCC

Sysplex - Pri Sysplex - Alt Sysplex - Spare CFRM - Spare CFRM - Pri CFRM - Alt SFM - Alt SFM - Spare SFM - Pri ARM - Pri ARM - Alt ARM - Spare WLM - Spare WLM -Pri WLM - Alt LOGR - Alt LOGR - Spare LOGR - Pri SHCDS - Pri SHCDS - Alt SHCDS - Spare

Figure 3-1 CDS placement

Avoid allocating CDSs on DASD volumes that are subject to reserve activity, or on volumes with high I/O activity (such as those containing paging and JES2 spool data sets), as both of these situations may adversely affect sysplex performance and could potentially cause CDS failure. If at all possible, we recommend defining the CDSs on volumes that only contain CDS data sets. With the flexibility to define volumes of just about any size, you can define volumes of the size required for the CDSs, and thereby avoid wasting DASD capacity if you decide to have these CDS-only volumes.

Make sure that DASD Fast Write and caching are enabled on the volumes containing the CDSs; in particular, those containing the sysplex and CFRM CDSs. Perform regular reviews to ensure that these features have not been disabled. This can be done with your batch scheduler submitting a weekly job to check this, and an automated task to scan the output to verify that these features are enabled. Ensure there are redundant paths to the CDSs, where possible, through different ESCON/FICON Directors, storage clusters, I/O bays, and channel cards, to minimize or

Chapter 3. z/OS 69 preferably eliminate CDS-related single points of failure. The XISOLATE tool can assist with this. The tool is available on the Web at: ftp://ftp.software.ibm.com/s390/mvs/tools/ While different policy types may be allocated in the same physical CDS, we strongly recommend that each CDS type has its own dedicated data sets. If your disaster recovery mechanism involves backing up to tape and restoring at a remote site, there is limited value in backing up the CDSs, as most of them contain mainly transient information. In addition, using CDSs that are not synchronized with the related data (the LOGR CDS and associated log streams, for example) can actually cause problems during recovery. For this reason, we recommend keeping a spare set of CDS that are only used in case of a disaster (remember that this means having a COUPLExx member that will only be used in case of a disaster). For those CDSs that contain policy information (the CFRM, WLM, SFM, ARM, and LOGR ones), you will need to ensure that either the disaster recovery spares are kept up to date with the normal copies, or else there is a procedure in place to update the CDSs with the current policies as part of the disaster recovery restore process. Be aware that some sysplex monitoring tools may generate a large amount of I/O to the CFRM CDS; this may affect the system performance. If you decide to use these tools, be aware of the performance implications. Avoid using DFSMSdss™ or other utilities that issue reserves to the volumes containing the CDSs. If there is a need to use DFSMSdss (or any other data mover) to back up a volume that includes a CDS, we recommend that you: Convert the SYSVTOC reserve on the volume to a global enqueue through specification of QNAME(SYSVTOC) RNAME(volser) in the GRS Reserve Conversion RNL. Refer to z/OS MVS Planning: Global Resource Serialization, SA22-7600, for further information regarding this facility. Use logical volume backup rather than physical backup. Refer to z/OS DFSMSdss Storage Administration Guide, SC35-0423, for further information regarding this facility. Do not use Peer-to-Peer Remote Copy (PPRC) or other synchronous remote copy mechanisms for the sysplex and CFRM CDSs. Synchronous data copy requires the I/O subsystem to update not only the primary volume but also the secondary volume before an update can be committed; this will invariably impact sysplex performance. In addition, the contents of these CDSs is unlikely to be of value in a disaster recovery situation, as the sysplex CDS contains information regarding systems that are no longer active, while the CFRM CDS refers to CFs at the original site rather than those at the disaster site, and is therefore potentially invalid. Note, however, that other types of CDSs, notably LOGR and WLM, do contain information that would be valid at the disaster recovery site and therefore should be considered for synchronous data copy. For LOGR remote copy to be effective, the log streams needs to be duplexed to DASD that are included in the remote copy configuration. If you are going to remote copy your CDSs, ensure that the PTFs for APARs OA05025, OA05391, and OW56611 are applied to all systems. CDSs may be allocated on SMS-managed volumes through specification of STORCLAS, MGMTCLAS, and VOLSER keywords in the format utility; however, you should do the following: Use a Storage Class setup with GUARANTEED SPACE=YES to allow specific volume selection.

70 Achieving the Highest Levels of Parallel Sysplex Availability

Use a Management Class with the following attributes to prevent accidental deletion or migration by DFSMShsm: • EXPIRE AFTER NON-USAGE=NOLIMIT • EXPIRE AFTER DATE/DAYS=NOLIMIT • COMMAND OR AUTO MIGRATE=NONE

The XISOLATE tool helps you ensure that critical data sets have no single points of failure. The tool is not intended for use with CDSs only; it can be used for any of your vital data sets.

3.1.2 Other important data sets In addition to the CDSs, there are a number of other data sets that are critical to the overall availability of your system and maybe your sysplex (if those data sets are shared by all systems in the sysplex).

Many software components uses multiple copies of data sets to allow functions to continue to operate after one of the copies becomes unavailable. Some products automatically switch to the alternate, while others may rely on an automation solution to initiate the switch. In either case, an alert should be raised to highlight both the failure and the exposure.

Recommendations for system data sets Recommendations for system data sets follow: Master catalog. We recommend that you have a duplicate master catalog (same contents but a different name) and appropriate procedures to keep the duplicate in sync, as well as the necessary definitions in LOADxx to allow IPL from the duplicate. It also recommended that the number of data set entries in the master catalog is kept to a minimum, to minimize the level of change activity to the catalog. User catalogs. It is impractical to have duplicate user catalogs, so tested and documented backup and recovery procedures are vital. Regular backups should be taken. You may consider taking two backups, each cataloged in a different catalog. Similarly, you may decide to keep two copies of the SMF data required to recover the catalogs, again cataloged in two different catalogs. This ensures that all catalogs can be recovered, irrespective of which user catalog it was that failed. Refer to the IBM Redbook ICF Catalgog Backup and Recovery:A Practical Guide, SG24-5644, for more information about backing up and recovering catalogs.

Tip: Ensure that critical user IDs, such as the system programmer ones and ones used to run recovery jobs, are split across more than one catalog.

If a common PROCLIB is used to start JES2, VTAM, TCP, and TSO on all systems in the sysplex, the data sets required to start JES2, VTAM, and TCP, and to log onto TSO are critical resources. These can be addressed by: Having an alternate JES2 proc, probably containing the minimum number of DD statements. And remember to define the alternate proc name in the IEFSSN member. In case of emergency, when there is no other way to fix the proc or to get JES2 up, you can start JES2 without a PROC via the Common System Address Space procedure (that is, S IEESYSAS,JOBNAME=JES2,PROG=HASJES20). This is not a simple

Chapter 3. z/OS 71 procedure and should not be relied on unless you have tested and documented its use. Information about this facility can be found in z/OS JES2 Commands, SA22-7526. Having alternate VTAM and TCP procedures that reference backup versions of all required libraries. Maintaining a TSO logon proc with the minimal data set allocation to support an ISPF session. SYS1.PARMLIB and SYS1.PROCLIB. Just as having a single shared master catalog reduces administration and the chance of different versions getting out of synch with each other; similarly, having just one PARMLIB and PROCLIB also makes for simpler system administration, which should result in fewer human errors. However, it also creates a single point of failure. Having copies of these data sets is of limited value, as the data sets are cataloged in the master catalog, and you would need to be able to update the catalog if you want to pick up the copy. However, if either of these data sets is broken, you may not be able to access the catalog to make that change. RACF database. Ensure that the RACF backup database is failure-isolated from the primary database and is configured to receive all updates except for statistics. Refer to “RACF” on page 157 for more information about the RACF databases. If there is a single MAS in the sysplex, the JES2 Checkpoint potentially represents a single point of failure. However, JES2 can (and should) be configured with two checkpoint data sets, and has the ability to automatically switch over to an alternate should one fail. Especially in a sysplex of eight or more systems, or any size sysplex with high levels of JES2 Checkpoint activity, we strongly recommend placing the JES2 primary checkpoint in a CF, with the alternate checkpoint being on DASD. Ensure that JES2 Checkpoints are defined with valid NEWCHKPTn values and with MODE=DUPLEX, DUPLEX=ON, and OPVERIFY=NO for automated reconfiguration in failure situations. Refer to “JES2” on page 139 for more information. JES2 spool data sets. The $T SPOOL,SYSAFF command can be used to partition the JES2 SPOOL, specifying which spool volumes may be used by each system. This can be useful if you have two sites, and have the spool volumes spread over both sites. You can also use spool fencing, which allows you (via the SPOOLDEF statement and JES2 Exits 11 and 12) to specify a subset of spool volumes from which a job or job class can allocate track groups. For more information on both these options, refer to the section entitled “Spool Partitioning” in z/OS JES2 Initialization and Tuning Guide, SA22-7532. DFSMShsm control data sets and journal. Backups of the DFSMShsm control data sets and journal should be taken regularly. The normal process would be that the CDSs and journal would be backed up at the start of the auto backup cycle by the Primary HSM. See 3.17, “DFSMShsm” on page 165, for more information about DFSMShsm.

SMS ACDS, SCDS, and COMMDS. Unlike the DFSMShsm CDSs, which are backed up as part of the daily AUTOBACKUP cycle, the SMS CDSs will not be backed up unless you specifically add procedures to do so. To have an alternate ACDS that can be switched to, you should define a spare ACDS, and use the SETSMS SAVEACDS command regularly to save the currently active SMS

72 Achieving the Highest Levels of Parallel Sysplex Availability configuration to this alternate CDS. If the normal ACDS is lost, you can switch to the new data set using the SETSMS ACDS(datasetname) command. There is no facility to copy the contents of the COMMDS; however, you can predefine a spare one, and switch to that dynamically using the SETSMS COMMDS command. If the SCDS is lost, it can be recreated using information in the ACDS with the SETSMS SAVESCDS command in z/OS 1.5 or later. Prior to z/OS 1.5, you must have a backup that you can use as a base to recover that data set. Obviously the alternate SCDS, ACDS, and COMMDS should be failure-isolated from the primaries. If you are using VSAM/RLS, the active SHCDS control data sets should be failure-isolated from each other (they all contain the same information and are backups for each other). Similarly, the spare SHCDS data sets should be failure isolated from each other, and from the active SHCDS data sets. In normal operation, all the active SHCDS data sets are identical to each other, but the spare CDSs are only formatted but not used. If access to an active SHCDS is lost, the system will automatically switch to a spare if one is available, and it is synchronized with the other active data sets at that time. We recommend having at least two active SHCDS data sets, and at least one spare SHCDS. Note that you can define up to five active and five spare SHCDS data sets. You should also put automation in place to notify you should SMSVSAM automatically switch one of the spare data sets to be an active one (as this indicates the loss of an active data set). The messages to monitor for are IGW609A, IGW614W, IGW615W, and IGW619I. If you have a CICSplex, that is, multiple related CICS regions that support the same applications, the DFHCSD should be shared between all those regions. If you are using EJB in the CICSplex, the DFHEJOS and DFHEJDIR data sets should be shared between all those regions. While CICS does not provide the ability to duplex the CSD data set, you can create a (failure-isolated) copy and REPRO the CSD any time you update it. You can also use DFSMShsm to do a backup-while-open of the CSD. In your CICS started task JCLs, you can use a JCL variable, which you can override when you start CICS to indicate which data set you wish to use.

The following data sets are critical to the availability of a single system or subsystem, rather than having a cross-system or multi-system impact.  PLPA and COMMON page data sets Ensure that spare PLPA and COMMON page data sets are available for IPL and that they can be picked up via an alternate IEASYSxx member.  Local page data sets Ensure all the page data sets are sized to handle the loss of one of the page data sets, unexpected levels of system activity, the data spaces used by the System Logger, and a number of system dumps. We recommend that the amount of space in the page data sets should be at least three times the amount of processor storage used by the z/OS system. It is not uncommon for installations to be using the same paging configuration

that they had when their systems had less than half the processor storage they are using now. For more information on recommended paging configurations, refer to the section entitled “Page Data Set Allocation - Size and Number” in the white paper z/OS Performance: Managing Processor Storage in an all “Real” Environment, available on the Web at: ftp://ftp.software.ibm.com/software/mktsupport/techdocs/allreal_v11.pdf

Chapter 3. z/OS 73

As well as providing the capacity you need, the paging subsystem should be designed to deliver the required levels of performance. z/OS 1.3 and later provides support for the use of Parallel Access Volumes (PAV) with paging volumes. We have traditionally recommended that paging volumes should only contain a single data set—this is so that SUSPEND/RESUME can be used by the Auxiliary Storage Manager (ASM) to

provide improved performance for the page data sets. However, the use of Dynamic PAVs on paging volumes allows you to place multiple page data sets on a single volume and still get the benefits of SUSPEND/RESUME. Note that the PAVs must be WLM-managed—static PAVs will not be used by ASM. Allocate one or more spare local page data sets that may be activated in the event of an auxiliary storage shortage. This can be done manually, or preferably automatically using a product such as SA/390 or msys for Operations. Even though msys for Operations provides the ability to dynamically allocate new page data sets, we still recommend having some spare pre-defined ones. This avoids the delay associated with formatting that takes place when you define a new page data set. Also make sure there is adequate redundancy in the connection of these devices, and that they are not spread across multiple physical control units. The reason for this recommendation is that the loss of a page data set will result in a dead system. If you spread the page data sets for all systems over all your physical control units, the loss of one control unit will kill all systems. Keeping all the page data sets for a system in one physical control unit should have no measurable performance impact, but limits the number of systems that are damaged should one of those control units fail. Migration to zArchitecture (64-bit) mode increases the potential size of your auxiliary storage requirement. Refer to the WSC FLASH 10185, available on the Web at: http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/Flash10185 For recommendations about configuring your paging subsystem, refer to the white paper entitled z/OS Performance: Managing Processor Storage in an all Real Environment, available on the Web at: ftp://ftp.software.ibm.com/software/mktsupport/techdocs/allreal_v11.pdf In addition to the shared CICS files, the CICS Journals, GCD, and LCD are critical files that can impact a single CICS region. The DB2 active logs, bootstrap data set (BSS), and archive logs all support duplexing, and we recommend using duplexing for these data sets. If those data sets are SMS-managed, you should also consider using the DFSMS Data Set Separation feature to ensure that the data sets are allocated on separate physical control units. The Data Set Separation feature is described in the section entitled “Specifying the DS Separation Profile” in z/OS DFSMSdfp Storage Administration Reference, SC26-7402. IMS provides support for duplexing its Online Log Data Sets (OLDS), Write Ahead Data Sets (WADS), and Multiple Area Data Sets (MADS). While we recommend the use of duplex OLDS and MADS, many customers decide not to duplex the WADS due to the CPU cost of duplexing, and the very small likelihood that they would ever be needed. Some customers rely on RAID technology to protect them from a DASD volume failure, and therefore do not duplex the OLDS or MADS either. However, RAID will not protect you from a failure at the CU level. For this reason we recommend duplexing those data sets and spreading them over multiple physical control units.

You must use dual RECON data sets, with a spare always defined. Each data set should be in a different catalog, and the catalog should be on the same volume as the respective data set. More information about how to set up the RECONs can be found in the sections entitled “Planning Considerations for the RECON” and “Hints and Tips for DBRC” in IMS V8 DBRC Guide and Reference, SC27-1295.

74 Achieving the Highest Levels of Parallel Sysplex Availability You should ensure that the two copies of these data sets are kept failure-isolated from each other. If these data sets are SMS-managed, you can use the DFSMS Data Set Separation feature to ensure that the data sets are allocated on separate physical control units. For IMS Version 7 and later, databases can be defined as HALDB partitions. These allow the partitions to be managed and maintained individually without impacting the other partitions. Application databases, other application data, and data sets containing vendor products and data.

3.1.3 Sysres and master catalog sharing Sharing a master catalog, the sysres, and other common infrastructure data sets is generally a good thing to do, but there are some things you must consider.

On the one hand, the benefits of sharing a common entity (sysres, catalog, or data set) are:  You will have fewer versions of that entity to manage.  There are fewer opportunities to make a mistake.  There is less chance of the entity being impacted by a hardware failure.  There is no chance of multiple copies of the same entity getting out of synch with each other.  It is easier to keep track of where changes and maintenance have been applied.

While every installation likes to think that they have strong Systems Management practices, in reality, more outages are caused by Systems Management-related failures than any other single reason. If your Systems Management practices are not foolproof, you will receive a bigger benefit from having only one copy of each of these entities. The IBM Redbook Parallel Sysplex. Managing Software for Availability, SG24-5451, discusses the benefits and drawbacks of sharing these system files.

On the other hand, the obvious exposure of only having a single version of each of these entities is that you now have a single point of failure. The entity is a single point of failure from two perspectives:  The device that the entity resides on could suffer a hardware or connectivity failure.  A mistake could be made while updating that entity, “breaking” the only version of that entity.

You can protect yourself against hardware failures by using GDPS/PPRC HyperSwap or by having a cloning process to create two identical versions of the one entity. You would only ever manually update the “master” copy, and then use your cloning process to propagate those changes to the second copy. You would then run half the systems off each copy.

Cloning in this manner will not, however, protect you from logic errors. If you apply a bad PTF to a sysres and then clone that sysres, every system that is IPLed using either physical sysres will potentially suffer the same problem, and therefore not be protected by the fact that they are running off a different physical copy.

The best way to protect yourself in these cases is through good Systems Management practices. For example, every time you update a Parmlib member, there should be a convention for what the old member is renamed to. Then, if the new member is found to be in error, everyone should know how to back off to the old member. For sysres’, we would never

Chapter 3. z/OS 75 recommend IPLing a number of systems off a new sysres until that sysres has been well tested—this is again controlled by good Systems Management practices.

Another consideration for using just one copy of a given entity is that it could potentially have an impact on performance. In normal operation, most of these entities would not be very intensively accessed. However, in the very unusual situation that a sysplex IPL is required, the IPL process could be slowed down if a very large number of systems are all IPLing from the same physical sysres at the same time. This is an unusual and unlikely scenario; however, you should be aware of this when making a final decision.

Recommendations for sysres sharing Read the IBM Redbook Parallel Sysplex. Managing Software for Availability, SG24-5451. This book discusses many of the concepts for designing your system data set layout. Use a cloning process to implement a shared sysres. This will allow you to have a spare copy that can be swung into production quickly. Establish good naming conventions, and use the facilities of the parmlib member IEASYM00 to create system symbols that will allow the Sysres data sets to be shared easily.

3.2 Consoles

Consoles are a critical resource in a Parallel Sysplex. The complete loss of consoles removes the operator's ability to control and interact with the system. If there is a prolonged loss of consoles, the CONSOLE address space buffers will eventually fill up, causing loss of messages and giving the appearance that the system is no longer functional. The Console Restructure code, delivered as a separate FMID (JBB7727) on top of z/OS 1.4, and in the base of z/OS 1.5, changes the design of console buffer handling so that this type of problem should no longer be encountered. We recommend you install this level as soon as possible. Another reason that the correct console configuration is so important in a sysplex is that the way the systems manage consoles in a sysplex is fundamentally different from how they were managed in a multisystem environment that was not a sysplex. Many installations are not fully aware of the impact of these changes, and have not changed their console configuration accordingly as part of their migration to a sysplex environment. We recommend that installations use the IBM z/OS Health Checker to check your console configuration and setup on a regular basis. The Health Checker has many checks specifically relating to consoles. You should also ensure that you install new levels of the Health Checker as they come available, and check the z/OS and Sysplex Health Checker User’s Guide, SA22-7931, for the latest checks related to consoles. If run on a regular basis (weekly or even daily), many aspects of console setup and use will be monitored, and automation can be set up to act on any out-of-line situations the Health Checker detects. Alerts sent to either the automation console or directly to systems programmers allow for appropriate actions to be taken before the problem escalates, potentially into a complete system failure. Refer to “IBM Health Checker for z/OS and Sysplex” on page 92 for more information about the Health Checker. This section discusses some considerations for the console configuration.

76 Achieving the Highest Levels of Parallel Sysplex Availability 3.2.1 Addressing WTO and WTOR buffer shortages The recommendations are:

We recommend implementing the Console Restructure code, either by adding the feature to a z/OS 1.4 system, or by migrating to z/OS 1.5. The need for aggressive WTO/WTL buffer shortage automation is substantially reduced by this feature. Messages destined for (S)MCS consoles will buffer into 24-bit storage in the console address space until that are is filled. Messages will be buffered above the line in the console address space until that area is filled. When both areas are filled, message queueing to (S)MCS consoles will stop and if the message cache wraps, messages may not be delivered to (S)MCS consoles. They will be delivered to the SSI and the log. You should monitor your consoles periodically to detect console problems that can cause messages to not be removed from message buffer storage. If you do not have existing automation in place to handle WTO and WTOR buffer shortages, we recommend using the WTO automation provided in SA/390 V2 and msys for Operations. These deliver pre-defined automation that addresses WTO and WTOR buffer shortage situations. If a WTO buffer shortage situation occurs, msys for Operations: – Increases the number of buffers to relieve the immediate shortage – Changes the properties of the affected console – Cancels the jobs that are consuming all the WTO or WTOR buffer space, if you permit that action If you do not exploit this capability, you should implement your own automation to react to the following messages: – IEA405E WTO BUFFER SHORTAGE - 80% FULL – IEA404A SEVERE WTO BUFFER SHORTAGE - 100% FULL – IEA230E WTOR BUFFER SHORTAGE. 80% FULL – IEA231A SEVERE WTOR BUFFER SHORTAGE. 100% FULL – IEA652A WTO STORAGE EXHAUSTED - WTOs WILL BE DISCARDED – IEA653I JOBNAME= jobname ASID= asid HAS REACHED nn% OF THE WTO BUFFER LIMIT Note that this message is replaced by the following message when the Console Restructure code is installed: CNZ3011I JOBNAME= jjjjjjjj JOBID= jjjjjjjj ASID=aaaa HAS REACHED 50% OF THE WTO BUFFER LIMIT – IEA654A WTO BUFFER SHORTAGE ON SYSTEM sysname1 IS AFFECTING MESSAGES ON SYSTEM sysname2 – IEA099A JOBNAME= jobname ASID= asid HAS REACHED THE WTO BUFFER LIMIT Note that this message is replaced by the following message when the Console Restructure code is installed: CNZ3012A JOBNAME= jjjjjjjj JOBID= jjjjjjjj ASID=aaaa HAS REACHED THE WTO BUFFER LIMIT The following are possible causes of WTO buffer shortages and are additional areas that should be considered for automation: – The buffer limit is too low to handle the message traffic. – A console might not be ready because a multiple-line WTO message is hung.

Chapter 3. z/OS 77 Note that hung multiple-line WTO situations cannot occur with the Console Restructure. Although the individual lines of a connecting multi-line WTO will continue to be seen in MPF and SSI as they are created, the multi-line message will be held in a data space until an end-line is received or an end-line is forced by a time-out, end-of-task, or end-of-memory. Only complete multi-line WTOs will be sent to consoles or to the

SYSLOG and/or OPERLOG. – An intervention required condition exists on one of the consoles. – A console has been powered off. – The path to the device is not fully functional; for example, an I/O interface is disabled. – The console is not in roll mode. – The console has a slow roll time. – Someone has pressed Enter on a console, resulting in that console going into Hold mode. Any consoles in Hold mode will automatically go out of Hold mode if 80 percent of the buffers are used. There is no way to define a console so that it will not go into Hold mode if Enter is pressed. – The console is in roll deletable (RD) mode, but the screen is filled with action messages. – An application is in a loop, issuing messages with or without a needed reply. – There is a WTO/R buffer shortage on another system in the sysplex. If the Console Restructure code is applied, a message buffer shortage on one system in the sysplex cannot affect another system in the sysplex. Messages are sent to other systems directly from the message issuer's address space and no longer occupy CONSOLE address space message buffers. – JES2 might have a problem with continuous writing of the syslog, causing the hardcopy function to hang. With the Console Restructure, messages being written to the SYSLOG and/or OPERLOG no longer occupy CONSOLE address space message buffers. In order to enable the efficient use and management of the consoles, the operators should have the following documentation: – Mapping of MCS, SMCS, and system consoles – Console operations and recovery procedures, including: • The use of command prefixes, the ROUTE command, and so on • Recovering from console failures and performing console switches • Switching console modes and recovering from buffer shortages The CONTROL Q (K Q) command can be used to recover from console buffer shortage and console failure situations. However, if the Console Restructure feature is installed, the CONTROL Q command can no longer be used to requeue messages from one console to another. The CONTROL Q command may still be used to remove (free) messages queued to a console, but the R= (requeue) parameter is no longer supported. – How to use the VARY CN(*), ACTIVATE command to enable the system console when needed for recovery

A document talking about performance and consoles in a sysplex can be found at the following URL: http://www.ibm.com/servers/eserver/zseries/library/techpapers/pdf/gm130166.pdf

78 Achieving the Highest Levels of Parallel Sysplex Availability 3.2.2 EMCS consoles Once an EMCS console is defined and activated, it lives for the life of the sysplex whether it remains active or not. If the number of EMCS consoles (including inactive consoles) becomes very large, console initialization during IPL can be extended by minutes. This may occur due to an error in NetView® setup or if a CLIST does not reuse EMCS console names. This results in an ongoing increase in the number of EMCS consoles defined.

If this condition is not detected early enough, the only way to reduce the number of EMCS consoles is a sysplex IPL.

Controlling EMCS consoles characteristics The attributes of an Extended MCS (EMCS) console can be specified in the OPERPARM segment of the RACF profile for the associated user ID.

The default values for all attributes are listed in Table 3-1.

Table 3-1 Default EMCS attributes Attribute Default

ALTGRP No default

AUTH INFO

AUTO NO

CMDSYS *

DOM NORMAL

KEY NONE

LEVEL ALL

LOGCMDRESP SYSTEM

MFORM M

MIGID NO

MONITOR JOBNAME SESS

MSCOPE *ALL

ROUTCODE NONE

UD NO

You can see that quite a few of the defaults go against our recommendations (MSCOPE=*ALL, for example). Therefore, take care when setting up user IDs associated with EMCS consoles to ensure that you specify the attributes that you actually require.

3.2.3 Using the HMC as a console Configure the hardware system console (DEVNUM(SYSCONS)) in your CONSOLxx member. This will allow the system to be operated from the Hardware Management Console (HMC) should all other consoles be unavailable for some reason. The system console function on the 9672 G5/G6 and zSeries CPCs can be used just like any other console; however, the interface is not conducive to normal, line-mode operation, like the MCS console is. The performance shortcomings that affected earlier implementations have been removed. With current HMC microcode levels, the biggest usability issue has

Chapter 3. z/OS 79 been fixed—it now provides an unlocked command entry area; previously you had to open a command window every time you wanted to issue a command.

The Console Availability Feature, available as a non priced feature for z/OS 1.4 and integrated into z/OS 1.5, will further improve system console support by providing an automatic means for enabling and disabling the flow of messages to the system console. The consoles specified in the new AUTOACT console group can be used to control when messages flow to the system console: If none of the consoles in the group are active, the system console will automatically be put into problem determination (PD) mode. (PD mode is discussed in z/OS MVS Planning: Operations, SA22-7601.) If any console in the group is active, the console will be taken out of PD mode, if it had previously been automatically put into PD mode. The consoles specified in the group can be of any type (MCS, SMCS, or EMCS) and may be attached to any system in the sysplex. For installations using SMCS consoles, this support will ensure that messages will be sent to the system console when VTAM (required by SMCS) is not available.

If only SMCS consoles are used, the system console plays an important role in availability—NIP messages and SYNCHDEST messages will be displayed on the system console. If the VTAM fails and all SMCS consoles become unavailable, the system console becomes the “console of last resort” and must be used to re-establish system availability.

Even though we do not in general recommend the use of the HMC as a console, there is a situation where it can provide valuable benefits. An advantage of using the HMC is that you can scroll back and see messages issued earlier in the NIP (IPL) process, whereas with a channel-attached NIP device, once the messages roll off the screen, they cannot be viewed until the system is fully up and running. This can be very helpful when trying to diagnose problems early in the IPL process.

3.2.4 Hardware consoles Certain sysplex-related hardware devices have their own consoles, specifically, ESCON Directors and Sysplex Timers. The consoles for both these devices are LAN-attached, and both devices support more than one console. We recommend having at least one console for each device type (it may be acceptable to use a single PC to fulfill both console functions) in the Operators area, and at least one other console close to the hardware for the service representative to use during maintenance or repair.

3.2.5 Console setup recommendations When the console component of MVS was originally designed, 1 MIPS was a large system. As the size of CPCs has grown, and the console component has grown to include all systems in a sysplex, the console component must handle volumes of messages hundreds or thousands of times higher than it was designed for. As a result, it is possible to encounter console-related problems that in some cases can impact availability.

The Console Restructure, initially delivered as a separate feature on z/OS 1.4, is intended to change the way the console works, bearing in mind these new loads. Until the Console Restructure code is installed, there are things that can be done to reduce your chances of having a console-related problem. This section contains an extensive list of things you should or should not do to minimize the risk of encountering a console problem. Ensure that there are a minimum of two MCS or SMCS consoles configured in the sysplex, and that there are no single points of failure related to these consoles. While it is possible to use the HMC as an operating system console, we strongly recommend against this. The HMC is the primary mechanism for alerting operators to hardware problems, and

80 Achieving the Highest Levels of Parallel Sysplex Availability using the HMC as a console will mask that capability. See “Using the HMC as a console” on page 79 for a more detailed discussion of using the HMC as a console. If you have a limited number of consoles, you should ensure that those consoles are attached to different CPCs, are separate physical devices, have separate power supplies, are connected to different 2074s, and so on. If using 2074s for console connectivity, you should have at least two 2074s. The maximum number of consoles (MCS, SMCS, and subsystem) in a system or a sysplex is 99. Ensure that your configuration does not exceed this limit. Verify that software products (automation, TSO, NetView, etc.) using console services are exploiting Extended MCS console support, and not subsystem consoles. EMCS consoles do not count towards the limit of 99. There are 150 additional 1-byte migration IDs available for software products that do not support the Extended MCS 4-byte console IDs. Note that 1-byte console IDs and 1-byte migration console IDs will no longer be supported in z/OS 1.7. Use the 1-byte console ID tracking facility provided in z/OS 1.4 and later to identify users of 1-byte console IDs and migration console IDs. You should convert to 4-byte console IDs wherever feasible. You should migrate to product levels that exploit the 4-byte console IDs, or encourage your software vendor to provide 4-byte console ID support. Review console configurations and, wherever possible, consolidate consoles that perform duplicate functions to eliminate unnecessary consoles. This is especially relevant when bringing an existing system (with its own set of consoles) into the sysplex. With the ability to route messages from all systems to a single console, it should be possible to reduce the total number of consoles required to manage multiple systems. Ensure that systems with attached MCS consoles are the first systems IPLed into the sysplex, and the last systems to leave the sysplex. Update your startup and shutdown IPL procedures to ensure that systems that have consoles attached are never all (intentionally) deactivated at the same time, to protect against potential “no console” conditions. Assign a unique name to all MCS and subsystem consoles. SMCS consoles always require a name. If the Console Restructure support is installed on z/OS 1.4, and for all subsequent releases, console names are required for all consoles (if the system console is not named, it will use the name defaults to the name of the system). If a system has a console that is not named, that system will not be allowed to join the sysplex. Because unnamed subsystem consoles do not get released when the owning system is IPLed, repeated system IPLs may result in the 99 console limit being exceeded. Should this happen, console slots already in use may be freed either by using the IEARELCN program supplied in SYS1.SAMPLIB and documented in z/OS MVS Planning: Operations, SA22-7601, or by the much less desirable action of performing a sysplex-wide IPL (where all systems are partitioned out of the sysplex before any system is re-IPLed back into the sysplex). Ensure the console attributes defined in the CONSOLxx member match these recommendations: If possible, use a single shared CONSOLxx member for all systems in the sysplex. Use system symbols if you need to tailor the CONSOLxx to be system-unique.

Note that if you use the same OS CONFIG and the same CONSOLxx member on all systems in a sysplex, and you do not assign a name to each console, you will end up with many more consoles than you probably expect. For example, if you have a console defined with DEVNUM(2200), and the CONFIG used by each system has a screen at device number 2200, you will get x number of consoles defined at device number 2200, where x is the number of systems in your sysplex. On the other hand, if you name your

Chapter 3. z/OS 81 consoles (as you must do when you implement the Console Restructure code), the first system to be IPLed with device 2200 online would get a console at device number 2200, a console at that address could only be online to one of the systems in the sysplex, at a time. If you actually want to have multiple consoles with the same device number (which we would not recommend), you could use a console name like &SYSNAME.2200

(assuming that &SYSNAME is four characters or less; otherwise you might choose a user-defined symbol that resolves to four characters or less, is unique on each system, and will resolve to a string that is valid as a console name). This would resolve to a different console name on each system and thereby allow you to have multiple consoles with the same device number. When operating in a multi-system sysplex, certain CONSOLxx keywords have a sysplex scope. When the first system is IPLed, the values specified on these keywords take effect for the entire sysplex. Table 3-2 summarizes the scope (system or sysplex) of each CONSOLxx keyword. If multiple CONSOLxx members are used, ensure that at least the sysplex-wide parameters are consistent across the sysplex members, to avoid confusion. For example, if one system has RLIM set to 999, then all systems should have RLIM set to 999.

Table 3-2 Scope of CONSOLxx keywords CONSOLxx statement System scope Sysplex scope

CONSOLE DEVNUM NAME UNIT ALTGRP PFKTAB AUTH USE CON SEG DEL RNUM RTME AREA UTME MSGRT MONITOR ROUTCODE LEVEL MFORM UD MSCOPE CMDSYS SYSTEM LOGON LU

INIT PFK AMRF MONITOR RLIM CMDDELIM CNGRP MPF ROUTTIME UEXIT GENERIC MMS MLIM LOGLIM NOCCGRP APPLID

82 Achieving the Highest Levels of Parallel Sysplex Availability CONSOLxx statement System scope Sysplex scope

DEFAULT LOGON RMAX HOLDMODE ROUTCODE SYNCHDEST

HARDCOPY All keywords

Define the system console (DEVNUM(SYSCONS)) explicitly in the CONSOLxx member. The HMC can still be used even if a console definition for DEVNUM(SYSCONS) does not exist, but specifying it allows you to control the attributes of that console. The system console should be defined with AUTH=MASTER—this is especially important if the system console is used as a backup to the sysplex master console. Try to avoid using MSCOPE=(*ALL), as this will generate a lot of console messages and system-to-system console traffic. If you must use MSCOPE=(*ALL), you should have implemented a high level (aim for a target of 95 percent) of message suppression to limit the volume of messages sent to any console defined in this way. Use MSCOPE=(*ALL) only on a few, or select group of consoles that really require a sysplex-wide scope. Use MSCOPE=(*) or MSCOPE with a specific subset of systems as much as possible. The objective should be to avoid having all the messages being routed to all the systems in the sysplex. Ideally, no consoles will be defined with ROUTCODE=ALL. If you do have consoles defined in this way, they should at least have an MSCOPE only pointing at one system. ROUTCODE 11 messages are intended for systems programmers, and therefore there is no need to display those messages on any consoles (they will show up in SYSLOG). For this reason, ensure that no consoles are defined to receive this ROUTCODE. Utilize alternate console groups to ensure that MCS and SMCS consoles have backups by using ALTGRP instead of ALTERNATE when defining consoles. Note that SMCS consoles only support ALTGRP. The Console Restructure code removes support for the ALTERNATE keyword on the CONSOLE statement. Similarly, the ALTCONS keyword is no longer supported on the VARY command. The ALTGRP mechanism is now the only way of defining backup consoles. The sysplex only has one MASTER (COND=M) console. The sysplex master console is assigned when the first system is IPLed into the sysplex, or, if no MCS consoles are available, then it is the first SMCS console with master authority to be activated. It is therefore possible for the sysplex master console to change depending on which system is IPLed first. Using ALTGRPs will minimize the risk of the sysplex entering into a “no master console” (IEE141A) condition. As part of defining the ALTGRPs, ensure that all MCS and SMCS consoles have backups on other systems in the sysplex. Undelivered messages may no longer be queued to a “console of last resort” or to the hardcopy message set. The Console Restructure feature removed support for the undelivered message (UD) keyword CONSOLE and HARDCOPY statements and on the VARY command.

Undelivered messages will continue to be written to the SYSLOG and/or OPERLOG. If Viewing these messages is required. You must configure a log browsing product such as SDSF to ensure your ability to view these messages. The Console Restructure feature also removed the ability for the hardcopy message to be assigned to an output-only console device by removing the DEVNUM parameter

Chapter 3. z/OS 83 from the HARDCOPY statement. The hardcopy message set can only be configured to SYSLOG or OPERLOG. The HCPYGRP parameter is also no longer supported. Define the SYNCHDEST group to provide predictability in the rare event of a synchronous (emergency, disabled) message. SMCS and EMCS consoles will be ignored if specified in the SYNCHDEST group. Define the AUTOACT group if the Console Restructure feature is installed. The AUTOACT console group controls the automatic enabling and disabling of the flow of messages to the system console. Minimize the route codes that are defined to a console to only those that are needed by the console's function. The objective is to avoid having all route codes defined to all the consoles defined to a system. Ensure that all MCS and SMCS consoles have backups on other systems in the sysplex. For every system that has a console connected to that system, at least one of those consoles should be defined with AUTH=MASTER. Ensure that all of these consoles have an ALTGRP defined. Use the following functions for operational efficiency: Implement message suppression (MPFLST) on all systems. You should suppress from display on the consoles all non-essential message traffic. Suppression rates of 95 percent and higher are typically achieved on well-managed systems. In addition to suppressing messages, you may wish to consider selectively deleting messages. Messages that have been deleted will not be displayed, will not be logged, and will not be presented to automation—so you need to think carefully about any messages that you wish to delete. Prior to the Console Restructure feature, deleting a message (using an MPF exit or a subsystem on the subsystem interface) caused the message to be marked for deletion, but the message still traversed the entire message path and occupied below-the-line message buffer storage. When the Console Restructure code is installed, deletion occurs immediately after MPF and subsystem interface (SSI) processing has occurred. Deleted messages will no longer be sent to other systems in the sysplex and will no longer occupy below-the-line message buffers, so deleting messages can now provide performance and storage benefits if the message is not really needed. The Command Prefix Facility (CPF) allows you to enter a subsystem command on any system and have the command automatically routed to the system in the sysplex that supports the command, without any need for the operator to know what system that is. Use CMDSYS to automatically route commands to a target system other than the system to which the console is attached. This reduces the need to have consoles configured on all the systems in the sysplex. The Action Message Retention Facility (AMRF) saves action messages across the sysplex in order to retrieve them later via a D R command. This list of messages is scanned every time a Delete Operator Message (DOM) is issued in the system and can impact console performance if allowed to grow too large. Verify that you are retaining only messages you need to retain. This can be controlled by the MPFLSTxx. We recommend that you do not retain any ‘E’ type (eventual) action messages. Ensure that the CONSOLE address space is assigned to the default service class of SYSTEM.

84 Achieving the Highest Levels of Parallel Sysplex Availability

Verify automation procedures to determine if modifications are required. Automated message activities may need to be modified to determine the originating system that issued the message before taking action.

3.3 Coupling Facility management

There are a number of aspects to consider in relation to the role your Coupling Facilities play in the availability of your sysplex, systems, and subsystems (and therefore your applications). The first one is to ensure the CFs are physically configured so as to provide the level of failure isolation required by your CF exploiters. This topic is discussed in detail in 2.3, “Coupling Facilities” on page 24.

The next one is to ensure that you define the CFs and structures in a manner that will help achieve the desired levels of availability and performance. We discuss this in 3.3.1, “Defining CFs and structures” on page 85.

The final aspect is how you monitor and manage your CFs. The CF provides great flexibility; for example, you can dynamically move the contents of a CF to another CF, recycle the CF, and then repopulate it, all without impacting the applications using the structures in that CF. However, to exploit this flexibility, you must understand the correct ways of interacting with the CF. This will be discussed in 3.3.5, “Structure monitoring” on page 90.

3.3.1 Defining CFs and structures Your active CFRM policy contains the definitions of the CFs and structures you will be using. In fact, you can have a number of CFRM policies, but only one will be active at a given time. If you need to make a change to your CFRM policy, you should make the change to a policy other than the current one. For example, if you are currently using policy CFRM04, you should make the change to CFRM05. This allows you to easily back out to a working policy should there be any problem with the new policy.

Defining the CFs The policy should contain all the CFs used in normal operations. You may consider having a third CF that is only used if one of the normal CFs is down (to avoid having a single point of failure during this time). In that case, careful thought must be given to the PREFLISTs to ensure that all structures end up in an appropriate CF in case of a failure of either of the “normal” CFs. While it is possible to also include any disaster recovery CFs in your policy, we recommend maintaining a separate set of CDSs that are only used in an emergency, so you may decide to have a separate disaster recovery policy that only contains those CFs.

You also need to set aside an amount of CF storage to contain structure control information should a dump be taken, and you have specified that the dump is to include structure data. The reserved storage in the CF is used to capture a copy of the structure control information as quickly as possible. After serialization on the structure is released, the dump information is sent to the operating system where it is included in the dump data set. As a rule of thumb, we recommend setting aside about 5 percent of the total amount of storage available in the CF. This value is then specified on the DUMPSPACE parameters in your CFRM policy.

Defining structures The CFRM policy will contain information about your structures, and how those structures are to be distributed across the available CFs. As such, it contains information that has a

Chapter 3. z/OS 85 fundamental impact on the availability and performance of those structures. The way you place your structures affects the load balancing across your CFs:  CF CPU utilization To deliver consistent, acceptable, response times, it is important that the CPU utilization of your CFs is fairly evenly balanced. This especially important if you are exploiting System-Managed CF Structure Duplexing. It is important to understand that all CF requests are not created equal. Just because two CFs are processing a similar number of requests does not mean that both will have similar CPU utilizations. As a rule of thumb, if a lock request consumes one unit of CF CPU, a cache request will consume two units, and a list request will consume three units. You should bear this in mind when identifying the preferred CF for each structure.  CF subchannel utilization High levels of contention on CF subchannels results in delayed requests and elongated CF response times. In order to achieve optimum performance and availability, you want to aim for the shortest response times possible within the capabilities of your configuration. Therefore, you should monitor CF subchannel utilization and consider adding more CF links or converting to peer mode links if subchannel utilization exceeds 30 percent or PATH BUSY counts in the SUBCHANNEL ACTIVITY section of the RMF COUPLING FACILITY ACTIVITY report exceed 10 percent.  CF storage utilization Of the three types of resources associated with a CF (CPU, links and subchannels, and storage), storage is the least important one to keep balanced. You must ensure, however, that all CFs contain enough storage to be able to hold all critical structures in case of a planned or unplanned CF outage. This is discussed in more detail in “CF memory” on page 35.

Bearing these resources in mind, and giving priority to any failure-isolation requirements, you should plan for the placement of your structures. If you get it wrong, or if the usage of a structure changes over time, it is very easy to fix (just update the PREFLIST in the policy and rebuild the structure), so you do not need to be overly concerned about getting everything exactly right on the first attempt.

3.3.2 Structure placement The placement of structures is very important to the availability. The placement of structures is specified by using the preference list and exclusion list parameter of the CFRM policy definition.

This takes a good deal of planning to do when spreading the structures among different CFs. If you are not using System Managed CF Structure Duplexing, which is discussed later in this chapter, you have to consider what types of CFs you are using when allocating structures—if you have two stand-alone CFs or one internal and one stand-alone, or if you have two internal IFs. When using ICF allows an installation to create a configuration that allows for double failure, that is when a sysplex loses both one ICF and an z/OS LPAR where there are one or more subsystems running. What we describe here is a recommendation for the structure placement if you have configured your sysplex with an ICF and z/OS image within the same CEC. However, if the z/OS image does not have any subsystem that is connected to structures in the ICF it will not be in the same failure domain and a double failure cannot occur.

The WSC document W98029 discusses the structure placement: http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/W98029

86 Achieving the Highest Levels of Parallel Sysplex Availability For a discussion of Failure Independence, please refer to Coupling Facility Configuration Options, GF22-5042.

3.3.3 Structure rebuild considerations

To be able to do a rebuild of a structure the following criteria has to be met:  At least one active connection to the original structure instance  Agreement of all the connected instances of the exploiter code on rebuilding the structure (that is, all of them have connected with rebuild allowed)  For the rebuild process to be considered as a viable means of recovery or maintenance: – There must enough physical resources available in terms of coupling facilities defined in the active preference list and their processor storage to allow the creation of the new (and temporarily duplicate) instance of the structure. – All the potential and active exploiters of the structure must have connectivity to the new instance of the structure. When repopulating a CF, always use the POPULATECF parameter of the SETXCF START,REBUILD command. This rebuilds the structures serially, thereby having less impact on performance. Also, POPULATECF will move XCF structures if they need to be moved—using SETXCF START,REBUILD with LOC=NORMAL will not move XCF structures.

3.3.4 Structure duplexing Structure duplexing involves maintaining a duplicate copy of a structure in another CF that may be used in the event of an unplanned outage.

It is specified through the DUPLEX parameter on the STRUCTURE statement in the CFRM policy, which can have the following values:  DISABLED Neither user-managed nor system-managed structure duplex rebuild can be started for the specified structure. This is the default.  ALLOWED An application can initiate its own user-managed or system-managed structure duplex rebuild for the specified structure, or it can be requested via the SETXCF START,REBUILD,DUPLEX operator command. No attempt is made by the system to restart duplexing should it be stopped for any reason.  ENABLED The system will initiate a user-managed or system-managed structure duplex rebuild for the specified structure. In addition, the system will initiate a restart of duplexing should it be stopped for any reason. To stop structure duplexing requires a CFRM policy change to one of the other options.

User-managed structure duplexing

User-managed structure duplexing was introduced with OS/390 1.3 with APAR OW28460 and is valid for cache structures only. List and lock structures are not supported. It was developed to address the shortcomings associated with user-managed structure rebuild where it is impossible or impractical for the structure exploiters to reconstruct the structure data when it is lost as a result of failure.

Chapter 3. z/OS 87 With user-managed structure duplexing, the structure exploiter can build and maintain a duplexed copy of the data in advance of any failure, and simply switch over to using the unaffected structure instance if and when a failure occurs. User-managed structure duplexing requires significant exploiter support to both create the duplex structure, and then to manage the duplexing process.

When requested by the application, by operator command, or automatically if DUPLEX(ENABLED) is specified, z/OS coordinates the duplex structure rebuild process with all of the active connected exploiters of the structure. Each of the exploiters participates in the steps of allocating the new structure instance, propagating the necessary structure data to the new structure, and then keeping both structure instances allocated indefinitely.

Once the duplex structure has been established, the connected exploiters must then duplex their ongoing structure updates into both structure instances, using their own unique serialization or other protocols for ensuring synchronization of the data in the two structure instances.

Stopping the structure duplexing via the SETXCF STOP,REBUILD,DUPLEX command requires the KEEP option to be specified to determine which structure instance is to be retained. KEEP=NEW specifies that processing should switch to the new structure instance, while KEEP=OLD retains the original structure instance. If the CFRM policy specifies DUPLEX(ENABLED) for the structure then the system will attempt to initiate duplexing again, so a change is required to the CFRM policy if this is not required.

The only IBM structures that can use user-managed structure duplexing are DB2 Global Buffer Pools.

System-managed structure duplexing System-managed structure duplexing is a general-purpose, easy-to-exploit mechanism for duplexing CF structure data. It provides a robust recovery mechanism through rapid failover to the unaffected structure instance of the duplex pair with very little disruption to the ongoing execution of work by the structure exploiter.

System-managed structure duplexing was made available in z/OS 1.2 with APAR OW41617. HCD APAR OW45976 is also required in order to allow sender CHPIDs to be defined in a CF partition.

In addition to the software, at least two CFs are required with CFLEVEL=11 or higher installed, and CF-to-CF links installed between each CF that will participate in a system-managed structure duplex pair as follows:  The IBM zSeries 900 and z990 CFs are able to support a single bidirectional link; however, multiple bidirectional links are recommended for the highest availability.  The S/390 G5, G6, or R06 CFs require a sender-to-receiver link in each direction to provide the bidirectional connectivity. Again, multiple links in each direction are recommended.

System-managed structure duplexing works by z/OS splitting a single exploiter’s request into two distinct CF requests, one destined for each of the two structure instances. Once the two commands arrive at their respective CFs, they need to coordinate their execution so that the update to the duplexed structure is synchronized between the two structure instances. To do this, the CFs exchange signals over CF-to-CF links. Once both the CFs have exchanged all the required signals and completed execution, they each return their individual command responses to the z/OS system that originated them. z/OS inspects the responses, validates

88 Achieving the Highest Levels of Parallel Sysplex Availability that the results are consistent between the two CFs, and then merges the results of the operations into a single consolidated response to the exploiter.

System-managed structure duplexing was implemented to provide robust recovery capability for structures whose exploiters do not support user-managed structure duplexing and/or user managed structure rebuild. It is designed to handle structure failures, CF failures, or losses of CF connectivity through:  Masking the observed failure condition from the active connectors to the structure, so that they do not perform any unnecessary recovery actions.  Switching over to the structure instance that did not experience the failure.  Re-establishing a new duplex copy of the structure, if appropriate, as the CF becomes available again, or in a third CF if available.

Failures are largely transparent to the structure exploiters; however, z/OS does provide a notification of transitions into and out of the duplexed state so that the exploiter may react in other ways. For example, the exploiter may maintain another backup mechanism when the structure is operating in simplex state.

To implement system-managed structure duplexing, a new set of CFRM CDSs are required to be formatted with the ITEM NAME(SMDUPLEX) NUMBER(1) parameter specified. Also, new LOGR CDSs must be formatted with the SMDUPLEX(1) parameter if system-managed structure duplexing is to be used for a System Logger CF structure. Note that once the new CFRM and LOGR CDSs have been implemented it is not possible to fall back nondisruptively to a down level CDS that is not system-managed structure duplex capable. Doing so requires a sysplex-wide IPL from the down fall CDSs.

While system-managed structure duplexing is a robust fail-over solution, IBM does not recommend its use in all situations due to the performance cost as a result of:  Increased z/OS CPU utilization  Increased CF CPU utilization  Increased CF link utilization

As a result, a cost/benefit analysis needs to be done for each structure type as a part of the implementation process. Also, not all structure types support system-managed structure duplexing, including:  Allocation Shared Tape (IEFAUTOS)  Enhanced Catalog Sharing  DB2 Global Buffer Pools (support user-managed structure duplexing instead)  GRS Star  IMS VSAM cache  RACF shared databases  DFSMS RLS Cache  XCF signalling

Also, in a GDPS/PPRC multi-site configuration, system-managed structure duplexing should not be used to duplex CF structures between CFs located in different sites. The distance effect for CF-to-CF communication is highly undesirable, and the CF structure data is not preserved in GDPS site fail over situations in any event.

For further information, refer to the System-Managed CF Structure Duplexing technical paper at: http://www.ibm.com/servers/eserver/zseries/library/techpapers/gm130103.html

Chapter 3. z/OS 89 3.3.5 Structure monitoring Make use of Structure Full Monitoring and have it automated so it can alert the appropriate personal that there has been a problem with structure size. This was first introduced in OS/390 2.9.

Enable AutoAlter for the recommended structures. Starting with OS/390 2.10, you can specify whether you want the system to automatically alter a structure when it reaches an installation-defined or defaulted-to percent full threshold as determined by Structure Full Monitoring. When activating a new CFRM policy structure might end up in a POLICY CHANGE PENDING. Check regularly if there are any such situation if there is; rebuild those structures at a convenient time.

3.3.6 Structure recommendations Apart from the placement of the structures, there are a number of other recommendations that should be kept in mind when defining your structures in the policy. Every structure should have at least two CFs in the PREFLIST. Ensure that each structure has an adequate initial size (INITSIZE). Use the CFSizer to get a “ball park” figure for each structure size, or to help find out if your structures have a size that is grossly different to that indicated by CFSizer. The CFSizer can be accessed at: http://www.ibm.com/servers/eserver/zseries/pso/tools.html On a regular basis, check the RMF CF reports for the peak period to ensure that each structure is large enough. Especially ensure that the XCF signalling structures are large enough. Nearly every system component or subsystem uses XCF services, so delays within XCF can impact many components across the sysplex. The IBM Redbook S/390 Parallel Sysplex Performance, SG24-4356, will help you interpret the RMF CF reports. It is also wise to enable Structure Full Monitoring. This will issue a warning message should the associated structure exceed the threshold specified on the FULLTHRESHOLD keyword specified for the structure in the CFRM policy. For more information about Structure Full Monitoring, refer to the section entitled “Monitoring Structure Utilization” in z/OS MVS Setting Up a Sysplex, SA22-7625. Do not specify a SIZE value that is very much larger than the INITSIZE. Figure 3-2 on page 91 shows how control information used by the CF is allocated in each structure. When a structure is allocated, the CF first reserves whatever amount of space is required within the structure to contain the control blocks that will be used by the CFCC. One of the things that impacts the size and number of control blocks is the maximum size of the structure (as specified on the SIZE statement). So, if the SIZE is very much larger than the structure is likely to grow to, you are wasting storage in the structure, and decreasing the amount of storage left for the connector’s use. A reasonable rule of thumb is that SIZE should not be more than twice the INITSIZE. And remember, if it transpires that you want to make the structure larger than this value, all you have to do is update the CFRM policy with larger INITSIZE and SIZE values. New CF levels often result in an increase in the size of the CFCC control blocks to support new functions (such as 64-bit support) delivered in the new level. As we just discussed, the CFCC control blocks are allocated out of the storage you specify for the structure, and reduce the amount of space available for the connector’s use. If the same structure were to be allocated in two different CF levels, there would normally be more space taken up by control blocks in the higher CF level. This is shown in Figure 3-2 on page 91. For this reason, it is important to always review structure sizes when moving to a new CF level. For more information, and a suggested procedure, refer to the Technote entitled

90 Achieving the Highest Levels of Parallel Sysplex Availability Determining Structure Size Impact of CF Level Changes, available on the IBM Redbooks Web site.

1) IXLCONN is issued, specifying structure name, INITSIZE, SIZE, Entry to Element ratio, number of EMCs, and various other information 2) Target CF determines how much space it needs for control blocks, based on the IXLCONN parameters and the CFLevel 3) Structure is allocated, and space for control blocks is removed from the space specified on the INITSIZE

INITSIZE=10M INITSIZE=10M

5000 Elements 4166 Elements 416 Entries 6M 500 Entries 5M

CFCC Control CFCC Control Information 4M Information 5M

CFLevel = 10 CFLevel = 12 Figure 3-2 CF Structure storage use

If storage in the CF becomes over-committed (over 90 percent), XES will start to reduce the size of structures in order to free up storage. There is no way to turn off this process; however, you can protect important structures from having storage stolen by specifying a MINSIZE for those structures. This is documented in the section entitled “Allowing a Structure to Be Altered Automatically” in z/OS MVS Setting Up a Sysplex, SA22-7625. If the storage within a structure becomes constrained, maybe because use of the structure has increased but the structure size has not been increased accordingly, the users of that structure may suffer a performance degradation. One way to protect against that is to enable Auto Alter for that structure via the ALLOWAUTOALT keyword in the CFRM policy. Auto Alter will automatically increase the size of a structure, up to the SIZE value, when storage use within the structure exceeds the FULLTHRESHOLD value. This can protect your performance; however, because the change is dynamic and automatic, the INITSIZE specified in your CFRM policy will no longer reflect the actual structure size. The next time the structure is deleted then allocated again, it will once again be allocated at INITSIZE, and will have to go through Auto Alter processing again to bring it back to the current “happy” size. To protect against this, you should put automation in place so that any time a structure is altered, the person responsible for maintaining the CFRM policy is automatically notified so he can make a corresponding adjustment to the structure’s INITSIZE and SIZE values in the CFRM policy. Message IXC588I is issued every time an Auto Alter is started, and contains the current structure size and the target size. There are some structures that should not co-reside in the same CF, mainly for single-point-of-failure reasons. For those structures, you should use the exclusion list in the CFRM policy to identify the structures that should not normally be in the same

Chapter 3. z/OS 91 CF. Examples would be XCF structures that are used for the same transport class, the structures for the RACF primary and backup databases, and the normal and alternate JES2 Checkpoint structures (if the NEWCKPT1 or CKPT2 checkpoints are CF structures). If you have a small number of extremely busy structures, you might specify those structures on each other’s exclusion lists, to ensure that one CF does not end up containing all the busiest structures. Never specify a REBUILDPERCENT value other than 1. Ideally, do not specify REBUILDPERCENT at all, and allow it to default to 1. Ensure that System-Managed Rebuild is enabled. To enable System-Managed Rebuild, you must be running OS/390 2.8 or later in all systems in the sysplex, CF level 8 in all CFs in the sysplex, and you must format the CFRM CDS with either the SMREBLD or SMDUPLEX keywords. See “System-managed rebuild” on page 32 for more information about System-Managed Rebuild.

3.4 CF operations

Operation of a Coupling Facility (CF) is covered in detail in 2.3, “Coupling Facilities” on page 24. You must carefully plan which of your structures must be failure isolated from the systems they are connected to. For example, extended outages will result if a non-duplexed structure used by DB2 data sharing is lost at the same time as the DB2 subsystem using the data in that structure. See the discussion on failure isolation in 2.3.2, “Failure isolation” on page 27. As discussed in “Dedicated or shared engines” on page 25, we always recommend the use of dedicated CPs for coupling facilities. Note that if you must share an engine, at least make sure that no lock structures are in the CF with the shared engine. See zSeries 890 and 990 Processor Resource/Systems Manager Planning Guide, SB10-7036, for more discussion on this.

3.5 IBM Health Checker for z/OS and Sysplex

The IBM Health Checker for z/OS and Sysplex is a tool that checks the current, active z/OS and sysplex settings and definitions for a system and compares their values to those either suggested by IBM or defined by you, as your criteria.

The objective of the IBM Health Checker for z/OS and Sysplex is to:  Identify potential problems before they impact your availability or, in worst cases, cause outages. These checks are based upon IBM analysis of common problems reported through the IBM Support Center.  Provide the latest guidance from IBM regarding system defaults and recommendations and advise customers when recommendations change.  Avoid situations where no one knows or can remember why a default was overridden. The IBM Health Checker for z/OS and Sysplex is easy to install and run. It is available as a free Web download from: http://www.ibm.com/servers/eserver/zseries/zos/downloads/

92 Achieving the Highest Levels of Parallel Sysplex Availability Even though the Health Checker is free, you are required to register prior to the download. This allows for automatic notifications to be sent whenever an update is available. When you are informed of a new level of the Health Checker, we recommend that you download it as soon as possible to ensure that you get the benefits of IBM’s latest experiences.

This tool will be updated periodically with additional functions and runs on OS/390 2.10 and all z/OS releases.

Tip: At least half the value of the Health Checker is that it will inform you should something change in your environment so that it no longer matches the IBM guidelines or your own overrides to those guidelines. Therefore, we strongly encourage you to run it on a regular basis. Once you get it customized to your environment, it should always run with return code 0 unless something has changed. Therefore, you only need to review the output in case of a non-zero return code.

3.5.1 Health Checker description The Health Checker should be run daily, as a number of the following checks are for dynamic resources whose status may change within the life of an IPL.

Where the Health Checker detects a conflict, messages are written to the z/OS console, allowing for automated response if applicable.

Automove setup verification In a sysplex using Shared HFS (refer to “Shared HFS” on page 149), verify that no AUTOMOVE file systems are mounted under a NOAUTOMOVE or UNMOUNT file system. If you do, the file system defined as AUTOMOVE will not be available until the failing system is restarted. This is because a sysplex file system that changes ownership as a result of the system failure will only be accessible if its mount point is also accessible, which will not be the case if NOAUTOMOVE or UNMOUNT is specified. The AUTOMOVE/NOAUTOMOVE/UNMOUNT parameters on ROOT and MOUNT statements indicate what happens to the file system if the system that owns that file system is partitioned out of the sysplex. The AUTOMOVE parameter (the default) specifies that the file system is automatically moved to another system; NOAUTOMOVE prevents the move, and leaves the file system inaccessible. The UNMOUNT option specifies that the file system will be unmounted when the system leaves the sysplex.

Available Frame Queue threshold Check if the MCCAFCTH value is hard coded in the IEAOPTxx PARMLIB member (that is, not defaulted), and if so verifies that the values specified are sufficient for 64-bit mode of operation to avoid situations where the system does not start to reclaim storage frames soon enough. MCCAFCTH specifies the low and OK threshold values for central storage, while MCCAECTH specifies the low and OK threshold values for expanded storage. The low value is when Real Storage Manager (RSM) begins frame stealing and the OK value is when RSM ceases frame stealing.

– For 31-bit mode, the defaults are MCCAFCTH=(50,100) and MCCAECTH=(150,300). – For 64-bit mode, after the installation of APARS OW55902 and OW55729, the defaults are MCCAFCTH=(400,600), and MCCAECTH is not used. IBM recommends usage of the default values or higher.

Chapter 3. z/OS 93

If these parameters are left to default then there should be no problem; however, if they were explicitly specified for 31-bit mode then they may need either modification or removal to meet 64-bit minimum recommendations.

Couple Data Set separation Checks that the primary sysplex, CFRM, and LOGR CDSs are not on the same volume due to the high I/O activity that can be associated with each of these data sets Checks that the primary and alternate CDSs for all CDS types are not on the same volume for single-point-of-failure purposes Checks that each primary CDS has an active alternate for single-point-of-failure purposes

Coupling Facility Structure attributes and location Check CF operating status and volatility status for each defined CF. Check the CF structure location of each allocated structure for current placement in comparison to the preference list as specified in the CFRM policy.

XCF signalling Performs the following checks in relation to XCF signalling parameters and pathing environment with the view to avoiding a single point of failure: Verifies that all transport classes are set up to service the pseudo-group name UNDESIG, which will allow any XCF message to use any transport class Verifies that all transport classes are assigned to at least one pathout Verifies that at least one transport class is defined with a large class length Verifies that multiple pathout/pathin pairs are operational for each system in the sysplex connected to the current system Checks that a MAXMSG value of a minimum size is defined for each transport class Verifies that each pathin can support a minimum number of messages from the sending system

XCF structure location Performs the following checks in relation to the XCF signalling structures with the view to avoiding a single point of failure: Where multiple XCF signalling structures are in use, verify that they are in different Coupling Facilities. Verifies a minimum of two online operational links to each CF. Verifies that there are not fewer operational links (this is, CHPIDs) than there are active links.

Sysplex consoles Performs the following checks in relation to sysplex consoles: Verifies that all MCS, SMCS, and subsystem consoles are assigned names to assist in reducing the number of console IDs to address the limit of 99 consoles per sysplex. Verifies that each sysplex console is using alternate groups (ALTGRP parameter) rather than alternate consoles (ALTERNATE parameter), which increases availability in console failure situations. Verifies that each sysplex system has a console with master authority that has been defined with command association.

94 Achieving the Highest Levels of Parallel Sysplex Availability

Verifies that the sysplex master console is active within the sysplex. Verifies console MSCOPE and ROUTCODE settings to ensure that consoles with multisystem MSCOPE have limited ROUTCODE settings, or conversely consoles with ROUTCODE(ALL) have limited MSCOPE settings. Consoles configured with both multisystem MSCOPE and ROUTCODE(ALL) may impact sysplex performance through console traffic volumes, which may result in WTO(R) buffer shortages. Verifies AMRF and MPF settings to keep the number of retained eventual action messages to a minimum. Verifies that consoles are not set to receive ROUTCODE(11), which are programmer informational messages to prevent unnecessary console traffic.

Extended Master Consoles (EMCS) checks EMCS consoles may be used by applications to access messages and send commands. Once defined, these consoles remain for the life of the sysplex and therefore numbers accumulate until the next sysplex initialization. If the numbers become very large through normal accumulation or through programming error, the IPL process is impacted as console initialization during NIP and may take considerable time. Verify that consoles do not have both multisystem MSCOPE and ROUTCODE(ALL) attributes. Setting both of these attributes can impact sysplex performance through console traffic volumes, which may result in WTO(R) buffer shortages. Verify that EMCS consoles with master authority are not receiving hardcopy messages and are not defined as backup devices for the hardcopy medium. Verify that the number of active and inactive EMCS consoles is not excessive, which elongates NIP during IPL. z/OS system console (SYSCONS) checks The z/OS system console (SYSCONS) is used as a backup for recovery purposes and can also be used to IPL a system or, if necessary, run the sysplex. Verify the z/OS system console has a local MSCOPE to minimize message traffic, especially during recovery actions. Verify the z/OS system console has minimal route codes defined (that is, either (1,2,10) or NONE) to minimize message traffic, especially during recovery actions. Verify that the z/OS system console is not running in problem determination mode (that is, V CN,ACTIVATE) during normal operations, as this mode degrades performance.

Real Storage checks If there is no requirement to run V=R jobs then check that REAL=0 is specified in the IEASYSxx PARMLIB member to increase performance.

Reconfigurable storage checks If there is no requirement to reconfigure storage then check that RSU=0 is specified in the IEASYSxx PARMLIB member to increase performance.

Virtual Storage checks The virtual storage map is established during IPL based upon specification of system parameters for CSA and SQA, and the sizes of modules in LPA and NUCLEUS. Unexpected changes in the virtual storage map (through maintenance or modification of the LPALST) such as reduction of the private area can cause system component or application failure and will require a system outage to resolve.

Chapter 3. z/OS 95

Check that the user-supplied minimum values for CSA, SQA, and private are being met. Check that threshold usage values for CSA and SQA have not been exceeded. Check the current sizes of CSA and private with equivalent values from the previous IPL.

XCF cleanup value check The XCF CLEANUP value specifies the maximum amount of time a system that is being removed from a sysplex waits between notifying XCF exploiters to perform cleanup processing and loading a non-restartable wait state. Check that the CLEANUP value in the COUPLExx PARMLIB member has the recommended value of 15.

Sysplex failure detection interval check The sysplex failure detection interval (specified by the INTERVAL parameter in the COUPLExx PARMLIB member) is the amount of elapsed time at which XCF on another system is to wait before initiating system failure processing for a system that has not updated its status.

This value must be coordinated with the spin recovery time as specified by the SPINTIME parameter in the EXSPATxx PARMLIB member to prevent spin loops causing sysplex failure processing from partitioning the spinning system from the sysplex. Check that the IBM formula of INTERVAL should be greater than (2*SPINTIME)+5 is being met. Note that SPINTIME settings depend upon physical hardware configuration as follows: – A system in Basic mode or in LPAR mode with dedicated CPs should use 10 seconds. – A system in LPAR mode with shared CPs should use 40 seconds.

Sysplex Failure Management (SFM) check Check that SFM has defined actions for: XCF signalling connectivity failures System failures as indicated by a status update missing condition PR/SM environment reconfiguration settings

SVC Dump parameter check SVC dump data sets may be predefined static data sets or dynamically allocated when required. IBM recommends dynamic dump data set usage to prevent SVC dump data set full conditions, which may lead partial dumps when data sets are undersized, or uncaptured dumps if all defined data sets are full. Check that SVC dump parameters have been defined for dynamic usage.

Global Resource Serialization (GRS) STAR configuration check IBM recommends usage of GRS STAR for reasons of availability, real storage consumption, processing capacity, and response time benefits. Check that GRS STAR configuration is being used.

3.5.2 IBM Health Checker recommendations Use the IBM Health Checker daily on all systems to ensure all coded availability checks are being met, and action those that are not accordingly.

96 Achieving the Highest Levels of Parallel Sysplex Availability 3.6 z/OS msys for Operations

msys for Operations is a subset of System Automation for OS/390 V2 that provides automation of critical events that can have an impact on system or sysplex availability. It also provides a full-screen interface to simplify sysplex operations. It is a base element of z/OS, and comes automatically on all z/OS 1.2 or later ServerPacs (that is, it does not have to be explicitly ordered).

The automation actions that msys for Operations provides are based on IBM’s analysis of customer outages and the root causes of those outages. Any function that it performs is seen as an area where there were multiple failures due to complexity or lack of automation.

One of the benefits of using msys for Operations is that the automation is designed based on IBM’s experience across many customers, and is designed with the involvement of IBM’s sysplex architects and developers. However, even if you do not use the automation functions of msys for Operations, you should study the events that it automates and ensure that you have equivalent automation for those events.

msys for Operations is also available for OS/390 2.10 and z/OS 1.1 as a free Web download from: http://www.ibm.com/servers/eserver/zseries/zos/downloads/

3.6.1 Automated Recovery Actions A number of recovery actions can be completely automated. These include:  Creating or re-creating missing alternate CDSs (CDSs)  Expanding the CDSs for the System Logger in case of a directory shortage  Management of System Log offload and failure situations  Long-running ENQ management to prevent resource lockout  Management of WTO(R) buffer shortages  Sysplex partitioning  Management of Auxiliary Storage shortages

Each automatic recovery action can be enabled or disabled separately and a number of the recovery actions may be customized. By default all supplied actions are disabled and require enablement and possibly customization through editing a special data set member should their function be required.

Alternate Couple Data Set management When a primary CDS fails, XCF makes the alternate the primary to support continued operation. After the switch, however, an alternate no longer exists; therefore the current primary is now a single point of failure for the sysplex as a whole.

msys for Operations provides a recovery mechanism that tries to ensure that an alternate CDS is always available through creation of a new alternate in the following situations:  During initialization of msys for Operations, a check is made to ensure that an alternate exists for every primary defined. If an alternate is not found then msys for Operations automatically creates it.  At run time, msys for Operations automatically creates a new alternates whenever the current alternate has been removed or switched to a primary.

Chapter 3. z/OS 97 System Logger Couple Data Set management Applications that use the System Logger write their log data into logstreams, which are usually associated with a CF structure in a Parallel Sysplex environment.

When data is written to the logstream it is first stored in either a CF structure (for a CF logstream) or a local buffer (for a DASD-only logstream). When the HIGHOFFLOAD percentage has been reached, the data is then off-loaded into a log data set automatically allocated by the System Logger, and when this fills up another log data set is automatically allocated, etc.

The LOGR CDS contains control information for the System Logger, which is specified when the CDS is formatted. In addition, it also maintains a directory of the log data sets for every logstream defined, which is updated every time a new log data set is created.

There are two potential error situations for which msys for Operations provides automatic recovery actions:  Resizing the LOGR CDSs The directory that the System Logger uses to maintain information about the logstream data sets is divided into directory extents, each of which can hold a maximum of 168 log data set entries. The number of directory extents is specified via the DSEXTENT parameter when the LOGR CDSs are formatted. When all available directory extents have been used, the System Logger can no longer allocate new log data sets and therefore cannot offload log data, which will impact the logging application. msys for Operations automatically reformats both the primary and alternate LOGR CDSs with an increased DSEXTENT parameter whenever the System Logger reports a directory shortage, thus avoiding potential application impact.  Reporting incorrect VSAM share options The System Logger is a sysplex-wide facility that may be used by applications on different systems to write log data to the same logstream. Because the log data sets are VSAM, it is important that they are defined with SHAREOPTIONS(3,3) if they are to be shared across multiple systems. When logstream data sets are managed by Storage Management Subsystem (SMS), it is possible that the data class used during the log data set allocation specifies SHAREOPTIONS(1,3), which does not support cross-system data set update and the logstream offload may subsequently fail. msys for Operations checks the share options of the log data sets on a daily basis and if incorrect share options are detected then an operator alert is generated.

System Log management msys for Operations automatically tracks various system messages associated with the system log and may be used after customization for day-to-day system log offload management, as well as system log recovery if required. Messages tracked include:  IEE037D LOG NOT ACTIVE, in which case a WRITELOG START command is issued.  IEE041I THE SYSTEM LOG IS NOW ACTIVE - MAY BE VARIED AS HARDCOPY LOG, in which case a V SYSLOG,HARDCPY command is issued.

 IEE043I A SYSTEM LOG DATASET HAS BEEN QUEUED TO SYSOUT CLASS class message may be used after customization as a trigger for a system log offload procedure to be activated.  IEE533I SYSTEM LOG INITIALIZATION HAS FAILED may be used after customization to provide an operator alert or take recovery action as required.

98 Achieving the Highest Levels of Parallel Sysplex Availability  IEE769E SYSTEM ERROR IN SYSTEM LOG may be used after customization to provide an operator alert or take recovery action as required.

WTO(R) buffer shortage management WTO(R) buffer shortages can lead to situations where operator commands can no longer be processed, which may impact the ongoing operation of the sysplex. msys for Operations will automatically attempt recovery from WTO(R) buffer shortages by using, in order, the following recovery methods:  Extend the buffer and modify the console characteristics, if applicable.  Cancel the jobs that issue WTO(R)s.

Long-running ENQ management Long-running jobs may hold resources as evidenced by ENQs while other jobs are waiting for access to these resources. This is a potential error situation, as it could represent a looping job, or possibly a job with insufficient dispatching priority that may, depending upon the importance of the ENQueued resource, begin to impact other applications, or potentially impact the sysplex as a whole. msys for Operations may be customized to handle long-running ENQ situations by specification of the ENQs to be monitored, an ENQ time limit, and a recovery action to be performed that may be:  Cancel or keep jobs that are holding the resource.  Dump then cancel the jobs that are holding the resource.

Pending I/O management during sysplex partitioning Under certain circumstances, Sysplex Failure Management (SFM) cannot complete the hardware isolation of a failed system and defers resetting the system image until manual operator intervention occurs.

Message IXC102A tells the operator to manually reset the hardware and then reply DOWN, after which SFM safely partitions the system out of the sysplex. When the operator performs the system reset function, an I/O reset occurs that clears the channel subsystem for the failed system of any pending I/Os that may have associated reserves and contingent allegiances.

As the failing system could potentially be holding a number of critical system resources, it is important that there is minimal delay between the system failure and the system reset/reply DOWN recovery action, else a sysplex-wide failure could occur. msys for Operations will automatically manage this recovery using the following two-step process: 1. Clear any pending I/O operations in the channel subsystem of the failing system by sending a hardware command to the relevant support element. Note that this requires msys for Operations to have a cross-reference of which systems are running on which hardware, as the system that issued the IXC102A message may not have access to the relevant support element; therefore is reliant on routing the hardware command to another system that does.

2. Automatically replying DOWN once the I/O reset is complete.

Auxiliary storage shortage management Auxiliary storage shortage has the potential to cause a system outage. msys for Operations may be customized to prevent page data set exhaustion by:

Chapter 3. z/OS 99  Dynamically adding spare local page data sets when needed. The spare local page data sets may be pre allocated or dynamically allocated by msys for Operations when needed; however, dynamic allocation is not recommended, as formatting new local page data sets is time consuming, which may impact the recovery time. This facility requires a PAGTOTL parameter in the IEASYSxx PARMLIB member to be set such that new local page data sets can be added to the current local page data set configuration.  If no extra local page data sets can be added, msys for Operations reviews which address spaces are causing the local page data set exhaustion and cancels them accordingly.

3.6.2 Sysplex operation msys for Operations may also be used to perform day-to-day management of various sysplex components through the usage of two commands: INGPLEX and INGCF.

Both of these commands may be entered from either an z/OS console and from a msys for Operations VTAM session. In addition, both of these commands support full-screen mode and line mode formats. However, line mode, as per the z/OS console, is restricted to display commands only.

INGCF command The INGCF command supports all of the msys for Operations functions that relate to Coupling Facilities (CFs).

Manual management of CFs is prone to error as it often involves complex command sequences that must be performed in the required order, and repeated on multiple systems. INGCF allows you to use function keys to perform appropriate command sequences based upon information displayed (for example, draining a CF or reintegrating a CF into the sysplex), and this can be done sysplex-wide with a single keystroke.

The INGCF command associates a status with every CF, and a condition with every structure (instance) that is allocated on the target CF. The structure condition is influenced by the release level of the system that allocated the structure. As a result, the INGCF functions are able to use the CF state and the structure conditions to determine which action can be performed in any given situation, and to enforce the correct sequence of actions for complex tasks like draining and restoring a CF.

Functions provided by INGCF include:  INGCF DRAIN Used to remove the target CF from the sysplex by following a step-by-step process where the results of each step are displayed, allowing the user the option of proceeding with the next step or exiting if required. The step order is: a. Rebuild structures in an alternate CF. Structures may be user-managed where the active connectors are responsible for reconstructing the data and, from OS/390 2.8 onwards, system-managed where Cross-system Extended Services® (XES) is responsible for transferring the data to the new structure instance. As a result, the scope of the rebuild action depends partly on the release level of the systems from which the structures were allocated: • Structures that were allocated from a system with OS/390 2.7 or below can only be rebuilt if they have at least one active connector and all the active connectors support user-managed rebuild.

100 Achieving the Highest Levels of Parallel Sysplex Availability • Structures that were allocated from a system with OS/390 2.8 or later can be rebuilt if they have an active connector and support either user-managed or system-managed rebuild, or if they have no active connector. Note that INGCF DRAIN rebuilds structures one at a time rather than concurrently, and always on a CF that is different from the target CF. b. Force the deallocation of structures that have no active connectors and could not be rebuilt. c. Disconnect the target CF from the systems to which it is connected. d. Inactivate the target CF.  INGCF ENABLE Used to integrate a new CF into the sysplex, or reintegrate an existing CF into the sysplex that has previously been removed by performing the following sequence of tasks: a. Activate the target CF. b. Activate the sender paths to the target CF from all systems of the sysplex. c. Switch to another CFRM policy if required for definition of a new CF. d. Populate the target CF by rebuilding all structures whose preference list starts with the target CF and are not otherwise excluded. As per the INGCF DRAIN command, a step-by-step order is enforced to ensure that the prerequisite steps have completed successfully across the sysplex before the next step is attempted.  INGCF PATH Used to set the sender paths either ONLINE or OFFLINE.  INGCF STRUCTURE Used to display information or modify status of the allocated structures within a CF as follows: – Display detailed structure information. – Perform a structure rebuild on an alternate CF. – Force the deletion of the structure. – Start and stop structure duplexing.

INGPLEX command The INGPLEX command supports all of the msys for Operations functions that relate to the sysplex.

Functions provided by INGPLEX include:  INGPLEX CDS Used to manage CDSs by providing the following facilities: – Replacement of the current alternate CDS for the selected CDS type either through automatic allocation (assuming customizing of spare volume information has previously been performed) or activation of an existing data set. – Display information about the channel paths for the selected CDS type. – Display CDS information including formatting parameters and policies if applicable. If the CDS of the selected type contains policies, the following additional actions are provided: •Display policy

Chapter 3. z/OS 101 Display details about the selected policy. •Start policy Activate the selected policy after confirmation. – Switch the alternate CDS to be the primary CDS. Note that once this has been completed there will no longer be an alternate CDS for the CDS type selected; therefore, msys for Operations automatically invokes alternate CDS replacement processing, either through automatic allocation or activation of an existing data set as specified on the confirmation panel.  INGPLEX CF This is equivalent to the INGCF command, as previously described.  INGPLEX CONSOLE Used to display information about the sysplex console environment including: – The name of the master console – The WTO and WTOR buffer utilization – The number of queued messages (replies) of various types – Each console and information relating to that console  INGPLEX DUMP Used to display or modify the default ABEND dump options for each or all of the sysplex systems from a single location.  INGPLEX IPL Used to display and compare the IPL information of a system. The IPL recording function allows you to identify parameters that were changed since the last IPL, as well as compare different IPL scenarios.  INGPLEX SDUMP Used to display or modify the default SDUMP dump options for each or all of the sysplex systems from a single location.  INGPLEX SLIP Used to view, enable, disable, and delete all SLIP traps of all systems in the sysplex.  INGPLEX SVCDUMP Used to issue a multisystem dump of up to 15 address spaces on a single sysplex system, including individual selection of their related data spaces and CF structures.  INGPLEX SYSTEM Used to display the target sysplex name, its GRS mode, and information for each member system including status, monitor timestamp, interval, and SFM settings. Commands may be entered against each member system to perform the following functions: – Display the ONLINE or OFFLINE status of one or more processors and any vector facilities or ICRFs attached to those processors. – Display timer synchronization mode and ETR ports. – Display IPL information. – Display IOS-related configuration information. – Display the amounts of central and expanded storage assigned and available. – Display information about the XCF signalling paths via CTC devices.

102 Achieving the Highest Levels of Parallel Sysplex Availability – Display detailed signalling path information for all CF structures.

3.6.3 z/OS msys for Operations recommendations Use msys for Operations to manage your sysplex specific automation by customization and exploitation of the supplied automated recovery actions. Use the INGCF command for sysplex-wide single point of management for the addition and removal of CFs, CF paths, and CF structures. Use the INGPLEX command for sysplex-wide single point of management for other sysplex-related resources.

3.7 Sysplex Failure Management (SFM)

SFM allows you to define a sysplex-wide policy that specifies the actions that z/OS is to take when certain failures occur. A number of situations might occur during the operation of a sysplex when one or more systems need to be removed so that the remaining sysplex members can continue to do work.

The goal of SFM is to allow these reconfiguration decisions to be made and carried out with little or no operator involvement by defining automated responses for:  Signaling connectivity failures  System failures as indicated by status update missing conditions  Reconfiguring systems in a PR/SM environment

If the sysplex contains a CF then signal connectivity failures and system isolation can be fully automated; otherwise operator involvement is required. System isolation consists of termination of I/O and CF accesses, resetting channel paths, then loading a non-restartable wait state to prevent corruption of shared resources.

3.7.1 Configuring for status update missing conditions Each system in a sysplex is required to update its status information regularly. A status update missing condition occurs when a system does not update its status information within the time specified on the INTERVAL keyword in the COUPLExx PARMLIB member.

The SFM policy is defined by the Administrative Data Utility IXCMIAPU and stored in the SFM CDS. Parameters that may be specified in order to manage the status update missing condition are:  PROMPT Notify the operator with message IXC402D after waiting for the length of the operator notification interval as specified on the OPNOTIFY parameter in the COUPLExx PARMLIB member. Note that PROMPT is the default if DEACTTIME, RESETTIME, or ISOLATETIME are not specified.  ISOLATETIME(isolate-interval) Specifies how long SFM will wait after detecting a status update missing condition before attempting to automatically isolate the failing system. The processing of the ISOLATETIME parameter was altered with APAR OW30926 to prevent isolation of systems experiencing temporary conditions preventing the status information update from occurring as follows:

Chapter 3. z/OS 103 – With or without OW30926, if a system has not updated its status information and is producing no XCF signalling traffic then SFM will automatically isolate the failing system at the expiration of the ISOLATETIME interval. – With OW30926, if a system has not updated its status information but is continuing to produce XCF signalling, the ISOLATETIME parameter is ignored and the operator is prompted via IXC426D and may elect to remove the system or retry. If the isolation attempt is not successful (for example, if the failing system is not connected through a CF to another system in the sysplex), then the operator is prompted via message IXC102A to manually system reset the failing system, and then reply DOWN to allow the isolation to continue.  DEACTTIME(deact-interval) Specifies the amount of time SFM will wait after detecting a status update missing condition before deactivating the LPAR of the failing system. Note that this action is only performed if there is an active system in the sysplex running in LPAR mode on the same PR/SM CPC as the failing system.  RESETTIME(reset-interval) Specifies the amount of time SFM will wait after detecting a status update missing condition before system resetting the LPAR of the failing system. Note that this action is only performed if there is an active system in the sysplex running in LPAR mode on the same PR/SM CPC as the failing system.

3.7.2 Configuring for signaling connectivity failures All systems in the sysplex must have signaling paths to and from every other system at all times. Loss of signaling connectivity between sysplex systems can result in one or more systems being removed from the sysplex so that the remaining systems retain full signaling connectivity to one another.

The SFM policy is defined by the Administrative Data Utility IXCMIAPU and stored in the SFM CDS. Parameters that may be specified in order to manage the signaling connectivity failure are:  WEIGHT(weight) Specifies a value that represents the relative importance of this system in comparison to the other systems in the sysplex.  CONFAIL(YES | NO) Specifying NO indicates that the system is to prompt the operator with IXC409D to manually determine which systems to partition from the sysplex. Specifying YES indicates that SFM is to automatically determine which systems to remove from the sysplex and then attempts to implement the decision by system isolation. In sysplexes with greater than two members, multiple scenarios may exist as to which members may be removed in order to regain full sysplex connectivity of the remaining members. In this situation SFM aggregates the system WEIGHT values for each scenario and retains the scenario with the highest aggregate value. Note that it is particularly important to set appropriate weights to ensure important systems are protected at all times. Where the weights of the possible surviving sysplex combinations are equal, SFM’s decision on which combination to keep becomes arbitrary and this may cause an undesirable result.

104 Achieving the Highest Levels of Parallel Sysplex Availability 3.7.3 Configuring for Coupling Facility failures Systems may lose access to a CF through either CF failure, or CF link attachment failure.

Whether or not z/OS attempts recovery for each affected structure by structure rebuild processing into another CF is dependant upon a number of factors:  The presence of an alternate CF with sufficient space.  The percentage loss of connectivity. z/OS uses the WEIGHTs defined in SFM policy to determine the percentage loss of connectivity by comparing the WEIGHTs of the affected systems with the aggregate WEIGHTs of all the sysplex systems.  The presence or absence of an active SFM policy.  The structure’s support for structure rebuild protocols.  The structure’s REBUILDPERCENT setting in the CFRM policy.  OS/390 1.3 or late with APAR OW30814, which removed REBUILDPERCENT restrictions if no SFM policy was active.

Assuming that an alternate CF is available with sufficient space and APAR OW30814 is applied, z/OS processing is as follows:  For a full CF outage (that is, percentage loss of connectivity = 100 percent), then z/OS attempts structure rebuild for all structures that support structure rebuild protocols irrespective of whether an SFM policy exists and regardless of any REBUILDPERCENT specifications.  For a partial CF outage (that is, percentage loss of connectivity < 100 percent): – If an SFM policy is active, then a structure rebuild is only attempted for structures that support structure rebuild protocols if the percentage loss of connectivity is greater than or equal to the REBUILDPERCENT as specified in the CFRM policy. – If there is no active SFM policy, then structure rebuilds are attempted for all structures that support structure rebuild protocols.

3.7.4 SFM recommendations Format SFM CDSs and define SFM policy as required. Ensure that all systems in the sysplex have access to the SFM CDSs. Loss of access by any system to the SFM CDSs inactivates the SFM policy over the whole sysplex. Specify CONFAIL(YES) to automate sysplex partitioning actions for connectivity failures. Ensure that critical systems are protected at all times through the setting of appropriate WEIGHTS. Define an ISOLATETIME in line with installation requirements. Ensure that the fixes for the following Apars are applied: OW30926 to prevent isolation of systems that are still signaling via XCF but are unable to perform status update processing

OW30814 to ensure structure rebuild for structures affected by REBUILDPERCENT in CF failure when SFM policy is not active OW33615 to resolve recovery issues with internal CFs OW41959 to change the default for REBUILDPERCENT from 100 percent to 1 percent so loss of connectivity from any system will force rebuild rather than loss of all systems

Chapter 3. z/OS 105

Ensure that adequate CF space is available in the remaining CFs to hold all the structures of a failed or removed CF. Remove the CFRM policy REBUILDPERCENT parameter for all structures such that it defaults to 1, allowing structure rebuild to occur in all connectivity failure scenarios.

3.8 Automatic Restart Manager (ARM)

ARM is an z/OS recovery function that can improve the availability of specific batch jobs or started tasks by restarting the job or started task without operator intervention. The goals of SFM and ARM are complementary in that the aim of SFM is to keep the sysplex running, while the aim of ARM is to keep specific work running within the sysplex.

Batch jobs and started tasks must register as elements with ARM via the IXCARM macro if they wish to be subject to automatic restart management.

When a registered job or started task fails, ARM attempts to restart it on the same system. If the system on which the registered job or started task fails, ARM attempts to restart it on one of the other systems in the sysplex that is in the same JES XCF group (that is, cross-system restart).

ARM allows an installation to define groups of elements known as a RESTART_GROUP. If a cross-system restart is required, all elements of the restart group are restarted on the same system.

In choosing which system on which to perform the restart, ARM checks the applicable restart group for any target system restrictions, after which, if there are still multiple candidate systems, ARM will reference WLM to determine the system on which to restart the work as follows:  If WLM is in goal mode, the work will be restarted on the system with the greatest available CPU capacity (and the required available CSA if the FREE_CSA parameter has been specified).  If WLM is in compatibility mode then the work may be restarted on any candidate system irrespective of CPU capacity (or CSA considerations if the FREE_CSA parameter has been specified).

Restart groups may also specify:  Whether elements need to be started in a particular order (via RESTART_ORDER)  Whether elements need to be restarted at intervals (that is, RESTART_PACING)  Whether target systems must have minimum available CSA (that is, RESTART_CSA)

Users of ARM may control both when and how a batch job or started task is to be restarted, as follows:  When (that is, conditions for restart) – Restart only on ABENDS (for example, application development). – Restart on ABENDS and on system failure (for example, production systems). – Never restart.  How (that is, batch job JCL or start command to be used): – Persistent (That is, use the same JCL or command text.) – JOB (That is, use different JCL.) – STC (That is, use different command text.)

106 Achieving the Highest Levels of Parallel Sysplex Availability Note that the choices may be different for ABENDS and system failures, and that the restart method may also change so a failed started task can be restarted as a batch job and vice versa if required.

3.8.1 Configuring for Automatic Restart Management The ARM policy is defined by the Administrative Data Utility IXCMIAPU and stored in the ARM CDS. Parameters that may be specified in order to control restart processing include:  RESTART_ORDER This parameter applies to all restart groups and specifies the order in which elements in the same restart group are to become ready after they are restarted. It has the following subparameters: – LEVEL Specifies the level associated with elements that must be restarted in a particular order. The elements are restarted from the lowest level to the highest level. –ELEMENT_NAME Specifies the name of each element to be restarted at the LEVEL specified. –ELEMENT_TYPE Specifies the element types that are to be restarted at the LEVEL specified.  RESTART_GROUP Identifies related elements that are to be restarted as a group if the system on which they are running fails. It has the following subparameters: – TARGET_SYSTEM Specifies the candidate systems on which elements may be started in a cross-system restart. It can include specific systems or a ‘*’ to represent all systems. – FREE_CSA Specifies the minimum amount of CSA and ECSA that must be available (requires WLM Goal mode). –RESTART_PACING Specifies the interval between restarts of each element in the same restart group. –ELEMENT Specifies a batch job or started task that can register as an element of this restart group and attaches attributes via the following parameters: • RESTART_ATTEMPTS Specifies the number of times ARM should attempt to restart the specified element within a given interval. •RESTART_TIMEOUT Specifies the maximum amount of time ARM will allow between restart and reregistration, after which the element is deregistered and message IXC803I is issued. • READY_TIMEOUT Specifies the maximum amount of time ARM will allow between restart and ready-for-work notification, after which ARM will remove this element from any restart dependencies for other elements. • TERMTYPE

Chapter 3. z/OS 107 Specifies the conditions for element restart. ALLTERM indicates both element failure and system failure, while ELEMTERM limits the restart to element failure only. •RESTART_METHOD Specifies how ARM is to restart the element by providing different methods for each of the failure scenarios.

In addition to the ARM policy, exits are provided that may be used to either modify ARM processing, or to interface to other system components beyond ARM control:  Element Restart Exit (IXC_ELEM_RESTART) This exit may be used to modify or cancel the ARM-initiated restart of an element, or to coordinate the restart of an element with other automation packages. It is invoked once for each element to be restarted on the system where it will be restarted.  Workload Restart Exit This exit may be used to prepare a system to receive additional workload from a failing system by canceling low-priority work, for example. It is invoked one time on each system that is selected to restart work from a failing system.  Event Exit

3.8.2 ARMWRAP - The ARM JCL Wrapper Initially, when ARM was first delivered in MVS/ESA 5.2, any program that wished to use ARM services needed to be APF authorized, which severely limited ARM’s use, as most installations prefer that application programs do not run authorized.

To resolve this problem, IBM changed ARM through APAR OW32480 to remove the APF authorization requirement for most ARM functions. However, SAF security was introduced to allow installations to limit usage.

In addition, IBM also introduced a product called ARMWRAP, which could be implemented without making any changes to application code whatsoever, although application JCL changes were required.

The premise behind ARMWRAP is to add an ARMWRAP step to the JCL prior to the application step that will register the application with ARM, and an ARMWRAP step after the application step that will de register the application.

Example 3-1 ARMWRAP JCL //ARMREG EXEC PGM=ARMWRAP,PARM=(’REQUEST=REGISTER,etc’) //APPL EXEC PGM=application,etc. //ARMDEREG EXEC PGM=ARMWRAP,PARM=(’REQUEST=DEREGISTER’)

Note that not all IXCARM parameters are supported by ARMWRAP (including EVENTEXIT, ANSAREA, REQUEST=ASSOCIATE,STARTTXT,RESTARTTIMEOUT, and ELEMBIND); therefore, code changes may be required for applications that wish to use these particular functions.

3.8.3 ARM recommendations Format ARM CDSs and define ARM policy as required. Determine which batch jobs and started tasks will be applicable for ARM restart.

108 Achieving the Highest Levels of Parallel Sysplex Availability

Determine if elements are interdependent and therefore need to be part of the same restart group. Determine if elements need to be restarted in a specific order and if elements need to be restarted at intervals. Determine if element restart occurs only on element failure or on both element failure and system failure. Determine method of restart (that is, original JCL, new JCL, or new started task command). For system failure, determine candidate target systems and set free CSA limitations if applicable. Determine maximum number of restart attempts to prevent recursive abend/restart scenarios. Determine time-out values between restart and re-registration with ARM, and, for cross-system restarts, between restart and ready-for-work notification that is used to coordinate restart order. For cross-system restarts, ensure that all of the required resources are available for the restarted component. Use ARMWRAP to provide ARM support for products that do not support ARM internally.

3.9 System Logger (LOGR)

The System Logger is a facility that allows an z/OS component, a subsystem, a program product, or an application to log data from single or multiple systems in a sysplex.

Current IBM exploiters of the System Logger include:  Operations Log (OPERLOG) Uses a logstream as the hard copy (HARDCPY) medium rather than the system log (SYSLOG) to provide a sysplex-wide view of operations log messages.  EREP Log (LOGREC) Uses a logstream instead of the LOGREC data set to record hardware failures, selected software errors, and selected system conditions across the sysplex.  APPC/MVS Uses a logstream to record events relating to protected conversations. An installation defined logstream is required for APPC/MVS to protect conversations through its participation in resource recovery.  CICS Log Manager Uses a logstream as a replacement to the CICS journal control management function. It provides a focal point for all CICS system log, forward recovery log, and user journal output within a sysplex and flexible mapping of CICS journals onto logstreams. CICS log manager also enables faster CICS restart, dual logging, and flexible log and journal archiving.  Resource Recovery Services (RRS) Uses a logstream to record events relating to protected resources. RRS records these events in five logstreams shared by systems in the sysplex.  IMS Common Queue Server (CQS) Log Manager

Chapter 3. z/OS 109 Uses a logstream to record information necessary for CQS to recover structures and restart. CQS writes log records into a separate logstream for each CF list structure pair that it uses.

3.9.1 Logstream types The System Logger writes log data into a logstream that consists of:  Interim storage, which is a medium where the log data is available for short-term access without incurring DASD I/O. When the interim storage medium reaches a user or installation-defined upper threshold, the log data is off-loaded to permanent storage on a record-by-record basis (oldest to newest) until the user-defined lower threshold is reached. As a failure of the interim storage medium would result in loss of log data, the System Logger enforces a duplexing of this log data—the location of the duplexed log data is dependent upon the interim storage medium and, in some cases, user-defined options.  Permanent storage, which is where the log data is written to log data sets for long-term access. The medium for permanent storage is linear VSAM data sets where the log data is retained until deleted.

CF logstream (DASDONLY(NO)) For a CF logstream, the interim storage is a CF list structure.  Duplexing of CF interim storage log data is specified through parameters in the logstream definition in the LOGR policy and can be: – Unconditionally to local storage buffers on each system via the STG_DUPLEX(NO) parameter. This is the default. Use this option only if an application is not vulnerable to log data loss that could occur if the system and the CF are not failure isolated. – Unconditionally to staging data sets (one per logstream per system) via the STG_DUPLEX(YES) and DUPLEXMODE(UNCOND) parameters. – Conditionally to staging data sets (one per logstream per system) via the STG_DUPLEX(YES) and DUPLEXMODE(COND) parameters. With the conditional setting, duplex logging to staging data sets will only occur when the System Logger determines that the system and the CF are not failure isolated; otherwise normal duplex logging to local storage buffers will apply. Failure isolation between the system and the CF means that there are no single points of failure situations that would affect both components, such as: • The System Logger is running on a system that is on the same CPC as the CF, in which case failure of the CPC would cause loss of log data in both local storage buffers and the CF structure. • The CF is volatile. CFs maintain a volatility status that is available to connected applications. A CF is considered to be volatile if it has only one power source, which may be either main power or battery backup. When a CF has a dual source of power it is non-volatile, as a power failure will not immediately destroy structure contents. The System Logger assumes that a power failure could affect both the CPC and a volatile CF and is therefore a single point of failure.

110 Achieving the Highest Levels of Parallel Sysplex Availability With this parameter, different systems using the same logstream may have different duplex logging methods, for example, where one system is on the same CPC as the CF and another system is not. Also, be aware that this is a dynamic status and the System Logger may change duplexing methods in response to changes in configuration. For example, a CF structure may be rebuilt in a different CF, or a CF may change volatility status. – Conditionally allow log data duplexing in a system-managed structure duplexing environment via STG_DUPLEX(YES) and LOGGERDUPLEX(COND). Note that this has both software and hardware requisites and may involve a performance cost due to additional overheads.  When the CF structure reaches the user-defined high threshold, the System Logger off loads the log data from the CF structure to the log data sets.  As CF list structures may be shared across systems, this type of logstream can provide a sysplex-wide logging facility by merging log data from multiple systems in the sysplex.  When the last connection to a CF-based logstream disconnects from the logstream, the System Logger off loads all remaining data in the CF structure to the log data sets, deletes the staging data sets if they exist, and returns the CF structure space to Cross-system Extended Services (XES) for reallocation.

DASD-Only logstream (DASDONLY(YES)) For a DASD-Only logstream, the interim storage is local storage buffers. These are dataspaces associated with the System Logger address space IXGLOGR.  Duplexing of the DASD-only interim storage log data is automatic to staging data sets.  When the staging data sets reach the user-defined high threshold, the System Logger offloads the log data from the local storage buffers to the log data sets.  This type of logstream is only system-wide in scope, as the interim storage is system-specific local storage buffers. However, multiple applications on the same system may connect to the logstream concurrently.  When the last connection to a DASD-Only logstream disconnects from the logstream, the System Logger offloads all remaining data in the local storage buffers to the log data sets, and deletes the staging data sets.

3.9.2 CF structure considerations Sizing the CF structure is important. Small structures with high logging rates will require frequent offload (as per HIGHOFFLOAD and LOWOFFLOAD settings), and therefore will sustain higher system overhead, and large structures may be in excess of application requirements and waste valuable CF space.

A single CF structure can be used by more than one logstream. In this case the System Logger divides the structure storage evenly among the logstreams that have at least one connection. This is a dynamic process, as the first connection is made with a new logstream, and the last connection is broken.

The maximum number of active logstreams that may connect to a CF structure is determined by the LOGSNUM parameter in the LOGR policy. This value determines how much CF control space is reserved in each System Logger structure. IBM recommend that this value be kept as low as possible to maximize structure space for log data.

Keeping the number of logstreams small within a CF structure also aids performance as the System Logger connect processes and logstream recovery processing have been optimized

Chapter 3. z/OS 111 to provide parallelism at the structure level. Defining multiple logstreams in a single structure serves to decrease performance as the System Logger must sequentially process these requests rather than execute them in parallel. However, this is offset by recovery considerations. A peer system can perform logstream recovery for a failed system if it has a connection to the CF structure containing the logstream. Without this, logstream recovery cannot occur until the failing system is restarted.

The structure size is defined in the CFRM policy via the SIZE and INITSIZE parameters. To simplify the task of calculating the size, IBM provides the CF Structure Sizer, a tool that estimates the storage size for each structure by asking questions based on your existing or planned configuration. The CF Structure Sizer uses the selected structure input to calculate the SIZE and INITSIZE values for the CFRM policy. Refer to: http://www.ibm.com/servers/eserver/zseries/pso

3.9.3 System-Managed CF Structure Duplexing The use of System-Managed CF Structure Duplexing for Logger structures may be specified via the STG_DUPLEX(YES) and LOGGERDUPLEX statements in the LOGR policy. The use of this function requires the LOGR CDS to be formatted at the HBB7705 (z/OS 1.2) level, using the ITEM SMDUPLEX keyword, as well as having the appropriate CF levels, z/OS levels and required PTFs, and the required hardware connectivity.

The use of System-Managed CF Structure Duplexing may allow the System Logger to provide the required levels of data protection without requiring the use of staging data sets. This is controlling using log stream definition parameters as follows:  LOGGERDUPLEX(UNCOND) causes the System Logger to use staging data sets as per the STG_DUPLEX and DUPLEXMODE specifications, even if the logstream is in a structure that is duplexed using System-Managed CF Structure Duplexing.  LOGGERDUPLEX(COND) causes the System Logger to stop using staging data sets if the logstream is in a structure that is duplexing using System-Managed CF Structure Duplexing and the CFs are failure-isolated from each other.

System-Managed CF Structure Duplexing can be used for products that require persistent log data, like CICS, without incurring the overhead of using staging data sets.

Note that a cost/benefit analysis should be performed for each structure prior to enabling it for System-Managed CF Structure Duplexing as it can lead to increased z/OS CPU utilization, increased CF CPU utilization, and increased CF link utilization. Refer to 3.3.4, “Structure duplexing” on page 87.

3.9.4 DASD based staging data set considerations (DASD-Only) Interim log data from a DASD-Only logstream is maintained in local storage buffers on the system, and duplexed in DASD based staging data sets.

The size of the staging data sets is determined by the STG_SIZE parameter which:  May be explicitly specified.  May be set to the same value as another defined logstream via the LIKE(like_streamname) parameter if specified.  May use values as specified in the SMS DATACLASS as specified by the STG_DATACLAS parameter or allocated by the SMS ACS routines.

112 Achieving the Highest Levels of Parallel Sysplex Availability  May use z/OS dynamic allocation defaults as specified or defaulted in the ALLOCxx PARMLIB member if SMS is not available.

It is important that the STG_SIZE parameter is carefully chosen as the staging data set usage is used as a trigger for the offload process as per the HIGHOFFLOAD and LOWOFFLOAD parameter settings.

If the staging data sets are defined too small, the System Logger will offload log data from local storage buffers to the DASD-based log data sets frequently, which incurs overhead. In addition, small staging data sets may fill up completely depending upon the logging rates and the amount of unused staging space remaining when the HIGHOFFLOAD threshold is reached.

As a result, it is recommended that the staging data sets are allocated as large as possible to minimize overhead and avoid short-term logstream-full conditions and related application impact.

3.9.5 DASD-based staging data set considerations (Coupling Facility) Depending upon the options specified in the LOGR policy, the System Logger may use DASD-based staging data sets to duplex the CF structure log data.

The size of these data sets is determined by the STG_SIZE parameter, which:  May be explicitly specified  May be set to the same value as another defined logstream via the LIKE(like_streamname) parameter if specified.  May be left to default, in which case the maximum CF structure size for the logstream structure is used.

It is best to let this parameter default in most circumstances. The default value of CF structure size is most efficient where the CF structure contains only a single logstream.

If multiple logstreams are active within a single CF structure, then these data sets will each be allocated at the CF structure size rather than the logstream data size and therefore data set space wastage will occur.

Explicitly specifying the size of these data sets can influence the maximum amount of log data able to be retained in the CF structure and the offload frequency, and can limit flexibility:  The HIGHOFFLOAD and LOWOFFLOAD settings apply to all interim storage which includes both the CF structure and the duplex media that is being used. As a result, offload processing and logstream capacity threshold messages, and logstream full conditions occur when either of these media reach the specified percentages. Therefore, specifying the STG_SIZE parameter smaller than the CF structure size will cause structure space wastage, and will increase offload frequency and associated overhead.  The CF structure space is distributed proportionally amongst the defined logstreams with at least one active connection. As a result, any logstream has the potential to use all of the CF structure space if the other defined logstreams do not have active connections. Explicit specification of the STG_SIZE parameter could limit this flexibility and prevent use of available CF structure space in this situation.

The CI size of staging data sets must be 4096.

Chapter 3. z/OS 113 3.9.6 DASD-based log data set considerations Both the CF and the DASD-only logstreams offload log data from interim storage to log data sets when the HIGHOFFLOAD threshold is reached.

These data sets are allocated as linear VSAM and the data set size is determined by the LS_SIZE parameter, which:  May be explicitly specified  May be set to the same value as another defined logstream via the LIKE(like_streamname) parameter if specified  May use values as specified in the SMS DATACLASS as specified by the LS_DATACLAS parameter or allocated by the SMS ACS routines  May use z/OS dynamic allocation defaults as specified or defaulted in the ALLOCxx PARMLIB member if SMS is not available

We recommend that these data sets are made as large as possible to minimize the overall number of data sets and minimize the overhead associated with allocating and switching to a new data set when the current data set becomes full.

The CI size of log data sets should be 24576 for 3390 format to realize optimal I/O performance.

By default, each logstream is limited to a maximum of 168 log data sets. Message IXG257I is issued when the logstream data set directory reaches 90 percent of capacity. LOGR CDSs formatted at OS/390 1.3 or above can specify the DSEXTENT parameter to define additional directory extents. Each directory extent allows an extra 168 DASD based log data sets to exist concurrently. The IXG261E and IXG262A messages are issued when the used directory extent records reaches 85 percent and 95 percent of capacity respectively.

There is a log data set directory is dedicated to every logstream, however the log data set directory extents are defined in a pool and may be used by any of the defined logstreams.

When the logstream data set directory is full and all available directory extents are full, the System Logger is unable to create a new DASD-based log data set and therefore unable offload data, which will impact the connected application. Recovery options for this situation include:  Formatting a new LOGR CDS at OS/390 1.3 or higher with additional directory extents via the DSEXTENT keyword.  Deleting log data from the logstream. The log data delete process marks log data for deletion. – When all log data in a DASD-based log data set has been marked for deletion, the System Logger physically deletes that data set, which frees an entry in either the log data set directory or a log data set directory extent.

Important: Manually deleting a DASD-based log data set has no effect, as this is unknown to the System Logger causing the data set directory and data set directory extent entries to remain unchanged.

– When all entries in a log data set directory extent have been freed it is returned to the pool of directory extents and may be then used by any of the defined logstreams.

114 Achieving the Highest Levels of Parallel Sysplex Availability To determine the number of log data set directory extents are currently in use, run a IXCMIAPU LOGR report. To ensure adequate scope for expansion, it is advised that used directory extent records should be below 85 percent of those formatted. Like other data sets, the log data sets may be subject to HSM migration and recall. It is important to assess the impact to the connected application when log data from a migrated log data set is required:  For a log data access, the application has no option but to wait until the DASD based log data set containing the required log data is recalled.  For an offload situation where the current logstream offload data set has been migrated, the System Logger may bypass the recall operation based upon settings in the LOGR policy: – OFFLOADRECALL(YES) requires the System Logger to wait until the data set is recalled, which will impact the application if all available log space in the CF structure is used prior to the recall being completed. – OFFLOADRECALL(NO) allows the System Logger to bypass the recall operation and move to a new offload data set. While this protects the connected application, it may cause DASD space wastage when the migrated log data set is eventually recalled, as it was only partially filled, and it may cause problems at the logstream data set directory and data set directory extent level due to proliferation of under-utilized log data sets.

3.9.7 Offload considerations The frequency of log data offload from interim storage to the log data sets is determined by the HIGHOFFLOAD (default=80%) and LOWOFFLOAD (default=0%) parameters in the logstream definition.

The default values are suitable for applications that primarily write to the logstream and do not retrieve data often (for example, LOGREC and OPERLOG) and serve to minimize the offload frequency and associated overhead. Alternatively, applications that read log data frequently may want larger amounts of log data to be available for quick access therefore values of 80 percent and 60 percent may be more appropriate; however, this will cause higher offload frequency and associated overhead.

Irrespective, the HIGHOFFLOAD value should allow sufficient capacity to avoid logstream full conditions as the System Logger rejects further logging requests when a logstream is at 100 percent capacity. IBM recommend a maximum value of 80 percent for this parameter.

3.9.8 Log data retention In the original implementation, log data retention and deletion was the responsibility of the owning application. New parameters were defined in OS/390 1.3 and later that allowed the System Logger to define a retention period and automatic deletion policy for each logstream.

Physical deletion of log data is done at the log data set level and requires all log data within the log data set to be available for deletion with regard to RETPD and AUTODELETE parameters as follows:  RETPD Specifies the retention period for log data within a logstream. This represents the minimum time the log data will be retained even if the data has been marked for deletion by the application.

Chapter 3. z/OS 115  AUTODELETE(NO) No automatic deletion policy is used so log data can only be physically deleted after being marked for deletion and expiry of the retention period if specified. This is the default.  AUTODELETE(YES)

Defines an automatic deletion policy such that data log data can be physically deleted when the retention period expires. This applies irrespective of whether the log data has been marked for deletion by the application.

3.9.9 GMT considerations The System Logger uses the system clock GMT value as an authorization key when writing to the CF on behalf of the logstream. If the system clock GMT value is turned back, the System Logger is unable to write to the logstream until the new GMT is greater than the old GMT value.

The System Logger must write log data to the logstream in timestamp order to avoid corruption and therefore will be unable to write a record until the current GMT time is greater than the value last used. In the initial implementation, the System Logger would wait until the GMT time problem was resolved, possibly holding critical system resources in the duration. APAR OW41306 changed this processing such that the IXGWRITE is rejected with a return code ‘0C’x and ABEND1C5-00040003, which will result in a dump being requested.

To prevent such problems, delete the logstream prior to the GMT date change and then redefine it once the new GMT date has been established.

3.9.10 System Logger recovery The System Logger performs recovery differently for DASD-only logstreams to CF logstreams because DASD-only logstreams are only system wide in scope and because log data is always protected in staging data sets.

DASD-only logstream recovery The System Logger does not perform any system-level recovery in a DASD-only logstream failure scenario either immediately after the failure, or at system initialization time.

As there are no peer systems in a DASD-only logstream environment, recovery only takes place when an application reconnects to the logstream. As part of the connect processing, the System Logger reads the log data from the staging data set into the local storage buffers of the current connecting system.

This can be done on a different system than where the failure occurred assuming staging data set accessibility. This allows logging applications to be restarted on different systems in system failure situations if required.

CF logstream recovery In this section we discuss CF logstream recovery.

System failure Where a system fails, the System Logger on a peer system in the sysplex tries to safeguard the log data still resident in the CF structure by either off loading it to DASD-based log data sets, or by ensuring that the data is secure in an active system-managed structure duplexing environment.

116 Achieving the Highest Levels of Parallel Sysplex Availability In order to do this, the System Logger on the peer system must have a connection to the same CF structure as the failed system. It does not have to have a connection to the same logstream, just to the same structure containing the logstream. If there is no peer system available that has the necessary connection, recovery of the log data from the CF structure is delayed until either the failed system re-IPLs, or another system in the sysplex connects to the structure. Therefore, for recovery purposes, the best configuration is where multiple systems are connected to each structure.

If the log data from the CF structure cannot be recovered, the System Logger attempts to use staging data sets if they exist. If this is successful for all connectors then the logstream is considered to be recovered; otherwise if staging data sets were not in use by some or all of the connectors then the logstream is marked as damaged.

In this case the logstream needs to be deleted and recreated. Log data that was successfully recovered will be written to the DASD-based log data sets while log data that was not recovered is lost.

CF failure Where a CF structure in a systems-managed structure duplexing environment becomes unusable, XES automatically switches from duplex mode to simplex mode and continues to use the surviving instance of the structure.

If the structure was in simplex mode at the time of the failure, structure rebuild processing will be initiated (assuming structure REBUILDPERCENT limitations are satisfied) and all systems in the sysplex that are connected to the structure at the time of failure will participate in the rebuild process.

3.9.11 System Logger recommendations Avoid placing the primary LOGR CDS on the same volume as the primary sysplex CDS or the primary CFRM CDS. Ensure the LOGR CDS is formatted at OS/390 1.3 or later and make use of the DSEXTENT facility to allow more than 168 log data sets. For CF logstreams with non-critical data, use STG_DUPLEX(NO) for to avoid staging data set overhead. Log data loss may occur if the physical configuration contains single points of failure. For CF logstreams with critical data, use STG_DUPLEX(YES) to ensure log data duplexing to staging data sets. Consider using system-managed structure duplexing via the STG_DUPLEX(YES) LOGGERDUPLEX(COND) parameter to implement external log data duplexing and avoid the performance impact of staging data sets. Note that this has both software and hardware requisites, and other performance considerations that need to be assessed. For a CF logstream in a GDPS environment, use STG_DUPLEX(YES) and DUPLEXMODE(UNCOND) to unconditionally use staging data sets to ensure remote site log data availability for restart after a freeze. Ensure that both the log data sets and the staging data sets are allocated with VSAM SHAREOPTIONS(3,3). Use the IBM Structure Sizer to assist in calculating INITSIZE and SIZE values to be used in the CFRM policy. Do not under size CF logstream structures for logstreams with high logging rates due to the frequent offload overhead. Use SMF type 88 to assist in LOGR structure utilization.

Chapter 3. z/OS 117

Specify ALLOWAUTOALT(NO) if you are not CF space constrained. Otherwise, if ALLOWAUTOALT(YES) is specified then also specify MINSIZE to prevent short-term logstream-full conditions, which could impact exploiters and increase offload overhead should XES resize the structure. Do not over specify LOGSNUM, as CF control space is reserved in each CF log structure based upon this number. For CF logstreams, let STG_SIZE default so staging data sets (if used) are allocated with the same size as the structure. If coded, avoid small values as these cause increased offload activity and overhead. Ensure a CI size of 4096. For DASD-only logstreams, specify STG_SIZE as large as possible to minimize offload overhead and avoid short-term logstream-full situations. Avoid small values, as these cause increased offload activity, and avoid default values as these are usually unsuitable. Ensure a CI size of 4096. For all logstreams, specify LS_SIZE as large as possible to minimize the number of log data sets to prevent log data set allocation and switch overhead, and log data set directory and log data set directory extent exhaustion. For 3390 formats, ensure a CI size of 24576 for optimal I/O performance. Do not set the HIGHOFFLOAD parameter above 80 percent as this increases the possibility of logstream exhaustion. Set the LOWOFFLOAD parameter based upon log data access requirements. Set a low or zero value for infrequent access log data that will tolerate slower access times from permanent media, and a higher value to retain recent data in fast-access interim media. Use the OFFLOADRECALL parameter if there is a possibility that the current log data set may be migrated and the remaining structure or staging data set space may be exhausted prior to recall completion. Monitor for the IXG261E and IXG262A messages which are issued when log data set directory extent records reach 85 percent and 95 percent of capacity, and action accordingly. Use RETPD to ensure minimum requirements for log data retention. Use AUTODELETE(YES) only for applications where log data may be deleted under any circumstances after retention period expiry. For other situations, set AUTODELETE(NO) and let the application mark the data as deleted as appropriate. Delete logstreams prior to any reverse GMT time changes. Failure to do so will cause logging requests by an application to be rejected until the GMT time prior to the reversal is reached. Try to have at least two active logstreams per CF structure, connected to more than one system, to allow peer recovery in case of failure. Try to start different logstreams for the same LOGR structure at the same time. When the first connection is made to another logstream in the CF structure, the existing logstreams will be re-sized, which could potentially cause short-term logstream-full conditions and impact exploiters. Ensure adequate auxiliary storage to back System Logger local storage buffers. Use non-volatile CFs and ensure failure isolation between exploiters and CFs if possible.

3.10 Cross-system Coupling Facility (XCF)

XCF is a component of OS/390 and z/OS that allows authorized programs (for example, an z/OS component or subsystem, a program product, or an installation-defined program) on

118 Achieving the Highest Levels of Parallel Sysplex Availability one system to communicate with other authorized programs on either the same system, or on other systems within the same sysplex.

Logical communication is achieved by the authorized programs defining an XCF group which is used to relate the component members across the sysplex. A member is the part of the authorized program that resides on one system in the sysplex and uses XCF services to communicate with other members of the same group. Note that authorized programs are not limited to defining one member per system, they may define multiple members if required (for example, to communicate between different subcomponents), and may even define multiple groups if required.

Physical communication between systems in a multisystem sysplex is achieved through physical paths which may be either or both:  Channel-to-channel (CTC) connections (either ESCON channels operating in CTC mode or parallel channels connected by a 3088 Multi Channel Control Unit (MCCU))  CF list structures

3.10.1 XCF systems, groups, and members The sysplex-wide maximum numbers of systems, XCF groups, and members within an XCF group are defined when the sysplex CDSs are formatted with the IXCL1DSU utility. If required, these numbers may be changed by formatting and activating new sysplex CDSs via the SETXCF COUPLE command or via sysplex-wide IPL.

Systems The MAXSYSTEM parameter (default=8,maximum=32) specifies the maximum number of systems allowed to be concurrently active within the sysplex.

To determine the current MAXSYSTEM value, use the DISPLAY XCF,COUPLE command and refer to the MAXSYSTEM setting. To determine the number of systems currently in the sysplex use the DISPLAY XCF,SYSPLEX command.

It is important that the MAXSYSTEM value is chosen carefully. Defining a value that caters for future growth is desirable.

Groups The ITEM NAME(GROUP) NUMBER( ) parameter (default=50, minimum=10, maximum= 2045) specifies the maximum number of XCF groups allowed to be concurrently active within the sysplex.

To determine the maximum number of XCF groups currently defined, use the DISPLAY XCF,COUPLE command and refer to the MAXGROUP(PEAK) values.To determine the number and names of XCF groups that are currently active use the DISPLAY XCF,GROUP command.

It is important for availability purposes that this value is chosen carefully as it needs to incorporate:  The number of XCF groups used by z/OS components  The number of XCF groups used by subsystems and program products  The number of XCF groups used by multisystem applications  A contingent number of XCF groups for future growth

It is worthwhile to regularly review the MAXGROUP(PEAK) values to ensure the peak values are not approaching the maximum available. An XCF group full condition will prevent further

Chapter 3. z/OS 119 XCF groups from being activated, which will impact the authorized program that was attempting to use XCF services. Temporary recovery may be to shut down a lesser priority address space on all systems in the sysplex in order to free up XCF group slots, while permanent recovery is to format and activate new sysplex CDSs with a higher groups specification.

Members As with XCF groups, there is a maximum number of XCF members that may be activated within an XCF group as defined by the ITEM NAME(MEMBER) NUMBER( ) parameter (minimum=8, default=device dependent, maximum=1023).

The default number of members in an XCF group is calculated to be the number of members that can be accommodated on one track on the device type on which the sysplex CDS resides. The actual number defined may be larger than the number specified, as the format utility adds one extra member for XCF usage, and then rounds the number of members to the next larger multiple of four.

To determine the maximum number of members allowed within an XCF group, use the DISPLAY XCF,COUPLE command and refer to the MAXMEMBER(PEAK) values. To determine the number and names of XCF groups that are currently active use the DISPLAY XCF,GROUP,groupname,ALL command.

As per the XCF group setting, it is important for availability purposes that the maximum number of members is chosen carefully as it needs to cater for both current requirement and future expansion. Regular review of the MAXMEMBER(PEAK) values will help protect against a member full situation.

3.10.2 XCF signaling paths It is a requirement of z/OS that there is full signaling connectivity between all systems in a sysplex such that operational outbound and inbound paths exist between each pair of systems.

If complete signaling connectivity is lost between two systems then z/OS will begin isolation processing to partition one of the two systems out of the sysplex either:  Automatically via activation of the Sysplex Failure Management (SFM) policy, which will retain the system that contributes to the configuration with the highest aggregate WEIGHT value and full signaling connectivity. Refer 3.7, “Sysplex Failure Management (SFM)” on page 103.  Manually via prompting the operator with IXC426D to allow specification of which system to remove.

Physical configuration Due to the severe impact of a full signaling connectivity failure, IBM recommends a configuration that consists of redundant connections between each system within the sysplex either through:  Multiple physical CTC channel connections

Signalling paths defined through CTC connections are unidirectional and, as such, must be exclusively defined as either inbound or outbound. This is done by specification of CTC device unit addresses via the PATHIN and PATHOUT statements in the COUPLExx PARMLIB member.  XCF structures in multiple physical Coupling Facilities

120 Achieving the Highest Levels of Parallel Sysplex Availability Unlike CTC connections, CF structures are bidirectional such that the same structure may be used for both inbound and outbound signaling paths; therefore both the PATHIN and PATHOUT statements may reference the same structure name. z/OS automatically establishes a signaling path by linking the outbound path with every other system in the sysplex that has the structure defined for inbound message traffic.

 A combination of CTC channel connections and XCF structures There is no requirement that signaling methodology needs to be uniform across the sysplex. It is quite valid for a system to use either of the above methods or both to connect to other systems. In addition, it needs to be noted that where multiple signaling paths are available between two systems, XCF does not attempt to balance the load but instead will use the path that is achieving the best performance.

Signal path failures Signal path failures will occur if the underlying hardware suffers failure. A properly configured sysplex with full redundancy for each inbound and outbound path has a different recovery scenario depending upon which signaling method has failed:  Where a CTC signaling path fails due to hardware failure, the recovery scenario is for XCF to use other available paths to maintain signaling connectivity until such time that the hardware has recovered.  Where a CF structure path fails, the recovery is dependant on the type of failure: – If a CF fails then z/OS can rebuild the structure in another available CF and reestablish signaling paths. When the CF has been repaired, the structure may be rebuilt in its original location via the SETXCF START,REBUILD command. – If the structure itself fails, z/OS can rebuild the structure in either the same CF or in another available CF and reestablish signaling paths. Note that during the structure rebuild process, all signaling connectivity is lost via that structure until the rebuild is complete. If the failed structure was the only signaling path between all members of the sysplex it is possible that Sysplex Failure Management (SFM), if active, could take action prior to signaling being reestablished and partition all members except for one out of the sysplex. Therefore, signaling path redundancy is strongly recommended, particularly when SFM is being used.

XCF attempts retry processing prior to a signaling path failing completely. A retry value may be specified on either the COUPLE statement (minimum=3,default=10,maximum=255), or on the PATHIN and PATHOUT statements. Whenever an error occurs on a path, XCF retries the path. If the retry is unsuccessful, XCF adds one to the retry count; if the retry is successful z/OS subtracts one from the retry count. XCF stops the path when the retry count reaches the RETRY value. The retry limit for a signaling path via a CF structure applies to each path in the specified direction (inbound or outbound).

When selecting a retry limit, consider that:  A large retry limit delays the removal of an inoperative or error-prone path.  A small retry limit hastens the removal of an inoperative or error-prone path.

For most installations, the default retry limit of 10 should be satisfactory.

3.10.3 XCF Transport Classes Transport classes are a method of linking XCF messages to XCF resources such as message buffers and signaling paths.This can be done either by the explicit specification of

Chapter 3. z/OS 121 XCF group names (that is, all messages for a particular application), or on the basis of message length (using the CLASSDEF statement in the COUPLExx PARMLIB member), or both. Each transport has its own resources which consist of a buffer pool, and one or more outbound signaling paths. Some early product documentation recommended separate transport classes for GRS and RMF to ensure performance by isolating these workloads; however, this is no longer advised. In most cases it is more efficient to pool the signaling resources for all applications and define transport classes based upon message size.

Where a transport class is explicitly defined for a number of groups, each group has equal access to the signaling resources. If the signaling resources for a transport class become unavailable, the messages for the related groups can use the signaling resources of other transport classes however this degrades the performance of the signaling service.

DEFAULT Transport Class z/OS automatically defines a default transport class with the following attributes: CLASSDEF CLASS(DEFAULT) CLASSLEN(956) GROUP(UNDESIG) MAXMSG(maxmsg)

Where:  CLASSLEN(956) = the length of messages to be accepted by this class.  GROUP(UNDESIG) = the workloads to be accepted by this class. Where a specific workload (for example, SYSGRS, SYSRMF, etc.) is not explicitly specified on any CLASSDEF statement it is classified as undesignated and placed in group UNDESIG that applies to all CLASSDEF’s that either specify GROUP(UNDESIG) or do not specify a group at all.  MAXMSG(maxmsg) = the maximum amount of buffer space that this CLASSDEF may use where maxmsg is either the value coded on the COUPLE statement, or the default of 750 K.

The DEFAULT transport class is adequate for most installations; however, these values may be modified if required either through explicit specification in the COUPLExx PARMLIB member or via the SETXCF MODIFY,CLASSDEF command.

Message Lengths The CLASSLEN(classlen) parameter on the CLASSDEF statement is used to determine the length of the individual message buffers formatted for a transport class.

This value is also used by XCF to select which transport class to use for each message. Where a message could be assigned to more than one transport class, XCF will select the one with the smallest message buffer length that will hold the message in order to minimize buffer wastage.

Internally, XCF uses buffers of 17 different fixed sizes. Within each buffer, 68 bytes are used for internal control blocks and the remaining buffer space is available for application message data; therefore CLASSLEN value will round up to the next available message size.

Table 3-3 XCF internal buffer sizes

Buffer size Message Buffer size Message Buffer size Message Buffer size Message size size size size

1 K 956 20 K 20412 40 K 40892 60K 61372

4 K 4028 24 K 24508 44 K 44988 61K(*) 62464

8 K 8124 28 K 28604 48 K 49084

122 Achieving the Highest Levels of Parallel Sysplex Availability Buffer size Message Buffer size Message Buffer size Message Buffer size Message size size size size

12 K 12220 32 K 32700 52 K 53180

16 K 16316 36 K 36796 56 K 57276

(*) The 61K buffer does not have the 68 byte overhead.

Where a message is larger than the CLASSLEN of any of the assigned transport classes, performance degradation occurs as additional processing is required to deliver the oversized message.

In this situation, XCF dynamically expands the size of the individual message buffers of a transport class and then processes the message. In early implementations, XCF always chose transport class DEFAULT to expand; however, this could have negative effects on the valid messages for that transport class especially if they were small so APAR OW16903 changed XCF to expand the transport with the largest buffer size instead.

If no additional messages of this size are received within a short time then dynamic contraction occurs to reinstate the original sizes. In both of these cases buffers need to be reformatted and extra XCF signals are generated to communicate these changes.

Message buffers Message buffers are used to hold messages in central storage as they are being processed. These buffers are allocated as needed to support the message traffic load. Certain situations like runaway applications, non-operational remote systems, or structure rebuilds can cause message traffic to back up to the point that the amount of virtual storage acquired for signaling would degrade the rest of the system.

The MAXMSG parameter is used to limit the total amount of message buffer space available to a transport class or signaling path. The MAXMSG keyword (default=750) can be specified on the COUPLE, CLASSDEF, PATHOUT, PATHIN, and LOCALMSG statements in the COUPLExx PARMLIB member.

The MAXMSG value provided (or defaulted) on the COUPLE statement applies to the CLASSDEF and PATHIN statements if not specifically provided. The PATHOUT statement uses the MAXMSG specified or defaulted for the associated CLASSDEF if not provided, and the LOCALMSG has a MAXMSG default of 0 if not provided.

Message buffers may be categorized as:  Outbound message buffers used to send messages to another system These consist of one set of transport class buffers for each connected system in the sysplex (of size MAXMSG specified or defaulted on the CLASSDEF statement) and one set of message buffers for each outbound signal path in that class (of size MAXMSG specified or defaulted on the PATHOUT statement).  Inbound message buffers used to receive messages from another system These consist of one set of signal class buffers for each inbound signal path (of size MAXMSG specified or defaulted on the PATHIN statement). Note that there are no transport classes associated with inbound messages.  Local message buffers used to send and receive messages within the same system These consist of one set of transport class buffers (of size MAXMSG specified or defaulted on the CLASSDEF statement, and additional local message buffers (of size MAXMSG specified on the LOCALMSG statement).

Chapter 3. z/OS 123 The various types of messages buffers are segregated from one another so that the supply of one type does not affect the supply of any other type. Message buffers used to communicate with one system are segregated from those used to communication with other systems.

3.10.4 XCF signal path performance problems XCF messages are sent out on the PATHOUT path and received on the PATHIN paths. If there are too few paths then signaling performance is degraded, while too many signaling paths may waste system resources. Tuning is achieved by altering either the number of signal paths, the type of signal paths, or both.

No paths for a transport class Outbound signaling path resources are usually dedicated to a single transport class via explicit specification of the CLASS parameter on the PATHOUT statement, or by the CLASS(DEFAULT) default.

Situations may arise where a transport class has no operational signaling paths. This is not a fatal condition as XCF will route the messages through over signaling paths in another transport class that has the required connectivity; however, this may result in performance degradation due to the additional overhead.

If a small message is redriven over a signaling paths that use large message buffers then significant buffer wastage could occur possibly leading to outbound and/or inbound buffer depletion scenarios.

Alternatively, if a large message is redriven over signaling paths that use small message buffers then additional overhead is required to support buffer expansion and contraction unless the message rate is sufficient to maintain the expanded buffer size, in which case the normal small message traffic will experience significant buffer wastage which again may cause outbound and/or inbound buffer depletion scenarios.

Diagnosis Examine the RMF XCF ACTIVITY report (XCF USAGE BY SYSTEM section) for non-zero values for “ALL PATHS UNAVAIL” for outbound traffic to remote systems.

Resolution This indicates either a configuration error, a transient or permanent hardware error, offline channels, or maybe operator intervention via the SETXCF STOP, SETXCF MODIFY, CONFIG, or VARY commands.

In all situations further investigation is required to determine the cause of the signal path unavailability with rectification as appropriate. If warranted and if resources permit, the transport class can be redirected to other operational signaling resources via the SETXCF MODIFY command.

Insufficient paths for a transport class Signal paths may be via either CF structures or via CTCs; therefore, due to the different architectures of these paths, different methods are required to evaluate performance problems relating to capacity.

Diagnosis (Coupling Facility paths) Examine the RMF XCF ACTIVITY report (XCF PATH STATISTICS section) for high “AVG Q LNGTH” and “BUSY” counts relative to “AVAIL” counts indicating XCF having to delay message transmission as the selected subchannel is still transferring the previous message.

124 Achieving the Highest Levels of Parallel Sysplex Availability Diagnosis (CTC paths) Due to the nature of XCF channel programs, queued requests are added to the CCW chain to increase efficiency; however, this distorts the “AVG Q LNGTH” on the RMF XCF ACTIVITY report (XCF PATH STATISTICS section) and therefore this is not a good indicator.

Instead, the DISPLAY XCF,PI,DEV=ALL command, which was updated by XCF APAR OW38138 to display the “MXFER TIME” for each signal path as an indicator of path response time. The “MXFER TIME” is the mean transfer time in microseconds for up to the last 64 signals received within the last minute. A value of less than 2 milliseconds (or 2000 microseconds) is an indicator of sufficient CTC capacity for the current workload.

As the DISPLAY XCF command is a snapshot of a small period of time, it may be worthwhile to generate an RMF Channel Activity report for an extended period (24 hours, 7 days, etc.) to determine regular periods of peak activity during which the DISPLAY XCF command results may be assessed for capacity and performance under load conditions.

Resolution (Coupling Facility paths and CTC paths) This is generally resolved through capacity changes:  Add new signaling paths to the transport group to increase capacity via the SETXCF START command, or redirect existing under utilized paths via the SETXCF MODIFY command.  Review the RMF XCF ACTIVITY report (XCF PATH STATISTICS section) for the connected system for non-zero values for “BUFFER UNAVAIL” for inbound traffic from connected systems. If this is non-zero it may indicate that the outbound path “BUSY” count is due to insufficient inbound message buffer capacity on the connected system causing delayed message delivery.  Review the RMF XCF ACTIVITY report (XCF USAGE BY MEMBER section) to determine group message counts. Where a transport class supports multiple group workloads it may be worthwhile routing a high message sending group through a dedicated transport class with dedicated signaling resources.

Too many paths for a transport class Where there are multiple outbound signaling paths assigned to a transport class leading to the same system, it is possible that the transport class is overconfigured and valuable signaling resources that could be used by other transport classes to the same system may be being wasted.

Diagnosis Examine the RMF XCF ACTIVITY report (XCF PATH STATISTICS section) for disparity between the “REQ OUT” counts for multiple signaling paths through the same transport class to the same system. XCF attempts to use the signaling path that is providing the best performance and regularly choosing the same path is an indication of excess capacity.

Resolution Relocate an under utilized signaling path to another transport group via the SETXCF MODIFY command with due consideration to redundancy.

3.10.5 XCF message buffer length performance problems Message buffer space is formatted into buffers of a set length as determined by the CLASSLEN value for the related CLASSDEF entry as follows:

Chapter 3. z/OS 125  Outbound message buffers consist of transport class buffers and outbound signaling path buffers.  Inbound message buffers consist of inbound signaling buffers only.  Local message buffers consist of transport class buffers and, optionally, specific local buffers if specified on the LOCALMSG statement.

Small message lengths XCF will never dynamically decrease a buffer size for a transport class or signaling path lower than the value specified on the related CLASSLEN statement; therefore it is possible that small messages may be occupying large buffers leading to buffer wastage, poor signaling performance, and possible buffer depletion.

Diagnosis Examine the RMF XCF ACTIVITY report (XCF USAGE BY SYSTEM section) for large values for “%SML” for outbound traffic to remote systems. The “%SML” value indicates the percentage of messages sent that could have used a smaller internal size buffer if one was available to the transport class.

Rectification This is usually resolved by either reducing the CLASSLEN for the related transport class to a lower internal buffer size as shown in Table 3-3 on page 122, or defining a new transport class to cater for the smaller sized messages.

Large message lengths If the message is larger than the CLASSLEN of any of the assigned transport classes, performance degradation occurs as additional processing is required to deliver the oversized message. XCF expands the buffer size of the transport class with the largest CLASSLEN (and related signaling resources), delivers the message, and in an attempt to eliminate this overhead retains the expanded buffer size for a short period of time until no larger buffers are being processed, in which case a buffer contraction process occurs.

Diagnosis Examine the RMF XCF ACTIVITY report (XCF USAGE BY SYSTEM section) for non-zero values for “%BIG” for outbound traffic to remote systems. The “%BIG” value indicates the percentage of total messages sent that needed a message buffer larger than the CLASSLEN size for the related transport class.

XCF tries to eliminate buffer expansion overhead by leaving the expanded buffers in place for a short period of time so a message larger than the CLASSLEN size does not always cause buffer expansion. To assess the effectiveness of this process, refer to the “%OVR” value, which represents the percentage of BIG messages (not total messages) that caused a buffer expansion.

To see a current message count of FIT and BIG messages for a transport group in terms of XCF internal buffer sizes, issue the DISPLAY XCF,CLASSDEF,CLASS=class command.

Rectification This is usually resolved by either increasing the CLASSLEN for the related transport class to a higher internal buffer size as shown in Table 3-3 on page 122, or defining a new transport class to cater for the larger sized messages.

126 Achieving the Highest Levels of Parallel Sysplex Availability 3.10.6 XCF message buffer space performance problems Insufficient XCF message buffers can impact XCF message flow between two systems if outbound or inbound messages are depleted or insufficient, or internally within a system if local message buffers are affected.

Outbound message reject For outbound messages to a particular system, if the sum of buffer space from the CLASSDEF and related PATHOUT statements is insufficient, then XCF will reject the message. Note that outbound buffer exhaustion may also be caused by messages backing up on the outbound side due to inbound buffer shortages on the connected system. Refer to “Inbound message reject” on page 127.

Diagnosis Examine the RMF XCF ACTIVITY report (XCF USAGE BY SYSTEM section) for non-zero values for “REQ REJECT” for outbound traffic to remote systems.

Resolution This is generally resolved by increasing the amount of outbound message buffer space through one or a combination of the following:  Increasing MAXMSG for the CLASSDEF buffers via the SETXCF MODIFY command  Increasing MAXMSG for the PATHOUT buffers via the SETXCF MODIFY command  Adding more outbound buffers via by activating additional PATHOUT signaling paths via the SETXCF START command

Inbound message reject For inbound messages from a particular system, if the sum of the storage from the related PATHIN statements is insufficient then the message will be delayed. This will cause the messages to back up on the outbound side and eventually a signal could be rejected through lack of message buffer space.

Diagnosis Examine the RMF XCF ACTIVITY report (XCF USAGE BY SYSTEM section) for non-zero values for “REQ REJECT” for inbound traffic from remote systems.

Examine the RMF XCF ACTIVITY report (XCF PATH STATISTICS section) for non-zero values for “BUFFERS UNAVAIL” for inbound traffic from remote systems.

Resolution This is generally resolved by increasing the amount of inbound message buffer space through increasing MAXMSG for the PATHIN buffers via the SETXCF MODIFY command.

Local Message Reject For local messages within the same system, if the sum of storage for the CLASSDEF and LOCALMSG statements is insufficient then XCF will reject the message.

Diagnosis Examine the RMF XCF ACTIVITY report (XCF USAGE BY SYSTEM) for non-zero values for “REQ REJECT” for local traffic for the system.

Chapter 3. z/OS 127 Resolution This is generally resolved by increasing the amount of local message buffer space through one or both of the following:  Increasing MAXMSG for the CLASSDEF buffers via the SETXCF MODIFY command  Increasing MAXMSG for the LOCALMSG buffers via the SETXCF MODIFY command

Dynamic changes via the SETXCF command will only be valid until the next IPL; therefore modifications will need to be made to the COUPLExx PARMLIB member if permanent changes are required.

3.10.7 XCF Coupling Facility performance problems Proper operation of the sysplex is extremely difficult if XCF is slowed down, and there are delays in sending messages to other sysplex members. Monitoring the performance of XCF using the RMF CXCF reports is very important.

The Washington Systems Centre has created a flash (number 10011) that discusses this area in detail. This flash is available at: http://www-1.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/FLASH10011

3.10.8 XCF recommendations Format sysplex CDSs with values for systems, XCF groups, and members to cater for both current requirements and future growth. Regularly use the DISPLAY XCF,COUPLE command and review the MAXGROUP(PEAK) and MAXMEMBER(PEAK) values to ensure the peak values of current usage are not approaching the maximum settings. Configure signaling paths to maximize redundancy. Ensure at least two distinct physical paths between each sysplex system. We recommend using both CTC paths and multiple CF structures. Use the PATHIN/PATHOUT RETRY default of 10. Minimize the number of transport classes. Remove separate transport classes for GRS and RMF as this is no longer advised and use transport based upon messages sizes instead. In most cases the DEFAULT transport class with message length 956 should be sufficient and, if warranted, define a second transport to handle larger message lengths. Set up procedures to monitor the performance of XCF on an on-going basic. See WSC flash 10011.

3.11 GRS

Global Resource Serialization (GRS) is the component of the z/OS operating system that controls serialization of shared resources between z/OS images. GRS can be used to serialize almost anything; however, typically this is most visible in relation to protecting the integrity of data sets in a shared DASD environment.

When access to a resource is required, a GRS ENQ is created, which is used by GRS to coordinate with all members of the GRS complex to determine whether access should be allowed.

GRS can run in one of two modes:  Ring mode

128 Achieving the Highest Levels of Parallel Sysplex Availability In this mode a logical (and sometimes physical) ring is created between all members of the GRS complex and a ring system authority (RSA) message containing resource ENQs is continually passed from member to member in set sequence to coordinate the serialization. As the RSA message needs to be passed all the way around the ring before access to a resource may be granted, rings with larger numbers of systems may suffer performance problems. IBM recommend conversion to star mode for Parallel Sysplexes for performance reasons.  Star mode In this mode, which is available with OS/390 1.2 and above, a CF lock structure called ISGLOCK is used to maintain the ENQ information, which is accessible by all members for the GRS complex. In terms of response time, the star configuration is far better, as requests for resources that are not in contention can be completed with only two signals to the CF in comparison with having to pass the RSA message around the ring before access can be granted. For CPU time, the overhead required to process ENQ and DEQ requests is limited to the system on which the request originated, and the CF; therefore the total processor time consumed across a star complex will be less than that consumed by a ring complex. Also, availability and recoverability are also improved with a star complex since the systems that make up the GRS complex are not dependent on each other as they are in the ring configuration, that is, no changes to processing or adjustments are required when systems enter or leave the GRS complex.

There are a number of different GRS complex configurations possible:  Non-sysplex systems only Where there are multiple systems that are not in a sysplex but share resources, serialization is achieved using ring mode with GRS communicating with GRS on other systems using dedicated CTCs in a physical ring configuration. This is prone to error as ring management is highly manual requiring significant operator intervention during ring disruption scenarios. Full connectivity is not required between each member; however, it is recommended, as it allows the ring to be built in any order and allows any system to withdraw from the ring without affecting the other systems.  Base sysplex systems only Where multiple systems are in a sysplex, XCF services are used as the communication method via CTC channels dedicated to XCF and a logical ring is established. This is a far more robust configuration, as ring management is largely automated and operator intervention almost completely removed. It is a sysplex requirement that there is full connectivity between each system, which allows GRS to build the ring in any order and easily manage system withdrawal and recovery.  Base sysplex and non-sysplex systems This is called a mixed-complex and is allowed but not recommended. Communication between sysplex systems is achieved using XCF services via CTC channels dedicated to XCF, and between sysplex and non-sysplex systems using CTC channels dedicated to GRS. This is a very confusing environment, as ring management may be either automatic or manual depending upon which system is involved.  Parallel Sysplex systems only using ring mode As per “base sysplex systems only” except that XCF services may use CF structures, CTC channels dedicated to XCF, or a combination of both in order to communicate with other sysplex systems.

Chapter 3. z/OS 129  Parallel Sysplex systems and non-sysplex systems using ring mode As per “base sysplex systems and non-sysplex systems”, except that XCF services may use CF structures, CTC channels dedicated to XCF, or a combination of both in order to communicate with other sysplex systems. Again, a highly confusing environment.  Parallel Sysplex systems using star mode This is the best configuration in terms of performance, recoverability, and availability. As this option uses a CF lock structure, only systems in the sysplex may participate, and therefore the GRS complex is equal to the sysplex in all circumstances. In order to convert to GRS star configuration, the XCF CDS must be formatted with a special option: ITEM NAME(GRS) NUMBER(1), which allocates space in the sysplex CDS where the RNL information is retained for use by all sysplex members.

3.11.1 GRS start options GRS has the following start options:  GRS=NONE The system is not to participate in a GRS complex. This start parameter cannot be used, as GRS is a requirement for a sysplex.  GRS=START The system is to start a GRS ring complex, which consists of a one-system ring and to assist other systems in joining the complex.  GRS=JOIN The system is to join an existing GRS ring complex and assist other systems in joining the complex.  GRS=TRYJOIN The system is to join an existing GRS ring complex and assist other systems in joining the complex, or start a GRS ring complex if no active GRS systems are found.  GRS=STAR The system is to participate in a GRS star complex.

3.11.2 Dynamic RNLs Resource Name Lists define how resources are to be managed from a GRS perspective and these are defined in the GRSRNLxx PARMLIB member and must be the same for all members of the GRS complex.

Prior to MVS/ESA SP 5, modifications to RNLs could only be specified at IPL, and because all RNLs had to be the same for all systems in the sysplex, a sysplex cold start was required to make changes.

With MVS/ESA SP 5 and later, dynamic RNLs allow for changes to be made across the entire GRS complex via a single SET GRSRNL=xx operator command. Note that this facility uses XCF services, and therefore cannot be used in mixed-complex configurations unless the non-sysplex systems are inactive.

GRS ensures the integrity of all resources is maintained throughout the duration of the RNL change; therefore it is recommended that dynamic RNL changes are made outside peak periods to minimize the following impacts:  Suspend

130 Achieving the Highest Levels of Parallel Sysplex Availability An address space requesting access to an affected resource (that is, a resource that is changing as a result of the RNL change) will be suspended until the RNL change is complete. This will be accompanied by message ISG210E and impacted address spaces may be identified via the D GRS,SUSPEND command.  Delay An address space holding an affected resource will delay the RNL change until the affected resource has been released. This is accompanied by message ISG219E and message ISG220D, which may be replied to with ‘C’ to cancel the RNL change if required. Address spaces causing the RNL change to be delayed may be identified via the D GRS,DELAY command. Note that suspended address spaces will remain suspended until the RNL change either completes or is cancelled.

Note that for a dynamic RNL change, one system reads the updated GRSRNLxx PARMLIB member and co-ordinates the changes throughout the systems. It is critical that all sysplex systems have their PARMLIBs updated to use the modified RNLs at the next IPL else they will be placed in a disabled wait X’A03’ as the RNLs no longer match what is currently active in the sysplex.

3.11.3 GRS Ring Availability considerations - Fully connected complex A fully connected complex is where the GRS complex matches the sysplex, in which case all GRS communication and ring management is done through XCF services.

As a result, physical availability considerations can be directly tied to XCF availability, as discussed in “Cross-system Coupling Facility (XCF)” on page 118.

In a fully connected complex, the recommended start parameter of GRS=TRYJOIN is used for all systems as it minimizes operator intervention by catering for both IPLing a system into an existing GRS ring as well as starting a new ring if no active members are found.

3.11.4 GRS Ring Availability considerations - Mixed complex A mixed complex is an inherently unstable configuration and is not recommended from an availability perspective for the following reasons:  Manual operator intervention is required to manage the CTC communication links used to provide GRS communication with non-sysplex systems, especially during ring disruption scenarios where systems enter or leave the GRS complex.  Failure to manage the CTC communication links can result in a system being placed in a disabled wait X’OA3’ through GRS communication failure.  An incorrect start option for a system joining an existing GRS complex can lead to a split ring situation if accompanied by CTC configuration errors.  Careful physical CTC planning needs to be done to ensure all ring scenarios have been catered for. The best configuration is redundant CTC connectivity between every member of the GRS complex. However, this may not be possible due to the amount of connections required as the number of systems increases.  Dynamic RNL changes are not available unless all non-sysplex systems are removed from the GRS complex for the duration of the change and then brought back in with updated GRSRNLxx PARMLIB members.

Chapter 3. z/OS 131 The parameter to be used at IPL time is dependent upon which system is being IPLd and whether or not a GRS complex already exists. The GRS=TRYJOIN option is not recommended because it can cause a split ring situation if there are problems with the CTC configuration. Therefore operator intervention is required using one of the following methods:  Specify GRS=JOIN for all systems. Whenever there is a full GRS complex outage specify an override of GRS=START in response to the IEA101A SPECIFY SYSTEM PARAMETERS message on the first system to be IPLed.  Specify GRS=START for all systems, then respond to the ISG005I START OPTION INVALID message with JOIN when a GRS complex already exists.  Specify GRS=JOIN for all systems, then respond to the ISG006I JOIN OPTION INVALID with START when a GRS complex does not already exist. Unfortunately, this message can come out when there are CTC configuration errors and a split ring situation can occur if not managed correctly.

3.11.5 GRS Star Availability considerations The GRS star configuration is reliant on XCF services to allow GRS to communicate with the GRS address spaces on other systems, and on the ISGLOCK structure in the CF used to maintain ENQ information.

The availability considerations of XCF have already been discussed in “Cross-system Coupling Facility (XCF)” on page 118. Therefore the following discussion concentrates on the ISGLOCK structure.

Loss of system access to the ISGLOCK structure will result in a disabled wait X’OA3’. Therefore recovery options for this structure are critical to the availability of the sysplex systems and the sysplex as a whole.

While conversion from ring mode to star mode may be performed dynamically, fallback from star mode to ring mode requires a sysplex-wide IPL. Therefore ring mode cannot be used as an availability alternative unlike XCF, for example, that can continue to use CTC communication should the XCF structure fail.

As a result, IBM recommend that star mode is not used in configurations where there is only one Coupling Facility, as this has severe consequences for both planned and unplanned outages relating to the CF or connectivity to the CF.

Factors important to rebuilding of the ISGLOCK structure include:  REBUILDPERCENT This parameter is specified in the CFRM policy definition for the ISGLOCK structure an is used during partial connectivity failures to determine whether to perform a structure rebuild or not. If the setting of this parameter prevents a structure rebuild then systems without connectivity will be placed in disabled wait X’A03’. Therefore it is recommended that the default value of 1 is used to force rebuild in all circumstances.  Available space in an alternate CF Insufficient available space in an alternate CF will prevent a structure rebuild in the new location. In this situation, GRS will continue to use the existing structure and systems without connectivity will be placed in disabled wait X’OA3’. The ISGLOCK structure must be sized correctly in the CFRM policy. The structure should be set, in most cases, to 33Mb or 65Mb, watching the false contention counts to see if the structure should be doubled (via CFRM policy change and a rebuild of the structure) to

132 Achieving the Highest Levels of Parallel Sysplex Availability reduce the false contention. The SIZE (and INITSIZE parameters if specified) can be calculated using the following method: a. #lock table entries = peak #resources * 100. The peak #resources can be determined using an IBM sample program (refer ISGCGRS SAMPLIB member), which reports the number of global resources in the existing GRS complex. b. Round #lock table entries up to the next power of 2. GRS requires that the #lock table entries is at least 32767 (that is, 32 K). Therefore use this value if the calculated value is lower. c. Lock table size (in bytes) = #lock table entries * 8. d. Structure size (in K) = (lock table size / 1024) + 256.

We recommend a minimum ISGLOCK structure size of 8448 K. However, smaller values may be sufficient for test environments.

When the first system joins the star, GRS attempts to allocate the ISGLOCK structure using the SIZE (or INITSIZE) values from the CFRM policy and issues ISG337I to document the number of locks created.

If there is insufficient storage in the CF to meet the CFRM policy size specifications, a smaller structure size is allocated and ISG322A is issued to advise of this modification.

If the number of lock entries is too small, GRS issues ISG338W and places the system in disabled wait X’OA3’, in which case the CFRM policy size may need to be increased, space freed up in the CF, or the ISGLOCK structure defined in a different CF.

Sometimes an ISGLOCK structure requires enlargement for performance reasons. As the ISGLOCK structure fills up the rate of false contention also increases. This can be seen in the RMF COUPLING FACILITY ACTIVITY report (COUPLING FACILITY STRUCTURE ACTIVITY section).

To increase the size of the ISGLOCK structure: 1. Update the CFRM policy with the new SIZE (or INITSIZE) value and activate. Note that the new size requirement may not be available in the existing CF; therefore, a different CF may need to be specified in the preference list, or another structure relocated to make the space available. 2. Issue a structure rebuild via START XCF,REBUILD,STRNAME=ISGLOCK,etc.

3.11.6 SYNCHRES option The Synchronous RESERVE feature is available in OS/390 2.7 and later, as was provided via response to FIN APAR OW26833.

Prior to OS/390 R7, there was a timing variation between when a RESERVE was requested (and global ENQ generated) and when the RESERVE was actually performed, increasing the likelihood of a deadlock occurring between systems that shared DASD volumes. Address spaces that issued the RESERVE macro were given immediate return control. However, the physical RESERVE was not actually implemented until the next I/O to the device, which could be from any address space on the same system and could be some time later.

With OS/390 R7, an installation may choose to remove this window by specifying SYNCHRES=YES (default=NO), which causes GRS to issue a NOP I/O to the device to implement the RESERVE prior to returning control to the original address space.

Chapter 3. z/OS 133 Implementation of Synchronous RESERVE is either by specification of the SYNCHRES parameter in the GRSCNFxx PARMLIB member, or by the SETGRS SYNCHRES= operator command.

3.11.7 Resource Name Lists (RNLs) Resource Name Lists (RNLs) are defined in the GRSRNLxx PARMLIB member and must be exactly the same for all systems participating in the GRS complex. The RNLs are used to alter GRS processing for the specified resources depending upon which RNL is used as follows:  The SYSTEM inclusion RNL converts each matching local resource (with SYSTEM scope) to global resource by changing the scope as specified on the ENQ or DEQ macro from SYSTEM to SYSTEMS.  The SYSTEMS exclusion RNL converts each matching global resource (with SYSTEMS scope) to a local resource by changing the scope as specified on the ENQ, DEQ, or RESERVE macros from SYSTEMS to SYSTEM.  The RESERVE conversion RNL suppresses the hardware reserve for each matching resource and retains the SYSTEMS scope as per the RESERVE macro default.

Each of the RNL types can contain specific entries (that is, full match), generic entries (that is, partial match), or pattern entries (that is, wildcard match available in z/OS 1.2 and later with toleration maintenance from OS/390 2.8 and later required for all members of the GRS complex).

When GRS encounters a request for a resource with a scope of SYSTEM or SYSTEMS, the system scans the RNLs looking for a match. Within each RNL type, specific RNL entries are search first, and if no specific match is made then the first generic or pattern entry that matches for each RNL type is used.

The SYSTEMS inclusion RNL is checked first, and where a match is found the resource has it’s scope changed from SYSTEM to SYSTEMS. As the scope is now SYSTEMS, it is then checked against the SYSTEMS exclusion RNLs and may be changed back to SYSTEM if a match is found. This allows for generic inclusion entries to be defined to change all resources and specific exclusion entries to handle the exceptions if required.

Because a RESERVE has SYSTEMS scope by default, it is first checked against the SYSTEMS exclusion RNL and if matched it will be changed to scope SYSTEM and the RESERVE will be issued. No subsequent checking is performed against the RESERVE conversion RNLs in this case. If no match is found then the RESERVE conversion RNLs are processed and if a match is found the RESERVE is suppressed.

3.11.8 RNL design Wherever a resource may be shared by multiple systems it is important that a serialization mechanism and appropriate rules are put in place to ensure data consistency and data integrity is maintained at all times.

Good RNL design is essential for sysplex availability for the following reasons:  Integrity of resources shared within GRS complex As GRS allows resources to be changed from SYSTEMS to SYSTEM scope via the SYSTEMS exclusion RNL it is critical that these resources are guaranteed not to be shared.

134 Achieving the Highest Levels of Parallel Sysplex Availability

Rule: Ensure that all shared resources have scope of SYSTEMS and are not modified via the SYSTEMS exclusion RNL.  Integrity of resources shared beyond GRS complex In some installations, DASD resources may be required to be shared with systems outside of the GRS complex. These DASD resources must be protected by hardware reserve to guarantee integrity. Rule: Ensure that all shared DASD resources beyond the GRS complex do not have the reserve converted via the RESERVE conversion RNL.  Starvation of resources on reserved DASD volumes Hardware reserves can cause starvation to resources on a DASD volume as the whole volume is locked irrespective of whether shared or exclusive access is required. This prevents access to the data set being referenced, and to any other data set located on the same volume for the duration of the reserve operation. Recommendation: Convert hardware reserves using the RESERVE conversion RNL wherever possible. Note that consideration needs to be given to DASD resources shared beyond the GRS complex, and to performance implications in a GRS ring configuration.  Performance in GRS ring configurations High-frequency short-term access data sets (for example, RACF primary data set) may achieve better performance using DASD reserve than via reserve conversion in GRS ring situations due to RSA message transmission implications. Where hardware reserves are required for performance reasons it is recommended that DASD volumes are dedicated for a particular function or purpose to minimize starvation situations. Recommendation: Convert hardware reserves via the RESERVE conversion RNL for GRS ring configurations where performance will not be a factor.  Performance in GRS star configurations IBM recommends conversion of all hardware reserves in a GRS star configuration due to comparable (in some cases better) performance with hardware reserve without the starvation implication. This affords granularity at the data set level, discriminates between shared versus exclusive requests, supports first-in-first-out access, and provides greater deadlock avoidance. Recommendation: Convert hardware reserves via the RESERVE conversion RNL for GRS star configurations in all circumstances if using current technology.  Deadlock avoidance The default behavior for a reserve request is to both perform the reserve, and perform the global ENQ. This results in double serialization. The reserver is implemented on the next I/O operation. This may be some time later, leaving a window in which a deadlock could occur. The SYNCHRES=YES option may be used to minimize this window. Refer “SYNCHRES option” on page 133. Recommendation: Convert hardware reserves using the RESERVE conversion RNL wherever possible. Where this is not possible, add the reserve to the SYSTEMS exclusion RNL to keep the reserve but prevent the global ENQ, or use SYNCHRES=YES to minimize deadlock window.

3.11.9 GRS monitor (ISGRUNAU) IBM provides a GRS monitor that monitors ENQ, DEQ, and RESERVE requests. This may be used to assist in the RNL planning for an initial GRS implementation, to review the existing RNLs for exposures, and to assist in GRS tuning.

Chapter 3. z/OS 135 The monitor executes as a Started Task or Batch Job (refer to the ISGRUNAU SAMPLIB member) and collects GRS-related information that can be reviewed by the related ISPF application (refer to SYS1.SBLSCLI0(ISGACLS0) for invocation exec). To ensure that no events are missed, the monitor must run at a high dispatching priority. therefore it is recommended that it is only used when required.

Data may be collected in:  A data set with switch to second data set when full (OUTPUT1 & OUTPUT2 allocated to data sets)  A data set with switch to a data space when full (OUTPUT1 allocated to a data set and OUTPUT2 allocated to DUMMY)  A data space only (OUTPUT1 and OUTPUT2 allocated to DUMMY)

Monitoring stops when the collection medium is full.

The monitor can be used to:  Display current statistics regarding the performance of the GRS ring or star complex.  Display reserves without RNLs. These should be added to the GRSRNLxx PARMLIB member as they are currently being processed with double serialization (that is, hardware reserve and global ENQ): – To convert the reserve, add the entry to the RESERVE conversion RNL, which will suppress the reserve and retain the SYSTEMS scope ENQ. – To retain the reserve, add the entry to the SYSTEMS exclusion RNL, which will keep the reserve and change the scope from SYSTEMS to SYSTEM to minimize the risk of deadlock.  Display resources with SYSTEM scope to ensure they are not shared across the GRS complex.

3.11.10 RNL syntax checking IBM provides a method for checking the syntax of the GRS RNLs prior to live implementation. However, this method has changed as of z/OS 1.4.

GRS RNL Checker (ISGRNLCK) - Valid up to z/OS 1.1 The GRS RNL Checker is supplied in SYS1.SAMPLIB(ISGRNLCK) as assembler source code, sample ASM/LKED JCL, and sample execution JCL.

Distribution of this code has been withdrawn by IBM, as it does not handle the new wildcard support introduced in z/OS 1.2.

SYS1.PARMLIB Processor ISPF Dialog The SYS1.PARMLIB Processor ISPF Dialog is supplied in SYS1.SAMPLIB(SPPINST), which is an ISPF panel-driven facility that may be used to parse PARMLIB and verify syntax of supported members including GRSRNLxx. This has been updated to support wildcard syntax. Please refer to z/OS: MVS Initialization and Tuning Reference, SA22-7592.

3.11.11 GRS recommendations Use star mode if possible. The performance improvements over ring mode are significant even with only two MVS systems in the Sysplex. Ring mode with more than three nodes

136 Achieving the Highest Levels of Parallel Sysplex Availability has significant performance issues. See S/390 Parallel Sysplex Performance, SG24-4356. Plan for and rehearse the use of the SET GRSRNL=xx command if changes to the RNLs are required. Proper use of this facility can remove the requirement for a planned outage of the Sysplex. If Ring mode must be used, use XCF signalling to connect the systems. GRS-only CTCs should only if there is no other option. Use multiple CTC paths to ensure that all systems are fully connected. This significantly reduces the recovery problems if a system goes down. In Star mode, ensure that all XCF signalling issues in 3.10.8, “XCF recommendations” on page 128, are addressed. Ensure that all shared resources have scope of SYSTEMS and are not modified via the SYSTEMS exclusion RNL. Ensure that all shared DASD resources beyond the GRS complex do not have the reserve converted via the RESERVE conversion RNL. Convert hardware reserves using the RESERVE conversion RNL wherever possible. Note that consideration needs to be given to DASD resources shared beyond the GRS complex, and to performance implications in a GRS ring configuration. Convert hardware reserves via the RESERVE conversion RNL for GRS ring configurations where performance will not be a factor. Convert hardware reserves via the RESERVE conversion RNL for GRS star configurations in all circumstances if using current technology. Convert hardware reserves using the RESERVE conversion RNL wherever possible. Where not possible, add the reserve to the SYSTEMS exclusion RNL to keep the reserve but prevent the global ENQ, or use SYNCHRES=YES to minimize deadlock window. Use the GRS monitor on an on-going basis to monitor and analyze GRS activity.

3.12 Tape sharing

Tape sharing is an important consideration in configuring an environment that uses multiple sysplex systems for availability.

Tape device resources are best utilized when drives can move automatically amongst sysplex systems to meet the workload needs at any given point in time. In planned and unplanned outage situations, the workload itself may move; therefore an automated and dynamic tape environment assists in matching the required resources to the system needs as they arise.

Within a sysplex, tape devices can be managed either as:  Dedicated, where the tape device may be online to only one system at a time. For another system to use the tape device, the operator must issue VARY OFFLINE and VARY ONLINE commands to make the tape device available.  Automatically switchable, where the tape device can be online to more than one system at a time. No VARY commands are required to move the tape device between the systems; however, communication between systems is required to ensure the tape device is only used by one system at a time.

Autoswitch is a system-specific tape device attribute that may be set and removed for each tape device on each system as required by operator command:

Chapter 3. z/OS 137 VARY device, AUTOSWITCH, [ON|OFF] The above command requires the tape device to be offline. The autoswitch attribute can also be specified in the HCD Operating System definition for the tape device so that the attribute is automatically set during IPL.

An additional requirement is to update the GRSRNLxx PARMLIB member to add a SYSTEMS inclusion RNL: RNLDEF RNL(INCL) TYPE(GENERIC) QNAME(SYSZVOLS)

This definition prevents a system from holding a tape device while it waits for a mount of a volume that is in use by another system in the sysplex.

3.12.1 IEFAUTOS Prior to z/OS 1.2, tape devices were managed as automatically switchable devices using the Parallel Sysplex Autoswitch facility where tape device serialization was achieved through a CF structure (IEFAUTOS) and XCF services. Through this structure, sysplex systems track the availability of automatically switchable tape devices, select appropriate tape devices to satisfy requests, and control their use.

The IEFAUTOS structure size is defined in the CFRM policy and is dependent upon the number of sysplex systems participating in autoswitch, and the number of tape devices that are to be autoswitch managed, the level of z/OS, and the level of the CF.

To simplify the task of calculating the size, IBM provides the CF Structure Sizer, a tool that estimates the storage size for each structure by asking questions based on your existing or planned configuration. The CF Structure Sizer uses the selected structure input to calculate the SIZE and INITSIZE values for the CFRM policy. Refer to: http://www.ibm.com/servers/eserver/zseries/pso

As the IEFAUTOS structure does not support structure alter, the INITSIZE value is not calculated and should not be specified.

3.12.2 ATS Star Systems at z/OS 1.2 (with OW51103 and OW50900) or later automatically use ATS Star instead, which uses GRS and XCF services to maintain serialization when allocating tape devices.

Instead of using a CF structure, each system maintains information about each autoswitchable tape device in the Allocation address space (ALLOCAS).

While it is possible to use ATS Star in a GRS ring configuration, IBM recommends a GRS star configuration in order to maximize the performance of this facility.

3.12.3 Coexistence between Dedicated, IEFAUTOS, and ATS Star Dedication of a tape device occurs when it is varied ONLINE to either a sysplex or non-sysplex system without the autoswitch attribute. When this occurs, the tape device will be unable to be used by any other system until it is varied OFFLINE.

Where another system has the same tape device defined as autoswitchable, it will be considered available for allocation. However, any attempt to use this tape device will result in an ‘assigned to foreign host’ message, and the tape device will be made unavailable until

138 Achieving the Highest Levels of Parallel Sysplex Availability further notice. Allocation regularly checks each unavailable autoswitch tape device and will reclaim and make available any tape device that is no longer dedicated.

Both IEFAUTOS and ATS Star systems consider each other as foreign hosts as well and manage tape devices in much the same way, although a tape device is only considered to be ‘assigned to a foreign host’ when it is ONLINE/ALLOCATED as distinct from being ONLINE only as per the dedicated scenario.

3.12.4 Tape-sharing recommendations Where IEFAUTOS is in use, ensure adequate structure size in the CFRM policy that will cater to additional sysplex systems, additional tape devices, changes to z/OS level, and changes to CF level. Update the GRSRNLxx PARMLIB member to include an entry for SYSZVOLS in the SYSTEMS inclusion RNL to prevent tape device allocation for mount requests of tape volumes that are in use.

3.13 JES2

JES2 is one of the job entry subsystems that IBM provides to receive jobs into the operating system, schedule them for processing by z/OS, and to control their output processing. In basic terms, JES2 manages jobs before and after running the program, and z/OS manages them during processing; while JES3, the other job entry subsystem supported by IBM, manages jobs throughout the entire process cycle.

JES2 maintains two different storage areas that contain job information and output queues as follows:  Simultaneous Peripheral Operations OnLine (SPOOL) The JES2 SPOOL is a repository for all input jobs and all system output (SYSOUT) that JES2 manages, as well as containing JES2 control blocks. The SPOOL space consists of one or more DASD data sets that must be accessible from all member systems in a multi-access spool (MAS) configuration.  Checkpoint The JES2 Checkpoint contains a backup copy of the in-storage job and output queues, and this may be located either on DASD or in a CF structure. Checkpointing ensures that information about these in-storage queues is not lost during scheduled shutdown and restart, JES2 failure, or even system failure scenarios. In addition, the checkpoint is critical for normal processing, as it allows each member to communicate and be aware of the current workload. Similar to the JES2 SPOOL data sets, the checkpoint must be accessible to all members of the MAS. However, only one member may have access to the checkpoint at any one time. As each member obtains control of the checkpoint, it updates its in-storage queues with information from the other MAS members, then writes the updated in-storage queues (which include its own updates) back to the checkpoint for other members to use, then releases control.

Chapter 3. z/OS 139 3.13.1 JES2 SPOOL considerations The JES2 SPOOL consists of a set of data sets that exist on DASD volumes that must be accessible to all members of the MAS. The data sets must all have the same name (default=SYS1.HASPACE) and must all reside on DASD volumes that meet a specific naming convention.

The SPOOLDEF statement specifies the parameters that define the SPOOL data sets as follows:  DSNAME (default=SYS1.HASPACE) This parameter defines the name of the SPOOL data set. As the same named data set may exist on multiple volumes (meeting the VOLUME naming convention), it is usually not cataloged.  VOLUME (default=SPOOL) Specifies either a four or five character volume serial prefix used to identify volumes on which the SPOOL data sets may reside. – Using a 5-character prefix, a maximum of 40 SPOOL volumes can be defined due to the limited suffix available (that is, A-Z,0-9, $,#,@,-). – Using a 4-character prefix, a maximum of 253 SPOOL volumes may be defined. In either case, the actual number of SPOOL volumes that can be defined at any one time is limited by the SPOOLNUM value.  SPOOLNUM (default=32,maximum=253) Specifies the actual amount of SPOOL volumes (rounded up to the next multiple of 32) that can be defined at any one time. This parameter requires careful planning, as there is a performance impact as each of the additional 32 volumes specified causes the JQE control block to increase by 4 bytes. IBM recommends that each SPOOL volume be entirely devoted to JES2. Allocation of other frequently used data sets on a SPOOL volume can degrade the efficiency of JES2’s direct-access allocation algorithm. SPOOL data sets should be allocated as single-extent data sets (that is, contiguous), as JES2 will only use the first extent.

By default, JES2 uses absolute track addressing (from the start of the volume) to address spool volumes. Using this method, the SPOOL data sets must be fully contained within the first 64 K tracks of the volume, and by extension cannot be bigger than 64 K tracks.

Optionally, relative track addressing (from the start of the data set) may be specified, which allows placement of SPOOL data sets anywhere on volumes larger than 64 K tracks. However, the maximum size of 64 K tracks still applies. The RELADDR parameter (default=NEVER) that controls this behavior is specified on the SPOOLDEF statement and applies whenever a SPOOL volume is first started, or on a JES2 COLD start.

3.13.2 JES2 Checkpoint considerations As the JES2 Checkpoint is a critical component, JES2 supports a two-checkpoint methodology that, if used, allows an installation to avoid the single-point-of-failure situation that would otherwise have been the case.

JES checkpoints are defined on the CKPTDEF statement in the JES2 initialization deck using the following parameter:  CKPTn=(DSNAME=dsn,VOLSER=vol,...) or CKPTn=(STRNAME=structure,...)

140 Achieving the Highest Levels of Parallel Sysplex Availability This defines CKPT1 or CKPT2 as being a DASD-based checkpoint named dsn on volume vol, or as a CF structure named structure, which must have previously been defined in the CFRM policy.  NEWCKPTn=(DSNAME=dsn,VOLSER=vol,...) or NEWCKPTn=(STRNAME=structure,...) This defines a checkpoint that can be used to replace the equivalent CKPTn should that checkpoint experience an I/O error, or otherwise be unavailable. IBM recommends that this parameter be specified at all times for recovery purposes. – Where NEWCKPTn refers to a DASD data set, it may either be preallocated or JES2 will allocate it when required. – Where A NEWCKPTn that refers to a structure, it must be defined in the CFRM policy prior to usage. – A DASD-based checkpoint CKPTn may use a structure for NEWCKPTn and vice versa.

IBM recommends that CKPTn and NEWCKPTn refer to failure-isolated checkpoints wherever possible (that is, different DASD subsystems, different Coupling Facilities, etc.) to maximize the possibility of recovery for various failure scenarios.

The choice of whether two checkpoints are to be used, and how they are to be used, is by specification of the MODE and DUPLEX parameters. The MODE parameter applies to the MAS as a whole, while the DUPLEX parameter applies only to the member on which it is specified, allowing individual members the option of assisting in the synchronization or not:  MODE=DUPLEX,DUPLEX=ON This is where there is a defined primary checkpoint CKPT1 and a duplex checkpoint CKPT2 (that is, backup). All read and write activity occurs against the primary checkpoint CKPT1. The duplex checkpoint CKPT2 is updated less frequently (that is, once for every 10 writes to the primary checkpoint). If the primary checkpoint CKPT1 fails, the duplex checkpoint CKPT2 is able to provide an exact or almost current copy of the primary depending on when it was last updated.  MODE=DUPLEX,DUPLEX=OFF This is where there is a defined primary checkpoint CKPT1 only. While a backup (that is, duplex) checkpoint may exist in the MAS, this member is unaware of it and therefore does not participate in the synchronization activity.  MODE=DUAL This is where there are two primary checkpoints (CKPT1 and CKPT2), and they are used in flip-flop fashion. The DUPLEX= parameter is ignored with this setting as all members of the MAS are required to participate in DUAL checkpointing. Also, if any checkpoint (CKPTn,NEWCKPTn) resides in a CF structure, then MODE=DUAL is ignored and MODE=DUPLEX is enforced.

As each member has in-storage queues, failure of the primary checkpoint can usually be recovered copying this information from the member with the most recent copy of the in-storage queues to the NEWCKPTn checkpoint if available. In a single member MAS, failure of both the primary checkpoint and the member system will be reliant on the alternate checkpoint for recovery.

Chapter 3. z/OS 141 The process of activating a NEWCKPTn checkpoint is called forwarding, and it can occur either through I/O error, or through explicit request via the JES2 Reconfiguration Dialog (maybe to move the checkpoint to a different location). Forwarding causes the specifications of CKPTn to be replaced with the equivalent values from NEWCKPTn, then populating the new CKPTn checkpoint with the in-storage version.

When the forwarding process is complete, the old CKPTn checkpoint has been replaced by the NEWCKPTn equivalent, and the NEWCKPTn is now null. If a JES2 restart occurs, the CKPTDEF parameter in the JES2 initialization deck will no longer match the actual environment. JES2 tries to maintain a “forwarding trail” by updating the old CKPTn checkpoint with forwarding details. However, this may not be possible due to I/O errors, loss of structure, etc. Where the forwarding trail remains intact, JES2 automatically performs forwarding each restart until the JES2 initialization deck is updated to match the actual environment; otherwise JES2 Checkpoint Reconfiguration is entered during restart. Therefore it is important that the JES2 initialization deck is updated accordingly whenever any forwarding is done.

Warning: Specifying RECONFIG as either a JES2 startup parameter, or in response to $HASP426 SPECIFY OPTIONS, will cause JES2 to ignore the forwarding trail and use whatever checkpoint is specified from the console. This could activate an out-of-date JES2 Checkpoint resulting in SPOOL corruption and data loss.

3.13.3 JES2 Checkpoint access Each member of the JES2 MAS competes for exclusive access to the JES2 primary checkpoint in order to update its own in-storage tables with information from other MAS members, and to incorporate its own updates as well. The frequency and the duration that each member holds the checkpoint are determined by the HOLD and DORMANCY parameters on the MASDEF statement:  HOLD=(hold) Specifies the minimum time interval that a member will hold the JES2 Checkpoint after acquiring it. Specifying a low hold value can cause thrashing, while a high hold value can needlessly lock out other members.  DORMANCY=(mindorm,maxdorm) – Specifies the minimum time interval mindorm that a member must wait after releasing the checkpoint before attempting to acquire it again. A low mindorm will create excessive contention while a high mindorm will cause excessive delays on this member. – Specifies the maximum time interval maxdorm that a member is allowed to wait before attempting to acquire it again. A low maxdorm will create excessive contention while a high maxdorm will cause this member to bypass opportunities for JES2 to process work.

Using these values, the JES2 Checkpoint access can be structured by the installation to be either:  Controlled In a controlled environment, each member is given a time-slice by specifying a mindorm parameter that prevents all the other members from attempting to acquire control of the checkpoint until every member has had their hold interval.  Contention-driven

142 Achieving the Highest Levels of Parallel Sysplex Availability In a contention-driven environment, each member has the mindorm restriction removed by specifying a zero value, which allows the member to attempt to acquire the checkpoint as soon as she needs it. The maxdorm parameter is still required to be set to a reasonable value such that the member can periodically check for incoming work.

For a DASD-based CKPT1 checkpoint, a controlled environment is usually a much better arrangement. While a contention-driven environment may provide benefits in that a member attempts to acquire the checkpoint immediately when it is needed rather than having to wait for the mindorm period to expire, the disadvantage is that more powerful MAS members can dominate and lock out the smaller MAS members.

For a CF-based CKPT1 checkpoint, both the controlled environment and the contention-driven environment are suitable due to FIFO queueing of lock requests guaranteeing each member access either immediately or in the near future.

3.13.4 JES2 Checkpoint performance Access to the JES2 Checkpoints needs to be serialized to guarantee integrity. JES2 always serializes against CKPT1 if possible, irrespective of which checkpoint is being updated.

For a DASD-based CKPT1, JES2 uses two methods of serialization:  A hardware checkpoint lock via RESERVE against the CKPT1 data set.  A backup software checkpoint lock in the first record of the first track of the checkpoint data set. If the hardware lock fails, the software checkpoint lock prevents other members of the MAS from updating the checkpoint.

Because of the RESERVE, it is recommended that the CKPT1 checkpoint data set is not placed on a SPOOL volume, or other volume that has frequent activity or other RESERVE activity.

IBM recommends that the checkpoint data sets are added to the GRS SYSTEMS exclusion RNL to prevent GRS global ENQ overhead from the RESERVE request, which could degrade performance, unless the sysplex is operating in GRS star mode, in which case the recommendation is to suppress the RESERVE by adding the data sets to the GRS RESERVE conversion RNL.

For CF-based CKPT1, serializing access to the checkpoint is greatly simplified because the CF uses its own lock, allowing RESERVE and software lock processing to be bypassed. Also, the CF lock enforces FIFO queueing of lock requests, guaranteeing each member access either immediately or in the near future.

An installation can use DASD checkpoints and CF checkpoints in any combination of CKPTn and NEWCKPTn, and there is no restriction on the checkpoint location.

Table 3-4 MODE=DUPLEX - DASD CKPT1 and MODE=DUPLEX - CF CKPT1 MODE=DUPLEX - DASD CKPT1 MODE=DUPLEX - CF CKPT1

 Acquire RESERVE for CKPT1.  Read latest CKPT (for example, CKPT2) and  Acquire software LOCK for CKPT1. acquire CF lock in the process.  Read CKPT1.  Primary write to CKPT2.  Primary write to CKPT2.  Intermediate writes to CKPT1 with every  Intermediate writes to CKPT1 with every 10th write to CKPT2. 10th write to CKPT2.  Final write to CKPT1 and release CF lock in  Final write to CKPT1. the process.  Release software LOCK for CKPT1.  Release RESERVE for CKPT1.

Chapter 3. z/OS 143 The optimum configuration is where multiple checkpoints are in use, CKPT1 is failure isolated from CKPT2, and CKPTn is failure isolated from the equivalent NEWCKPTn.  CKPT1=CF,CKPT2=CF,NEWCKPT1=DASD,NEWCKPT2=DASD Placing both the primary and duplex checkpoints in CF structures will give the best performance. However, this should only be done where the Coupling Facilities are non-volatile (that is, either UPS or battery backup power). Using DASD for NEWCKPT1 and NEWCKPT2 allows for both planned and unplanned outages to the Coupling Facilities to be easily managed through JES2 Dynamic Reconfiguration.  CKPT1=CF,CKPT2=DASD,NEWCKPT1=DASD,NEWCKPT2=DASD Placing the primary checkpoint only in a CF structure will also provide a performance increase as the RESERVE and software lock overhead associated with acquiring the checkpoint is removed. Also, DASD I/O is minimized as the duplex checkpoint is updated only once in every 10 writes. Again, specifying NEWCKPT1=DASD caters for planned and unplanned outages to the CF.  CKPT1=DASD,CKPT2=CF,NEWCKPT1=DASD,NEWCKPT2=DASD There is almost no benefit in this configuration, as all overheads associated with obtaining the RESERVE and software lock remain.

3.13.5 JES2 Checkpoint management Managing a JES2 Checkpoint in a CF structure is different from managing a JES2 Checkpoint on DASD due to the checkpoint information being retained in temporary memory versus the permanent storage of a DASD data set.

Planned or unplanned outages to a DASD checkpoint can only be handled through JES2 Dynamic Reconfiguration using forwarding of the CKPTn to the equivalent NEWCKPTn specification.

For planned outages to a CF, there is an additional option of performing a system-managed structure rebuild via the SETXCF START,REBUILD command to move the checkpoint structure to an alternative CF (assuming the CFRM policy allows the relocation and sufficient space is available). During the structure rebuild process, JES2 will not respond to commands, and will not start new work, and other MAS members may issue messages about not being able to access the checkpoint.

Note that JES2 only supports system-managed structure rebuild. In an unplanned outage situation relating to a CF structure, JES2 will always use the Dynamic Reconfiguration dialog to perform the recovery via the NEWCKPTn specification.

Wherever a CF is used, the volatility of the CF needs to be taken into consideration.  A volatile CF has a failure-dependent power supply. Therefore loss of power will cause loss of data.

 A non-volatile CF has a failure-independent power supply either through UPS or via a main supply with battery backup.

The CF is aware of its volatility status and can signal exploiting applications when the volatility status changes. JES2 caters for non-volatile to volatile changes of checkpoint structures through the VOLATILE parameter on the CKPTDEF statement:  VOLATILE=(ONECKPT=WTOR | DIALOG | IGNORE)

144 Achieving the Highest Levels of Parallel Sysplex Availability Where both checkpoints are in CF structures and a single checkpoint is subject to a volatility change  VOLATILE=(ALLCKPT=WTOR | DIALOG | IGNORE) Where all CF structure checkpoints are subject to a volatility change

Valid options are to prompt the operator for direction (WTOR), automatically enter Dynamic Reconfiguration (DIALOG), or to ignore the volatility status all together (IGNORE).

3.13.6 JES2 Health Monitor The JES2 Health Monitor is a separate address space provided in z/OS JES2 1.4 and later that uses cross-memory services to examine data in the JES2 address space in order to detect conditions that could seriously impact JES2 performance. When a problem with JES2 is suspected, the monitor can be interrogated to determine if it has detected anything.

The monitor cannot be turned off. It starts when JES2 starts and shuts down when JES2 terminates normally; and is automatically restarted by JES2 if the monitor abnormally terminates or is shut down by operator command.

Incidents The monitor samples the JES2 address space looking for particular conditions that may result in an incident. Each incident has a threshold associated with it that is used for categorization:  Normal incident Most conditions monitored can occur during the normal operation of JES2. If an incident lasts less than five seconds it is considered part of normal processing.  Tracked incident Where an incident lasts longer than five seconds it is tracked and a data element is created to describe the incident that may be displayed via the $JD JES command.  Alert incident Where the incident that is being tracked crosses the sampling threshold for that incident type it becomes an alert and a highlighted message is issued to the console. If the alert persists, the highlighted message is reissued at regular intervals.

Notices The monitor can also display information on conditions that are not time related. These are called notices and are displayed only in response to the $JD STATUS and $JD JES commands.

Resource monitoring As part of the sampling done by the monitor, resource utilization information is collected for the major JES2 resources, and JES2 CPU usage. The low, high, average, and current utilization of each resource is maintained and reset at the start of every hour.

Values for each resource for the current hour may be displayed via the $JD DETAILS command, and similar information for previous intervals may be displayed via the $JD HISTORY command.

Chapter 3. z/OS 145 3.13.7 Scheduling environment A scheduling environment is defined in the sysplex-wide Workload Management policy and consists of a list of abstract resource names along with their required states (ON or OFF) to allow scheduling for execution.

Each system in the sysplex indicates the availability of each of the specified resources (also with ON and OFF), and if all the scheduling environment resource requirements are satisfied on a system, then the job may be scheduled for execution on that system, assuming other parameters are satisfied like initiator availability, job class, system affinity, etc.

The scheduling environment is an installation-defined 16-character name that must be defined in the sysplex-wide Workload Management policy before it can be referenced, or else a JCL error will occur. Batch jobs are allocated a scheduling environment by usage of the SCHENV= keyword on the JOB statement, through an installation exit, automatically via a JOBCLASS default, or via the $T JOB JES2 command.

The resource names that make up the scheduling environment are also up to 16 characters in length, are abstract, and have no inherent meaning. In theory, each resource name represents the potential availability of a resource on an z/OS system.

The resource itself may be a physical resource (for example, peripheral device), or a tangible concept; it does not matter, as the only important factor is each system in the sysplex flagging whether the resource is available (ON) or not available (OFF).

In Figure , job JOB1 specifies scheduling environment XYZ, which is defined with three resources that are currently set to different values for different systems. As a result,

JOB1 will only be considered for scheduling for execution by SYSB, as this is the only system where the actual state of the resources meets the required state.

MVS Job JCL:

//JOB1 JOB (account),'pgmrname',CLASS=x,MSGCLASS=y,MSGLEVEL=(x,y), // NOTIFY=&SYSUID,SCHENV=XYZ

SDSF Scheduling Environment display:

SDSF SCHEDULING ENVIRONMENT DISPLAY MAS SYSTEMS LINE 1-1 (1) COMMAND INPUT ===> SCROLL ===> PAGE NP SCHEDULING-ENV Description Systems XYZ Product XYZ SE

SDSF Scheduling Environment Resource display:

SDSF RESOURCE DISPLAY MAS SYSTEMS XYZ LINE 1-3 (3) COMMAND INPUT ===> SCROLL ===> PAGE NP RESOURCE ReqState SYSA SYSB SYSC XYZ_AVAILABLE ON OFF ON OFF TAPE_ONLINE ON ON ON OFF BUSINESS_HOURS OFF OFF OFF OFF

Figure 3-3 Scheduling environment

146 Achieving the Highest Levels of Parallel Sysplex Availability Note that the ON and OFF values for each system for the specified resources are not system determined. These need to be set either manually, or via an automation process.

3.13.8 WLM-managed initiators

WLM-managed initiator can be used quite effectively to ensure that important batch work continues to get service if one or more of the MVS systems in the sysplex becomes unavailable. For example, take the case where each of two systems in a sysplex has a number of initiators running an important batch and some initiators running a discretionary batch. If one of the systems is shut down, either planned or unplanned, the number of initiators running important work in the sysplex is reduced. If WLM manages the initiators, extra initiators for important work will be automatically started on the remaining MVS system, and the discretionary initiators drained. This process will occur without operator intervention over a period of time as WLM notices that the important work is missing its goal.

3.13.9 JESLOG SPIN data sets Long-running address spaces, especially Started Tasks that last for the duration of the IPL, can accumulate significant amounts of JESLOG data (specifically JESMSGLG and JESYSMSG output groups), which use valuable SPOOL track groups. This space is only recovered when the address space is recycled and the output purged.

With z/OS 1.2 and later, JES2 has the option of suppressing the JESLOG data altogether, or spinning off the JESLOG data periodically (or on demand) while the address space continues to execute. This may be used for Started Tasks, batch jobs, and TSO users if required.

To use this facility, the appropriate JES2 JOBCLASS(jobclass) needs to be modified with one of the appropriate SPIN options. This is best done in the JES2 initialization deck. However, it can also be done dynamically via the $T JOBCLASS(jobclass),JESLOG=(SPIN,option), and subsequent address spaces that use this jobclass will be JESLOG SPIN capable.

There are quite a few options that can be specified to control when the spin will occur:  SPIN - On demand via a $T...,SPIN command  (SPIN,+hh:mm) - Every hh:mm interval  (SPIN,hh:mm) - At hh:mm every day  (SPIN,nnn) - Whenever JESMSGLG or JESYSMSG reaches nnn lines (This can be expressed as nnn, nnnK, or nnnM.)  SUPPRESS - Prevents writing to the JESLOG during execution  NOSPIN - No JESLOG spinning allowed (default)

Because JES2 can run in a MAS with differing levels, the JESLOG=SPIN facility will only work for address spaces both converted and executing on MAS member with z/OS JES2 1.2 or later.

3.13.10 JES2 recommendations Allocate the Spool volumes on volumes dedicated to JES2. Allocate the SPOOL data sets in a single extent. Use duplexing of the JES2 Checkpoint. Use a CF structure as well as DASD checkpoint.

Chapter 3. z/OS 147

Ensure the operations staff are familiar with the messages seen and the commands required to recover from a problem accessing the checkpoint. Establish automation to set or reset the Scheduling Environment values when a system is added to or leaves the sysplex.

3.14 WLM

The Workload Manager (WLM) provides sysplex-wide workload management capabilities based on installation-specified performance goals and the business importance of the workloads. The Workload Manager tries to attain the performance goals through dynamic resource distribution. WLM provides the Parallel Sysplex Cluster with the intelligence to determine where work needs to be processed and in what priority. The priority is based on the customer's business goals and is managed by sysplex technology.

All currently supported releases of z/OS require the use of WLM goal mode. This component of z/OS allows the system to classify all work in the sysplex assigning a “service class” to each item of work.

3.14.1 Service classes The service class is assigned by WLM based on classification rules in the WLM policy. These rules are supplied by the IBM customer. The service class has two important attributes: The goal of the work and the importance of the work. Each piece of work in the sysplex has a service class, and so MVS can ensure that the work that is the most important (has an importance value of 1) will meet its goal wherever possible. If all importance 1 work meets its goals then lower importance work will be managed.

This management is completely automatic. WLM will ensure that work from the network is routed to the system with the most capacity. It will move initiators from a heavily loaded system to another more lightly loaded system.

3.14.2 WLM recommendations Ensure that all work sysplex-wide has the appropriate classification. This will ensure that the work will be properly managed if systems enter or leave the sysplex. Ensure you have properly configured the workload routing features in TCP/IP, CICS, and other subsystems so that WLM can direct incoming work to the most appropriate server.

3.15 UNIX System Services

UNIX System Services is IBM’s implementation of UNIX on the z/OS platform.

UNIX was first introduced in MVS/ESA SP 4.3 under the title OpenEdition®, which met some POSIX standards but offered no entry into the network and did not include support for TCP/IP.

In OS/390 1.2, OpenEdition received full UNIX95 branding, having met the requirements of the Open Systems Foundation, which allowed it to be called UNIX.

z/OS UNIX is divided into two distinct pieces:

148 Achieving the Highest Levels of Parallel Sysplex Availability  The kernel is part of the BCP element of z/OS. It sends instructions to the processor, schedules work, manages I/O, and tracks processes, open files, and shared memory among other things. Other parts of the operating system or applications request the kernel’s services using assembler callable services (called syscalls). No work gets done in z/OS UNIX without involving the kernel.  The shell is the interactive interface with the kernel; it is a command interpreter and has a programming language of its own. The shell’s work consists of programs and shell commands and utilities run by shell users. The shell is accessed by the OMVS command issued by a logged-on TSO/E user ID from a 3270 workstation, or from telnet or rlogin commands from a workstation via TCP/IP.

In UNIX, the file system is hierarchical in nature. To support this in the z/OS environment, a special file type called Hierarchical File System (HFS) was created. The HFS data set can only be accessed through z/OS UNIX and contains UNIX format directories and files.

Using the z/OS UNIX mount command (or via the auto mounter facility), an HFS data set can be attached at a mountpoint in the existing file structure, thus adding its directories and files to the hierarchical file system tree.

3.15.1 Shared HFS Prior to OS/390 2.9, applications and users on one system in a sysplex had read-write access only to the file system associated with that system.

In order for file systems to be shared, the owning HFS data sets had to be mounted on each of the systems in read-only mode. From a management, recovery, and availability point of view this was very limiting.

SYS1 SYS2 SYS3

READ READ WRITE READ READ WRITE ONLY ONLY

FS1 FS2 FS3

Figure 3-4 Pre OS/390 2.9 shared HFS

In Figure SYS1 and SYS3 each have their own dedicated read-write HFS, and share FS2 in read-only mode. SYS2 does not have access to any file system.

OS/390 2.9 introduced support for Shared HFS, which allowed users on any system within the sysplex to access programs and data for both read and write access anywhere throughout the file system hierarchy within the sysplex.

Chapter 3. z/OS 149 With this support active, HFS data sets are still mounted on a single sysplex system in read-write mode, but the directories and files are available to the other sysplex systems by function shipping. As per pre OS/390 2.9, a HFS may be mounted on multiple sysplex systems; however, this still requires read-only mode.

FUNCTION SHIP

FUNCTION FUNCTION SHIP SHIP SYS1 SYS2 SYS3

READ READ WRITE READ READ WRITE ONLY ONLY

FS1 FS2 FS3

Figure 3-5 OS/390 2.9 shared HFS

In Figure 3-5, all systems in the sysplex can access all file systems, either directly or via another sysplex system using function ship.

This allows for greater availability of data in case of system outage. There is also greater flexibility for data placement as well as the potential for a single BPXPRMxx PARMLIB member to define all the file systems in the sysplex.

Function shipping involves using XCF Services to access the files and directories when they are not available directly. A request is passed to the system that owns the mounted file system to perform the I/O and return the results. This activity increases the I/O path length and will have a performance cost due to the additional overhead, particularly in file I/O and mount times.

Implementation of shared HFS requires:  The sysplex root HFS The sysplex root HFS is used as the sysplex-wide root. This data set must be mounted read-write and designated AUTOMOVE (refer to 3.15.2, “Automove” on page 152). Only one sysplex root is allowed for all systems participating in shared HFS. No programs or data reside in the sysplex root. It consists of directories and symbolic links only and is usually very small. The sysplex root provides access to all directories via symbolic links, which may contain the following substitutable symbols: – $SYSNAME - Resolves to the current system name – $VERSION - Resolves to the value specified for VERSION in the BPXPRMxx PARMLIB member By having substitutable strings in symbolic links, each sysplex system is able to reference different file structures using the same link. As a result, the sysplex root provides redirection to the appropriate directories for each sysplex system.

150 Achieving the Highest Levels of Parallel Sysplex Availability  The system-specific HFS The system-specific HFS is used by the system to mount system-specific data and should be mounted read-write and designated NOAUTOMOVE (refer to 3.15.2, “Automove” on page 152). One system-specific HFS is required for each system participating in shared HFS. It contains the necessary mount points for system-specific data (/etc., /var, /tmp, /dev) and the symbolic links to access sysplex-wide data (/bin, /usr, /lib, /opt, /samples). IBM recommends that the name of the system-specific data set contain the system name as one of the qualifiers. This allows a shared BPXPRMxx PARMLIB member to use the &SYSNAME symbolic.  The version HFS The version HFS is the IBM-supplied root HFS data set, which should be mounted read-only and designated AUTOMOVE (refer to 3.15.2, “Automove” on page 152). The version HFS contains system code and binaries including the /bin, /usr, /lib, /opt, and /samples directories. There is one version HFS for each set of systems participating in shared HFS and that are at the same release level (that is, using the same SYSRES volume). The mountpoint for the version HFS is dynamically created in the sysplex root HFS when the VERSION statement is used in the BPXPRMxx PARMLIB member. It is recommended that the VERSION statement use &SYSR1 symbolic, which will resolve to the SYSRES volume and tie all systems using the same SYSRES volume to the same version HFS.  OMVS CDSs Shared HFS requires CDSs of type BPXMCDS to hold the sysplex-wide mount table and information about all participating systems, and all mounted file systems in the sysplex. The parameters that need to be considered are: – MOUNTS, which specifies the number of concurrent mounts (including automounts) that can be supported by OMVS in the sysplex. – AMTRULES, which specifies the number of automount rules that can be supported by OMVS in the sysplex. This is a generic or specific entry in an automount map file. These values need to be carefully chosen, as over specification will over allocate the CDS, resulting in poor performance for reading and updating, while under specification will cause the function to fail once the limit is reached.  BPXPRMxx PARMLIB member A number of updates are required to the BPXPRMxx PARMLIB member to allow shared HFS: – SYSPLEX(YES) - Advises the system to join the SYSBPX XCF group to share HFS resources across the sysplex. The default is NO. Changing this parameter requires an IPL. VERSION(version) - A directory with the value version is dynamically created in the sysplex root HFS during IPL and is used as a mountpoint for the version HFS. The default is /. The version HFS contains system code and binaries including the /bin, /usr, /lib, /opt, and /samples directories. It is recommended that the VERSION statement use &SYSR1 symbolic, which will resolve to the SYSRES volume and tie all systems using the same SYSRES volume to the same version HFS.

Chapter 3. z/OS 151 – SYSNAME is a new parameter on the ROOT and MOUNT statements that specifies the particular system on which a mount should be performed. The system will then become the initial owner of the file system mounted. The default is the current system. – AUTOMOVE, NOAUTOMOVE, and UNMOUNT are new parameters on the ROOT and MOUNT statements used to decide what action is to occur when the system that owns the file system goes down. The default is AUTOMOVE. Refer to 3.15.2, “Automove” on page 152.  Automount policy management Prior to shared HFS, automount policies could be system specific with a sysplex. With shared HFS, all mounted file systems area available to all systems participating in shared HFS; therefore the policies should be identical in the /etc/auto.master associate map files maintained by each system.

3.15.2 Automove Automove is a feature of shared HFS that determines how file system ownership is to be managed if the owning system goes down. Valid options are:  AUTOMOVE specifies that ownership of the file system automatically moves to another system.  NOAUTOMOVE specifies that ownership of the file system is not moved and, as a result, the file system becomes inaccessible.

Prior to z/OS 1.4, the relocation of file system ownership was random and any of the other shared HFS participating systems could become the new owner. z/OS 1.4 has provided new parameters on the MOUNT statement to give an installation more control over this process:  AUTOMOVE(INCLUDE,sysname1,sysname2,...) may be used to specify a prioritized list of systems where the file system ownership may be moved.  AUTOMOVE(EXCLUDE,sysname1,sysname2,...) may be used to specify a list of systems where the file system ownership may not be moved.  UNMOUNT specifies that the file system should be unmounted, including any file systems mounted within its subtree.

It is recommended that the sysplex root and version HFS are designated as AUTOMOVE to allow another system to take ownership of these file systems when the owning system goes down.

The system-specific HFS, and other HFSs containing file systems that are managed on a system-specific basis (/dev, /tmp, /var, /etc) should be designated as NOAUTOMOVE to prevent change in ownership.

It is not advisable to mount AUTOMOVE file systems within NOAUTOMOVE file systems, as the file system concerned will not be recovered after a system failure until the failing system is restarted.

Specify NOAUTOMOVE for file systems that are mostly used by DFS™ clients. A file system can only be exported by the DFS server on the system that owns the file system. Once a file system has been exported by DFS it cannot be moved until it has been un-exported from DFS.

Unplanned outages When a system is removed from a sysplex, there is a window of time during which any file systems it owned will become inaccessible to other systems. This window occurs while other

152 Achieving the Highest Levels of Parallel Sysplex Availability systems are being notified of the system’s exit from the sysplex, and before they start the cleanup for the system.

During this period, all file operations for the unowned file system will fail until a new owner is established. In addition, applications using files in unowned file systems will need to close (BPX1CLO) those files and reopen (BPX1OPN) them after the file system is recovered.

File systems that are mounted NOAUTOMOVE will become unowned when the file system owner exits the sysplex. The file system will remain unowned until the original owning system restarts, or until the unowned file system is unmounted. Note that since the file system still exists in the hierarchy, the file system mountpoint is still in use.

Planned outages During planned outages, shared HFS file systems owned by the system to be removed from the sysplex should first be moved to an alternate owner as a system shutdown task, otherwise system removal from the sysplex will be treated as per an unplanned outage.

This can be done a number of ways:  Specifically via the chmount shell command: chmount -d sysname mountpoint  Specifically via the SETOMVS command specifying the file system HFS: SETOMVS FILESYS,FILESYSTEM=’dsn’,SYSNAME=sysname  Specifically via the SETOMVS command specifying the file system mountpoint: SETOMVS FILESYS,MOUNTPOINT=’mountpoint’,SYSNAME=sysname  Generically via system command using automove options: F BPXOINIT,SHUTDOWN=FILESYS This system command removes the file system ownership from the system in preparation for a system shutdown. File systems are moved according to the automove options. However, this command does not prevent the system from becoming a file owner again; therefore it is eligible to gain ownership of file systems from other systems that are also being shut down, including the file systems that it may have previously owned. Usage of this command needs to be coordinated when multiple system are being removed.  Generically via system command automove options: F BPXOINIT,SHUTDOWN=FILEOWNER This system command is similar to the SHUTDOWN=FILESYS; however, the system is disabled from becoming a file system owner until z/OS UNIX has been recycled. However, new mounts (where this system is the file owner) are not blocked. Usage of this command does not require coordination of shutdown processing with other systems in the sysplex; however, new mounts during the shutdown process need to be avoided.

Note that two sorts of shared HFS file systems should not be moved:  System-specific file systems.  File systems that are being exported by DFS. These need to be unexported first should a move be required.

Chapter 3. z/OS 153 3.15.3 zFS The z/OS Distributed File Service (DFS) zSeries File System (zFS) is a new z/OS UNIX file system available in z/OS 1.2 with retrofit support via ptf in z/OS 1.1 and OS/390 2.10 via OW51780.

zFS provides significant performance gains in accessing files approaching 8 KB in size (and greater) that are frequently accessed and updated. The access performance of smaller files is equivalent to that of HFS.

zFS is not a replacement for HFS, as HFS is still required for the root file system and for z/OS installation.

From an application point of view, using a zFS should be almost transparent to using a HFS, as the same APIs and commands are available, and it is mounted in the same way (for compatibility mode aggregates), although there are some additional options specified via the PARM parameter on the MOUNT statement or command. However, from an administration point of view, zFS is quite different, as it is allocated as a VSAM linear data set (VSAM LDS), requires formatting, and may require ongoing space management to enable growth.

zFS runs in a z/OS UNIX colony address space that is separate from the OMVS address space that runs HFS. This requires a ZFS cataloged procedure in PROCLIB that references the zFS parameter data set via the IOEZPRM DD card.

Aggregates Data sets that contain zFS file systems are known as zFS file system aggregates. These data sets are allocated as a VSAM LDS and can contain one or more zFS file systems. Formatting of the aggregate is required before any file systems can exist inside it. Once formatted, the zFS contains a root directory and may be mounted in the z/OS UNIX file system hierarchy. zFS aggregates may be formatted as one of two types:  Compatibility mode aggregates – Can only contain one zFS file system and therefore appears very similar to HFS data sets. – Can be mounted with the AUTOMOVE parameter in a shared HFS environment. – The VSAM LDS name is the same as the aggregate name and the file system name. – The file system size (that is, the quota) in a compatibility mode aggregate is set to the size of the aggregate. – Does not require attach of the aggregate before mount of the file system. Attach processing occurs automatically as part of the mount. – Supports automount.  Multi-file mode aggregates – Can contain multiple zFS file systems. Using this format it is possible to share space as deleted files in one file system free space that may be used by another file system in the same aggregate. – Not supported in a shared HFS environment.

– The VSAM LDS name is the same as the aggregate name; however, each file system has a separate name, which is defined when the file system is created. – Each zFS file system (that is, the quota) has a predefined maximum size. – Requires the aggregate to be attached before mounting the file systems, and the file systems can only be mounted on the system where the attach was done.

154 Achieving the Highest Levels of Parallel Sysplex Availability – Supports automount only after attach has done; however, this is not recommended. zFS aggregates must be attached by zFS before they can be used. Attach occurs in one of three ways:  At IPL (or when zFS is started) by putting the aggregate name in the IOEFSPRM file  On command by issuing the zfsadm attach command  Via mount of the file system if the aggregate is a compatibility mode aggregate

Metadata cache The zFS file system also has a cache for file system metadata (which includes directory information like owner information, permissions, data block pointers, etc.), as well as the data of files that are smaller than the aggregate block size. The size of the metadata cache is defined via the zFS parameters and the setting of this value is important to performance because zFs references the file system metadata frequently.

Clones zFS also supports file system clones, which are read-only point-in-time copies of the file system in the same aggregate. The clone operation happens relatively quickly and does not take up too much additional space as only the metadata is copied. Only one clone is allowed at a time and the name of the file system clone is the same as that of the file system with a .bak suffix.

A separate mount is required for a compatibility mode aggregate to access the clone, which must be read-only and designated NOAUTOMOVE if shared HFS is active. Both the file system and its clone must be mounted on the same system in the sysplex.

As soon as the clone is complete, the original file system is immediately available for update. When the original file system is updated, new blocks are allocated for the updates but the original blocks also remain allocated and are still pointed to by the clone metadata. When a reclone is done, the original blocks that have since been updated are returned to free space.

Recovery Every zFS aggregate contains a log file that is used to record transactions describing changes to the file system structure. The log file is created when the aggregate is formatted and defaults to 1 percent of the aggregate size, which is usually sufficient. zFS also provides a recovery mechanism that uses a zFS file system log to verify or correct the structure of an aggregate. All changes to metadata are logged and this log may be replayed if the system fails and the file system is restarted.

Sysplex considerations zFS file systems can be shared in a sysplex environment in the same manner as HFS file systems. With shared HFS support, all file systems that are mounted by a participating system are available to all participating systems.

When all members of a sysplex are at z/OS 1.2 or later and some or all systems are running zFS:  If a system running zFS is brought down or fails: – zFS compatibility mode file systems owned by the system that are defined AUTOMOVE are automoved to another system running zFS. If no other owner can be found then the file system becomes unowned. – zFS file systems that are defined NOAUTOMOVE become unowned.

Chapter 3. z/OS 155  If zFS is brought down on a system: – zFS compatibility mode file systems owned by the system that are defined AUTOMOVE are automoved to another system running zFS. If no other owner can be found, the file system and file systems mounted under it are unmounted in the sysplex. – zFS file systems that are defined NOAUTOMOVE are unmounted in the sysplex.

File systems that are unowned are not visible in the file system hierarchy, but can be seen from a D OMVS,F operator command. To recover a file system that is mounted and unowned, the file system must be unmounted.

When all members of a sysplex are not at z/OS 1.2 or later, and some or all systems are running zFS:  If a system running zFS is brought down or fails: As per all systems at z/OS 1.2 or later.  If zFS is brought down on a system: All zFS file systems in the sysplex are unmounted. For this reason this configuration is not recommended. Alternatives are to restrict zFS usage to only one member of a shared HFS group, or restrict zFS usage to images not participating in a shared HFS group.

Automove considerations Compatibility mode file systems can be mounted AUTOMOVE to allow ownership of the file system to be automatically moved to another system in the sysplex if the current owning system fails, or if zFS is brought down on the owning system. The following restrictions apply:  zFS must be running on the other system in the sysplex that will assume ownership.  Both systems must have zFS and USS support for USS aggregate awareness and USS automove support for zFS.

Compatibility mode file systems with a clone are generally not able to use AUTOMOVE, as both the read-write file system and the read-only clone must be mounted on the same system, and AUTOMOVE does not guarantee this. However, this function will work if the owning system is at z/OS 1/4 or later and the two file systems specify AUTOMOVE and have compatible SYSTEM lists.

zFS file systems in multi-mode aggregates must be owned and mounted on the same system in a sysplex with the NOAUTOMOVE option specified. z/OS UNIX is not aware of the relationship between zFS file systems in the same aggregate. Therefore the following tasks need to be done to move ownership of the file systems in a multi-file aggregate: 1. Manually unmount each file system in the multi-system aggregate. 2. Detach the aggregate. 3. Attach the aggregate on the other system. 4. Mount all the file systems on the other system.

Because all the file systems in a multi-system aggregate must be managed together, mounts for these file system should specify NOAUTOMOVE. Also, file systems in multi-mode aggregates should not be automounted, as automount always specifies AUTOMOVE, and automounted file systems might be mounted and owned by any system.

3.15.4 BRLM issues Users can lock all or part of a file that they are accessing for read-write purposes by using the byte range lock manager (BRLM).

156 Achieving the Highest Levels of Parallel Sysplex Availability Prior to z/OS 1.4 (or OW52293) In this environment, BRLM is initialized on only one system in the sysplex (that is, centralized BRLM). The first system that enters the sysplex does the initialization and becomes the owning system and all lock requests are sent to that system.

If the system owning BRLM fails, all history of byte range locks is lost. BRLM is reinitialized on another system in the sysplex and new locking can begin once recovery has completed.

However, to maintain locking and data integrity for files open on surviving systems after the system that owned the BRLM goes down, z/OS UNIX prevents further locking or I/O on open files that were locked, causing these applications to fail. In this situation, files must be closed and reopened before usage can continue.

z/OS 1.4 (or OW52293) While centralized BRLM is still the default, installations now have the option to implement distributed BRLM if desired.

Using this facility, BRLM is initialized on every system in the sysplex and each BRLM is responsible for handling locking requests for files whose file systems are mounted locally in that system.

To implement distributed BRLM: 1. Ensure the appropriate support for APAR OW52293 is on each system in the sysplex. 2. Format new BPXMCDS CDSs with ITEM NAME(DISTBRLM) NUMBER(1). 3. Activate the new BPXMCDS CDSs via SETXCF COUPLE commands. 4. Remove the central BRLM server from the sysplex, which eliminates all locking history and allows the new distributed BRLM servers to start with a clean locking history.

Distributed BRLM eliminates a single point of failure by maintaining locking services on the same system as the locking applications that are using the service.

3.15.5 UNIX System Services recommendations Designate the sysplex root and version HFS as AUTOMOVE to allow another system to take ownership of these file systems when the owning system goes down. Specify NOAUTOMOVE for file systems that are mostly used by DFS clients. A file system can only be exported by the DFS server on the system that owns the file system. Once a file system has been exported by DFS it cannot be moved until it has been un-exported from DFS. Plan and practice the commands required to move shared HFS files to another system in the Sysplex so that planned outages can be done without losing access to the file system. Use zFS in compatibility mode, that is, a single file system in the zFS aggregate. If BRLM is used, implement distributed BRLM to eliminate the single point of failure.

3.16 RACF

RACF retains information about users, resources, and access authorities in the RACF database and refers to the profiles when deciding which users should be permitted access to protected system resources.

There are a number of aspects to RACF that can affect your availability:

Chapter 3. z/OS 157  RACF protects resources, so it should be set up to provide adequate protection of all critical system resources, especially system data sets or any other resource that can impact the whole system or sysplex.  Because no resource can be accessed without first checking with RACF, it is critical that RACF is always available to respond to these queries, and also that it delivers responses in an acceptable time.  The RACF SPECIAL and OPERATIONS user IDs are especially critical resources, together with the user IDs associated with your batch scheduler and critical started tasks. Care must be taken with who can use these user IDs, and also to ensure that they never get revoked.

For performance and sizing reasons, RACF supports the RACF database being divided into multiple physical data sets. The name of each data set is defined in the RACF data set name table (ICHRDSNT) and the division of information amongst the multiple data sets is defined in the RACF range table (ICHRRNG).

For availability purposes, RACF also supports a backup RACF database concept, which may be used if the primary RACF database fails. Note that a backup data set is required for every primary data set defined. Once the installation has created the backup RACF database, RACF is able to maintain it automatically depending upon the customization options specified.

In terms of recovery, the primary and backup RACF databases should be failure isolated. The degree to which updates to the primary RACF database are reflected on the backup RACF database is determined by flag settings in the RACF Data Set Name Table (ICHRDSNT) as follows:  x’00...... ’ - No updates are reflected.  x’10...... ’ - All updates except for statistics are reflected.  x’11...... ’ - All updates including statistics are reflected.

RACF also supports a multi-system environment using a shared RACF database concept where both the primary RACF database and backup RACF database are shared amongst the participating systems.

Two types of sharing are supported:  Sharing between non-sysplex systems, and sharing between sysplex systems and non-sysplex systems In order to serialize access, RACF uses the hardware RESERVE facility. In this configuration it is recommended that DASD volumes are dedicated to the RACF databases to minimize contention.  Sharing between sysplex systems Where all the sharing systems are in a sysplex, RACF sysplex communication may be enabled, which allows RACF to facilitate system administration by the automatic propagation of various commands to the other members of the sysplex. In addition, with sysplex communication enabled, RACF may also activate data sharing for performance benefits, which uses CF structures as a RACF database cache.

These options are also set by flag settings in the RACF data set name table (ICHRDSNT) as follows: – x’.... 00....’ - No sysplex communication – x’.... 10....’ - Sysplex communication but non-data sharing at IPL – x’.... 11....’ - Sysplex communication and data sharing at IPL

158 Achieving the Highest Levels of Parallel Sysplex Availability Serialization of the RACF database where all members are in a sysplex depends upon whether data sharing has been enabled. Data sharing automatically causes RACF to use global ENQs instead of hardware RESERVES.

3.16.1 RACF sysplex communication When RACF is enabled for sysplex communication, an XCF group IRRXCF00 is defined and each sharing system joins the group as a member.

Certain RACF administration commands (for example, SETROPTS CLASSACT) update the inventory control block (ICB) in the RACF database, and therefore are propagated across the RACFplex automatically as each system reads the RACF database.

A number of other administration commands, however, are only local in scope and therefore must be entered on each system. With sysplex communication enabled, RACF is able to use XCF services to automatically propagate the following commands to all members of the RACFplex:  RVARY SWITCH  RVARY ACTIVE  RVARY INACTIVE  RVARY DATASHARE  RVARY NODATASHARE  SETROPTS RACLIST(classname) [ REFRESH ]  SETROPTS NORACLIST(classname)  SETROPTS GLOBAL(classname) [ REFRESH ]  SETROPTS GENERIC(classname) REFRESH  SETROPTS WHEN(PROGRAM) [ REFRESH ]

By having to only enter these commands once within a sysplex, security management is made easier and the potential for error is reduced.

When RACF is enabled for sysplex communication, the RACFplex operates in one of three modes:  Non-data sharing mode  Data sharing mode  Read-only mode

Which mode is being used is dependent upon the options chosen at IPL or changed dynamically, and the availability of the CF structures.

3.16.2 RACF non-data sharing mode In RACF non-data sharing mode, the CF is not used; therefore RACF uses the hardware RESERVE protocol to serialize access to the RACF database as per the non-sysplex scenario.

However, as all systems are in a sysplex, the hardware RESERVE can be converted into a global ENQ by defining a SYSZRACF RESERVE conversion RNL in the GRSRNLxx PARMLIB member.

3.16.3 RACF data sharing mode With RACF sysplex communication activated, RACF data sharing may also be enabled either dynamically or automatically at IPL via flag bit settings in the RACF data set name table.

Chapter 3. z/OS 159 In this mode, RACF is able to cache the RACF database in CF structures in order to decrease the RACF data set I/O rates. To enable RACF data sharing, the following tasks need to be done: 1. Define the CF structures in the CFRM policy: One structure is required for each physical RACF database as defined in the RACF data set name table. The following naming convention must be used: IRRXCF00_ayyy Where: – a is P for primary and B for backup. – yyy is the relative position of the data set in the data set name table (ICHRDSNT). There is no standard structure size for RACF. The structures operate as a cache for each RACF database; therefore may range in size from the minimum required to allow RACF to connect to the structure (that is, 4k * number of local buffers specified in ICHRDSNT, plus CF control information) through to a size that will cache the whole RACF database. Also, there is no requirement for the primary and backup structures to be of equal size. As the backup RACF database has far less activity, it should be sized at 20 percent of the primary database size in line with the in-storage buffer allocation. It is often the case that a relatively small percentage of the profiles in the RACF database account for a relatively high percentage of the I/O. Defining structure sizes that will accommodate these profiles will give the best performance benefit due to I/O rate decreases. Providing structure space beyond what is needed for these frequent use profiles is less beneficial and may waste valuable CF storage. Therefore it is important that an initial size is calculated and then adjusted based upon the particular site requirements to reach the optimum size. By monitoring the RACF database I/O rates and gradually increasing the size of the structure, an optimum size can be determined at the point where there is no significant reduction in the I/O rate. To simplify the task of calculating the size, IBM provides the CF Structure Sizer, a tool that estimates the storage size for each structure by asking questions based on your existing or planned configuration. The CF Structure Sizer uses the selected structure input to calculate the SIZE and INITSIZE values for the CFRM policy. Refer to: http://www.ibm.com/servers/eserver/zseries/pso Note that RACF does not support the structure alter function; therefore only the SIZE parameter should be specified. 2. Activate RACF data sharing. RACF data sharing may be activated either: – Via IPL with the appropriate flag bit set in the RACF data set name table. – Dynamically via the RVARY DATASHARE command. This command is automatically propagated to all members of the RACFplex via sysplex communication; therefore this only needs to be entered once. Once data sharing is activated, the RACFDS address space is started and remains alive for the life of the IPL even if data sharing is deactivated.

Note that where data sharing is activated, RACF automatically uses global ENQs to serialize access instead of hardware RESERVE; therefore the SYSZRACF RESERVE conversion RNL in the GRSRNLxx PARMLIB member is no longer required.

160 Achieving the Highest Levels of Parallel Sysplex Availability 3.16.4 RACF read-only mode RACF read-only mode is an emergency mode that is used when data sharing has been specified; however, a system is unable to access or use the CF structures.

Serialization in read-only mode is compatible with serialization in data sharing mode; therefore other systems in the sysplex that are not experiencing the error can continue to use the CF in data sharing mode while the problem is being rectified.

RACF prevents any updates from being made by a system in read-only mode, as updates cannot be reflected in the cache structures by the system that has lost access to the CF. However, read activity is still allowed via the RACF database.

Read-only mode remains until either the problem relating to the CF structures is resolved or data sharing is inactivated throughout the RACFplex via a RVARY NODATASHARE command.

Through ENF signaling, RACF is automatically notified of the Coupling Facilities availability once access has been reestablished and will automatically connect to the CF structures and enter data sharing mode.

3.16.5 RACF recovery procedures Errors can occur with both the RACF database and the CF structures for which recovery procedures should be defined for availability purposes.

RACF database failure RACF database errors can be logical or physical and different recovery methods are available in each case. In either case regular backups of the RACF database should be taken in case the error cannot be rectified and the only option is to restore.

Simple logical errors may be able to be fixed using standard RACF commands. RACF supports the following utilities that may assist in determining the location of the error:  IRRUT200 - RACF Database Verification Program  IRRDBU00 - RACF Database Unload Utility

Where the error cannot be rectified through normal commands, RACF supports the BLKUPD utility, which may be used to edit the RACF database directly. This utility can be used to examine and modify any block in a RACF database and should be used with extreme caution, as improper use could cause severe and unrecoverable damage.

Physical RACF database errors may be resolved by removing the failed data set. RACF supports the following RVARY commands to assist in RACF database management:  RVARY SWITCH Inactivates the current primary database and switches the current backup database to be the new primary. Requires entry of the switch_password at the system console as defined by the RACF administrator via the SETROPTS RVARYPW(SWITCH(switch_password)).  RVARY INACTIVE DATASET(dsn) Inactivates the database specified. Requires entry of the status_password at the system console as defined by the RACF administrator via the SETROPTS RVARYPW(STATUS(status_password)).  RVARY ACTIVE DATASET(dsn)

Chapter 3. z/OS 161 Activates the database specified. Requires entry of the status_password at the system console as defined by the RACF administrator via the SETROPTS RVARYPW(STATUS(status_password)).

The RVARY commands may be entered either as TSO commands or via the system console through the RACF address space. Management of the SWITCH and STATUS passwords should be part of an installation’s security management procedures.

RACF data sharing failure Most data sharing errors relate to access of the CF structures either through CF failure or link failure.

The RACF management of each of these errors is dependent upon whether data sharing has previously been established or not.

In an IPL situation where data sharing has been requested, if RACF cannot connect to the CF structure then the system is initialized in read-only mode until either the problem is resolved or data sharing is disabled. Refer to “RACF read-only mode” on page 161.

Failure to connect to a CF structure during IPL can be caused by:  CF failure  CF link failure  Structure not defined in CFRM policy  Structure too small to allow RACF to connect

Where a system has previously established data sharing mode and a failure occurs, two recovery options are possible:  If REBUILDPERCENT was specified in the CFRM policy for the RACF structure such that the percentage of system-weight (as specified in the SFM policy) losing connectivity has exceeded the limit, z/OS initiates a rebuild for that structure. A structure rebuild can be initiated by: –CF failure – CF link failure – CF structure failure – A SETXCF START,REBUILD command  If conditions for REBUILDPERCENT are not satisfied, or if the rebuild fails, the system enters read-only mode until either the problem is resolved or data sharing is disabled. Refer to “RACF read-only mode” on page 161.

Sysplex failure Total sysplex failure may require a system to be brought up in XCF-local mode in order to perform sysplex recovery.

If the RACF data set name table (ICHRDSNT) specifies that data sharing is required and the system is started in XCF-local mode, RACF will enter failsoft mode for the duration of the IPL.

Failsoft mode is extremely disruptive and degrades system performance and system security. In this mode RACF will not make decisions to grant or deny access. Instead each data set access is referred to the system operator, and for general resource classes a RC(4) is returned to the calling program, which can then decide whether access should be permitted or not.

Failsoft mode can also occur for other reasons (for example, failure of the RACF primary database, inactivation of the RACF primary database, RACF failure at initialization, etc.), and

162 Achieving the Highest Levels of Parallel Sysplex Availability therefore needs to be incorporated into availability considerations. Two methods are recommended to cater for failsoft mode:  Emergency user ID in the TSO User Attributes Data Set (UADS) This involves definition of an emergency user ID in the SYS1.UADS data set that has all the information required to build a TSO session without having to refer to RACF. This must be done in advance of failsoft mode being invoked, and management of this user ID (including password management) should be incorporated into the installation’s security procedures.  Emergency RACF data set name table (ICHRDSNT) An alternate ICHRDSNT load module without the data sharing specification can be placed in a library that can be activated at IPL via emergency LNKLSTxx or PROGxx PARMLIB members specified at IPL through LNK= and PROG= respectively.

3.16.6 PKI Services PKI Services allows an installation to establish a Public Key Infrastructure (PKI) configuration and serve as a Certificate Authority (CA) for internal and external users, issuing and administering digital certificates in accordance with organization policy.

Users can use a PKI Services application to request and obtain digital certificates through their Web browsers, and the PKI administrators can approve, modify, or reject these requests through a Web browser interface as well. The PKI Services interface can be highly customized such that automatic approval for digital certificate requests can be granted to some users while other users may need to provide additional authentication such as RACF user IDs.

Digital certificates can be issued for Web browsers, servers, and other purposes such as virtual private network (VPN) devices, smart cards, and secure e-mail.

The CA acts as a trusted third party to ensure that users who engage in e-business are able to trust each other. The CA vouches for the identity of each party through the certificates it issues, and the certificate contains the public key of the certificate owner, which allows other users to encrypt communications.

All certificates issued by the CA are digitally signed using the CA private key. Any attempt to alter the certificate invalidates this signature and renders the certificate unusable.

PKI Services and related components A typical PKI Services implementation requires the following components:  PKI Services installed in an z/OS HFS data set at /usr/lpp/pkiserv by default. The PKI Services product consists of a daemon and the callable services API.  A HTTP Web server to encrypt messages, authenticate requests, and transfer certificates to intended recipients.  A Lightweight Directory Access Protocol (LDAP) server to manage the directory that maintains information about the valid and revoked digital certificates that have been issued.  RACF to control who can use the callable services API as well as creating the certificate, key ring, and private key. RACF may also be used to store the CA private key if ICSF is not available.  Integrated Cryptographic Service Facility (ICSF) is an optional product that may be used to securely store the CA private key. ICSF is a software element of z/OS and OS/390 that

Chapter 3. z/OS 163 provides an API to the cryptographic hardware (that is, CMOS Cryptographic Coprocessor) and is the preferred mechanism for securely storing the CA private key.  Web administration application.  Web user application.

Parallel Sysplex support With z/OS 1.4 and later, PKI Services has been modified to exploit Parallel Sysplex environments by being able to start multiple independent instances of the PKI Services daemon on different systems in the sysplex.

Using this implementation, PKI Services can be configured to run in parallel on multiple sysplex systems using one common data store that provides both recoverability and availability features in a Parallel Sysplex environment.

To be able to use this feature:  All systems in the sysplex that run PKI Services must be at z/OS 1.4 or later.  All instances of PKI Services must share the same VSAM data sets using VSAM Record Level Sharing (RLS), which requires a CF and cache and lock structures.

3.16.7 RACF recommendations Establish a backup RACF database that is failure isolated from the primary. Since RACF uses RESERVE/RELEASE to serialize access to a shared RACF database, ensure that the DASD volumes used for the RACF database be dedicated. If the volumes are not dedicated, take a careful look at the GRS RNLs. If using Sysplex data sharing, spend time to calculate the ideal size of the structure, to minimize the I/O to the CF structure. Plan and practice the use of RACF commands required to recover from a failure in the RACF database. Install an emergency RACF data set name table that can be activated at IPL via emergency LNKLSTxx or PROGxx PARMLIB members specified at IPL. Ideally there should only be one RACF database for the whole sysplex. If there is more than one RACF database, DASD should only be accessible to systems in one RACFplex. If they are accessible to more than one RACFplex, it is possible, even likely, that the RACF profiles protecting those resources will not be identical in both RACFplexes. Sysplex DASD should not be accessible to any system outside the sysplex, for the same reason. Make sure operators know to never reply YES to whatever the RACF message is asking if it is OK to revoke a SPECIAL user ID. Additionally, as soon as they see this message, they should immediately inform someone in a position of authority. Have a separate RACF profile for each APF library, and other critical system resources. Use RACF groups for all access to all system resources. This makes it much easier to add new system programmers and remove old ones. Perform regular independent penetration tests. Should have the same RACF exits on each system, to ensure resources are protected uniformly, regardless of which system a given job runs on. Ensure no single points of failure for primary and backup RACF databases.

164 Achieving the Highest Levels of Parallel Sysplex Availability

If you for some reason need to recover an old version of your RACF databases, how would you identify all the changes that had taken place since that backup was taken? Maybe you can do something with RACF SMF records. Ensure that RACF SMF records are turned on and collected. Ensure the critical RACF messages (lost database, going to READ ONLY mode, etc.) are handled by an alert or by automation. Use NOPASSWORD for critical STCs so they cannot get revoked. Specify a password for the RVARY SWITCH command and ensure it is available somewhere if required. REVOKE IBMUSER, but do not delete it. It is not possible using normal RACF commands to delete IBMUSER, nor should you try to.

3.17 DFSMShsm

DFSMShsm is a licensed program that automatically performs space management and Availability Management in a storage device hierarchy.

Space management Space management ensures that space is available for extending existing data sets or allocating new ones by:  Expiration All DASD data sets support an expiration date. For non-SMS managed data sets, the expiration date may be specified when the data set is allocated. For SMS-managed data sets the expiration date is controlled by the SMS management class associated with the data set. Using expiration, space is made available on user volumes by deletion of data sets that have reached their expiry date.  Migration Data sets that have not been used for a specified period of time may be migrated, which involves relocation to a different volume (either DASD or tape) in a compressed (and optionally compacted) form that saves space on the volume. Two levels of migration are supported: – Migration Level 1 (ML1), which is always DASD volumes – Migration Level 2 (ML2), which may be DASD or tape volumes Migrated data sets cannot be used directly due to the compressed format. As a result, a recall operation is required whenever a migrated data set is referenced, which will reinstate the data set. Using migration, space is made available by moving data sets that have not been moved recently to a different location, and freeing overallocated space during migration and restore processes.

Availability Management Availability Management ensures backup copies of data sets are available for recovery purposes should the working copies be lost or corrupted. This occurs through two different backup mechanisms:  Full volume backups

Chapter 3. z/OS 165 DFSMShsm is able to take dumps of complete DASD volumes irrespective of whether data has changed.  Incremental backups Where a data set has changed since the last backup, a new backup is taken to ensure the changes are not lost.

Should it be required, users may request DFSMShsm to recover a data set by retrieving a backup copy from either the full volume or incremental backups. By default, the most recent backup is retrieve; however, users may be able to select older backups depending upon installation options.

Another component of Availability Management is the secondary host promotion function which may be used when a DFSMShsm host processor experiences a failure in a multi-system environment. Refer to 3.17.2, “Hot standby (Secondary Host promotion)” on page 167.

3.17.1 Common Recall Queue From z/OS 1.3 and later, DFSMShsm supports an HSMplex-wide common recall queue (CRQ), which supports balancing workload across the HSMplex by allowing all DFSMShsm hosts to process any recall in the HSMplex rather than just its own.

Benefits of the Common Recall Queue are:  Workload balancing by allowing all DFSMShsm hosts to participate in the recall workload, which optimizes resources and allows individual hosts to withdraw in either planned or unplanned scenarios with minimal impact. Alternatively, direct recall processing from resource constrained systems to other systems to provide additional resources.  Priority selection allows for all higher priority recall requests to be processed across the HSMplex rather than just within the scope of a single host.  Minimize tape mounts for concurrent requests by different DFSMShsm hosts and minimize contention differing DFSMShsm tasks that require tape volumes.  Support for multiple DFSMShsm hosts on the same system.  Bypass resource restrictions. Allow recalls to occur for a system that may not have access to tape devices, for example.

This CRQ is implemented through the use of a CF list structure defined in the CFRM policy with the name SYSARC_basename_RCL.

A minimum CF level of eight is required. Also, the CRQ is a persistent structure with nonpersistent connections, which means that the structure remains allocated even if all connections have been deleted.

The size of the CRQ list structure is dependent on the maximum number of concurrent recalls that may occur, and the percentage of recalls that require a unique ML2 tape. As this structure supports structure alter, it is recommended that INITSIZE value is calculated for normal recall activity and SIZE is calculated to cater for extreme situations.

To simplify the task of calculating the size, IBM provides the CF Structure Sizer, a tool that estimates the storage size for each structure by asking questions based on your existing or planned configuration. The CF Structure Sizer uses the selected structure input to calculate the SIZE and INITSIZE values for the CFRM policy. Refer to: http://www.ibm.com/servers/eserver/zseries/pso

166 Achieving the Highest Levels of Parallel Sysplex Availability This CRQ structure name is provided to DFSMShsm via the SETSYS COMMONQUEUE(RECALL(CONNECT(basename))) command, which may be specified:  As an initialization statement in ARCCMDxx, in which case DFSMShsm connects to the CRQ structure during startup processing  As a command after startup, in which case DFSMShsm connects to the CRQ structure and moves any requests from its own local queue to the CRQ

A host processes a request selected from the CRQ just like a request submitted on its own system, with two exceptions. First, messages that are routed to TSO are sent to the original host to issue to the appropriate user. Second, when the processing host completes a recall request, it notifies the original host, which then posts any WAIT requests as complete.

The processing host issues console and operator messages, and messages directed to the DFSMShsm logs go to the logs for the processing host.

Finally, a recall request can be canceled up until the point that it has been selected for processing by a host in the HSMplex; however, the cancel request must be submitted on the same host that originated the recall request that is to be cancelled.

3.17.2 Hot standby (Secondary Host promotion) DFSMShsm categorizes hosts based upon functions that they perform:  A DFSMShsm primary host is the only host in a HSMplex that can perform primary-level functions.  A DFSMShsm SSM host is a host that performs secondary space management (SSM) functions. Usually SSM runs on a single host, which may be the primary host or a specific SSM host. However, it may be defined on multiple hosts with either different or identical windows and cycles defined. Note that the SSM host may also be the primary host.  A DFSMShsm secondary host is a host that is not assigned to perform primary host or SSM responsibilities.

Secondary host promotion is a DFSMShsm 1.5 and later function that uses XCF services to support a secondary host:  Taking over the unique functions of a failed primary host  Taking over the SSM functions from a failed SSM host An existing SSM host cannot take over SSM functions from a failed SSM host because the failed host SSM parameters may conflict with SSM parameters already being used.

Each host in the HSMplex specifies the level to which it may participate in secondary host promotion via: SETSYS PROMOTE PRIMARYHOST(YES | NO) SSM(YES | NO)

Specifying YES indicates that the host concerned is able to take over the corresponding function.

When a primary or SSM host becomes disabled, all DFSMShsm hosts in the HSMplex are notified through XCF services. Any host that is eligible via the SETSYS PROMOTE setting to perform the functions of the failed host will attempt to take over the failed host. The first host that successfully takes over for the failed host becomes the promoted host. If a promoted host also fails, the process is repeated. Any remaining host that is eligible for promotion for the failed function will take over.

Chapter 3. z/OS 167 There is currently no means available for assigning an order to which hosts take over the functions of a failed host.

3.17.3 Use of record level sharing (RLS) for CDSs

DFSMShsm control data sets are system-type data sets that DFSMShsm uses to keep track of all DFSMShsm-owned data. They consist of migration (MCDS), backup (BCDS), and offline (OCDS) control data sets.

In a multi-system environment, DFSMShsm needs to serialize access to the CDSs in order to maintain integrity. The serialization method used in the HSMplex is specified via initialization options:  CDSSHR=YES and CDSQ=YES cause the CDSs to be serialized with an exclusive GRS global ENQ. The major resource name (that is, qname) is ARCENQG, and the minor resource name (that is, rname) is ARCxCDS (where x = B, M, or O, depending on which CDS is being accessed).  CDSSHR=YES and CDSR=YES cause the CDSs to be serialized via hardware RESERVE with a qname of ARCGkPA and rname of ARCxCDS. This can be specified in combination with the CDSQ option.  CDSSHR=RLS causes the CDSs to be accessed in VSAM record level sharing (RLS) mode. VSAM RLS uses the CF to perform data set level locking, record level locking, and data caching. VSAM RLS uses the conditional write and cross-invalidate functions of the CF cache structure, thereby avoiding the need for control interval (CI) locking.

All hosts in the HSMplex must use the same serialization technique. DFSMShsm will fail at startup if the selected serialization mode of a starting DFSMShsm is incompatible with the CDS serialization mode of another DFSMShsm host that is already actively using the same CDSs.

VSAM RLS DFSMShsm supports VSAM KSDS extended addressability capability that uses record level sharing (RLS).

By accessing the CDSs in VSAM RLS mode, DFSMShsm is able to:  Exploit CF locking to enforce serialization of access.  Use CF structure rebuild features to aid in CDS recovery and availability during system failure scenarios.  Reduce contention between the DFSMShsm hosts as the locking occurs at the record level as opposed to the Control Interval, data set, or volume level as per the other serialization techniques.  Use VSAM extended addressability to allow the CDSs to grow beyond existing maximum sizes. Currently, BCDS and MCDS are limited to a maximum size of 16 gigabytes (that is, via multicluster support consisting of up to 4 x 4 gigabyte data sets), and the OCDS is limited to a maximum size of 4 gigabytes (that is, multicluster not supported).

3.17.4 DFSMShsm recommendations Set up HSM to ensure good backups of application files are taken. Rehearse the recovery scenarios so that recovery of lost data can be automated, or at least performed quickly.

168 Achieving the Highest Levels of Parallel Sysplex Availability

Configure multiple HSM servers on separate MVS images so that HSM can continue if an MVS is taken down.

3.18 Catalog

An ICF catalog consists of two separate kinds of data sets:  A basic catalog structure (BCS) The BCS is a VSAM key-sequence data set. It uses the data set name of entries to store and retrieve data set information. For VSAM data sets, the BCS contains volume, security, ownership, and association information. For non-VSAM data sets, the BCS contains volume, ownership, and association information.  A VSAM volume data set (VVDS) The VVDS is a VSAM entry-sequenced data set. A VVDS resides on every volume that contains a VSAM or SMS-managed data set cataloged in an ICF catalog. It contains the data set characteristics, extent information, and the volume-related information of the VSAM data sets cataloged in the BCS. If you are using the Storage Management Subsystem (SMS), the VVDS also contains data set characteristics and volume-related information for the non-VSAM SMS-managed data sets on the volume. VVDS records for VSAM data sets are called VSAM volume records (VVRs). VVDS records for SMS-managed non-VSAM data sets are called non-VSAM volume records (NVRs). Where a non-VSAM data set spans multiple volumes, the associated NVR is located in the VVDS of the data set’s first volume.

3.18.1 VVDS mode catalog sharing A shared catalog is a BCS that is eligible to be used by more than one system.

Communication of information between the sharing systems is via the VVR record for the catalog in the VVDS on the volume where BCS is located. This record is used to:  Ensure the consistency of the catalog records that are cached on any sharing system.  Update the BCS control block structure in those cases where a sharing system has extended beyond the current high used value or to a new extent.  Invalidate BCS data and index buffers when they have been updated from a sharing system.

VVDS mode is the default mode of sharing and requires the BCS to be defined with SHAREOPTIONS(3,4) and reside on a volume that is defined as shared in the IODF of all sharing systems. Catalog corruption will occur if either of these prerequisites is not met.

Accessing the special record in the VVDS requires additional I/O which may become significant in shared environments and impact performance.

Serialization of access to the catalogs is via hardware RESERVE:  SYSIGGV2 reserve serializes access to the BCS.  SYSVVDS reserve serializes access to the VVDS.

Where there are multiple systems sharing multiple catalogs, and the catalogs are on volumes that also contain data sets, it is possible for deadlocks to occur.

Figure on page 170 shows a deadlock situation.

Chapter 3. z/OS 169  SYS1 has VOLSER1 reserved (via SYSIGGV2 reserve for CATALOG A) and is trying to obtain a reserve for VOLSER2 (either SYSVTOC or SYSVVDS for DATASET A).  SYS2. has VOLSER2 reserved (via SYSIGGV2 reserve for CATALOG B) and is trying to obtain a reserve for VOLSER1 (either SYSVTOC or SYSVVDS for DATASET B).

Prevention of such deadlocks can be achieved by making changes to the GRSRNLxx PARMLIB members:  Add an entry to the RESERVE conversion RNL to always convert the SYSIGGV2 reserve to a SYSTEMS ENQ. This will prevent the reserve for the volume that contains the BCS.  Add an entry to the SYSTEMS exclusion RNL for SYSVVDS to prevent double serialization by retaining the reserve but suppressing the SYSTEMS ENQ. There is no requirement to convert this reserve.

SYS1 SYS2

VOLSER1 SYSIGGV2 CATALOG A SYSVVDS or SYSVTOC DATASET B

VOLSER2 SYSIGGV2 SYSVVDS or CATALOG B SYSVTOC DATASET A

Figure 3-6 Shared catalog deadlock

3.18.2 Enhanced catalog sharing (ECS) Enhanced Catalog Sharing (ECS) has been designed to reduce the overhead associated with shared catalogs. ECS eliminates the I/O required to the VVR record for each eligible catalog by maintaining catalog change information in a CF cache structure, resulting in performance for shared catalogs that is similar to the performance of non-shared catalogs.

For a catalog to use ECS sharing, the following conditions need to be met:  The catalog must be defined with SHAREOPTIONS(3,4).  The catalog must be located on a shared volume as defined in the IODF.  The ECS cache structure (SYSIGGCAS_ECS) must be defined in the CFRM policy.

170 Achieving the Highest Levels of Parallel Sysplex Availability The size of the ECS cache structure is dependent of the maximum number of catalogs that are to be ECS-managed. As this structure supports Structure Alter, it is recommended that INITSIZE value is calculated for normal catalog requirements and SIZE is calculated to cater for future expansion. To simplify the task of calculating the size, IBM provides the CF Structure Sizer, a tool that estimates the storage size for each structure by asking questions based on your existing or planned configuration. The CF Structure Sizer uses the selected structure input to calculate the SIZE and INITSIZE values for the CFRM policy. Refer to: http://www.ibm.com/servers/eserver/zseries/pso  A successful connection has been made to the ECS cache structure. The Catalog Address Space (CAS) automatically attempts to connect to the ECS cache structure as soon as possible during its initialization. If a connection was not established at initialization, or if the connection was lost, it can be added dynamically via the F CATALOG,ECSHR(CONNECT) command and removed via the F CATALOG,ECSHR(DISCONNECT) command.  The catalog must have the ECSHARING attribute set. The ECSHARING attribute of a catalog can be altered at anytime via the IDCAMS ALTER statement, as shown in Example 3-2.

Example 3-2 IDCAMS Alter statement to change ECSHARING attribute

//ECSACT EXEC PGM=IDCAMS //SYSPRINT DD SYSOUT=A //SYSIN DD * ALTER ucatname ECSHARING

If you remove the attribute and the catalog is currently using ECS mode, it will be converted back to VVDS mode on all systems that are sharing it.  The catalog must have been added to ECS. A catalog may be manually added to ECS via the F CATALOG,ECSHR(ADD,catalog) command if all of the previous conditions have been met. Alternatively, catalogs may be added automatically upon next reference to the catalog via the F CATALOG,ECSHR(AUTOADD) command. This command will not add catalogs that have been specifically removed, or whose last accessed was a non-ECS system.  Once you decide that you want to continue with using ECS in production, you should specify a “Y” in column 72 of the SYSCAT keyword in your LOADxx member. This ensures that ECS will be enabled automatically should there be a sysplex IPL.

3.18.3 Catalog integrity Access Method Services (AMS) provides the EXAMINE command to check the structure of a BCS, and the DIAGNOSE command to check the content of a BCS or VVDS. Access to use these commands is controlled by STGADMIN.IGG.** FACILITY class profiles.

EXAMINE The AMS EXAMINE command analyzes and reports on the structural integrity of the index and data components of a key-sequenced data set (KSDS) cluster and of a variable-length relative record data set (VRRDS) cluster. In addition, EXAMINE can analyze and report on the structural integrity of the basic catalog structure (BCS) of an ICF catalog.

To test the structure of a BCS, perform the following steps:

Chapter 3. z/OS 171 1. Use the AMS VERIFY command to verify that the VSAM information in the catalog is current. 2. Execute the AMS ALTER command to lock the catalog and prevent updates while the structure is being inspected. 3. Execute the AMS EXAMINE command to test the index and data components of the BCS for structural integrity. 4. If the BCS has structural errors, recover the catalog from the most recent structurally sound backup copy. 5. Unlock the catalog.

DIAGNOSE Error situations involving catalog recovery or full-volume restore may cause catalog entries to become unsynchronized such that information about the attributes and characteristics of a data set are different in the BCS, VVDS, and VTOC.

The AMS DIAGNOSE command can check for synchronization errors between a BCS and VVDS, or between a VVDS, BCS, and VTOC. DIAGNOSE also checks for invalid data and invalid relationships between dependent entries. Note that this requires READ access to the facility class STGADMIN.IDC.DIAGNOSE.CATALOG profile.

To analyze a BCS entry: 1. Determine related VVDSs via LISTCAT LVL(SYS1.VVDS) CAT(catalog). 2. Use DIAGNOSE ICFCATALOG and specify required VVDSs in the COMPAREDD parameter.

To analyze a VVDS entry: 1. Determine related BCSs via PRINT COUNT(1) to print the first record of the VVDS. 2. Use DIAGNOSE VVDS and specify required BCS’s in the COMPAREDD or COMPAREDS parameters.

IBM strongly recommends that the DIAGNOSE and EXAMINE functions are performed regularly for all catalogs in an installation. These commands can be combined with regular backup functions to ensure that backups taken of catalogs are structurally valid, as well as helping to identify as early as possible when a catalog has possibly been damaged.

3.18.4 Catalog performance The simplest method of improving catalog performance is to use cache to maintain catalog records within main storage of a data space. Using cache reduces the I/O required to read from catalogs on DASD.

Two kinds of cache areas are available exclusively for catalogs; however, an individual catalog cannot use both types simultaneously:  In-storage cache (ISC) This is the default caching mechanism. Each catalog will use ISC unless it is specified to use CDSC, or unless ISC caching has been explicitly stopped for this catalog. The ISC cache resides in main storage within the catalog address spaces (CAS). Space usage is dependent upon the type of catalog: – Master catalogs have unlimited access to ISC cache space and all eligible records in the master catalog are cached as they are read. To prevent the ISC from using an

172 Achieving the Highest Levels of Parallel Sysplex Availability excessive amount of storage you should keep the number of entries in the master catalog to a minimum (that is, user catalogs, aliases, and system related data sets only). – User catalogs are alloted a fixed amount of ISC cache space each which is managed on a least-recently-used record basis once full. ISC caching for a catalog is started automatically unless the catalog is CDSC cached. Caching may be stopped via the F CATALOG,NOISC(catalog) and restarted via the F CATALOG,ISC(catalog) commands if required.  Catalog data space cache (CDSC) The CDSC cache resides in a data space which is defined via the COFVLFxx PARMLIB member. This uses the virtual lookaside facility (VLF) and requires the VLF address space to be active. Unlike ISC, catalogs cached in CDSC are not limited to a specific amount of storage each. The total amount of CDSC cache available (defined by the MAXVIRT parameter) is shared by all eligible catalogs and is managed on a least-recently-used record basis once full. CDSC caching may be started or stopped by adjustment to the COFVLFxx PARMLIB member and recycling the VLF address space. Also, F CATALOG,NOVLF(catalog) and F CATALOG,VLF(catalog) commands are available to allow dynamic stop and restart if required.

Before each physical access to a shared catalog, special checking is performed to ensure that the ISC or CDSC cache contains current information. This checking also ensures that the access method control blocks for this catalog are updated in the event the catalog has been extended or otherwise altered from another system. While this checking maintains data integrity, it also affects performance because the VVR for a shared catalog must be read before using the ISC or CDSC cache. Enhanced Catalog Sharing minimizes this overhead by placing the VVR for the shared catalog in a CF cache structure.

If a catalog uses ISC and a sharing system updates any record in the catalog, catalog management releases and recreates a new ISC for the catalog, irrespective of whether the updated record was in the ISC cache or not. Therefore, to minimize this effect, it is best to use ISC caching for catalogs like the master catalog that are predominantly read only.

Where CDSC caching is used and a sharing system updates any record in the catalog, catalog management is able to reflect the change in the CDSC cache on a record by record basis. As a result, CDSC caching is good for catalogs that have significant update activity.

3.18.5 Catalog sizing Because ICF catalogs use variable-length spanned records it is not possible to precisely calculate the amount of space that a catalog requires however it is important to size catalogs with sufficient space to allow for growth well into the future. A full catalog can have a severe impact on a production environment, as can the outage required to perform the reallocation.

Catalogs can have up to 123 extents but can only occupy space on a single volume. Generally, secondary extents to not cause problems for catalogs. However, if the number of secondary extents becomes excessive performance may be impacted in which case a catalog reorganization may become necessary.

BCS Creation of a catalog is via the IDCAMS DEFINE USERCATALOG command, which allows for direct specification of parameters, or alternatively an SMS data class may be used either

Chapter 3. z/OS 173 through direct reference or assigned via the SMS ACS routines. The parameters relevant to catalogs are:  Space May be specified in kilobytes, megabytes, tracks, cylinders, or records. Where records is used the value specified is multiplied by the RECORDSIZE parameter, which defaults to 4086 if not specified. When allocating catalog space: – Try to minimize the number of extents by making the primary extent an adequate insize. The manual DFSMS: Managing Catalogs, SC26-7409-01, has byte estimates for each type of catalog entry, which can be used to approximate the overall size requirements. – Always specify a secondary allocation to allow expansion. – Specify a minimum of 1 cylinder for both primary and secondary allocation. Refer to the control area size for details.  Control interval size Selecting a value of 4096 or 8192 provides an acceptable compromise between minimizing data transfer time and the occurrence of records that span control intervals.  Control area size The control area size for the data component is the smaller of the primary allocation quantity, the secondary allocation quantity, or 1 cylinder. Catalog performance is optimized with a control area size of 1 cylinder, which is the maximum size available.

VVDS VVDSs have the naming convention SYS1.VVDS.Vvolser where volser is the volume serial number of the volume on which they are defined. VVDS data sets are not required to be defined in the master catalog.

A VVDS can be defined either:  Implicitly when the first VSAM or SMS-managed data set is defined on the volume This automatically creates a VVDS of size TRK(10,10) and relates the VVDS to the BCS in which the VSAM or SMS-managed data set is cataloged.  Explicitly via the IDCAMS DEFINE CLUSTER command An explicitly defined VVDS is not related to any BCS at allocation time. As data sets are allocated on the VVDS volume, each BCS with VSAM or SMS-managed data sets residing on that volume is related to the VVDS. Where the VVDS is being explicitly defined, it is important that sufficient space is allocated to minimize the possibility of a full VVDS, which can be quite disruptive. The default implicit value of TRK(10,10) should be sufficient in most circumstances.

3.18.6 Catalog backup and recovery

Catalogs are important system data sets therefore a backup strategy needs to be in place in the event that a catalog recovery is required.

The age of the backup is important. An installation needs to make a trade-off between backup frequency versus the time spent performing catalog recovery should recovery be required.

174 Achieving the Highest Levels of Parallel Sysplex Availability Also, the backup strategy for the master catalog may need to be different than for usercatalogs as a system may not be available from which to perform recovery.

Physical backup A physical backup is taken as part of a full-volume DFSMSdss dump.

IBM recommends periodic backups of the volume that contains the master catalog to cater for the situation of complete system loss due where the master catalog represents a single point of failure.

Possible backups could be:  A DFSMSdss full-volume copies to a similar geometry DASD volume with a different volume serial. Recovery could be IPL with an alternate LOADxx PARMLIB or IPLPARM member referencing the different volume serial for the master catalog, or a stand-alone ICKDSF reformat procedure to change the different volume serial to the correct value prior to IPL.  A DFSMSdss full-volume dump to tape media with recovery using stand-alone DFSMSdss to perform the volume restore.

In either case, well documented and practiced procedures for the recovery of the master catalog volume are necessary to minimize the duration of the recovery—especially for mission critical systems.

Logical backup A logical backup is where the catalog is backed up through direct specification and may be done using:  An AMS EXPORT TEMPORARY command.  A DFSMSdss logical DUMP command  A DFSMShsm BACKDS command or DFSMShsm backup via management class settings.

Catalog aliases are automatically included in the backup for each of the logical backup methods outlined. Recovery of the catalog from the backup will automatically cause the aliases to be redefined and merged with the existing aliases when DFSMSdss or DFSMShsm are used, and aliases will also be processed by AMS IMPORT if the ALIAS keyword is specified.

Advanced Copy Services Advanced Copy Services are a collection of functions that may be used in disaster recovery, data migration, and data duplication scenarios. Using ACS, data is copied to another storage device on either a dynamic or point-in-time basis depending upon which product is used:  Dynamic Two remote copy products are provided by IBM that work on a dynamic basis: – Extended Remote Copy (XRC) – Peer-to-Peer Remote Copy (PPRC) Both of these products constantly update the secondary copy as applications make updates to the primary data source. Installations should include volumes that contain the master catalog and key usercatalogs in the volumes that are maintained via remote copy.  Point-in-time These functions take an instantaneous copy of the data at a particular point in time:

Chapter 3. z/OS 175 – Shapshot for the RAMAC Virtual Array (RVA) – Flashcopy for the Enterprise Storage Server (ESS) Both these products work at the volume rather than data set level and therefore may be compared with the DFSMSdss full-volume backup.

Note that the functions provided by Advanced Copy Services only support media failure recovery scenarios. Logical structure errors will be propagated by these functions to the backup copy.

Integrated Catalog Forward Recovery Utility (ICFRU) ICFRU is a catalog recovery tool that may be used to recover a catalog to currency by using a point-in-time catalog backup and applying subsequent catalog modifications as captured in SMF records.

The point-in-time catalog backup used by ICFRU must be the portable copy created by AMS EXPORT. No other backup type is supported. If an AMS EXPORT portable copy is not immediately available, it may be generated by recovering the catalog from the most recent backup irrespective of format, and then doing the AMS EXPORT.

The SMF data input to ICFRU must include record types 61, 65, and 66 from all systems to which the damaged catalog is connected. These SMF records contain the information required to update the catalog to the point of failure. A start date/time and end date/time must be specified such that only records after the time of the portable backup are used in the roll-forward process.

Output from the ICFRU is a new portable file that may be used in an AMS IMPORT command to recover the catalog.

Note that a master catalog can be recovered using this process but only if it is being referenced as a usercatalog from a connected system.

3.18.7 Catalog security Three areas of security relate to catalogs and should be reviewed to ensure the personnel in charge of catalog definition, maintenance, and recovery have appropriate authority.  Catalog access Access to the data set profile protecting the catalog is important as it is used in security checking for functions other than creating and deleting data sets: – To define entries in a catalog, users need UPDATE access to the catalog. – To delete entries in a catalog, users need ALTER access to the data set or ALTER access to the catalog. – To perform various catalog recovery tasks, users need ALTER access to the catalog.  FACILITY class IGG.CATLOCK profiles The IGG.CATLOCK profile, in conjunction with normal security checking, controls who can lock a catalog, and who can access a locked catalog. Users responsible for catalog maintenance and recovery should have READ access to IGG.CATLOCK and ALTER access to the catalog.  FACILITY class STGADMIN.IDC.** and STGADMIN.IGG.** profiles A number of FACILITY class profiles may be used to restrict the ability of users to perform Storage Management tasks associated with catalogs.

176 Achieving the Highest Levels of Parallel Sysplex Availability – STGADMIN.IDC.DIAGNOSE.CATALOG Restricts IDCAMS DIAGNOSE against catalogs. – STGADMIN.IDC.DIAGNOSE.VVDS Restricts IDCAMS DIAGNOSE against VVDSs. – STGADMIN.IDC.EXAMINE.DATASET Restricts IDCAMS EXAMINE against catalogs. – STGADMIN.IGG.ALTBCS Restricts the ability to alter BCS attributes. – STGADMIN.IGG.ALTER.SMS Restricts the ability to alter STORCLASS or MGMTCLASS of an SMS-managed data set. – STGADMIN.IGG.ALTER.UNCONVRT Restricts the ability to alter an SMS-managed VSAM data set to an unmanaged VSAM data set. – STGADMIN.IGG.DEFDEL.UALIAS Allows the ability to delete or define aliases related to usercatalogs without further authorization checking. – STGADMIN.IGG.DEFNVSAM.NOBCS Restricts the ability to define an SMS-managed non-VSAM data set with no BCS entry. Only an NVR is created in the VVDS. – STGADMIN.IGG.DEFNVSAM.NONVR Restricts the ability to define an SMS-managed non-VSAM data set with no NVR entry in the VVDS. Only a BCS entry is created. – STGADMIN.IGG.DELETE.NOSCRTCH Restricts the ability to use DELETE NOSCRATCH for an SMS-managed data. – STGADMIN.IGG.DELGDG.FORCE Restricts the ability to use DELETE FORCE on a GDG, which contains an SMS-managed generation data set. – STGADMIN.IGG.DELGDG.RECOVERY Restricts the ability to use DELETE RECOVERY on a GDG that contains an SMS-managed generation data set. – STGADMIN.IGG.DELNVR.NOBCSCHK Restricts the ability to delete the NVR in the VVDS for an SMS-managed non-VSAM data set and bypass the associated catalog name and BCS entry checking. – STGADMIN.IGG.DIRCAT Restricts the ability to direct catalog requests to a specific catalog for an SMS-managed data set, bypassing the normal catalog search. – STGADMIN.IGG.DLVVRNVR.NOCAT Restricts the ability to delete a VVR or NVR without an associated catalog. – STGADMIN.IGG.LIBRARY Restricts the ability to DEFINE, DELETE, or ALTER tape library and tape volume entries.

The STGADMIN.IGG.** profiles are only valid for SMS-managed data sets. Access to these profiles should be provided to storage administrators only as they allow standard SMS processing to be bypassed. They are intended for recovery purposes only rather than day to day use by the general user population.

Chapter 3. z/OS 177 3.18.8 Catalog recommendations Make careful plans for the GRS RNLs associated with the Catalog ENQs. Lockouts can occur if the RNLs are not set up carefully.

Make sure that a catalog that is shared is defined as SHAREOPTIONS(3,4) and resides on a volume defined as shared in the IODF of all sharing systems. Use ECS to improve the performance of shared catalogs. Use VLF to provide caching to improve performance.

3.19 Software maintenance

IBM regularly makes available modifications to current releases of code. These modifications may be to fix a coding problem that has been reported to IBM by a customer or discovered internally within IBM, or to provide new function.

Where a legitimate coding problem, or bug, has been discovered, IBM open an Authorized Program Analysis Report (APAR), which is given a unique number for tracking, and a target date for solution. When the support team responsible for the code identifies the problem, a Program Temporary Fix (PTF) is released that may be applied to solve the problem. The PTF is also supplied to the development group working on the next release of the product to ensure problem rectification into the future. As the PTF is temporary, the development group may decide to incorporate the PTF directly into the new release, or redesign the code in the new release to eliminate the problem. A fix is not considered permanent until it is incorporated into the base release of a product.

When applied, the PTF closes the APAR. Note that it is possible for a single PTF to close multiple APARS. Also, a single APAR may have multiple PTFs as the problem may occur in multiple releases of the product and a separate PTF is required for each release. In this situation it is important that installations order and apply the correct PTF for their level of software. Alternatively, installations may need to order and apply multiple PTFs where they are running multiple releases of the same product.

IBM also may use the APAR and PTF methodology to release new function modifications. Normally new function modifications are incorporated into the next release of a product. However, in some cases, IBM will make modifications to the current release and possibly previous releases that are still being supported. An APAR is opened to describe the new function, and as per the error scenario, multiple PTFs maybe designed and made available depending upon the number of product releases for which the new function is being provided.

PTFs have a hierarchy to ensure fixes that have dependencies are processed correctly. A PTF may totally supersede a previous PTF if all the components of the previous PTF are being re delivered. Also, a PTF may have prerequisite PTFs that must be applied before or at the same time, and corequisite PTFs that must be applied at the same time. In all cases, a PTF supersedes its related APARS to indicate that the problem has been closed.

3.19.1 Types of maintenance

The types of maintenance are corrective or preventative.

Corrective Corrective maintenance is where installations apply PTFs in response to a problem that has occurred at their installation. When a problem occurs, installations can report the problem to

178 Achieving the Highest Levels of Parallel Sysplex Availability IBM, who can check their world-wide databases to see if the problem has been previously reported and, if so, immediately provide the appropriate fixing PTF if it is available.

If the problem has not been previously reported, IBM may request the installation to provide appropriate diagnostic information to assist in determining if it is a legitimate coding problem, and if so open an APAR. Note that a lot of problems reported to IBM may not result in an APAR, as they may be due to errors in product installation, configuration, usage, or errors resulting from OEM hardware or software beyond IBM control.

Due to the PTF hierarchy, it is possible that a single PTF may cause multiple PTFs to be applied to ensure prerequisites and corequisite requirements are met. These other PTFs will resolve other problems, and could potentially also incorporate new function.

Preventative Preventative maintenance is where an installation applies PTFs to prevent problems from occurring that they may not yet have encountered. In addition, new function PTFs may also be applied as part of this maintenance, either directly or as a corequisite or prerequisite PTF for other PTFs that resolve errors.

3.19.2 Classification of maintenance In this section we discuss the classification of maintenance.

PUTyymm All PTFs released by IBM are given a PUTyymm SMP/E SOURCEID that indicates year and month of inclusion in the program update tape.

Originally, IBM supplied a PUT tape to clients on a monthly basis and recommended the application of all the related PTFyymm maintenance. The PUT tape delivery mechanism has since been superseded by other delivery mechanisms (ESO, SUF, ShopzSeries, etc.); however, the PUTyymm indication is still used.

The PUTyymm SOURCEID may be used in SMP/E to select maintenance to be applied or accepted on a collective basis rather than specifying each PTF; however, IBM’s current recommendation is to use the RSUyymm indication instead.

RSUyymm In addition to the PUTyymm SOURCEID, IBM has defined an RSUyymm SOURCEID to highlight recommended maintenance.

The RSUyymm SOURCEID was originally designed to reduce the amount of maintenance installations customers were advised to install. PTFs for low-impact problems that were unlikely to affect most customer systems were not given an RSUyymm indication, thus eliminating the installation of 20–30 percent of the available PTFs.

Like PUTyymm, RSUyymm are available monthly usually around the 15th of each month and identify all recommended maintenance released in the previous month (that is, yymm). To qualify as recommended, PTFs must be:

 Severity 1 & 2 APARs  HIPERs  Special attention fixes  PTF-in-error (PE) fixes  Security/Integrity APARs

Chapter 3. z/OS 179 However, the introduction of Consolidated Service Test (refer “Consolidated Service Test (CST)” on page 181) has caused RSUyymm to include additional maintenance at the end of every quarter (that is, RSUyy03, RSUyy06, RSUyy09, and RSUyy12). Under CST guidelines, all PTFs that are not already marked RSU that have successfully completed a 3-month CST test cycle will be given the RSUyymm indication of the year/month at the end of the cycle.

RSU indications may be downloaded from the IBM Service site at Boulder. These are in SMPE ++ASSIGN format and may be obtained by TCP/IP FTP and received into the GLOBAL zone to update on site PTFs. Refer to: ftp://service.boulder.ibm.com/s390/assigns/

HIPER HIPER is an SMP/E SOURCEID indication that may be used to gauge an APARs severity. Using the HIPER indication, installations could select these PTFs for special processing and priority installation if required.

Originally HIPER indicated High Impact and PERvasive APARS. However, a lot of APARS were high impact but not pervasive and were not given this special indication.

In 1996, IBM changed the definition of HIPER to include high impact APARS only; with pervasiveness (and other categorizations) being provided by symptom flags and keywords which include those shown in Table 3-5.

Table 3-5 Flags, keywords and meaning Flag Keyword Meaning

DAL DATALOSS Destruction and/or contamination of customer data.

FUL FUNCTIONLOSS Causes a major loss of function on the customer system.

IPL SYSTEMOUTAGE Causes the customer to reIPL, reboot, recycle, or restart one or more systems or subsystems.

PRF PERFORMANCE Causes severe impact to system performance/throughput.

PRV PERVASIVE Problem may affect many customers.

YR2000 YEAR2000 Identifies APARS that provide Year 2000 function, or fix a Year 2000 related problem.

SYSPLXDS SYSPLEXDS Identifies HIPER fixes needed to support and implement Sysplex Data Sharing.

XSYSTEM XSYSTEM Indentifies HIPER fixes which provide cross-system, migration, compatibility, or toleration support.

3.19.3 Sources of maintenance In this section we discuss sources of maintenance.

IBMLINK IBMLink™ is an electronic system that installations can use to interface with IBM for technical support and other purposes. Refer to: http://www.ibmlink.ibm.com

Using IBMLink ServiceLink, customers can search IBM data bases for problems and order PTFs to be delivered either electronically or via tape media as appropriate.

180 Achieving the Highest Levels of Parallel Sysplex Availability Custom-Build Product Delivery Offering (CBPDO) CBPDO is a packaging and delivery mechanism where installations can order a customized list of products and maintenance to be included in the single delivery. Maintenance only CBPDOs are available; however, these are usually provided via the ESO mechanism.

CBPDOs are delivered on tape media as per the order specification.

Expanded Service Offering (ESO) ESO is an IBM service delivery mechanism that allows customers to order PTF maintenance for products for which they are licensed.

ESO has a new service level every month and installations can place an order when required, or automatically have an order generated at their requested frequency. Orders are delivered on tape media. Refer to: http://service.software.ibm.com/holdata/390hdmvsinst.html#Header_3

Service Update Facility (SUF) SUF is an Internet-based electronic solution that may be used to order and download required PTF maintenance. It is installed on the z/OS system that contains the SMP/E maintenance environment and has a Web browser interface.

Corrective service may be ordered by specifying particular PTFs, and preventative service orders are supported by selecting either monthly or quarterly RSU levels. SUF interfaces with SMP/E to determine PTFs that have already been applied and this information, along with the required order, is electronically placed with IBM over the internet via a HTTP session. The IBM SUF server processes the order by resolving all prerequisite and corequisite requirements and removing PTFs that have already been applied. The remaining PTFs are returned over the HTTP session and automatically received into the appropriate SMP/E global zone. Refer to: http://www.ibm.com/servers/eserver/zseries/zos/suf/

ShopzSeries ShopzSeries is the IBM recommended facility for ordering zSeries software products and service.

It is an electronic service that allows both corrective and preventative maintenance orders to be placed and resulting service may be automatically received into the SMP/E environment. Refer to: http://www.ibm.com/software/ShopzSeries

3.19.4 Consolidated Service Test (CST) In 2001, IBM introduced an additional service testing environment called Consolidated Service Test (CST) aimed at improving the overall quality of IBM service and to provide a common maintenance recommendation for the operating system and major subsystems on the z/OS and OS/390 platforms.

CST is focused on the Parallel Sysplex environment and data sharing. Testing is performed in a customer-like test environment that has batch and data sharing applications using two levels of subsystems within three levels of z/OS or OS/390 systems.

CST works on a 3-month cycle starting each quarter as follows:

Chapter 3. z/OS 181  Month1 - Install all PUTyymm maintenance from the previous quarter, run existing workloads and develop new test scenarios to exploit new product functions.  Month2 - Run new test scenarios, identify problems, and apply fixes.  Month3 - Perform recovery tests and run workloads in a high stress environment.

The CST team provide maintenance recommendations in the format of a CST Test Report and RSUyymm indications are assigned accordingly. While the main focus is on the quarterly report, intervening monthly reports are published for those installations whose preventative strategy requires more frequent updates.

The quarterly reports (RSUyy03, RSUyy06, RSUyy09, RSUyy12) include:  All service as at the end of the prior quarter  HIPER, PE fixes, Security/Integrity APARs that have completed a 30 day CST cycle in that month  CST corrective fixes

The monthly reports (RSUyy01, RSUyy02, RSUyy04, etc) include:  HIPER, PE fixes, Security/Integrity APARs that have completed a 30 day CST cycle in that month  CST corrective fixes

Further information regarding Consolidated Service Test can be found at: http://www.ibm.com/servers/eserver/zseries/zos/servicetst/

3.19.5 Enhanced Holddata Holddata is an SMP/E concept that is used during apply and accept of PTFs to flag documentation changes, physical actions required before or after PTF processing, dependencies, etc.

Holddata is also used to flag PTFs that have gone into error (that is, PE) and are prevented from being applied by SMP/E until a fixing PTF is available.

Holddata was originally provided by IBM only with product or service deliveries. In the mid 1990s, IBM superseded the previously delivered holddata with Enhanced Holddata which had far more functionality as follows:  Available on demand via TCP/IP FTP download from an IBM Web site.  Automatically supplied with both physical media and electronic service orders.  Includes IBM Smart Alert information in the COMMENT and CLASS fields.  Updated daily.  User selectable restrospective period of 1 month, 1 quarter, 1 year, 2 years, or 3 years.

Holddata is a protective mechanism for customers to use when performing SMP/E maintenance. In order to prevent installation of known defect PTFs, it is recommended that your SMP/E GLOBAL zone is refreshed with the latest Enhanced Holddata prior to any SMP/E APPLY or ACCEPT processing being performed. To assist with SMP/E research, the SMP/E REPORT ERRORSYSMODS may be used process the Smart Alert information to assist in identifying fixing PTFs if available, resolving fixing PTF chains, and determining APAR severity and scope via symptom flags.

Further information regarding Enhanced Holddata can be found at:

182 Achieving the Highest Levels of Parallel Sysplex Availability http://service.software.ibm.com/holdata/390holddata.html

3.19.6 Software maintenance recommendations

Outage analysis at IBM has shown that about 15 percent of outages affecting multiple systems/subsystems could have been avoided by better preventative maintenance practices. That is, the cause of the outage was fixed by a PTF that had been available for six months or more.

As a result, we recommend that each customer develop a preventative maintenance strategy suitable for their own particular requirements. Issues to consider include: IBM recommends a quarterly maintenance schedule. In addition, IBM intends that all preventative and corrective maintenance should not require a sysplex-wide IPL therefore it should be possible to install maintenance via rolling IPLs in a sysplex environment. Note that in some cases a toleration PTF may need to be on all systems in a sysplex environment prior to a subsequent PTF being installed on any system. PTFs that require toleration will contain a ++HOLD for ACTION to indicate the requirement. IBM suggest using only the RSUyymm sourceids to select PTF maintenance for installation. While this takes advantage of all the testing and verification performed by the Consolidated Service Test environment, customer specific testing is always recommended. Applying up to and including the latest quarterly RSUyymm should be adequate for most installations. Usage of Enhanced Holddata is also encouraged in order to prevent installation of known defect APARS and to assist in resolving PTF corequisite/prerequisite chains. We recommend weekly review of HIPER APARS via the SMP/E REPORT ERRORSYSMODS process using the latest Enhanced Holddata and taking action as necessary.

Further information regarding IBM maintenance suggestions may be found at: http://www.ibm.com/servers/eserver/zseries/library/whitepapers/psos390maint.html

3.20 Testing the sysplex

Having a Sysplex gives many new opportunities to test new applications and systems. Certain of the MVS images in the sysplex can be designated as production online images. These MVS systems would run the high-availability production online systems. Other MVS images in the sysplex can be designated as batch or less critical systems, and others can be for application or system testing.

This functional isolation of different types of work improves the availability of the whole sysplex. The systems that are being brought up and down for testing are isolated from the systems that are providing the 7/24 online production.

A significant advantage of using Parallel Sysplex technology is the ability to perform hardware and software maintenance and installations in a nondisruptive manner. Through data sharing and dynamic workload management, servers can be dynamically removed from or added to the cluster allowing installation and maintenance activities to be performed while the remaining systems continue to process work. Furthermore, by adhering to IBM's software and hardware coexistence policy, software and/or hardware upgrades can be introduced one system at a time. This capability allows customers to roll changes through systems at a pace that makes sense for their business. The ability to perform rolling hardware and software

Chapter 3. z/OS 183 maintenance in a nondisruptive manner allows business to implement critical business function and react to rapid growth without affecting customer availability

The co-existence strategy allows the various z/OS systems in the sysplex to be up to two releases apart. Normally we would expect a new release of z/OS to be built and brought into the sysplex as the least important image in the sysplex. After running for a period of time this release can be moved to another image in the sysplex. This is called a rolling upgrade.

With careful planning it should be possible to avoid full Sysplex IPLs. Bringing one system in a sysplex down at a time means that you will always have access to an MVS image to perform emergency repair of the down system.

3.20.1 Test Sysplex Some customers have found it advantageous to have a complete test sysplex to test components or new releases that are outside the coexistence window, or are deemed to be too risky to put on the production sysplex even on a low-impact MVS image. If you decide to configure a test sysplex, then it is very important to ensure that the test and production sysplexes are completely isolated from one another. It is a false economy to attempt to share disk volumes, or any other hardware, between the two sysplexes.

If access to certain files in production is required in testing, then the testing should be done on a test MVS image that is part of the production sysplex.

3.20.2 Sysplex testing recommendations Ensure that each MVS image in the sysplex has a defined, well understood function. Do not try to run all work on all systems with carefully planning for recovery of work in an outage. Normally the on-line productions systems should be isolated from testing. Use GRS star to protect resources in the sysplex. If a test sysplex is used, ensure there are no disk volumes shared with the production sysplex. The test sysplex should be completely isolated from production, perhaps even in another city, where it can provide the focus of a disaster recovery site.

3.21 Planned outages

Almost all of the critical system parameters are defined in members within the PARMLIB concatenation. Originally, system outages were required in order to change these parameters, even for minor changes.

With each successive release of z/OS, IBM has introduced modifications to many of these areas, such that changes can be performed dynamically without the requirement for IPL.

It is important that installations understand the dynamic facilities available to make modifications to various system components rather than relying on an IPL to bring in change. In some cases, a dynamic facility is available but can only be used if the system is configured to allow it to be used.

3.21.1 APPC/MVS configuration The Application Peer-to-Peer Control facility allows system components and application programs to perform peer-to-peer transactions in an z/OS environment.

184 Achieving the Highest Levels of Parallel Sysplex Availability PARMLIB member The APPCPMxx PARMLIB member contains a combination of statement types that define or modify the APPC/MVS configuration. These statements define APPC/MVS local LUs, indicate whether they are associated with a transaction scheduler, and name their associated administrative VSAM files.

Disruptive modification APPC address space recycle.

Nondisruptive modification The SET APPC=xx operator command may be used to reference a new APPCPMxx member to dynamically modify the APPC/MVS configuration. The new configuration modifies the existing configuration rather than replacing it.

3.21.2 APPC/MVS Transaction Scheduler The APPC/MVS transaction scheduled is used with APPC.

PARMLIB member The ASCHPMxx PARMLIB member contains scheduling information for the ASCH transaction scheduler. The statements define classes of transaction initiators and provide default scheduling information when it is missing from a TP profile.

Disruptive modification ASCH address space recycle.

Nondisruptive modification The SET ASCH=xx operator command may be used to reference a new ASCHPMxx member that may be used to modify the APPC/MVS transaction scheduler. The new configuration modifies the existing configuration rather than replacing it.

3.21.3 Authorized Program Facility (APF) For a program to be APF authorized, it must be link-edited AC(1) and be located in an APF-authorized load library.

PARMLIB members The IEAAPFxx PARMLIB member defines static APF data sets and can contain a maximum of 253 installation defined entries.

IBM provides the PROGxx PARMLIB member as an alternative to IEAAPFxx, which allows you to update the APF list dynamically and specify an unlimited number of APF-authorized libraries.

The system will process both IEAAPFxx and PROGxx members if specified; however, IBM recommends full conversion to PROGxx format to allow full usage of the dynamic capabilities. To assist in the conversion process, IBM supplies the IEAAPFPR REXX exec, which will generate the equivalent PROGxx format definitions from an IEAAPFxx supplied member.

Distruptive modification System IPL.

Chapter 3. z/OS 185 Nondisruptive modification IBM provide two commands that may be used to make dynamic modifications to the APF list:  The SET PROG=(xx,xx,...) operator command may be used to activate new PROGxx PARMLIB members who modify the existing APF list rather than replacing it.

 The SETPROG APF operator command dynamically processes a single APF list entry, which is valid only until the next IPL.

3.21.4 Diagnostics Exhaustion of common storage (CSA,ECSA,SQA,ESQA) and address space storage can cause system failure and address space failure. IBM provide a diagnostics member to allow these resources to be monitored if required.

PARMLIB member The DIAGxx PARMLIB member is used to control common storage tracking and GETMAIN/FREEMAIN/STORAGE tracing.

Disruptive modification System IPL.

Nondisruptive modification The SET DIAG=xx operator command may be used to activate a new DIAGxx PARMLIB member, which modifies the existing tracing rather than replacing it.

3.21.5 Dump options Starting dump options for the SYSABEND, SYSUDUMP, and SYSMDUMP ABEND dumps are set at IPL via PARMLIB members and may be dynamically altered by operator command. Dump options may be specified. SVC dumps have no equivalent PARMLIB member, therefore the starting dump options at IPL are empty.

PARMLIB member The following PARMLIB members are used to set the starting dump options for the related ABEND dump type:  IEAABDxx - SYSABEND dumps  IEADMPxx - SYSUDUMP dumps  IEADMRxx - SYSMDUMP dumps

Disruptive modification System IPL.

Nondisruptive modification The CHNGDUMP operator command may be used to change the dump mode and dump options for each of the dump types.

3.21.6 Dump Analysis and Elimination (DAE) DAE may be used to suppress SVC dumps, SYSMDUMP dumps, and IEATDUMP transaction dumps when the symptom data of a dump matches a dump previously taken for the same dump type.

186 Achieving the Highest Levels of Parallel Sysplex Availability PARMLIB member The ADYSETxx PARMLIB member is used to specify DAE-related options.

Disruptive modification System IPL.

Nondisruptive modification The SET DAE=xx operator command may be used to activate a new ADYSETxx PARMLIB member, which contains the new parameters that DAE is to use.

3.21.7 Console management The system console configuration and console options for MCS and SMCS consoles are determined when the CONSOLE address space initializes at IPL.

PARMLIB members The CONSOLxx PARMLIB member is used to specify the MCS and SMCS console configuration, console devices, console device attributes, console security options, system hardcopy options, and reference other console-related PARMLIB members such as:  MPFLSTxx - For Message Processing Facility options  PFKTABxx - For Program Function Key tables  MMSLSTxx - For z/OS Message Service options  CNGRPxx - For Console Group definitions.

Disruptive modification System IPL.

Nondisruptive modification The CONTROL operator command may be used to modify the screen display options on existing MCS and SMCS consoles.

The VARY CN operator command may be used to set attributes for existing MCS, SMCS and EMCS consoles.

The VARY CONSOLE operator command is used to activate and set attributes for MCS consoles only. This command is still supported but is no longer being enhanced. IBM recommends using the VARY CN operator command instead.

The VARY command may be used to switch the master console function to an existing MCS or SMCS console.

3.21.8 Console group management Console groups are used to select candidate consoles for switch selection in the event of a console failure. MCS, SMCS, and EMCS consoles are supported.

PARMLIB member The CNGRPxx PARMLIB members are selected during IPL via the INIT statement in the CONSOLxx PARMLIB. These contains console group definitions and console names for each group. Multiple members may be specified and the first specification for any console group is used where duplicates exist.

Chapter 3. z/OS 187 When a system joins a sysplex, the system inherits any console group definitions that are currently defined in the sysplex and its own CNGRPxx specifications are ignored.

Disruptive modification Sysplex-wide IPL.

Nondisruptive modification The SET CNGRP=(xx,xx,..) operator command may be used to activate new CNGRPxx PARMLIB members which replaces the existing definitions. Alternatively, SET CNGRP=NO may be used to remove all active console group definitions from the sysplex.

3.21.9 Exits The allocation installation exit list is used to make exceptions to the installation default policy for volume ENQ, volume mount, specific wait, and allocated/offline device processing.

PARMLIB member The EXITxx PARMLIB is used to associate an exit load module name with of the following installation exit points:  IEF_VOLUME_ENQ for the volume ENQ installation exit  IEF_VOLUME_MNT for the volume mount installation exit  IEF_SPEC_WAIT for the specific waits installation exit  IEF_ALLC_OFFLN for the allocated/offline installation exit

IBM provides the PROGxx PARMLIB member as an alternative to EXITxx, which allows you to specify exits, control their use, and associate one or more exit routines with exits.

The system will process both EXITxx and PROGxx members if specified; however, IBM recommends full conversion to PROGxx format to allow full usage of the dynamic capabilities. To assist in the conversion process, IBM supply the IEFEXPR REXX exec, which will generate the equivalent PROGxx format definitions from an EXITxx supplied member.

Disruptive modification System IPL.

Nondisruptive modification IBM provides two commands that may be used to make dynamic modifications to the APF list:  The SET PROG=(xx,xx,...) operator command may be used to activate new PROGxx PARMLIB members, which modify the existing exit list rather than replacing it.  The SETPROG EXIT operator command dynamically processes a single exit entry, which is valid only until the next IPL.

3.21.10 Global Resource Serialization (GRS) GRS is used to serialize access to global resources amongst units of work on multiple systems. GRS processing parameters and resource name lists (RNLs) are set during GRS initialization at IPL.

188 Achieving the Highest Levels of Parallel Sysplex Availability PARMLIB member The GRSCNFxx PARMLIB member is used to specify system-specific GRS configuration and related parameters. Parameters for multiple systems may be specified in the one member using the MATCHSYS statement. Only the RESMIL, TOLINT, and SYNCHRES parameters may be modified dynamically. The GRSRNLxx PARMLIB member contains the GRS resource name lists (RNLs). This must match the values currently in use by the GRS complex during IPL, or else the system will be put into an x’A03’ disabled wait.

Disruptive modification System IPL for changes to GRSCNFxx.

Sysplex-wide IPL for changes to GRSRNLxx.

Nondisruptive modification The SET GRSRNL=(xx,xx,..) operator command may be used to activate new GRSRNLxx PARMLIB members, which replaces the existing RNL definitions.

Important: Use extreme caution when issuing this command to change heavily used or highly critical resources. Work that requires resources for a critical application, or resources used by the operating system, may become suspended or delayed.

The SETGRS operator command may be used to migrate a currently active GRS ring complex to a GRS star complex, or to modify the current RESMIL, SYNCHRES or TOLINT values.

3.21.11 IODF management The Input Output Definition File (IODF) contains information about the software and hardware I/O configuration. It is generated by the Hardware Configuration Dialog (HCD) and is referenced during the IPL process. It is also used to generate the IOCDS which is used by the processor during Power-on-Reset.

IODFs have a naming convention of SYSx.IODFyy, where x is 1 through 9, and yy is x’00’ through x’FF’.

PARMLIB member The LOADxx PARMLIB (or IPLPARM) member may reference the IODF directly through specification of the yy value on the IODF statement. Alternatively, asterisks (**), pluses (++), minuses (--), or equals (==) can be specified to use the last valid IODF name found in the hardware token from either the last POR or from the last dynamic hardware change.

Disruptive modification System IPL for software changes.

Processor POR for hardware changes.

Nondisruptive modification The ACTIVATE operator command may be used to dynamically activate or test a new I/O configuration definition.

Chapter 3. z/OS 189 3.21.12 IOS The I/O subsystem parameters for each device class are set during IPL. These parameters relate to missing interrupt handler (MIH) timeout settings, I/O timing, and recovery actions for HOTIO and hung interface conditions.

PARMLIB member The IECIOSxx PARMLIB member may be used to override system defaults generically for each device class or specifically for each device or range of devices.

Disruptive modification System IPL.

Nondisruptive modification The SET IOS=xx operator command may be used to activate a new IECIOSxx PARMLIB member which replaces the existing IOS settings.

The SETIOS operator command can be used to dynamically add a parameter, as well as delete, modify, or replace any previously specified missing interrupt handler (MIH) or I/O timing (IOT) parameter.

3.21.13 LNKLST The LNKLST is a list of load libraries that are concatenated to SYS1.LINKLIB.

PARMLIB member The LNKLSTxx PARMLIB member defines static LNKLST data sets. During IPL the system opens and concatenates each data set to SYS1.LINKLIB in the order it was listed. The system creates a data extent block (DEB) that describes the data sets concatenated to SYS1.LINKLIB and their extents. The extents remain in effect for the duration of the IPL.

IBM provides the PROGxx PARMLIB member as an alternative to LNKLSTxx, which allows you to update the LNKLST dynamically.

The system will process both LNKLSTxx and PROGxx members if specified; however, a PROGxx that contains a LNKLST ACTIVATE statement will cause the LNKLSTxx member to be ignored. IBM recommends full conversion to PROGxx format to allow full usage of the dynamic capabilities. To assist in the conversion process, IBM supply the CSVLNKPR REXX exec which will generate the equivalent PROGxx format definitions from a LNKLSTxx supplied member.

Disruptive modification System IPL.

Nondisruptive modification IBM provides two commands that may be used to make dynamic modifications to the LNKLST:  The SET PROG=(xx,xx,...) operator command may be used to activate new PROGxx PARMLIB member that contains the necessary statements to define a new LNKLST set and activate it to replace the existing LNKLST set.  The SETPROG LNKLST operator command allows for definition of a new LNKLST set followed by activation to replace the existing LNKLST set.

190 Achieving the Highest Levels of Parallel Sysplex Availability 3.21.14 LOGREC error recording LOGREC records are written to the LOGREC data set or LOGREC logstream when either a hardware or software error occurs. These records may be post-processed into history archives and reports using the Environmental Recording and Error Processing (EREP) facility.

PARMLIB member The IEASYSxx PARMLIB member specifies whether LOGREC recording is to a data set, a logstream, or not to be done at all.

Disruptive modification System IPL.

Nondisruptive modification The SETLOGRC operator command may be used to change the LOGREC recording medium originally specified at IPL.

3.21.15 LPALST The pageable link pack area (PLPA) contains read-only reenterable programs that are loaded at IPL during cold starts or when CLPA is specified. The PLPA is part of the virtual storage of every address space and therefore available to be shared amongst all users of the system.

PARMLIB member The LPALSTxx PARMLIB member defines static LPA data sets. During IPL the system opens and concatenates each data set to SYS1.LPALIB in the order it was listed upto a maximum of 255 entries.

The SYS1.LPALIB data set is always first in the concatenation unless overridden by a SYSLIB statement in PROGxx.

IBM provides the PROGxx PARMLIB member to allow modifications to the LPALST after IPL. You can specify a data set from which the system is to load the module, you can request that the modules be placed into fixed common storage, and you can request that only the full pages within a load module be page-protected.

As part of the LPA search, the system finds modules that were added dynamically. A module added dynamically is found before one of the same name added during IPL.

The dynamic LPA function is intended to replace modules only in cases where the owning product verifies the replacement. Otherwise, replacement could result in partial updates, or if the module address has already been saved by its owning product, an LPA search will not be done and the updated module will not be found.

Disruptive modification System IPL.

Nondisruptive modification IBM provide two commands that may be used to make dynamic modifications to the LPALST:  The SET PROG=(xx,xx,...) operator command may be used to activate new PROGxx PARMLIB member that contains the necessary statements to add or delete LPA modules from PLPA.

Chapter 3. z/OS 191  The SETPROG LPA operator command allows for addition or deletion of LPA modules from PLPA.

3.21.16 Message Processing Facility (MPF)

The Message Processing Facility (MPF) examines messages that are written to consoles and the system hardcopy medium and makes adjustments as per the settings specified.

PARMLIB member The MPFLSTxx PARMLIB member contains statements to control:  Message presentation via highlighting, color, and intensity  Message management via suppression, retention, or automation  Command processing by calling an installation exit each time a command is called

At IPL, the MPFLSTxx PARMLIB member is specified by the MPF parameter on the INIT statement in the CONSOLxx PARMLIB member.

Disruptive modification IPL.

Nondisruptive modification The SET MPFL=(xx,xx,..) operator command may be used to activate new MPFLSTxx PARMLIB members, which replaces the existing MPF definitions. Alternatively, SET MPF=NO may be used to terminate MPF processing.

3.21.17 MVS Message Service (MMS) The MVS Message Service provides language translation services for messages written to consoles and the system hardcopy medium depending upon the settings specified.

PARMLIB member The MMSLSTxx PARMLIB member contains statements to:  Identify the languages available into which U.S. English messages can be translated.  Specify the default language into which U.S. English messages can be translated.  Specify the installation exits that get control either before or after the translation occurs.

At IPL, the MPFLSTxx PARMLIB member is specified by the MMS parameter on the INIT statement in the CONSOLxx PARMLIB member.

Disruptive modification System IPL.

Nondisruptive modification The SET MMS=(xx,xx,..) operator command may be used to activate new MMSLSTxx PARMLIB members which replaces the existing MMS definitions. Alternatively, SET MMS=NO may be used to terminate MPF processing.

3.21.18 Local Page Data Sets The LOCAL system page data sets are used to support private virtual storage for z/OS address spaces and VIO pages which are not backed by expanded storage.

192 Achieving the Highest Levels of Parallel Sysplex Availability PARMLIB member The PAGE statement in the IEASYSxx PARMLIB member specifies the page data sets to be used for the system as follows:  The first data set specified is used as the PLPA page data set which contains pageable link pack area (PLPA) pages.  The second data set specified is used as the COMMON page data set, which contains all of the common pages that are not PLPA pages.  The third and all subsequent data sets specified are used as LOCAL page data sets and contain all the system pages (including VIO pages) that are considered neither PLPA or COMMON data set pages.

Disruptive modification System IPL.

Nondisruptive modification The PAGEADD operator command may be used to add local page data sets to the system. These data sets remain available until an IPL occurs with either the CLPA or CVIO parameters, or a PAGEDEL operator command is issued. The number of page data sets that can be in use by the system is limited by the PAGTOTL statement in the IEASYSxx PARMLIB member.

The PAGEDEL operator command may be used to delete, replace, or drain local page data sets without performing an IPL. When you replace a local page data set, the system migrates the in-use slots from the old data set to the new one. When you delete a local page data set, the system migrates the in-use slots to other data sets before it deletes the data set.

3.21.19 Parmlib concatenation The logical parmlib concatenation consists of upto 10 PARMLIB data sets that are concatenated to SYS1.PARMLIB. The parmlib concatenation is established during IPL and is used by the master scheduler. SYS1.PARMLIB is placed at the end of the concatenation unless it is specified as one of the 10 PARMLIB data sets.

PARMLIB member The LOADxx PARMLIB (or IPLPARM) member specifies the logical parmlib concatenation through multiple PARMLIB statements. Each successive PARMLIB statement adds a data set to the concatenation. If no PARMLIB statements are specified then only SYS1.PARMLIB is used by the master scheduler.

Disruptive modification System IPL.

Nondisruptive modification The SETLOAD operator command allows you to switch dynamically from one logical parmlib concatenation to another without having to IPL. The SETLOAD command specifies the

LOADxx member that contains the PARMLIB statements to use for the switch.

3.21.20 Products The product enablement policy is used to specify products or product features that support product enablement.

Chapter 3. z/OS 193 PARMLIB member The IFAPRDxx PARMLIB member contains the product enablement policy, which lists the products and features, as well as the system environment in which they are enabled to run.

The system builds the enablement policy from the PRODUCT statements and WHEN statements in the active IFAPRDxx PARMLIB member. Each WHEN statement defines a system environment, and each PRODUCT statement identifies a product and product features that are enabled or disabled when running in the system environment defined on the preceding WHEN statement.

Disruptive modification System IPL.

Nondisruptive modification The SET PROD=(xx,xx,..) operator command may be used to reference new IFAPRDxx PARMLIB members to modify the existing product enablement policy.

3.21.21 Program properties table The program properties table (PPT) is a list of programs that have special properties. IBM supplies a default PPT for system components and common subsystems. The default PPT may be concatenated to or modified by the installation.

PARMLIB member The SCHEDxx PARMLIB member may be used to specify programs and related properties for addition to the IBM supplied default PPT. If the program already exists in the IBM-supplied default PPT then it is replaced with the installations specifications.

Disruptive modification System IPL.

Nondisruptive modification The SET SCHED=(xx,xx,..) operator command may be used to reference new SCHEDxx PARMLIB members to rebuild the program properties table. The current PPT is replaced with the IBM default PPT and the modifications as specified by the SCHEDxx members are applied in order.

3.21.22 Run-time library services (RTLS) The run-time library services (RTLS) configuration allows you to eliminate STEPLIBs from the JCL that runs your applications, as well as the system overhead involved in searching STEPLIB data sets when loading modules into storage. Instead of STEPLIBs, the CSVRTLS macro connects to and loads from a given RTLS logical library.

PARMLIB member The CSVRTLxx PARMLIB member is used to specify the names of libraries to be managed, as well as storage limits for caching modules from the libraries.

Disruptive modification System IPL.

194 Achieving the Highest Levels of Parallel Sysplex Availability Nondisruptive modification The SET RTLS=(xx,xx,..) operator command may be used to reference new CSVRTLxx PARMLIB members, which contain the desired RTLS specifications.

3.21.23 SLIP Serviceability level indication processing (SLIP) is used to set traps to catch problem data. SLIP can intercept program event recording (PER) or error events. When an event that matches a trap occurs, SLIP performs the problem determination action as specified:  Requesting or suppressing a dump  Writing a trace or a logrec data set record  Giving control to a recovery routine  Putting the system in a wait state

PARMLIB member The IEASLPxx PARMLIB member is used to set SLIP traps through SLIP commands. The commands can span multiple lines and the system processes the commands in order.

Disruptive modification System IPL.

Nondisruptive modification IBM provides two commands that may be used to set SLIP traps:  The SET SLIP=xx operator command may be used to reference a IEASLPxx PARMLIB member that contains the required SLIP commands.  The SLIP console command allows for addition, modification, or deletion of an individual SLIP trap.

3.21.24 System Measurement Facility (SMF) The System Measurement Facility (SMF) collects and records system and job-related information that your system can use for billing, performance and tuning, auditing security, etc. SMF records are written to the SMF data sets for subsequent archival.

In addition, an installation can provide its own routines as part of SMF. These routines will receive control either at a particular point as a job moves through the system, or when a specific event occurs.

PARMLIB member The SMFPRMxx PARMLIB member is used to control how SMF works at your installation.

Disruptive modification System IPL.

Nondisruptive modification IBM provides two commands that may be used to set SMF SLIP traps:  The SET SMF=xx operator command may be used to reference a SMFPRMxx PARMLIB member that contains the required SMF parameters that SMF will use to replace the existing parameters.

Chapter 3. z/OS 195  The SETSMF console command allows an installation to add a SUBPARM parameter, or replace any previously specified parameter in the active SMFPRMxx PARMLIB member except the ACTIVE, PROMPT, SID, or EXITS parameters. The SETSMF operator command cannot add a parameter that was not previously specified, and cannot be used with an SMFPRMxx PARMLIB member that specified NOPROMPT.

3.21.25 Storage Management Subsystem (SMS) System-Managed Storage is the IBM automated approach to managing storage resources. It uses software programs to manage data security, placement, migration, backup, recall, recovery, and deletion so that:  Current data is available when needed.  Space is made available for creating new data and for extending current data.  Obsolete data is removed from storage.

PARMLIB member The IGDSMSxx PARMLIB member contains the parameters that initialize SMS including the names of the active control data set (ACDS) and the communications data set (COMMDS).

Disruptive modification System IPL.

Nondisruptive modification IBM provides two commands that may be used to set SMS options:  The SET SMS=xx operator command may be used to reference an IGDSMSxx PARMLIB member that contains the required SMS parameters. Specifying SET SMS=xx also starts SMF if it was not started at IPL, or restarts SMS if it has stopped and cant restart itself.  The SETSMS console command may be used to change a subset of SMS parameter without changing the active IGDSMSxx PARMLIb member.

3.21.26 Subsystem Names (SSN) The subsystem name table defines the primary subsystem and the various secondary subsystems.

PARMLIB member The IEFSSNxx PARMLIB member contains the subsystem definitions for the subsystems to be initialized during IPL. These definitions also allow you to specify any initialization routine to be given control during master scheduler initialization, as well as parameters to be passed to the initialization routine.

The IEFSSNxx PARMLIB member supports two syntax formats; however, only one format can be used in a single IEFSSNxx member:  Positional format, which consists of one line per entry  Keyword format, which allows entries to span multiple lines

IBM recommends that you use the keyword format for defining subsystems as positional format subsystems cannot be deactivated dynamically.

Disruptive modification System IPL

196 Achieving the Highest Levels of Parallel Sysplex Availability Nondisruptive modification The SETSSI operator command may be used to dynamically add, activate, or deactivate a subsystem.

3.21.27 System Resources Manager (SRM) Beginning with z/OS 1.3, WLM compatibility mode is no longer available; therefore the following section is only relevant to back-level systems.

The System Resources Manager (SRM) is a part of the system control program. It determines which address spaces, of all active address spaces, should be given access to the system resources and the rate at which each address space is allowed to consume these resources.

PARMLIB members The IEAICSxx PARMLIB member contains the installation control specification used to assign performance groups. This member is not used in WLM goal mode.

The IEAIPSxx PARMLIB member contains the installation performance specification that details performance parameters including service definition coefficients, general control keywords, domains, and performance groups. This member is not used in WLM goal mode.

The IEAOPTxx PARMLIB member contains several categories of information used by SRM. Most, but not all, of the parameters in this member are not valid for WLM goal mode.

Disruptive modification System IPL.

Nondisruptive modification The SET ICS=xx operator command may be used to reference an IEAICSxx PARMLIB member that contains the required installation control specification. This command is not valid when in WLM goal mode.

The SET IPS=xx operator command may be used to reference an IEAIPSxx PARMLIB member that contains the required installation performance specification. This command is not valid when in WLM goal mode.

The SET OPT=xx operator command may be used to reference an IEAOPTxx PARMLIB member that SRM is to use.

The SETDMN operator command may be used to change existing values of parameters in a single domain. This command is not valid when in WLM goal mode.

3.21.28 Time Sharing Option (TSO) TSO/E allows users to interactively share computer time and resources.

PARMLIB member The IKJTSOxx PARMLIB member identifies the following commands and programs that can be used in a TSO/ environment:  Authorized commands and programs  Commands that a user cannot issue in the background  APF-authorized programs that users may call through the TSO/E service facility

Chapter 3. z/OS 197 Also, the IKJTSOxx PARMLIB member allows you to specify the defaults for the TSO/E ALLOCATE, SEND, RECEIVE, TRANSMIT, CONSOLE, and TEST commands.

Disruptive modification TSO address space recycle.

Nondisruptive modification The TSO/E PARMLIB UPDATE command may be used to reference an IKJTSOxx PARMLIB member that contains new specifications that TSO/E is to use.

The SET IKJTSO=xx operator command may be used to reference an IKJTSOxx PARMLIB member. This command performs processing similar to the TSO/E PARMLIB UPDATE command.

3.21.29 UNIX System Services (USS) The UNIX System Services environment is part of the z/OS base control program.

PARMLIB member The BPXPRMxx PARMLIB member contains parameters that control the z/OS UNIX System Services environment and the file systems. IBM recommends that you have two BPXPRMxx PARMLIB members, one defining the values to be used for system setup and the other defining the file systems. This makes it easier to migrate from one release to another.

Disruptive modification System IPL.

Nondisruptive modification The SET OMVS=(xx,xx,..) operator command may be used to reference BPXPRMxx PARMLIB members that UNIX System Services may use for changed specifications.

The SETOMVS operator command may be used to dynamically change the options that UNIX System Services is currently using.

3.21.30 XCF The cross-system coupling facility (XCF) is used to perform intersystem communication within a sysplex environment.

PARMLIB member The COUPLExx PARMLIB member specifies XCF-related configuration and performance parameters.

Disruptive modification System IPL.

Nondisruptive modification The SETXCF operator command may be used to dynamically modify the XCF configuration and performance parameters.

198 Achieving the Highest Levels of Parallel Sysplex Availability 3.21.31 Planned outages recommendations Practice moving workloads from one MVS to another so that the operators know how to move applications away from an MVS image that is going to be shut down for maintenance.

3.22 Unplanned outages

It is important to configure your system such that all related information is captured if an error does occur.

When an error is reported to the IBM Support Center, you will often be asked for various kinds of supporting documentation to assist in the diagnosis process. The most useful items include dumps, system logs, and error logs; therefore it is critical that these are configured correctly, such that relevant information is captured at the time of error.

3.22.1 Dump options z/OS caters for the following types of dump that may be used to capture data during error conditions:  SVC dump  SYSABEND dump  SYSUDUMP dump  SYSMDUMP dump

z/OS initializes a dump mode and a dump options list for each of the four dump types each time you IPL the system.

Dump modes The dump mode determines whether the system accepts either the dump requestor the options that a dump request specifies. Valid dump modes are:  ADD The starting dump mode after IPL for each dump type is ADD. When a dump is requested for a dump type that is in ADD mode, the system merges the options specified on the dump request with the options specified in the dump options list for that dump type. Where merged options conflict, the dump options list takes precedence.  OVER When a dump is requested for a dump type that is in OVER (override) mode, the system ignores the options specified on the dump request and uses on the options specified in the system dump options list for that dump type combined with the override options to determine the data areas to dump.  NODUMP When a dump is requested for a dump type that is in NODUMP mode, the system ignores the request and does not take a dump. Dump options The dump options list determines the data areas to be dumped.

The starting dump options list after IPL for the SYSABEND, SYSUDUMP, and SYSMDUMP types are specified in the IEAABD00, IEADMP00, and IEADMR00 PARMLIB members

Chapter 3. z/OS 199 respectively. SVC dump starts with an empty dump options list because it has no corresponding PARMLIB member.

Two types of dump options are defined:  SDATA=(options) for options related to system data areas  PDATA=(optons) for options related to problem program data areas, specifically for the problem program that issued the dump

The current dump mode and current dump options list for each dump type may be displayed using the D DUMP,OPTIONS command.

The CHNGDUMP command may be used to change the dump options for each dump type. Options can be individually added and removed, or totally reset to the starting dump options list as per the values set at IPL.

As SVC dumps do not have starting dump options these need to be explicitly set.

IBM supply a default IEACMD00 PARMLIB member that contains a CHNGDUMP operator command that adds the local system queue area (LSQA) and trace data (TRT) to every SVC dump requested by an SDUMP or SDUMPX macro or a DUMP operator command, but not for SVC dumps requested by the SLIP operator command.

Most installations usually add a number of other SVC dump options during the IPL process. Recommended values are:  ALLPSA - Prefix storage areas for all processors  CSA - Common storage area  GRSQ - Global Resource Serialization (GRS) queues  LPA - Link pack area  LSQA - Local system queue area  NUC - Non-page-protected areas of the DAT-on nucleus  RGN - Entire private area  SQA - System queue area  SUM - Summary dump  SWA - Scheduler work area

3.22.2 ABEND dumps An ABEND dump shows the virtual storage for a program but does not include data in hiperspaces.

Typically an ABEND dump is requested when the program cannot continue processing and abnormally abends. An ABEND dump can also be requested when an address space is terminated via the CANCEL operator command.

ABEND dumps required DD statements in the JCL for the job step that ended. This DD is used to determine how to format the dump and the dump contents. If the DD is not allocated then no dump is taken.  SYSABEND

These are formatted dumps for diagnosis of complex errors in any program running on the system. z/OS sets the starting dump options list at IPL using the IEAABD00 PARMLIB member. SYSABEND dumps may be written to a SYSOUT data set for viewing or printing, to a tape or DASD data set of any valid name, or directly to a printer although this is not

200 Achieving the Highest Levels of Parallel Sysplex Availability recommended, as it dedicates the printer for the duration of the job step whether a dump is taken or not.  SYSUDUMP These are formatted dumps for diagnosis of problem programs needing simple problem data. z/OS sets the starting dump options list at IPL using the IEADMP00 PARMLIB member. SYSABEND dumps may be written to the same media as SYSUDUMPs.  SYSMDUMP A SYSMDUMP can be used for the diagnosis of system problems when the dump is requested in a program. These dumps are unformatted when written and require Interactive Problem Control System (IPCS) for formatting and analysis. SYSMDUMPs must be written to a tape or DASD data set. Any valid data set name may be used; however, SYSMDUMP treats the data set name SYS1.SYSMDPxx (where xx = 00 through FF) as special by allowing a dump to be written only if there is an end-of-file (EOF) mark as the first record. This prevents multiple dumps overwriting each other by retaining only the first dump written.

Dump modes and dump option lists for SYSUDUMP, SYSABEND, and SYSMDUMP may be modified or reset using the CHNGDUMP operator command.

3.22.3 SVC dumps SVC dumps take a copy of selected virtual storage and places it in a SVC dump data set for subsequent review and analysis using the Interactive Problem Control System (IPCS) service aid.

SVC dumps may be either requested by a system component or authorized program when an unexpected error occurs, or operator initiated via the DUMP or SLIP console commands.

SVC dumps requested from disabled, locked, or SRB-mode routines cannot be handled by SVC dump immediately, which allows subsequent system activity to overwrite possibly useful diagnostic data. To minimize this effect, users can request a summary dump, which is a copy of predefined data areas taken at the time of the request rather than when the dump is actually taken. IBM recommends a summary dump always be taken.

SVC dumps may be one of two types:  Asynchronous (scheduled) a. The system issues an instruction or the caller uses a combination of parameters on the SVC dump macro invocation. The system returns control to the requestor once the SVC dump has been scheduled. b. SVC dump captures all of the dump data into a set of data spaces then writes the dump data from the data spaces into a dump data set. c. The system is available for another SVC dump upon completion of the capture phase of the dump. d. The summary dump is captured first and can be considered to be more useful for diagnosis.  Synchronous a. The requester’s SVC dump macro invocation issues an instruction to obtain the dump under the current task.

Chapter 3. z/OS 201 b. The system returns control to the requester once the dump data has been captured into a set of data spaces. SVC dump processing then writes the dump data from the data spaces into a dump data set. c. The system is available for another SVC dump upon completion of the capture phase of the dump. d. The summary dump is captured last.

The SVC dump contents may be customized by specifying the SDATA parameter on the SDUMP and SDUMPX macros, via the DUMP and SLIP operator commands, and through system defaults set via the CHNGDUMP operator command.

SVC dump data sets When an SVC dump is taken, the output is written to an SVC dump data set. SVC dump supports two sorts of SVC dump data sets:  Pre-allocated dump data sets These have the format SYS1.DUMPnn (where nn is 00 through 99) and have to be allocated prior to being made available to SVC dump via the DUMPDS ADD command. Once a dump has been captured, a SYS1.DUMPnn data set is made unavailable for further use until such time the dump is cleared via a DUMPDS CLEAR command. As these must be preallocated before use, the sizing must cater for the largest dump expected else only a partial dump is written. Pre-allocated dump data sets support secondary extents so you can size your data sets such that the primary extent caters for small dumps and growth via secondary extents caters for the largest dump expected. The maximum sequential data set size is 65,536 tracks, which will hold about 3 gigabytes of data, which may not be sufficient as regions can be up to 16E bytes; therefore IBM recommends using SMS-managed extended sequential data sets as they have greater capacity (upto 128 gigabytes), support striping, and support compression. If all SYS1.DUMPnn data sets are in use, SVC dump issues IEA793A and the dump remains in virtual storage until either a dump data set is cleared and the dump can be successfully written, or the dump is deleted either by operator intervention of the expiration of the CHNGDUMP MSGTIME parameter.  Automatically allocated dump data sets Automatically allocated dumps maybe named as per installation requirements via a name-pattern specified on the DUMPDS command. The dump data sets can be allocated as SMS-managed or non-SMS-managed depending on the VOLSER or SMS classes defined on the DUMPDS ADD command. When a dump is written, SVC dump allocates a data set of the correct size and will use system-determined blocksize to maximize DASD efficiency. As per the pre-allocated data sets, automatically allocated dump data sets may be extended sequential data sets if they are SMS-managed.

Using automatically allocated dump data sets removes the requirement to clear dump data sets once used, and also prevents partial dumps due to space errors relating to undersized data sets.

It also minimizes dump loss due to dump data sets not being available, which could be a critical factor in obtaining the necessary information for problem diagnosis.

202 Achieving the Highest Levels of Parallel Sysplex Availability 3.22.4 Stand-alone dump The stand-alone dump (SAD) program may be used in severe situations where a system has failed, or is in the process of failing and unable to be recovered. By IPLing the SAD program, you can produce a high-speed unformatted dump of central storage and parts of paged-out virtual storage that may be reviewed using Interactive Problem Control System (IPCS) or passed to IBM for further analysis.

The SAD program is created by coding the AMDSADMP macro in assembler format, which is then put through a generation process to produce the output program. It is important to make decisions at this stage to determine how SAD will operate at your installation:  IPL SAD from DASD or tape media – For DASD IPL, the SAD program is written as IPLTXT on the volume specified. – For tape IPL, the SAD program is written to a NL tape.  Output dump to DASD or tape media – For DASD output, a preallocated and formatted data set is required on the unit specified. This is done using the AMDSADD REXX utility and needs to be sufficiently large to contain all of central storage and the specified amounts of virtual storage for the active address spaces. – For tape output, specify a unit address where an output NL tape will be written.  Console addresses used for operator communication. These allow for specification of between two and 21 console devices. If the system console is to be used, an entry of SYSC must be the first entry in the list.  Dump options to choose what additional virtual storage is to be dumped. This option needs careful consideration as there is a tradeoff between the amount of data captured versus the time taken to perform the dump. Also, large amounts of dump data may require multiple tape volumes or multiple output data sets, which may also slow the process due to the requirement for operator involvement.  The level of operator involvement can be minimized through careful choice of options. A number of the parameters either solicit information from the operator, or allow operator override. While providing flexibility, these parameters can slow the dump process if the operator is unfamiliar with the valid responses, syntax, etc.

Stand-alone dump is usually used only in outage situations and often these are time constrained, especially in critical production environments. It is therefore very important that the SAD setup is well documented and rehearsed such that operations personnel are fully familiar with the process, the inputs, and the outputs.

3.22.5 Dump suppression Recurrent problems or recursive abends may cause many dumps to be captured that are not required. If these dumps were allowed to be taken they could waste system resources and exhaust available dump space, especially if pre-allocated dump data sets are being used.

Dump Analysis and Elimination (DAE) DAE suppresses dumps that match a dump you already have. Each time DAE suppresses a duplicate dump, the system does not collect data for the duplicate, or write the duplicate to a dump data set. In this way, DAE can improve dump management by only dumping unique situations and by minimizing the number of dumps.

Chapter 3. z/OS 203 To perform dump suppression, DAE builds a symptom string, which can be used to determine if the dump is a duplicate. The symptom string consists of two required symptoms (load module and csect) and at least three of the optional symptoms before it is considered useful. When DAE is started, usually at IPL, parameters are read from the ADYSETxx PARMLIB member. This member references the DAE data set, which is read and symptom strings that were active in the last 180 days are copied to virtual storage. Both the DAE data set and the virtual storage copies are updated as dumps are processed; new dumps require addition of a new symptom string and duplicate dumps require updating of last occurrence information and incidence count.

DAE be set up with an individual data set for each system or as a sysplex-wide shared data set, which is the IBM recommendation, as it allows for sysplex-wide duplicate dump suppression.

To prevent DAE from suppressing a dump in situations where a dump is required you can:  Use SLIP to with an action of SVCD, TRDUMP, or NOSUP.  Use the DUMP operator command.  Use IPCS to tell DAE to take the next dump. This requires DAE to be stopped on all systems that are using the DAE data set while the IPCS DAE dialog panel is being used.  Update the DAE data set using ISPF EDIT and the ADYUPDAT edit macro to update the relevant symptom string record. Again, this requires DAE to be stopped on all system that are using the DAE data set for the duration of the ISPF EDIT session.  Inactivate DAE.

DAE can suppress rapidly occurring dumps automatically, which may mask a significant problem. To be notified of such situations, a NOTIFY(number,minutes) parameter may be added to the SVCDUMP statement in the ADYSETxx PARMLIB member to establish a threshold for notification. The default threshold is three dumps requested in 30 minutes for the same symptom string that is, NOTIFY(3,30). This is a sysplex-wide facility and the system in the sysplex that crosses the notification threshold does the notify.

3.22.6 SLIP traps SLIP traps may be used to suppress dumps for abend codes where no further research is required as the problem is a known defect.

For these completion codes, add SLIP commands to the IEASLPxx PARMLIB member using one of the available dump suppression options:  ACTION=NODUMP - All dumps  ACTION=NOSVCD - SVC dumps  ACTION=NOSYSA - ABEND dumps (SYSABEND )  ACTION=NOSYSM - ABEND dumps (SYSMDUMP )  ACTION=NOSYSU - ABEND dumps (SYSUDUMP )

IBM supplies a default IEASLP00 member that contains SLIP commands to suppress ABEND dumps that are seldom needed.

3.22.7 System Hardcopy log Hardcopy processing allows an installation to have a permanent record of system activity and optionally an audit trail of commands entered and responses received.

204 Achieving the Highest Levels of Parallel Sysplex Availability It also provides a valuable tool that may assist in diagnosing the cause of an address space or system failure by recording error messages, diagnostic command responses, and abend information in the period leading up to the failure. Commands, command responses, and unsolicited messages that are recorded in the hardcopy medium are called the hardcopy message set and must have one or more of the following characteristics:  Have the ‘hardcopy only’ message delivery attribute set  Are WTOR messages  Have descriptor codes of 1, 2, 3, 11, or 12  Have no routing codes  Have an installation-specified routing code  Are command responses of the installation’s specifiec command level  Have a message type specified

Messages for which ‘no hardcopy’ is requested are not included in the hardcopy message set regardless of their other characteristics.

During IPL, the system hardcopy medium is selected using the HARDCOPY statement in the CONSOLxx PARMLIB member. This can be either:  An MCS printer  The system log (SYSLOG)  The operations log (OPERLOG)  An MCS printer and the operations log (OPERLOG)  The system log (SYSLOG) and the operations log (OPERLOG)

SYSLOG is the default hardcopy medium and is used where no HARDCOPY statement is provided, or where the HARDCOPY statement specifies an unusable medium.

Other parameters on the HARDCOPY statement may be used to set criteria for message inclusion in the hardcopy message set.  For commands and command responses, the CMDLEVEL parameter is used to control the types of commands included. The default is to include all operator and system commands and their responses, and static and dynamic status displays.  For unsolicited system messages the ROUTCODE parameter selects the messages to be included. The default is ALL route codes.

In a sysplex, the hardcopy medium has only system scope; therefore different systems can be using different media.

SYSLOG The system log (SYSLOG) is a data set residing in the primary job entry subsystem’s spool space. It records system messages for a single system.

SYSLOG is queued for printing when the number of messages recorded reaches a threshold specified at IPL via the LOGLIMIT parameter in the IEASYSxx PARMLIB member. Alternatively, the WRITELOG operator command may be used at any time to force the system log to be queued for printing. SYSLOG management is important, as it contains a historical record of system and address space generated records, which may be used to assist in problem diagnosis. In a sysplex environment, it is important to review the SYSLOGs from all connected systems when investigating address space or system failure. As a sysplex requires all systems to be using

Chapter 3. z/OS 205 the same timing source, SYSLOG records can be manually matched cross-systems using the message timestamp so that an accurate order of events can be established.

If SYSLOG is being used as the hardcopy medium and is failing to work correctly, z/OS saves the messages in log buffers until the LOGLIM value is reached. At this point messages remain on the WQE queue until the MLIM value is reached at which point no new messages can be displayed and a re-IPL is required to reactivate the hardcopy medium.

In z/OS 1.2, a new parameter is available on the VARY command that will allow hardcopy to be turned off: V SYSLOG,HARDCPY,OFF,UNCOND

When this has been entered, a WRITELOG CLOSE may be done and reactivation is allowed. Note that messages may be lost in this situation however the IPL is avoided.

OPERLOG The operations log (OPERLOG) is a logstream that uses the System Logger to record and merge system messages from each connected system in a sysplex.

Generally, an OPERLOG implementation will involve a CF structure recording information for multiple systems. OPERLOG can be set up using the DASD-only logstream; however, the staging data sets and the log data sets only have system scope. Therefore this may only be used for a single-system implementation.

As per SYSLOG management, OPERLOG may assist in the diagnosis of address space or system failure. The advantage of OPERLOG relates to its format where it is already a merged copy of system activity from all connected systems in the sysplex. Therefore a sysplex-wide view of the events leading up to the failure is immediately available.

Also, OPERLOG is not subject to the JES address space. Therefore conditions such as spool full and JES failure do not prevent the hardcopy medium from continuing to function in this case. However, OPERLOG does require logstream management to prevent logstream full conditions. The IEAMDBLG member in the SAMPLIB data set provides facilities for printing OPERLOG data, as well as marking old messages as expired so they may be removed.

3.22.8 Environmental Record Editing and Printing (EREP) When an error occurs, the system records information about the error in the LOGREC data set or LOGREC logstream. This information provides you with a history of all hardware failures, selected software errors, and selected system conditions.

The EREP program is used print reports using LOGREC data.

LOGREC data set A LOGREC data set is associated with each individual system. It must be allocated, initialized, activated by IPL, then managed for daily offload, full conditions, archive processing, etc.

Also, when an error occurs in a sysplex environment, each of the individual LOGREC data sets (and maybe history archives) may need to be processed in order to obtain a sysplex-wide view of the problem.

The LOGREC data set should be marked as an unmovable data set when allocated as changing its position after activation will cause system errors when writing records. Also, the data set specified by the LOGREC=dsn statement in the IEASYSxx PARMLIB member must

206 Achieving the Highest Levels of Parallel Sysplex Availability exist, must be cataloged (or on the SYSRES), and must be initialized correctly else the system will fail to IPL.

LOGREC logstream Using a CF LOGREC logstream overcomes the issues associated with LOGREC data set management as a single logstream may be used by all systems in the sysplex. Also, issues associated with full conditions, daily offload, and archive management are also eliminated.

The LOGREC logstream may be specified via the LOGREC=LOGSTREAM statement in the IEASYSxx PARMLIB member, which requires the System Logger to be available and a SYSPLEX.LOGREC.ALLRECS logstream to be defined.

IBM recommends that a CF logstream is used so that records from all sysplex systems are merged. It is possible to set up a DASD-only logstream; however, this is not recommended, as only one system will be able to access it.

By using a logstream, LOGREC information may be left in the logstream rather than being off-loaded on a daily basis and placed into a history archive; although this can still be done if you wish to minimize the changes to the LOGREC management procedures at your installation. When the logstream is nearly full, data may be either deleted or off-loaded to history archives depending upon your requirements.

LOGREC recording control buffer The LOGREC recording control buffer is an area in storage that serves as an interim storage location for the hardware and software error records that are queued to be written to the LOGREC data set.

This buffer is a wrap-table similar to the system trace table with variable size entries and it is significant due to the history it contains. Often in a system failure situation, the latest entries are the most significant and it is possible that these have not been written to the LOGREC data set yet.

When the system writes a dump, the dump includes the LOGREC recording control buffer. This buffer may be formatted using the IPCS sub command VERBEXIT LOGDATA.

Using LOGREC data LOGREC records are written for both hardware and software errors.

Software log records include:  Machine checks (hardware-detected hardware errors)  Program checks (hardware-detected software errors)  Restart errors (operator detected errors)  Lost record errors, which are a count of the records that did not fit in the buffer to be written to the LOGREC data set or logstream  Software-detected errors such as ABENDS, and other errors that did not result in an ABEND, but a symptom record was written by the system component or application

Errors may be recoverable without installation involvement, or unrecoverable causing an address space failure or possibly system outage. Recoverable errors can easily go unnoticed, possibly for days or weeks. However, all recoverable errors consume system resources both in performing the recovery and in logging the error, and therefore should be minimized if possible.

Chapter 3. z/OS 207 It is recommended that EREP system summary reports and event history reports are run at regular intervals to review the error counts accordingly.

Large error counts for specific hardware devices should be reported to hardware support, as these can often highlight impending device hardware or poor performance through excessive retry attempts.

Software errors may be reviewed using the symptom information, as detailed in the software records or symptom records relating to the error. These can be used as search arguments for IBM databases or reported to the IBM Support Center for further investigation.

When a major system or subsystem failure occurs, IBM may require EREP reports to be provided to assist in the diagnosis of the problem; therefore it is important that the EREP configuration at your site is such that LOCREC data is being recorded and maintained such that it is easily retrieved when the need arises.

3.22.9 Unplanned outages recommendations Change the CHNGDUMP command supplied in the default IEACMD00 member of PARMLIB. Set the options as documented in “Dump options” on page 199. We recommend that you ensure that a SUMMARY dump be taken on all SVC Dumps. Use automatically allocated SVC dump data sets. This removes the requirement to clear dump data sets once used, and also prevents partial dumps due to space errors relating to undersized data sets. Establish good stand-alone dump procedures, so that a dump ID captured quickly the first time a problem occurs.

208 Achieving the Highest Levels of Parallel Sysplex Availability

4

Chapter 4. Systems Management

Many studies have shown a major percentage of all outages are caused directly by human error and are avoidable, as illustrated in Figure 4-1. Experience has shown that the impact of the remaining outages could have been significantly reduced had better procedures been used. These results are likely to be understating the impact of human/procedural problems, as customers do not always report problems that are self-inflicted. By taking the time to invest in developing and enforcing an Availability Management strategy, one has the potential to save an enormous amount of money.

Unscheduled Outages Operating Systems 10,0%

Hardware 10,0% Application 40,0%

Process 40,0%

Source: Gartner Group Figure 4-1 Types of outages

Availability typically is thought of as a technology reliability issue; in reality, application and process issues make up 80 percent of the total outages.

© Copyright IBM Corp. 2004. All rights reserved. 209 Continuous availability directly relates to bottom-line profit. Productivity losses are minor compared to the resulting revenue losses when systems are down. When the systems are down, it is as if the “doors are closed for business.” This is true even during what was once considered a “safe” time to stop applications. Customers can and will switch to the competitor if they cannot access the IT systems. Continuous availability is now the standard expectation

for core IT services. People are always looking for new functions. They only ask for better availability in the aftermath of an outage. Better availability is obtained by planning for it before you need it.

The focus of this chapter is not on specific products and features, but rather on what can be done to manage complex systems around product problems and insulate the production systems from component failures. Although this concentrates on the zSeries and z/OS environment, many of the concepts presented here are just as applicable in other environments. It includes the concepts of communication between organizations, minimizing the risk of change, managing problems when they do occur, and actions needed to be performed after service is restored.

210 Achieving the Highest Levels of Parallel Sysplex Availability 4.1 Overall Availability Management processes

High levels of availability cannot be achieved without a continual focus. An IT installation might initiate an availability improvement project and get to high availability, but to sustain that level requires constant focus and attention. As soon as the availability effort is relaxed, the level of service will begin to deteriorate. To protect the service level achievements and the investment in availability improvements, it is essential to evolve from the availability project phase into an ongoing Availability Management process.

An Availability Management process provides an ongoing focus on continuously improving availability across systems, network, and applications, and ensures consistent reporting. The Availability Management process must move the organization from a reactive to a proactive environment and eliminate a fragmented approach to availability. The process ensures problem trends (in addition to problems) are being resolved and develops availability improvement action plans, which optimize resources across IT.

There are numerous activities that must be performed as part of an effective overall Availability Management process: Ensure IT vision includes availability. Create, maintain, and manage the availability strategy and plan. Establish target objectives. Monitor service level achievements. Monitor availability, identify availability issues, and follow up. Sensitize the organization to availability value and transition IT to a proactive environment. Assess current state and identify gaps. Monitor and guide related and dependent processes for availability considerations. Problem Management Change Management Testing Incident Management Capacity Management Configuration Management Application design and development processes (phase reviews, availability guidelines) Ensure architecture and design includes availability.

For Availability Management to be successful, fundamental organizational issues are crucial:  Executive Support - There must be executive buy-in and support of the availability objectives and the willingness to provide required resources.  Management Support - Line management in all departments must support and participate in all processes associated with Availability Management. It is vital to the success of Availability Management that management ensures that their staffs participate in all the processes.  Cultural Changes - Everyone associated with the process of providing the agreed to availability must understand the reasons for the required processes and actively participate in the execution of their roles and responsibilities. Availability Management quickly pays for itself with interest and is cost effective, as illustrated in Figure 4-2 on page 212.

Chapter 4. Systems Management 211 . Most Cost-effective way to improve availability More than just bringing service back after failures Requires culture change from everyone Executive & Management support

100% Special High Availability Solutions Design Effective Processes Availability Current Standard Products

Cost Figure 4-2 Availability cost value

A successful Availability Management process can yield benefits of:  Higher systems availability  Improved systems management and controls  Quality service delivery  Improved customer relationships  Cost-effective availability investment  Availability tracking measurements  Better understanding of availability exposures

This is done by defining and following set processes for the various aspects of Availability Management as described in this document. The key for all of these processes is the need to define the various plans, set an owner to the plans, verify that the plans are being followed, track and report on the performance of these plans through post-mortem and causal analysis, and through this analysis, update the processes to improve on them.

The availability plan focuses on ensuring availability objectives are consistent with all committed IT service agreements. The plan itself describes the actions or initiatives that must be taken in order to achieve the objectives. These actions include developing and communicating Availability Management practices and initiatives necessary to achieve the requisite levels of availability. The availability plan should include the following major categories: Develop availability practices and standards. Define Service Level Agreements. Track and report availability.

212 Achieving the Highest Levels of Parallel Sysplex Availability

Develop standards. Analyze outages.

4.1.1 Develop availability practices The business objectives and management priorities need to be understood and accepted as the basis for determining user dependencies. These user dependencies should then be translated into requirements on IT implementation, processes, and organization. Activities related to this process can include: Create Service Level Agreements (See 4.2, “Service Level Management” on page 214) with key business customers and keep them current, expressed in customer business-oriented terms, and are consistent with both the needs of the business customers and the capabilities of the IT service providers. Generate accurate service-level results (actual vs. committed), regularly reviewed with business customers, and consistent with customer perceptions. Monitor the level of customer satisfaction with service availability and dissatisfaction issues identified and addressed. Identify the impact of each unplanned outage. The cost of unavailability of critical business applications needs to be known and considered in investment decisions. Verify the organization's culture and priorities support process-oriented activities requiring support and cooperation among multiple organizational units. The organization's culture needs to support continuous learning and improvement. Mistakes are to be used as opportunities to improve, not as a basis for placing blame. Education classes are available to learn about new technologies and processes in the market. Verify processes with IBM’s High Availability Services.

There are three elements of an outage that impact availability as it is perceived by the users of IT services: Frequency, duration, and scope of impact (that is, the number of users impacted). A large number of outages, outages of long duration, or outages impacting a large set of users can have a significant impact on the real availability achievements. It is important to understand the contributors to each of these elements, as availability improvement actions are identified and implemented.

4.1.2 Develop standards Create standards to help reduce system management complexity and improve on problem determination time to reduce the duration of unplanned outages. The standards should satisfy the broadest set of users in overall design and direction. Additionally, the set of standards needs to be flexible to handle new technologies and configurations. Standards should be considered and documented for all aspects of the data center operations such as: – IP addresses – System/LPAR names – Subsystem names – Data Sharing group names – Transaction/job names – Program names – Data set names – Volume names

Chapter 4. Systems Management 213 – Security classifications

4.2 Service Level Management

Service Level Management is the process of defining, negotiating, monitoring, and reporting the service levels expected and achieved. The Service Level Management process ensures the understanding, consistency, and cost-effectiveness of quality service delivered by IT to meet the business needs.

Any business’s customers want 100 percent availability to 100 percent of the users 100 percent of the time. This continuous availability goal requires a significant investment by the business and in reality cannot be met.

Availability goals are based upon customer requirements within acceptable costs, as illustrated in Figure 4-3.

Availability Strategy

Optimum Availability Investment? Loss Cost Due to of Unavailability Availability

Common High Continuous Availability Availability Availability Figure 4-3 Availability strategy

Set customer expectations of what goals can be met; while at the same time, customers need to explicitly state what they need verses what they want. Plan for application, operating system, and hardware upgrades. One also needs to plan for the unexpected. As the availability requirements goes up, so does the costs associated with providing the solution to meet the availability needs. The goal to obtain customer satisfaction is to negotiate and set the expectations to customers, measure and report what was obtained, and update the agreements if needed, based upon changing technologies and business goals.

214 Achieving the Highest Levels of Parallel Sysplex Availability 4.2.1 Business requirements Not all businesses have the same requirements. A clothes retailer may be able to successfully function across an outage lasting a full day without significant impact to the business, while a bank may consider anything over 15 minutes as significant. Others may find long outages acceptable, as long as they are not too frequent. Determine the business requirements; it depends upon understanding the various business units and the impact of planned and unplanned down time for each unit.

4.2.2 Negotiating objectives Negotiating objectives is a matter of balancing costs of service disruption for each service provided, and cost of providing various levels of availability. The cost of a service disruption includes factors such as productivity losses, manufacturing delays, lost revenue, corporate image. Determining cost of supporting the desired service levels depends upon various software and system technologies, hardware and software, and staffing requirements. With a fixed IT budget, it may be necessary to negotiate trade-offs. More investment in the production applications may mean less money available to invest in the development environment.

The objectives need to be specific to guide system support. Each objective must be measurable and will be reported upon. "If you can't measure it, you can't manage it."

Planned outages Is it acceptable to have an outage every night after midnight to make system changes? Keep in mind that in the Internet world there is no such thing as a planned outage. If a user wants to access his account and the system is unavailable, he does not care if it was a planned or unplanned outage, the information is still unavailable and the user is impacted.

Unplanned What is the mean time to recovery (MTR)? If it is known that any outage more than one hour has an impact to the business, and that it takes 45 minutes to restart a system, then it follows that if an outage has lasted 15 minutes and no known fix is available, one must initiate system restart.

4.2.3 Documenting agreements - Managing expectations Information technology is an essential part of how business is conducted. With the current trend towards electronic commerce, an expectation of high availability and reliability is placed on the IT service. To document these expectations, a formal agreement between the service provider and the customer may be constituted. This is known as the Service Level Agreement (SLA).

The main benefit of the SLA is that it documents the customers' expectations of service quality, thus setting targets to be achieved by the service provider. By monitoring the service delivery, the service provider can measure how well they are doing in the eyes of the customer. Conversely, the customer can also measure if the service provider is delivering on their promise.

On the other hand, the SLA helps raise customer awareness of cost of service and clearly defines the expected service quality. This has both short-term and long-term benefits. The service provider can validate customer complaints regarding poor service quality against the SLA, and new customer requirements can be incorporated and priced. It is important that the

Chapter 4. Systems Management 215 SLA agreements are periodically updated to reflect changes to the business environment over time. Define and document SLAs; they formally describe the objectives that have been agreed upon by the user and IT and assist with managing expectations; they should describe how achievements will be measured and reported.

4.2.4 Building infrastructure - Technical and support Once the objectives are decided upon, the infrastructure can be set to support the agreed upon availability and disaster recovery objectives. This includes setting up the configuration such as a Parallel Sysplex with redundancy, or training the support staff.

4.2.5 Measuring availability Achievements must be constantly monitored and immediate actions taken to address any issues to meet the committed objectives. Measuring the amount of time a specific application or computer system is available can be a very complex task. Applications usually have many hardware and software components for processing such as networks and data storage. In some cases a single component failure may cause the application to run in a degraded mode or become totally unavailable. Develop an application view to achieve a true end-to-end availability measurement. Most measurements include the host application, system, or network availability separately, and do not achieve a true end-to-end availability measurement. An application view for measuring availability focuses on an entire application and the components that comprise it. All of the key components must be identified and documented. This results in a Component Failure Impact Analysis (CFIA)-like view of a subset of the total configuration, centered around each key application, based on the user/application/data relationships. List the major components for the application in question to ensure the entire application is comprehended. This can include components such as: – Users/user groups using this application – Workstations – LAN/WAN components – Network routers – Communication paths connecting to the application platform – Subsystems and platforms the application requires –Web server –Host system – Database management system – Data used by the application – DASD control units –DASD volumes For each of these components, consider: – What if a single component fails but an alternate end-to-end path is still available? – Do you track “degraded” or “slowdown” conditions? – Are there any relationships between multiple component outages when one component fails? – How much time is needed to recover for each component outage? To accurately track availability out to the network, active monitoring techniques need to be driven. Many of the components in the application view do not fall under the normal S/390

216 Achieving the Highest Levels of Parallel Sysplex Availability or zSeries architecture scope of control. This can be done in one of the following techniques: – Automation-driven query commands – Heartbeat detection – PINGs (SNMP, APPN, NETBIOS, etc.) – Remote commands –User simulation As well as for just monitoring availability, if automation tools track problems, they can detect and take recovery actions quickly to reduce the duration of the outage.

4.2.6 Track and report availability The hardest part of reporting (un)availability is getting an agreement of what classifies as an outage. For example, what if one user was affected? How about one department? One branch? What if one transaction failed? One CICS region if there was a backup region available? What if there was no backup, but all other regions stayed up? How about poor performance? If so, then how poor does it need to be degraded? Are these definitions affected by the time and day in the week? How about by which LPAR and/or application is affected?

One way of tying all these together is to calculate the user-hours affected for the production systems. Knowing how many users are typically on each application at different points in the week, one can have a weighted indicator of unavailability. Using this methodology, a little-used application down for a long duration has the same impact as a heavily used application down for a short duration. Track and report the specific availability parameters; they would include the parameters needed to establish whether both service level targets and any additional internal availability objectives have been met. Activities involved with tracking and reporting include: Generate availability measurements. Calculate actual availability levels. Identify exceptions from established targets. Gather availability data. Consolidate availability data from various sources including Help Desk, Operations log, Syslog, various monitoring tools, etc. Report and distribute availability measurements. • This reports the actual level of availability delivered vs. objectives. • Produce reports. • Distribute reports. Maintain distribution lists.

4.2.7 Customer satisfaction Many times one may have measured availability results that meet objectives, but either the wrong thing is being measured, the requirements have changed, or the perception does not match reality. For example, if a development system is down for just one second in the middle of first shift, it will have a significant impact, as users need to re-log onto the system, reorganize their thoughts, and get back to where they were, possibility resetting their test cases. The impact to this is significantly more than one second and the users will remember this.

Chapter 4. Systems Management 217

Customer satisfaction must be monitored to ensure that the business requirements are adequately understood. Hold regularly scheduled user group meetings to understand customer perception and easily obtain new requirements. Action plans should be developed to address any satisfaction issues with the service being delivered with results communicated back to the users. Other actions that can be taken are simply to set customer expectations. If a planned outage is required but at an unexpected time, communicate this to the user community before taking the outage.

4.3 Change Management

The mission of the Change Management process is to facilitate the timely introduction of change whenever possible while protecting the stability of the production systems. A good Change Management process can improve programmer productivity as well as system availability.

The Change Management group is not an enforcement agency. Rather than facilitating quality change, it is often viewed as a barrier. The Change Management group needs to assume accountability for the quality of changes from the whole IT organization. It does this by assessing the quality of the change itself, not by making sure all fields in the change records are filled in. This requires technical skills.

Change Management facilitates change and protects the stability of production systems. Figure 4-4 describes the objectives of the Change Management process.

Sample Objectives The objectives of the Change Management process must facilitate timely introduction of change. Improve the capability of the IT organization to implement an increasing volume of changes. Reduce implementation delays caused by the change process. Minimize the number of changes that are not installed or delayed due to implementation failures. Reduce the cost of change implementation.

To protect the stability of the production systems, all changes must be implemented successfully with high quality. Reduce the number of unscheduled service impacts caused by changes. Reduce the number and duration of planned service windows. Reduce or eliminate disruptions of service due to change. Log all changes to support fast problem determination of problems caused by changes.

Figure 4-4 Change Management objectives

A successful change is a change that becomes operational on the first installation attempt; does not cause unplanned impact to the users; and does not need to be backed out, bypassed, or modified. The Change Management process coordinates the introduction of

change into the system. Its goals are to minimize the impact of the change; reduce the skill level needed to manage the change; and reduce the process to a series of small, repeatable steps that can be automated.

All changes to the production IT environment are managed by the Change Management process and are documented in a common, centralized repository.

218 Achieving the Highest Levels of Parallel Sysplex Availability Tasks that are categorized under the Change Management process include the following sections: Develop and prepare change. Assess and minimize risk.

Build a test plan. Plan back-out. Verify change readiness. Schedule change. Communicate change. Document change. Review quality.

4.3.1 Develop and prepare change Accept change requests from authorized change initiators. Store the requests for tracking purposes in a database.

4.3.2 Assess and minimize risk Every change has an associated risk. The person requesting the change should assess the risk level of the change. The risk assessment methodology must be adequately defined and address systems level business and technical risk assessments. As an example, the following risk categories can be assigned to each change request: High risk The change has the highest impact on users, and may affect an entire site. Backing out of the change is time consuming or difficult. Moderate risk The change can significantly impact users, but backing out of the change is not an elongated process. Low risk The change has a minor impact on the user and backing out of the change is easy. Evaluate and approve the change request. This includes documenting the risk assessment. Testing to verify the expected operation of the new or altered components. Some level of testing should be done on all hardware, software, or configuration changes. Validating security, automation, and system management aspects of the new solution should be part of this. Identify the risk level. This helps insure the correct level of testing undertaken prior to the change. For example: Risk assessment Testing recommended High risk Requires validation of new solution, including testing new function, regression testing, assessing impact to existing infrastructure, stress testing using terminal simulator, performance impact testing, and back out tests completion of operations support documents. Design review prior to testing is recommended. Changes communicated to user community. Moderate risk Requires testing and review of all new functions, some regression testing to determine the impact to the existing environment, test back out plan.

Chapter 4. Systems Management 219 Low risk Requires testing and review of all new functions. No risk No testing required. A common mistake is to have little differences in the process steps for low risk and high risk changes, resulting in inefficient use of resources on low risk changes, and insufficient time allotted to review high risk changes. Change management should not unnecessarily add to the cycle time of changes. Trivial changes can be implemented quickly and with little delay imposed by the change process.

4.3.3 Testing Introducing change into a system decreases its stability, but if an organization is to survive, it must introduce changes to meet the current and future requirements. One way to minimize the risk is to have a good testing process.

There are a number of other conditions where testing adds value before introduction into a production environment.  Upgrades to new releases of systems and applications code  Introduction of functional enhancements to existing systems and applications cod  Introduction of maintenance and other changes  Installation of new tools  Verification of new or updated operational and recovery procedures  Verification of new or updated automation routines

Many IT installations have a defined roll-out and migration process for new systems and applications, and depend on that as their primary systems testing process. 1. System build – Used to build new system platforms – Normally not fully integrated with applications – Ensures system can be initialized and start work 2. QA test 3. Provides functional testing 4. Development (often erroneously called test) 5. Production systems supporting the application development groups 6. Non-critical production 7. Critical production

The criteria for rolling from one system to the next varies by installation but is frequently based on time rather than results. For example, a new system may run for one week on each system before rolling to the next system.

Many times, the test process often falls short of effectively identifying problems before rolling the systems into production. Typically, this is due to:  The system programmers normally have their own systems, which are used primarily as a system build vehicle and seldom provide little testing beyond initialization.

 Many quality assurance systems do not provide full volume and stress testing. Environments to support full systems and volume performance tests are expensive and difficult to justify.  All too often, the application development systems become the primary test vehicle. Development systems, however, are truly production, since the application developers are dependent on these systems to do their job. As such, this prevents recovery testing.

220 Achieving the Highest Levels of Parallel Sysplex Availability  The transactions executed on a development machine are very different from critical production systems supporting the business units and do not test the same functions. The developers are running TSO transactions, editors, and compilers. The end users in the business units are running application transactions based on IMS, CICS, or DB2. Though application testers may also run application transactions, they are at different volume

levels using smaller databases.  Development systems may have a different system and database configuration, and run different utilities and different batch jobs.

Build a test plan Testing needs to have a strategy and a process to expedite trivial changes and to ensure non-trivial changes do not impact system stability. The following example shows the process for a major application upgrades. For trivial changes, most of these steps can be skipped, according to predefined agreements.

The foundation for any test begins with a solid test plan, where attacks targeted against new features of the operating system are devised and documented. The plan is typically a formal document. Input comes from several sources, including design documents, "postmortems" from prior tests of the same or similar components (where testers reflected upon and document the strengths and weaknesses of their approach), and formal meetings between the test and development personnel responsible for each new feature to be tested. Planned test scenarios are reviewed, and gaps or overlaps are identified and removed.

Once the test plan has been approved and appropriate entry criteria met, testing commences.

The system test phase should include regression and migration testing, basic load and stress testing, new function (including scenario-driven) testing, and finally recovery testing.

Within this plan, the following tasks documented can include some of the following: Environment setup. Verify that the test environment matches the production environment. Test requirements. Define what new function needs to be tested. Test variation development/inspection. Specifically target new/updated functions. As with application code, there needs to be design/code reviews of the test cases. In some cases, the test organization may end up writing more lines of code than the application change. Test plan development/inspection. The test plan is reviewed with the various organizations. This may include application design/development, system support, operations, I/T architects, etc. Function test. Application coders perform their own testing first. Once it passes the pre-defined exit criteria, the code is passed on to the test organization System test. The independent test group takes over now. Run old test cases. Regression testing needs to always be done to verify there is no interaction with code that was not changed. Run new test cases. Any usability issues should also be considered as test errors. Any changes to automation should also be tested. User acceptance test. Postmortem report. Summarize the results of the testing, lessons learned, etc. Archive test cases. All new test cases coded are prepared for future regression testing. A rich, varied, high-stress workload can flush out timing and serialization problems that are difficult to find in any other way. "Good test cases never die—they just increase your stress."

Chapter 4. Systems Management 221

Promote to development system. Quality assurance. Perform a release readiness review with Change Management before migrating to production systems. Any new or changed externals need to be well documented. Promote to production systems in order of system criticality. Track for testing failures after the change was promoted. Production problems should initiate changes to the test methodology in the future. Keep old test cases for future regression testing. Usability problems are test case failures.

Test exit criteria Additional considerations should include the test exit criteria. This can include:  All user procedures and automated functions perform to specifications.  System meets I/S standards.  Documentation is accurate and approved.  All errors resolved or have action plan.  No severity 1 or 2 problems.  Planned resolutions for all severity 3 and 4 problems.  System is stable.  All Interested parties have given their approval for formal installation. Assess the risk with the management. The Test Process needs to be flexible. Many times the business dictates that changes must be in production before a specific date. If the documented exit criteria is not met by the scheduled end date, then a management review with a risk assessment is required before the change can be migrated to production. Use a defect model. Experience has shown that one can anticipate how many defects are uncovered by an application based upon the lines of code (LOC) being changed and the application complexity. Armed with this knowledge, a defect model can be used to track anticipated and actual problems found for each application. If this model is kept current, it can be used to help manage the testing process by: – Tracking defects/LOC tracking for each application – Identifying how effective the current testing process is – Identifying if insufficient testing has been performed on the change – Identifying when sufficient testing has been performed and the change is ready to be migrated to production. – Can be used to identify when design changes are required to reduce application problems

4.3.4 Back-out planning Provide back-out procedures for reversing an unsuccessful change. Depending upon the risk and complexity of the change, these procedures should also be tested before the change is scheduled for promotion to production. Remember to also test implementing and backing out changes.

222 Achieving the Highest Levels of Parallel Sysplex Availability 4.3.5 Verify change readiness For major changes, perform a Change Readiness Review by skilled personnel. This should address testing results, back-out testing, support skills, and external dependencies. Do new application changes support high availability (data sharing and dynamic workload balancing)? Often, management dictates the schedule when changes are to take place without consideration of the current quality of that new application. The change review board must have authorization to delay initiating a change to protect the end users.

A Readiness Checklist expedites this process. A sample checklist can be:  A change request record has been created and contains all required information.  The change is technically complete and is ready for implementation.  All associated documentation is complete.  A performance and capacity review has been completed (if required).  The effect on the disaster recovery plan has been reviewed.  Verification procedures are in place and understood.  Back-out plans are documented and understood.  All required system and resource services (network, printers, etc.) have been identified and scheduled.  The technical assessment is complete and the probability of success is optimized.  The change and all associated procedures have been tested.  The business assessment is complete and the scope and impact of a failure is minimized.  Potential conflicts with business events are avoided.  Other changes have been reviewed to optimize the scheduling and implementation of this change.  The education plan is complete.  The change has been communicated to the affected end users.  All management approvals have been received.

4.3.6 Schedule change Schedule the change to minimize impact and conflicts with other changes. Synchronize the order and timing of change installation along with recovery actions if the installation is not successful, with other changes scheduled for that time. Schedule the change, within the limits of the change plan, after accounting for the availability of resources, altered job schedules, and other operational considerations.

Often, the lead times are excessive, adding unnecessary time to introduce minor changes. The Change Management process should not be adding lead times of 7–10 days for a minor 2–4 hour change. The time imposed by the process must be measured and managed to a minimal level.

Emergency change Sometimes emergency changes are needed to get fixes into the system as soon as possible to fix a problem. For these situations, there should be an emergency change procedure. The definition of an emergency change is frequently based on lead time rather than business need

Chapter 4. Systems Management 223 and therefore the number of emergency changes is excessive. This results in too many changes that have bypassed necessary steps in the process. Very often, higher levels of approval are required for emergency changes. In a true emergency, when the systems are down, the staff should be empowered to apply the change to fix a high-severity problem, rather than losing time while waiting for approvals.

4.3.7 Communicate change Communicate the change plan to all affected groups. Communicate details of the change by setting expectations, aligning support resources, communicating operational requirements, and informing users. The risk level and potential impact to affected groups, as well as scheduled downtime as a result of the change, should dictate the communication requirements. Although it is not always possible to notify all users of all changes, end users should be notified of any changes that affect them directly. Define who will be affected by a change and what the potential downtime may be for each application, user group, or server. Keep in mind that different groups may require varying levels of detail about the change. For instance, support groups might receive communication with more detailed aspects of the change, new support requirements, and individual contacts; while user groups may simply receive a notice of the potential downtime and a short message describing the business benefit.

4.3.8 Implement and document The Change Management procedure steps needed to implement the change, and back-out plan, and who to call if there are any problems installing the change must be documented. Do not forget to provide additional documentation when the environment changes.

Sometimes, putting in changes does not go smoothly and may take longer than originally estimated. There comes a point when a decision must be made: Back out the change and try again during the next scheduled change window, or continue trying to install the change, risking an unplanned outage as the change window is longer than expected. A pre-defined policy needs to be put in place to handle this situation. Issues deal with the rate of process being made, the cost to the business of not getting the change in on time, and scheduling when the next change window is available.

4.3.9 Change record content The readiness checklist can be incorporated into a change record, flagging the results of the change can be added in, and completion codes can be defined to track the causes of change failures.

A change record can include elements as described in Figure 4-5 on page 225.

224 Achieving the Highest Levels of Parallel Sysplex Availability

Example Contents of Change Record Change Record Number Test Plan (brief overview in freeform text) Associated Problem Record Number Overall installation plan Requester name, phone, and department Dependent operations activity Assignee name, phone, and department Required documentation updates (available prior to Technical contact installation) Operational procedures Planned implementation date Recovery Procedures Estimated duration of implementation CFIA Estimated duration of planned system outage Training plan (schedule if required) Change Class (hardware, software, application, Verification procedure (steps to verify successful etc...) installation) Change Type (normal, expedited, or emergency) Back-out and recovery plan (brief description) Risk Category Criteria for executing back-out Change phase (test, plan, etc. ...) Actual back-out procedures Application or component affected Completion code (successful, failed, cancel, backed-out) System affected (actual system Id) Indicators to describe success or failure Installed or not installed Change description (short abstract and full narrative) Unexpected problems during install Prerequisites or co-requisites (other related changes) Problems impacting production following install

Figure 4-5 A change record example

The change record will be used to track the progress of the change through to successful implementation. Open a change request in the change database as early in the planning cycle as possible. – Describe the purpose and procedures for the change. – Records are typically opened by the person implementing the change, but can be opened by any requester. – Document or point to plans for testing, training, installing, and backing out the change. – Assess risk based upon complexity and impact of the change. – Assign a risk factor to be used to determine the level of scrutiny and approval for each change. Install and verify the change and update the change request record with results. – Clearly document whether the change is installed or backed out. – Document any problems during or following the install. – A review should be conducted for any change that resulted in impact to the user. Close the change request record. – Notify the users. – Ensure the record is properly updated to support later trend reports and root cause analysis. Develop reports, aligned with the Change Management process objectives, to track the volume and quality of changes made. Develop a users guide to describe the Change Management process. The user process guide should be usable by a new employee to understand how to get changes into the production systems. It includes how to correctly assess risk; and how to document, update, and close a record. Include some information about what should be in a change request form—the minimum that you need to make it usable. If you have too much, people will not fill it out accurately.

Chapter 4. Systems Management 225 4.3.10 Review quality Good judgement comes from bad experiences. Bad experiences come from bad judgement.

Post-installation analysis reviews need to be completed for major changes to verify that they met the enterprise's objectives and to learn from experience, both good and bad. There should be no time limit on when a problem can be tied back to a change. Change quality is measured over time and actions identified where shortcomings dictate. Root cause of change failures should be used to provide feedback to help identify modifications needed to improve the change process and to meet objectives.

Change Management is a tool for system programmers and developers. It is not an obstacle to be circumvented.

4.4 Organization

The responsibility for availability should not just fall into the hands of change managers or just system programmers. Application developers, operations, system programmers, and management all need to be responsible and accountable for availability.

Does the overall Availability Manager have the authority to reject applications that do not support high availability (data sharing and dynamic workload balancing)? Is the change process adhered to and does not yield to the schedule pressures to get the application online above all other concerns? If not, it is like me having a sports car and someone else getting all my speeding tickets—where is my disincentive to tear down the highway at 100 miles/hr (160 Km/hr) every day?

A good recommendation would be to tie pay/bonuses/ratings to availability so everyone that can affect availability has some stake in the game.

Before a new application comes online, there are checklist items to follow that come outside the basic functional testing of the code. This is the organizational readiness. The physical infrastructure is in place to support the product. The required monitoring and health check tools have been installed and tested. All required problem determination tools have been installed and tested. Documentation library, war room, cell phones, home terminals, and access authority requirements have been identified and made available to the key support personnel. The organizational infrastructure (personnel) has been trained to support the product. Operations Technical Field services Help desk Support contracts with vendors have been negotiated and are in place. The support contracts with vendors meet response time objectives to achieve recovery within the stipulated requirement.

4.4.1 Skills Provide a skills inventory for both internal staff and external contractors and to ensure that skills are of the level required and available. Application of this process determines the

226 Achieving the Highest Levels of Parallel Sysplex Availability gap between IT skills that are needed and those that are available, and manages the skills inventory to meet the required portfolio. Training and/or hiring requirements are defined, and training conducted as appropriate.

4.4.2 Help desk activities Ensure that the Help Desk is ready to take calls. The primary purpose of the help desk is to provide a main point of contact for the users of the services. Whenever the users experience problems, have questions, or need information regarding the use of services, they should contact the help desk and not operators. Inform the users. On the other hand, the help desk is also responsible for notifying the users of disruptions in service, planned outages, and availability of new functions. The help desk serves as a two-way conveyer of information between the service users and the staff supporting the service. Track the problem. It is the responsibility of the help desk to keep track of the incident to ensure that problems are solved within the time agreed in the SLA. If the help desk cannot identify a solution to the problem on their own, the incident is escalated to a problem, prioritized, and stored in the problem tracking data base.

4.4.3 Operations On a busy highway with the cars following closely behind each other, all going at 65 MPH (100 Km/hr), if one car should slow down for a few seconds the car behind needs to brake, causing the car behind that to brake, and soon a traffic jam results with all cars in all lanes having to stop. Similarly, in a busy computer system, if one component should stop functioning (hangs) or even just runs slow, time is of the utmost importance to identify and fix unexpected events before they become a problem. This requires automation tools with hooks into system monitors with well-trained operators able to access the recovery procedures. This is lacking in all too many sites. If there is any small incident, the operators are trained to pick up the phone and call/page for system support. This can easily add 15 minutes to any outage in its most critical time. IBM studies on multi-system failures has shown that many large problems can be totally avoided if the operators and/or automation took quick action. Trained operators should be the first line of defense against unavailability. It takes time to enable this to happen. It requires education, empowerment, accountability, and compensation. Operators need education on the basics of z/OS, CICS, and DB2 commands and internals. They have to be authorized to enter various Display and Modify commands as needed. They need to be evaluated on the actions that they took to fix problems in a timely and efficient manor, and at the same time know their limits to know when to call out for help on more complex problems. Once trained, it is imperative that the operators stay within the company. This can be done

by several possible means: – Money. The most obvious. – Life satisfaction. Experience has shown that people like working three 12-hour shifts/week without changing time of day. This allows long-term personal life planning, including getting a second job if desired.

Chapter 4. Systems Management 227 – Career path. The system support organization can “hire” the best of the operators if a job opens. As well as being good for the operators who want this move, it is also good for the support staff since the “new hire” is already familiar with the environment. Many compare computer operators with commercial airline pilots. During a normal flight, the airplane flies itself. Pilots are there just for exception situations. Similarly, during normal business days, computers also “fly” themselves, with operators there for exception situations. While a plane is a lot more expensive than a computer, the impact of a prolonged service outage could add up to just as many dollars as a plane crash (although with less loss of life). Operators should be very trained, just like pilots, to handle exception situations. Perversely, pilots are the highest-paid members of the flight crew. Operators are the lowest-paid members on the IT crew. Update an Operating Procedure Guide. To help with knowing which commands to enter under which circumstances, many places today have a Standard Operating Procedure online guide. This guide should be constantly updated in what it covers. This can be done by simple actions: Give operators update authority to the guide. Update the guide every time a problem occurs and is fixed. Entries should include symptoms of the problem, display commands entered to identify the problem, and actions taken to fix the problem. Update the guide every time change requests are made for planned changes. If the change management process requires detailed instructions on how to make changes (CF maintenance, for example), take these detailed instructions and put them into the operator's guide. Every time new applications or subsystem functions go through the test process, relevant entries can be added. Exchange information with other companies about entries in this guide (though not necessarily the detailed data).

Additional benefits would be seen by having operators practice this during change windows, possibly freeing up personnel from the change management team at this time. In addition, it would be easier for change requests to be made by only listing the actions that need to be done once; then it can be done by operators using their guide.

To reduce single points of failure, the Standard Operating Procedure guide needs to be taken off of the host system and moved on to a LAN-based system with local copies downloaded to workstations.

One cannot expect to document every possible failure, but one can anticipate various failures that could happen.

4.4.4 Automation The need to simplify operations increases as you add hardware and software products to your data center, data centers to your network, and personnel to your data-processing staff. It simplifies operations; provides a single point of control for systems management functions; and monitors, controls, and automates hardware and software resources. Bottom line: It improves availability.

Automation can improve the availability of your system and network. Automated operations can quickly and accurately respond to unexpected events. When outages do occur, whether planned or unplanned, automation can reduce your recovery time. This is done by decreasing the chances for operator errors. Some operator errors can cause failures and lengthen

228 Achieving the Highest Levels of Parallel Sysplex Availability recovery times. For example, an operator might fail to see a message or type a command incorrectly. Also, an operator might have to type long sequences of commands, remembering the command syntax of several programs or components (or take the time to look them up). The opportunities for operator error are many. Substitute automatic responses for operator-typed commands. That will reduce the opportunities for error. When operator intervention is required, automation procedures can simplify the tasks, reduce the chances of mistakes, and ensure similar responses to similar events. Automation also expedites shutdown, initialization, and recovery procedures, reducing planned downtime. Automate recovery procedures. If an application experiences an error, attempt to recover from the situation as your policy defines. Recovery can include issuing commands or replies to a message, and restarting the application if it has ABENDed. You can also specify selective conditions and thresholds under which the automation does not attempt to recover an ABENDed application. Also, consider conditions that need to be satisfied before the starting or stopping of an application, for example, that a certain application needs to be backed up before it can be started. Suppress messages. The first job of an automation project is to manage the messages that are directed to Operations. Log analysis programs can be used in analyzing logs. The analysis programs helps identify frequently issued messages to target for suppression or automation. Monitor key batch job start and end times, network activity, and I/O errors to critical data sets.

4.4.5 Application testing New versions of z/OS and subsystems typically have new and changed messages. As the new software versions are being tested, automation changes to handle changed messages should also be tested at the same time. Typically, automation changes consists of just suppressing the new Informational messages. Decisions are needed on how to handle the error and action messages, with automation trying to handle the action messages being best. Try to only display messages when operator action that threatens availability is required. New versions of applications also have new and changed messages. Make sure automation is there for message suppression and control for this as well. This requires that all application messages have a message ID. As with all changes, test automation changes on test systems before promoting them to production.

4.4.6 Passive and active monitoring Passive monitoring is the only type of monitoring found in many locations. It consists of simply receiving information from system messages, alerts, and notifications. These can cause updates to resource status displays, which drive operator action that an event has already started. It would be nice if automation can notify operators of a potential problem before they do start. This is done by active monitoring.

Active monitoring tools can proactively identify and respond to system events that typically precede larger system problems, thereby preventing many types of common system failures, ranging from performance degradation to brief downtime to catastrophic failure. To maximize availability, tools can continually monitor system components, such as CPU, I/O rates, transaction rates, network connectivity, and their interactions, identifying events that are likely to occur and take appropriate automatic actions to prevent a failure. Many problems can be

Chapter 4. Systems Management 229 averted by applying technical and business expertise and user-defined priorities and by anticipating certain conditions that can affect the system. Tools that monitor resources in a complex environment need to use the system's capabilities to capture the vast quantities of available information. This information must then be presented to the operations staff in a simple, easy-to-use format that allows for quick and efficient notification and access to

appropriate diagnostic tools.

Active monitoring also checks and compares at regular user-defined intervals the resource automation status that it receives from systems and resources within the enterprise, with the desired resource status as specified in your automation policy. This helps to spot potential problems before they occur, which improves system and resource availability. The data received from resource monitoring can be used to trigger event-based automation. Individual monitoring granularity allows for monitoring critical resources more often than others.

Examples of proactive monitoring of critical functions can include: F IRLM,STATUS,ALLD Health of IRLM, Retained Locks information. Compare to thresholds based on known normal values. F IRLM,STATUS Locks Held and Waiters. D ETR Status of ETR ports. D GRS,C Enqueue Contention. D XCF,POL,TYPE=SFM SFM status.

For example, in order to ensure that the applications running under the transaction managers (such as CICS) are operational, it is not enough to know the status of CICS or the condition of the links. Other resources must be available for the required programs to take place. Automation can initiate a health check program by sending a request to the target CICS. CICS receives the request, invokes the program, and sends the results back to automation. The following are examples of what can be returned:  An acknowledgment (ACK) stating that the transaction completed successfully.  A negative acknowledgment (NACK) stating that the transaction did not complete successfully. If a NACK response is given, data can be passed back to automation describing the error condition.  Automation can monitor the transaction response time as well from when the transaction started until it received an acknowledgment.

This type of monitoring could detect outages when connections between regions were not reestablished after subsystems were started out of order. It could also help detect a loss of connectivity between subsystems.

Other active monitoring can include:  "PINGing" of key TCP/IP addresses to verify the health of the network.  Monitoring delays of system address spaces, started tasks, and batch jobs for exceptions.  Monitoring DASD volume information for exceptions can help identify performance problems before they become critical.

Trend reports across multiple days/weeks/months should be produced at a regular basis to know when proactive action should be taken.

Automation planning Other automation tasks can be put into place to support other system management functions.

230 Achieving the Highest Levels of Parallel Sysplex Availability For example:  Generate problem records for each incident automatically. This should be done for unexpected batch job return codes, for planned outages, etc., as well as for major system incidences.  Generate problem record reports automatically using data from the problem records. To enable trend analysis, data should include: – Total outages sorted by cause of incident (ABEND code, return code, etc.) – Total outages sorted by application – Details of job names that caused the outages – Trends of outages across multiple months

In general, automation reacts to messages. To ensure that all critical functions are automated, a considerable amount of research and implementation is required. Here are some guidelines to help in developing a comprehensive automation strategy: Obtain automation requirements from various sources: – Analysis of outages – Operations – Product and component message manuals – Test scenarios – New product installations – New function/feature implementations Analyze applications. – Gather relevant application information. – Should it be automated? – How is it started? – How does it indicate readiness for work? – How is it stopped normally? – What if it does not work? – When should it run? – Any dependencies? – What messages does the application produce? Do they have a unique message ID? –What WTORs? – Should this application be grouped with others? – How can this application be monitored? – Can it exploit system symbolics? An analysis of critical functions (see 4.10.1, “Component Failure Impact Analysis” on page 250) should be performed and automation put in place to handle failure scenarios of those functions. An effective way to accomplish this is to examine the messages associated with each component/product and select candidates for automation. Work with the product specialist to validate candidate messages/conditions and the automation logic used to handle them. Ensure your automation process provides the following functions: – Automatic restart after (most) failures. – Reestablishing connectivity to all other subsystems after recovery. – Start up and shut down procedures and triggers. – Cold/warm/hot startup. – Quiesce/immediate/force shutdown. – Abort shutdown in progress. – Errors during startup or it taking too long. – Monitoring how long shutdown takes. – Common commands/routines.

Chapter 4. Systems Management 231 – Dump procedures. – Reestablish connectivity if any subsystem communication is broken.

Anything that is done more than once per month is a target of what can be automated.

4.5 Recovery Management

The most valuable technical support person is not the traditional “hero,” but rather the system programmer whose objective is to not get called in the middle of the night. He equips the operations staff with the information and procedures to make them self sufficient.

Too many times there is a problem (for example, a subsystem is down) with system support trying to perform problem determination for two hours, while if the system was just re-IPLed, it could be up and running again within one hour. Recovery Management deals with the formal processes that take place when problems occur to minimize the impact and scope of problems when they do occur. This is what most people think of when they think about Availability Management.

4.5.1 Terminology There are many different terms that are used by different people. For this discussion, we use the following terminology: Event An abnormal occurrence in the IT environment that is resolved quickly before it can impact end users and become an incident. Actions are performed by automation and/or operators. Example: A recursive error occurs, causing use of message buffers in a LAN server to grow. Incident An event or series of events that disrupts (or has the potential to disrupt) IT production services to the user. Appropriate recognition and response is given to restore service in a timely manner to minimize the impact of service distruptions as seen by users. Actions are performed by operators or on-call system programmers. Example: All message buffers are used, causing the LAN server to stop functioning. Crisis An elongated outage that cannot be handled by level 1 operations and requires coordination of level 2 and 3 support resources. Restoring service may require coordination among multiple organizations. Example: Operators are not able to determine what failed due to lack of documented procedures. Outage Any interruption of service that prevents the user from using the IT systems to conduct their business. It can affect one user or many. It can affect one component or the entire system. Recovery The act of restoring service to the IT production environment after an incident. Example: The LAN server is rebooted.

Problem A fault or defect that requires further analysis after recovery, to determine and eliminate the cause of the incident. Example: A defect in the LAN server operating system that caused the server to be rebooted Root cause The underlying reason for why a defect exists. Example: A defect in system software was not fixed due to poor maintenance practices.

232 Achieving the Highest Levels of Parallel Sysplex Availability Recovery Management Proactive planning and preparation activity prior to an outage. Problem Management Reactive, following up after outage situations.

4.5.2 Recovery potential initiatives The variety of impact of incidents requires a variety of possible recovery actions, as described in Figure 4-6.

Minor Impact Severe Impact

Failure System Single Server Application Mainframe Network Total Type Message User Outage Outage System Outage Site Outage Outage Failure

Impact No user One user Some users One group Many users All users All users impact impacted impacted of users impacted impacted impacted impacted indefinitely

Possible No recovery 1) Cancel 1) Re-boot Restart in 1) Re-IPL or 1) Repair Execute Actions required user and server or place 2) Move to and disaster sign-on or 2) Fail-over back-up reinitialize or plan 2) Re-boot to back-up system 2) Switch desktop server users to back-up site

Figure 4-6 Actions versus failure type

An event when not handled expeditiously can result in impacting incidents, which, if not resolved immediately may become a crisis. Frequently there are built-in delays that contribute to the duration of an incident:  Waiting for support to respond.  Waiting for management decisions or interpreting management policies. Ensuring the operations and support staff is trained and equipped to handle the exception conditions, and developing and documenting problem determination and recovery procedures in preparation for an event that may be caused by a problem can minimize these delays. Poor planning and training can cause events to become an incident or a crisis, as depicted in Figure 4-7 on page 234.

Chapter 4. Systems Management 233

INCIDENTINCIDENT Documentation Training PD Check Lists Structured Recovery

EVENT CRISIS Monitoring Tools Training & Experience Detection Communication Actions to Prevent Impact Coordination of Resources Automation Escalation Criteria

Figure 4-7 When an event becomes a crisis

The focus of recovery management should be on avoiding critical situations and crises, but it is still necessary to be prepared for any possible situation. In a resilient infrastructure, potential failures are anticipated and appropriate recovery actions are developed in advance of any failure. All staff must be prepared and aware of what is expected of them, educated on crisis management and crisis plans. Problem determination and recovery procedures are tested to verify effectiveness and efficiency. Recovery procedures and management decision policies are documented, maintained, and mapped to business objectives prior to production implementation.

4.5.3 Recovery Management activities Establish target objectives for restart of the systems and subsystems. Decide management policies in advance. Identify failure scenarios, failure impact, and recovery alternatives using a structured analysis of the configuration components. Develop a systematic approach to problem determination procedures based on symptoms. Define documentation requirements and online repository tool for recovery plans. Write and test the identified recovery procedures. Define operator training requirements. Ensure problem management enables identification of problems with recovery procedures. Ensure change management provides for the identification of any associated change to recovery procedures. Enhance the efficiency of existing procedures. Tune the restart and recovery procedures to meet the service level objectives. Establish periodic reviews to continuously improve the recovery procedures and ensure their currency.

234 Achieving the Highest Levels of Parallel Sysplex Availability

Monitor progress against the objectives and refine as needed. The process should also include reviews to continuously improve the recovery procedures and ensure their currency. 1. The first step in the recovery process is to recognize that a failure has occurred and that action is required. Sometimes problems are first identified by users calling the Help Desk. This requires the Help Desk personnel to be able to quickly open a problem record and notify system Operations. Other times, problems can be first identified by system messages. Automation should be suppressing unwanted messages, but highlight messages that indicate problems. 2. The second step is identification of failing component symptoms (initial problem determination). Operations and system support need the means to quickly identify the problem. This includes use of monitors attuned to the system; the monitor thresholds must be set properly. The operational and support staff need to be prepared with the skills for handling exception conditions. They need to be able to enter and understand various display commands. This requires on-job training, cross training, and fire drills on test systems. Problem Determination checklists are developed and used. 3. The third step is identification of actions to be taken to restore service. The operational staff should be familiar with the online repository tool for recovery plans so that they can quickly and easily identify the appropriate recovery procedure. Recovery procedures should be well documented and easily retrievable. Prior to implementation of any recovery procedure or associated change, the condition should be simulated to familiarize the operator with the procedure as well as the conditions for when to use this procedure. Operations should sign off on any changes to the recovery procedures and documentation. Though every possible failure situation cannot be predicted, a comprehensive predefined recovery plan can mitigate the impact of a critical situation. The recovery procedures should be general enough to encompass a class of errors (Re-IML a control unit, restart DB2, reset the LAN). Any time new recovery steps are taken for a problem, the documentation should be updated to include these. Include console responses if you later choose to automate this procedure. Operations should have update capability to this guide. Wherever possible, the recovery procedures should be automated to reduce the possibility of human error as well as to reduce the duration of the incident—the total time starting from the first impact to a user and ending when all users are notified that service has been restored. If situations warrant it, there needs to be a well-defined escalation procedure. The crisis manager is the single key focal point. 4. The fourth step is to notify users that the system is back up. If the users are not aware that the system is available, they may think it is down when it is not. This impacts the perceived outage time and user satisfaction. Even before service is recovered, the users (via the Help Desk or intranet site) will expect to be notified of the failure and the anticipated recovery time. If situations warrant it, there needs to be a well-defined escalation procedure. The crisis manager is the single key focal point. 5. Finally, reports should be generated on results and effectiveness of recovery procedures. These include: – Mean time to recovery trends – Missing recovery procedures – Failing recovery procedures by cause Post incident analysis must ensure that recovery procedures are tuned and optimized for the fastest, most dependable recovery.

Chapter 4. Systems Management 235 Periodic firedrills are held for operations to periodically practice problem determination and recovery procedures. The production IT operations staff will then be prepared to handle component failures and other exception conditions. Initial problem determination must be done to identify the failing component and associated symptoms. – Once the component is identified, the actions to be taken must be identified and specific recovery procedures selected and executed to restore service. – In the case of a broken hardware device that is also a single point of failure, the only choice may be to wait for repair. – For non-critical devices, the device may be removed from the configuration and bypassed. – For components or data with a back-up, it may be necessary to switch to the back-up. – Many software components may simply require re-initialization. – The recovery execution will need to be monitored and service restoration verified. – The users will need to be notified that the system is back online. – The end users may then need to log back onto the system and recover any work in progress.

4.5.4 Event Management Event Management is the process of monitoring for abnormal IT conditions and taking action before those result in impact to the users. It supports quick identification and response to internal IT situations. This requires careful end-to-end monitoring of the system to quickly detect abnormal conditions and raise alerts. These monitoring tools are integrated into the system and drive automatic or automated actions whenever possible. As a prerequisite of event monitoring, all normal operating ranges for all components must be known. As applications and configuration changes, the thresholds must be reevaluated and possibly reestablished. As with many activities, there needs to be a process to kick this off. For procedures that cannot be automated, operators need to perform problem determination quickly and accurately. This requires skill in using the real-time tools and software (z/OS, DB2, WebSphere, etc.) display and modify commands. Note that this activity needs to be done by operators. Waiting for system support to be called while the event is unchecked will result in the event escalating to an incident. Operational skills must be constantly upgraded and practiced.

Events leading to impact are quickly recognized and provided to Incident Management.

4.5.5 Incident Management A critical characteristic of a resilient Incident Management is the ability to recognize and respond to incidents quickly and restore service in a structured, timely manner without the need to invoke support from outside organizations. Service failures are recognized quickly and actions to protect service are begun without delay. Problem determination data is immediately available, and the failing component is identified quickly and correctly on the first attempt, recovery actions are identified, and service is restored. Effective Incident Management requires well-defined and well-rehearsed procedures for identifying what has failed, determining how to recover, and flawlessly executing successful recovery. It must be assumed that during a critical incident, there is no time to

236 Achieving the Highest Levels of Parallel Sysplex Availability wait for support organizations to respond. Operator and system support training and experience is a requirement for effective and efficient Incident Management.

Effective Incident Management has dependencies on other critical processes. Coordination with the Help Desk can alert Operations to a potential problem before it spreads. Configuration Management ensures all components and potential failure scenarios are known. Create well-defined problem determination and recovery procedures. Effective Change Management ensures currency of these recovery procedures. If the problem cannot be recovered quickly, management policies and business considerations must be known with pre-defined escalation procedures.

4.5.6 Crisis Management An event when not handled expeditiously can result in impacting incidents, which, if not resolved immediately, may become a crisis, progressively increasing the impact and the recovery time. A crisis is any failure or abnormal condition that results in impact to the users and cannot be handled in a routine, structured manner. This typically requires multiple levels of support with multiple organizations working together to resolve. This can include technical support, application support, and/or management coordination.

Entering “crisis mode” should be viewed as a failure of the Incident Management process. Problem determination procedures have failed to identify the correct recovery action or procedure, a predefined recovery procedure does not exist, the recovery action has failed due to an incorrect recovery procedure or failure of the subsystem to restart, or service has not been restored within a defined period and restart objectives have been exceeded or are exposed.

A good Crisis Management process can keep communication channels open and reduce the overall duration of unanticipated outages.

During a crisis, the Help Desk is a critical communication link to the customer. It is used to keep the users informed of the incident and estimated recovery time so they may plan accordingly. Users can plan their own back-up and recovery activities, and it allows the business units to redeploy their staff for better utilization. The Help Desk also isolates Operations so they spend their time helping to recovery the system, not answering the same questions over and over again (“Hello, is the system down? What happened? When can you bring it back up again?”). A crisis plan and associated guidelines should be developed and documented to provide guidance to the crisis managers and all personnel involved in support of a crisis. The plan should describe what a crisis is and the criteria for declaring a crisis. Management policies and guidelines should be clearly defined and documented. These guidelines need to include when to escalate problems, who needs to be involved (and who should not be involved). A contact or on-call list including phone numbers should be documented together with back-ups. Prevent uncontrolled problem determination with a clear recovery decision process. Problem determination is one of the most significant and unpredictable contributors to outage duration. Turning the system over to a technical support without established time limits will elongate an outage. Strict controls of when to make a dump and recycle the subsystem (or LPAR) need to be followed. All too often, one hears “Give me another minute and I’ll have it,” but often this is not the case and 30 minutes later the system is recycled anyway.

Controversy Alert: One important issue that needs to be considered first is what happens if there is a system problem on one member of the Parallel Sysplex.

Chapter 4. Systems Management 237

One option to consider is that if the problem cannot be fixed in five minutes (for example), kill the system (following the standard procedures for bringing out a member from the sysplex). This minimizes the chances of “sympathy sickness” affecting other members. Since there is data sharing, this should not be considered an outage at all. In addition, make sure Operations is aware of this policy so it can be acted upon without fear of them

losing their job. A prerequisite for this policy is to verify that all critical applications do in fact have data sharing with dynamic workload routing capabilities. Many times, that is not the case.

Figure 4-8 shows a Sample Incident Severity and Escalation Levels table.

Sample Incident Severity and Escalation Levels

Incident Scope Impact Recovery Escalation Escalation Severity of Criteria Level Level Impact 1 A critical IT service is Critical Procedures unusable 5 min Tech Supp Mgr unavailable or or no alternate 15 min Data Center Mgr severely degraded available 30 min IT Executive 2 A critical IT service is Severe Alternate available, 15 min Tech Supp Mgr partially degraded or fail-over required 30 min Data Center Mgr at risk. 1 hr IT Executive 3 Non-critical IT service Minimal Bypass available 1 hr Tech Support Mgr is unavailable or 2 hr Data Center Mgr degraded

4 A component is None None; deferred 2 hr Tech Supp Mgr unavailable, but no maintenance 4 hr Data Center Mgr impact to IT services acceptable

Figure 4-8 Sample incident severity and escalation levels

Data capture Data to be captured, event logging, and documentation requirements are predefined. This allows rapid first-data capture for later debugging. Note that IBM sample Operator DUMP command parmlib members are distributed in SYS1.SAMPLIB. Data to be captured in a situation or crisis should include all of the following: – Event chronology. – Key decisions. – Who was called and when. – Alternatives considered. – Actions taken—successful and unsuccessful. – Technical data for subsequent problem determination, problem source identification, and problem resolution. – Record outages accurately and timely. – Problem determination time. – Recovery time.

– Existence and effectiveness of recovery procedures. – Records of the key sequence of events and subsequent results in the outage record, with reference to the detailed Crisis Log. – For each secondary or related problem recorded in the crisis log, open a separate problem in the problem management database, so that these secondary problems can be later resolved.

238 Achieving the Highest Levels of Parallel Sysplex Availability

The crisis manager should ensure that all activities relevant to the crisis are logged for a later post-incident review. Sources of this information can come from the Operator and situation manager's logs, the system logs (SYSLOG), as well as hardware (EREP) data. These should be copied to separate files to avoid loss through expiration of retention periods.

Actions to reduce impact  Actions that can reduce the frequency of unplanned outages include: Isolate components to avoid failures of critical services for reasons related to the less-critical services. Isolate the testing environment from the production environment to avoid failures related to nonproduction-level data and programs. Isolate user functions from database updates (that is, queue updates), so that user functions can continue even if the database is unavailable. Eliminate components that add a single point of failure, but do not significantly contribute to providing the critical services. Add redundancy, such as alternate processing platforms, redundant (mirrored) data, alternate paths for data, etc., to allow critical functions to proceed during outages of critical components. Implement reduced-function alternatives (such as read-only subset of transactions, queuing of updates to be applied later, use of back-level data, use of partial data, etc.) when complete application function cannot be provided. Perform postmortem analyses of all unplanned outages and take appropriate preventive action to avoid similar problems in the future. Perform periodic analyses to identify outage trends and take appropriate action.  Actions that can reduce the duration of unplanned outages include: Develop an automated process for monitoring and detecting outages of critical system and application components. Ensure that affected recovery procedures are updated, documented, and retested whenever a change is implemented. Hardware, subsystem, application, data, connectivity, environmentals. Automate decision-making and implementation procedures associated with recovery. Develop a clear fallback plan for each change.

Switch to pre-initialized alternate components when this can be done faster than recovering on the original component.

The crisis manager The crisis manager is a single focal point responsible for all activities associated with the situation. The crisis manager role could be assumed by any experienced manager who is immediately available at the time the crisis is recognized and who is trained and qualified in crisis management procedures and understands business priorities, understands the service level objectives and commitments, and brings business considerations and judgment to the decision making process.

Depending upon the scope and magnitude of the situation, the crisis manager can be any one of a number of people, including Operations shift supervisor, technical support manager, network operations supervisor, or data center manager.

Chapter 4. Systems Management 239

The person to be designated as the crisis manager for any given situation should be clearly understood. Many times when there is a problem, everyone wants to get involved, even if they are not needed. This only elongates the outages as it adds chaos to the operational support room, and wastes time as the events are repeated to personnel that should not even be involved. The crisis manager needs to be able to make sure only the

support staff needed is involved in the crisis. The crisis manager should be the single key focal point responsible for all activities associated with the crisis. He is responsible to verify the correct people are involved (including vendors) and coordinate the recovery efforts among the various organizations, communicate the situation to management, and be able to authorize any recovery actions that need to be executed. The crisis manager should ensure that all pertinent data and logs have been documented, and take part of all follow-up activities as appropriate.

Disaster recovery In a disaster, Crisis Management policies, planning, and preparation are essential to reduce recovery delays or incorrect recovery actions. The criteria for declaring a disaster and invoking recovery must first be clearly predefined to ensure complete control over switching to the recovery site. Consider more than an immediate catastrophic event such as the possibility of a gradually deteriorating situation. Avoid letting unfolding events determine default actions that can lead to inconsistent and uncertainty of data integrity.

Confidence in the readiness of the recovery site can affect the decision process. If the back-up system is several hours from being ready to assume the workload, the decision to pull the plug on a running system is difficult. It is easier to make a decision to terminate a functioning system when there is a functioning back-up system.

The infrastructure must support IT during any crisis and is essential during recovery from a major disaster.  Sufficient telephone capacity to support voice communication is needed.  Cell phone service must reach into all parts of the recovery site buildings. Plans must consider the possibility that pagers, telephones, and e-mail may also be impacted by the disaster and may not be available for some period following the disaster.  Workstations must be available to all essential personnel.  There must be enough space, desks, phones, or workstations to support IT and busniess unit staff who may be sharing the recovery site.  There must be enough ports to allow people to work from home.  The internal e-mail system is essential to provide critical communication and should be considered a priority application.  A configuration database describing the configuration prior to the disaster enhances the recovery process.  Application development and testing systems may be necessary to support program changes and facilitate recovery.

 Recovery procedures must be developed, tested, and practiced prior to the disaster.

Hold a surprise test Just as school teachers sometimes have surprise tests to verify the students really know the material, the same thing can be set up for Operations and System Support on test systems.

240 Achieving the Highest Levels of Parallel Sysplex Availability As this requires attention of the multiple support personnel, this is done during a low-impact time. The following should be looked at:  Responsiveness and accuracy of support staff.  Use and accuracy of the problem determination checklist.  Validate the escalation procedures.  Verify access to needed system resources and tools.  Verify logon passwords.  Use and accuracy of monitors.

4.6 Problem Management

Problem Management should be seen as a tool, not administrative red tape. For example, if there is a problem that was fixed by a colleague two weeks before and this person is now on vacation, you can get the fix right out of the Problem Management database. You will look like a hero rather than a fool.

Frequently the Problem Management group is focused on managing incidents rather than resolving and preventing problems. The problem managers frequently become crisis managers with a 24-hour view. Fighting the fires is obviously the top priority, but the scope of the problems do not end there. After service is restored with any fixes applied to the system, the post-incident activity must start. This is required if the objective is to reduce the impact of future problems by reducing the number of recurring incidents, the resolution time to correct all problems, and the impact of any problems that do occur. Problem Management encompasses the entire problem cycle, as shown in Figure 1-2 on page 5.

4.6.1 Data tracking and reporting Upon the first indication of an incident, a problem record must be opened. This record can be created from automation, the Help Desk, or an operator. As with everything else, the problem record cannot be a haphazard collection of letters and numbers, but must contain enough information so that detailed information about the cause and resolution of problems can be referred to in the future quickly. This requires planning: – The problem record fields should limit the amount of free-form text. This allows operations and system programmers to execute tools to locate similar problems in the past and what was done to correct the problem. Obviously, these tools must exist and be easily accessible. – Include separate fields for the start and stop time of the incident as seen by the end user. – Include a field to list the scope of the outage. – Train all personnel to know how to create quality records. Over the course of the outage, all the documentation on the incident data, event chronology, user impact, and recovery actions should be organized together into one incident record. If secondary problems arise during the incident, open a second problem record. This allows separate tracking of recovery actions and follow-up. The problem tracking system should maintain meaningful outage data in a problem database that can be used as the basis for a continuous improvement process. Records should be generated by Automation for all incidents, with as much data as possible being entered by Automation. Initially, this would generate a lot of records, but that should just

Chapter 4. Systems Management 241 drive the process to reduce the incidents and improve availability. The problem database should include: – Duration of the outage as perceived by end user. – Scope of the outage (users affected/unaffected, applications affected/unaffected, etc.). – Cross-references to related problem or change records. – Recovery actions taken, including factors that delayed recovery. – Adequate technical diagnosis and cause analysis information. – Results of post-outage review/analysis. Action taken to avoid similar outages in the future or to facilitate their recovery (both technical “fix” and process improvements).

Figure 4-9 illustrates what could be the content of an incident record.

Sample Incident Record Contents

Incident # Incident Type Descriptive Text Date Occurred Time Occurred Recovery Text Total Duration Location Closed Date System Impacted Problem Determination End Time Symptom String Problem Recognition Time Recovery Stop Time Closing Code Problem Determination Start Time Root Cause Code Detailed Event Chronology Recovery Start Time Closed By Closed Time

Figure 4-9 Example of a sample incident record contents

Data reporting Problem-related information in a structure format is available for performing efficient queries to have immediate access to data when required.

The three main categories for reporting are:  Batch to provide predefined reports, for example, number of problems per month.  Real Time to display status of a severity one call currently open.  Multi-Dimensional Analysis, allowing information to be analyzed along different dimensions, thus providing the user with views of her information. By showing different views of the data you can perform true trend analysis and forecasting using Data Warehouse/Business Intelligence analysis techniques. This function would help in the outage trend analysis as describe in later chapters.

This enables further analysis of data collected about recent outages. For example:  What patterns emerge from analysis of recent outage data?  Do any of the following emerge as trends during recent outages: – Repeat outages caused by the same problem? – Outages that were induced by recently introduced changes?

– Outages that might have been avoided had more attention been paid to “early warning” symptoms? – Outages that were prolonged by the detection, notification, diagnosis, decision-making, or implementation?

242 Achieving the Highest Levels of Parallel Sysplex Availability 4.6.2 Causal analysis Causal analysis is not about looking for someone to pin blame on. It is the way to identify the root cause of a problem so you can make sure it does not happen again. This is done by always asking “why?”

One of the most frequently misunderstood terms is root cause analysis. Root cause is often thought of as the triggering problem and thus many people think that root cause has been addressed when the triggering problem has been identified and resolved. Frequently, people think they have gotten to root cause and suspend any further detailed analysis once they have replaced a broken hardware part or fixed a software coding error. Rather, root cause is defined as the most fundamental problem, which, if corrected, would have prevented the failure from occurring. Frequently the root cause is a process or procedural problem. When performing causal analysis, look for the three matrix of unavailability: Could the incident have been avoided, the duration reduced, or the impact lessened? Using an example of a software defect, root cause might be explained by understanding how the software defect got into a production system. Digging further, one might also uncover a failure in recovery procedures, which might then be taken back to a “deeper root” such as inadequate skills or training.

The goal of causal analysis is to develop a list of recommended actions and a suggested implementation plan that, when implemented, will significantly improve the online availability of the applications during the prime and non-prime time hours of operation. It will also provide the justification for many of the recommendations by providing key input in the quantification of the expected improvement that results from implementation of each of the recommended actions. Figure 4-10 on page 244 is a structured list of elements to go through in a root cause analysis.

Chapter 4. Systems Management 243

Causes Actions

Number of Outages Decrease Frequency Frequency Reinitialize Proactive Problem Prevention Subsystem & Appl Abnormal Root cause analysis Termination System Outage Analysis Human Errors Effective Change HW Component Failures Management Recurring Problems Robust System Design Untested Changes Proactive Monitoring Complexity Standardization Outage Duration Reduce Duration Duration Unchecked Problem Effective Recovery Procedures Determination Component Failure Impact Informal Recovery Procedures Analysis Other secondary contributors Automation Initialization Design Situation Management Indecision Post Incident Reviews Lack of Back-up capability System and application design Warm Back-up Number of Users Impacted Limit Scope Scope System Design System Integration and Design Application Design Application Design Data Design Data Design System COnfiguration Isolation of Vital APplications Common Dependencies Component Failure Impact Unnecessary IPL or reboot Analysis

Figure 4-10 Measurement elements

One of the products of this analysis will be a determination of whether the outage was avoidable altogether, or whether some portion of the outage was avoidable. Avoidable outage time is meant to be the time that could reasonably be expected to be preventable with controlled processes and procedures, through exploitation of availability features and functions that are generally available or by addressing the root cause following the first occurrence of a unique problem. Examples include:  A recurrence of a problem that could have been fixed sooner to prevent the recurrence.  An operator error that could have been prevented through better documented or automated procedures.  Failures that could have been prevented through greater proactive monitoring.  Recovery time that could have been shortened through the use of automation, better procedures, faster response by support, clearly defined management policies, etc.  Outages that extended into the online hours due to late decisions to move to a backup.  Outages that could have been prevented through better testing, better management of change, etc.  Outages due to single points of failure in the existing configuration.  Outages avoidable by exploiting generally available hardware and software functions.  Problems that could have been investigated and corrected before it reoccurred.  Storage, library space, or other resource problems that could have been avoided by proactive monitoring or recovered via a less-disruptive procedure.

244 Achieving the Highest Levels of Parallel Sysplex Availability Sometimes, they may not have contributed to causing the outage, but are rather secondary problems that affected the duration and/or impact.

4.6.3 Maintenance policies

You should establish a maintenance policy. What service level do you want to be at? Review your rollout practices. Would you have better availability if you left larger gaps between the new release hitting each system (are you hitting problems on systems that should have been discovered earlier in the rollout) or are you waiting too long between each system (systems late in the rollout are hitting problems that are fixed on the new level)? When rolling in a new release, you should keep a system at the (n-1) level to fall back to in case there are problems.

4.7 Performance Management

Performance should be a defined threshold at which point slow response times are considered an outage.

Performance Management ensures tools, skills, and understanding of normal operating ranges in support of quick identification of potential bottlenecks that could lead to response time and availability issues.  Tools are implemented to monitor end-to-end performance.  Performance issues are identified as incidents and underlying problems are managed appropriately.  Benchmarks of normal operating ranges are established to allow effective determination of performance problems.  Performance monitoring probes are utilized to identify potential bottlenecks.  Event management includes performance events.

4.8 Capacity planning

Capacity Management uses performance data and business projections to identify potential capacity issues and proactively take actions to ensure sufficient (reserve) capacity is available to handle peak volumes and recovery requirements.  There is sufficient capacity to support peak loads (for example, processor, disk, and network, etc.).  Policies for reserve capacity to support component failures are established and enforced.  Systems are upgraded before it becomes necessary to cut into the reserve capacity.  Business growth and workload volumes are projected and capacity requirements are determined.  Capacity requirements of DR back-up systems are accurate and current.  Need spare capacity to take over work from a failed unit, but you also need spare capacity to handle the workload spike during recovery processing.

There is a higher level of incidents in systems constantly running flat out.

Chapter 4. Systems Management 245 The design point for many customers is oriented (skewed) towards price/performance by keeping the systems at 100 percent processor busy, and not oriented towards high availability. This way of setting up and running a Parallel Sysplex is in direct conflict with reaching very high levels of availability. The IBM design point for achieving high availability with Parallel Sysplex is to have spare redundant capacity (white space) so that workload can be distributed when planned and unplanned outages occur, and for executing recovery actions swiftly. Running processors for online work consistently over 90 percent busy and near to 100 percent busy causes problems when there are insufficient resources to recover quickly and properly. Such an approach will expose more stress-related software defects, will expose stress-related setup problems, and also results in inefficient performance at very high processor utilization. The design point for IBM processors is most definitely 100 percent. However, if a customer is committed to achieving very high availability using the Parallel Sysplex model, they should use 70–80 percent busy (average) and 90 percent busy (peak) as the design point for running their processors for online work. Any future processor capacity upgrade needs to cover for both latent pent up demand and the additional “white space” required for workload distribution and recovery action. It is a business decision to evaluate the cost/benefit of this recommendation as an insurance premium to be paid for achieving very high levels of availability by allowing less stressed environment and faster restarts after failures.

4.9 Security Management

People accidentally screwing up production systems because they had access to something they should not have access to is one of the larger reasons for outages. People are the weakest link in security, yet people are the most valuable asset in an IT shop. Information security is needed to balance this dichotomy, being used to protect the privacy of data and integrity of systems from unauthorized or inappropriate use, either external or internal, planned or accidental. The information security management objectives are Confidentiality, Integrity, and Availability (CIA).

The Security Management process is similar to the general Availability Management process as a whole:  Planning actions (Proactive)  Recovery Actions - Detecting and containing an intrusion, removing the intrusion, and restoring the system (Reactive)  Corrective Actions - Fixing the security hole (Reactive)  Preventive Actions - Causal and trend analysis (Proactive)

The rest of this chapter concentrates on the proactive planning activities. Just like the Availability Management process described earlier, the key to this is the installation security policy.

4.9.1 Security policy Before doing anything, obtain strong management support for security. This is a must. If management does not support security planning, it will not be done right. Too many people will ignore the policy and it will collapse like a house of cards.

246 Achieving the Highest Levels of Parallel Sysplex Availability

The Security Management policy should contain the following: Define why security is important. The goal of any security policy is to adequately protect business assets and resources with a minimal amount of administrative effort. Identify IT assets and protection requirements.

These could be any type of data object, such as files, directories, network servers, messages, databases, or Web pages. You must decide what users and groups of users should have access to these protected resources. You also need to decide what type of access to these resources should be permitted. Finally, you must apply the proper security policy on these resources to ensure that only the right users can access them. • Keeping non-authorized individuals from accessing any part of the system • Ensuring that even authorized individuals access only those resources for which they have further specific authorization Risk Analysis: Identify threats and risks. The Quantitative Risk Analysis method is to assign independent objective dollar values to components of risk and potential losses. This is not completely possible. Two tools used to produce a Quantitative Risk Analysis are Preliminary Security Examination (PSE) and an Evaluation Function. The PSE assigns a cost or value to each asset, rates the threats, lists the security measures (safeguards to mitigate threats), and then the report is reviewed and approved by management. The Evaluation Function determines how to delay loss, damage, and service outages; detect I/T fraud, unauthorized disclosure, and physical theft. Qualitative Risk Analysis is a large collection of scenarios (200–300) without dollar assignments, ranked by seriousness of threat and sensitivity of asset. The threats are matched to the assets in the scenarios. The scenarios also outline the mitigations and the actions taken to end or reverse the effects of the threat. The Delphi Risk Analysis method is independent and anonymous opinions circulated to teams to reach consensus based on value of data and probability of events. Balance® cost of data and mitigation. The cost of not securing the data can be calculated by the product of the Risk of single loss expectancy * annual rate of occurrence * Value of the data. The value of the data is calculated based upon: – Cost to acquire, develop, or maintain – Value to owner, custodian, or advisory – Real-world cost/value, price paid, or initial data load cost Establish IT security standards, measurements, practices, and procedures. The US TCSEC DoD standards focus on confidentiality. The TCSEC orange book is a metric of trust and defines security functions and requirements. The orange book defines the following security requirements and functions: – Security policy enforced by system – Objects must use access control labels (passive security) – Subject identification is required (active) – Audit logs must be protected – Evaluation mechanisms to measure trust of system – Protected tamper proof mechanisms

Chapter 4. Systems Management 247 The European ITSEC standards focus on integrity. The ISO 15498 was derived from multiple standards including the orange book. The document is called the common criteria and is based on defense in depth. ISO 15498 contains the following controls: – Information classification to ensure data has appropriate level of protection and CIA. – Data handling defines who classifies data, who reviews data classifications, and rules of data use (on desktops, PC, locked up, etc.) Define allowed deviations. Implement/build tools. The following is a list of countermeasure properties. – Cost effective. – Minimum human intervention. – Sustainable (the more automated, the more sustainable). – Override and fail-safe defaults. – Absence of design secrecy in solutions used. – Enforce least privilege. – No entrapment of users (authorized or not). – Controls independent of subjects. – Apply solutions uniformly (management or executives are not exempt). – Compartmentalize defenses and use defensive depth (multiple layers of defenses). – Isolate each safeguard (no interaction so one cannot be used to defeat another). – Minimize dependencies on common mechanism (avoid Single Point of Failure - SPoF). – Simple, effective, and reliable. – Completely define security policies. – Consistently enforce and apply security policies. – Monitor (instrument, log) and audit compliance. – Management support of policies. – User education on policies and acceptance of policies. – Accountability to policies. – Reaction and recovery to incidents built into countermeasures. – Residual audit trails/logs resistant to destruction by incident. – Audit trails/logs can be reset. Implement practices and procedures. – Physical access to IT resources is monitored, documented, and controlled. – Audit trails are maintained to log and track unauthorized IT resource changes. – Electronic access to the production systems and databases for unauthorized or inappropriate use is prevented and tracked. Lock down production systems. Audit IT security. An independent security consultant should try to break the system using “Ethical Hacking.”

4.9.2 Physical security

Physical security means to understand threats, vulnerabilities, and countermeasures related to physical security to protect sensitive information assets (which are people, facilities, data, equipment, and supplies). A key to physical security is understanding facilities management. Facilities management covers planning, environment, and administration of physical security. Facilities management planning takes into account low visibility, location (hazards, crime, natural disaster), hazards (airplanes, roads, other tenants in building, strikes), police, fire, and medical services. When planning a site the following should be considered for the

248 Achieving the Highest Levels of Parallel Sysplex Availability environment: Heating, ventilation, air conditioning (HVAC), air quality, contamination, and 70F/50 percent humidity for systems to reduce static. The facilities administration should consider emergency response, periodic inspection and report of emergency readiness and condition of facility, rotate duties, training, drills, and exercises (fire, medical, etc.).

Construction Construction should be in compliance with all building codes. The following should be reviewed: Building - Walls, floors, windows, doors, utilities, HVAC dedicated to one company. Media (that is, tape) - Cabinets, rooms, vaults, inventory audit, transportation. Electrical - Primary and alternate electricity source, power monitors/regulators, orderly shutdown for power outage, prevent loss, noise, transients, over/underage, or spikes from hitting equipment. Equipment can take 10 percent fluctuation. Shielding power cables. Fire - Prevent, detect, suppress balancing personnel safety and equipment security. Fire prevention - Fire protection systems, drainage, HVAC power, safety procedures, training, housekeeping Fire detection - Alarm box, automatic detectors or sensors Data center fire risk - (High to low) electrical distribution, other equipment, air conditioning, smoking, open flame

4.10 Configuration Management

An error message appears at the console that the control Unit at 10E0 is having problems and a part needs to be replaced. This would be simple if the operator knew who the vendor is with the telephone number to call. Once the engineer arrives, now all that is needed is to find this device.

Configuration Management is a prerequisite to effective Event Management. All the components that are necessary to run transactions must be known. It includes the physical resources inside the data center (hardware), all vendor products (software), and everything that connects two (network). – Component-specific information – Serial number – Attributes such as size – Vendor name and contact – The location of each component • For hardware, include the physical location of the component (address, phone, building, floor, room). • For software, include the systems on which it runs. – Connections and interfaces to other components – Maintenance levels of each component • EC levels for hardware • Release and PTF levels for software – Contacts for initialization, maintenance, and recovery of the component – Index to the initialization, problem determination, and recovery procedures – Users or business functions dependent on the component

The Configuration Management process provides configuration information to the support staff within IT and is valuable during the recovery process.

Chapter 4. Systems Management 249

This information should be maintained in a database that is kept current and accessible by operations and support staff. This database needs to be available to the Help Desk, Operations, Problem Management, and Change Management teams. Obviously, this information must be accessible during system outages to provide information for recovery decisions. Be sure to have a back-up version on a resilient platform.

Change Management, Problem Management, and recovery development processes maintain accuracy and currency of the component database. Where possible, use tools to automatically discover and document components and levels. Do not forget to update the Configuration Management database during emergency changes.

Benefits of a good configuration database span across several areas. As discussed, it assists in recovery execution by providing an index to the components. It also aids in the development of a formal recovery procedure by providing a cross-reference index for recovery procedures and in proactive problem prevention.

Secondary benefits include that the help desk is better able to determine which users are connected to which components. This improves Service Level Management. The Change Management process is also improved by understanding the interactions between components when one needs to be changed. In addition, it gives IT support a clear picture of what exactly is out there. Many times they would know about only the platform/technology that they support, with little understanding of the total picture and how it fits in.

In a resilient infrastructure all components in the end-to-end service delivery chain are known, failure impact understood, and recovery options determined. The connectivity and dependencies between components is known. Component failures are anticipated, the impact known, and recovery alternatives are developed. This is done by performing a Component Failure Impact Analysis.

4.10.1 Component Failure Impact Analysis A Component Failure Impact Analysis (CFIA) is a systematic approach that can be used to predict and evaluate the impact of component failures for each critical application. Starting with the list of components identified by the Configuration Management process, the impact to the system if this component is unavailable is determined (how many users, how long the recovery time is), along with possible actions to take to restore service such as bypassing or recovering the failed component. Interactions with other components are also identified.

CFIAs can be used to assess systems designs to decrease the impact of exposures. A single point of failure (SPOF) is any one instance of software or hardware where if it was not available, end-user availability would be impacted. One purpose of a CFIA is to identify potential single points of failure.

The CFIA utilizes a matrix to graphically relate effects of hardware and software components to systems and applications, as illustrated in Figure 4-11 on page 251. It contains a list of components on one axis and a list of systems, subsystems, hardware, and related applications on the other axis. The impact of a component failure on a system or application is then recorded and a recovery procedure number is identified in the grid. The CFIA matrix can also be used as an index to components and associated recovery procedures. It can also be used to maintain configuration information when made available to other processes.

250 Achieving the Highest Levels of Parallel Sysplex Availability

Group Software Processor 1 Processor 2 Shared DASD Item Number 1 2 3 4 5 6 7 8 9 1011121314 151617181920 212223242526272829 Component O J R E C S P D D C C C C S P D D C D D D S S E A N P E O S S H H H P E O S S H E E E W / S C V U R W E E 0 1 2 U R W E E 0 V V V I 3 2 F I V E 0 1 V E 0 1 / / / T 9 R I R I R 0 0 0 C 0 C C 1 2 3 H E E

Number Available 121111111122 11 222

System or Legend Application

OS/390 X X X XAA X B B X = Fails IMS X X X X A X B A B A = Alternate DB2 X X X X A X B A B Available B = Back-up Available

Procedure 1 5 82512 25 12 7 9 Number

Figure 4-11 CFIA grid

A CFIA is initially conducted as a three-phase project of preparation, analysis, and implementation. The preparation phase involves obtaining executive support, determining the objectives and scope, and setting the schedules and checkpoints for the project. Since the scope of the grid is a specific application(s), not all applications can have a CFIA done on them. Much of the content of each grid, though, will be the same between applications. The analysis phase involves doing the research, constructing the grids, reporting the findings, and recommending actions. Finally, the implementation phase includes reviewing with management and building implementation plans.

Questions that should be raised when doing the analysis are:  What is the impact if this component fails?  Is this a single point of failure?  How many users would be impacted?  What is the probability of failure?  What do I do when it does fail?  What recovery procedures need to be executed?  Are the procedures documented?  Can they be automated?  What could I do differently to prevent impact?  What system design alternatives exist for this component?  Is redundancy possible?  Can redundancy be justified?  Is placement of critical data optimal?

Configuration information and Component Failure Impact Analysis provide valuable information to assist with recovery procedures and validation of design and help proactively identify problem recovery areas before they become problems.

4.11 Enterprise architecture

All too frequently, there is a hodge-podge of servers on the floor, each one doing some task; many are redundant. If there is a problem, the problem determination process is to point a finger at the component at fault. Unfortunately, it is not uncommon for a transaction to pass

Chapter 4. Systems Management 251 through multiple different platforms and back again, not to mention the network connectivity connections. Complexity leads to:  Elongated problem determination. If a transaction fails, it is difficult to identify where the problem occurred. This is especially true if the problem involves the interaction between multiple platforms.  Complex change management. A change in an application on one platform requires changes on other platforms that must occur at the same time.  More problems: The more components there are involved in delivering a service, the less chance there is that anyone will know all the possible impacts of changing any given parameter and the more chance of an application change affecting some unsuspected component.  More difficult performance tuning. If there is a performance problem, it is difficult to pinpoint the problem system and difficult to predict how changes to one system will affect other platforms.  Extra overhead, due to data extracts when applications are not running where the data is.  More single points of failure.  Increased system programmer skills needed to support the multiple platforms.  Higher total cost of operations. Skills and resources are needed for all the platforms. Additional savings can be obtained through consolidation on the larger zSeries platforms.  Longer end-user response times. Transactions must bounce from place to place instead of running and ending. In addition, data may need to be transferred to a remote server instead of accessed locally.  Ineffective disaster recovery. It would be difficult, if not impossible, to replicate the configuration in a D/R site.  More political maneurvering. The easiest thing to do is to point fingers at other groups and say, “Not my problem!”

Eventually, this may lead to an out of control situation where departments have their own system, with their own standards, running ISV products differently from other groups, etc. A recommendation is to use an Enterprise IT Architect to have an agreed upon architecture as one way of trying to limit the number and level of components used by a given application.

An example of a complex configuration is shown in Figure 4-12 on page 253.

252 Achieving the Highest Levels of Parallel Sysplex Availability

Webservers

WAS Win/NT Svrs Security gateway

Security Win/NT Svrs

Win/NT Svrs FTP for file access FTP for DB extracts z/OS IP database extract utility IMS/CICS/DB2 TN3270D RACF LAN IP SNA

Win/NT Svrs RACF FEP - 374x

SNA SNA LLC2, Application Data Link LAN SNA Switch, SNA Files Database Win/NT Svrs CISCO 7500 CIP

Storage Fibre File Switches Servers Storage Area Network

Figure 4-12 A common configuration nowadays

One solution is to consolidate onto only two platforms, the zSeries and xSeries®, using the strengths of each. zSeries strengths are:  High performance transaction processing  I/O intensive workloads  Large database serving  High resiliency and security  Unpredictable and highly variable workload spikes  Low utilization infrastructure applications  Rapid provisioning and re-provisioning

Workloads that run well on zSeries (z/OS and zLinux):  Security and directory services  Print/file servers  Application servers  DNS servers  Web servers  Database servers  Transaction servers

Characteristics of work that runs well on BladeCenter™ implementation includes:  Clustered workloads  Distributed computing applications  Infrastructure applications

Chapter 4. Systems Management 253  Small databases  Processor and memory intensive workloads  Centralized storage solutions Workloads that run on BladeCenter include:

 Application servers  Terminal serving  e-Commerce applications  Corporate infrastructure  Deep computing clusters  Collaboration servers  Print/file servers  SSL appliances  Web services

Figure 4-13 shows the configuration after the Web servers, file servers, security gateway, WebSpheres, and applications are consolidated onto just two platforms.

Simplified Configuration

NFS/CFS FTP for DB extracts zLinux Kerberos/PKI z/OS Webservers telnet IMS/CICS/DB2 Connectors NFS/CIFS IP RACF w/ Kerberos WAS NFS/CIFS Applications TN3270D Comm Server LAN WAS

RACF

LAN Files Database

BladeCenter Webservers File Servers Enterprise Extender Storage WAS Area Applications Network

Figure 4-13 A simplified configuration

4.11.1 Infrastructure simplification Simplify infrastructure to create a more efficient, resilient, and responsive IT environment.

The Internet brought a much greater awareness and access to the power and breadth of computing solutions and the vast possibilities of managing and manipulating data.

Distributed hardware and software resources allowed for more flexibility in data manipulation and access, but the physical implementation of complex computing environments has left IT

254 Achieving the Highest Levels of Parallel Sysplex Availability managers with the chore of trying to make sense of what became a massive raised floor footprint—one that is largely fragmented, under utilized, and inefficient. In other words, existing IT deployments can be typified by excess overhead and high degrees of under-utilization. For both enterprise IT and the industry, infrastructure simplification has become a necessity for survival. If the myriad computing resources existing are not simplified,

the sheer cost of operation and maintenance could prevent future investment, and prevent achieving the ROI that would enable enterprises to seek out a competitive advantage in these vigilant economic times.

The emergence of new key technologies makes it possible for you to simplify that physical architecture. While logical layers cannot just disappear, certainly the physical layers can be simplified, resulting in more integrated, cost-effective, and flexible environments, as well as more resilience.

Infrastructure simplification steps beyond server consolidation to focus on additional cost reductions by creating a more efficient, resilient, and responsive IT environment. With infrastructure simplification, multiple tiers of computing systems (servers, storage, networking, etc.) are collapsed into a more efficient, less complex infrastructure, often using mainframe and blade technology.

What are these enabling technologies? First of all, advances in networking that allow you to think about centralizing computing resources. Secondly, the massive and rapid acceleration of software developers embracing Java™ and Linux, not only infrastructure services like file, print, and mail, but more and more enterprise applications, such as SAP and Oracle. As the IT community has embraced Linux and Java, the potential to consolidate those application servers that are distributed around the environment, and to consolidate Web application, file, and print servers is now possible.

The other emerging technology is Grid, and it is being adopted more rapidly than anyone expected, in addition to the emergence of the new scale-out technology in the BladeCenter and in scale-up technology with the mainframe (Linux on zSeries is an excellent example of scale-out technology). We are seeing a market-wide move to consolidate and to collapse these physical servers into two centers of gravity: The mainframe on one side and the BladeCenter on the other, and to virtualize and consolidate their storage into Storage Area Networks.

4.11.2 Ideal mainframe implementations Simplifying infrastructure begins with an understanding of system task requirements, and structuring them in the optimal processing environments.

Figure 4-13 on page 254 shows examples of the applications and services that it makes sense to consolidate onto the mainframe. You can see high-performance transaction processing, highly I/O intensive workloads, large databases, those application servers that have a requirement for high resiliency and bulletproof security, or unpredictable and variable workload spikes. Those things make sense on the mainframe.

4.11.3 Ideal BladeCenter implementation

Conversely, on BladeCenter, there is another class of applications that make sense in that environment. Applications such as clustered workloads, distributed computing applications, infrastructure applications, small databases, processor and memory-intensive workloads, and centralized storage solutions make sense on the BladeCenter.

IBM plans to provide guidance to the market at large as to how to implement these solutions, and help you to simplify your infrastructure.

Chapter 4. Systems Management 255 4.11.4 Examples and scenarios demonstrate real infrastructure simplification efforts Below are some examples of infrastructure simplification. These are taken from real customer experiences.

 Scenario 1: Web serving with high availability and security Leveraging strengths of Parallel Sysplex performance, availability, and security to take traditional DB2 with CICS and IMS transaction systems and re-engineering these using WebSphere Application Server with data connectors to build on demand Web applications. Tie to BladeCenter continuing front-end services like http server, traffic load balancing, Web caching, and firewalls. Combined with gigabit connections to a meshed network infrastructure makes this a viable model for simplified Web services to re-engineered traditional transaction systems.  Scenario 2: Using grid to balance workload across BladeCenter and zSeries resources Grid technology enables prioritization and scheduling of compute and I/O intensive workloads to zSeries and BladeCenter resources. Classic split would be to use BladeCenter for compute intensive and zSeries for I/O intensive. The scenario facilitates use of available zSeries capacity in reserve to meet peaks. Typical grid applications are on Linux; however, using standard Java and grid middleware, even traditional CICS transactions can leverage BladeCenter for compute-intensive calculations, while zSeries manages data and high transaction rates.  Scenario 3: Dynamic resource provisioning to improve efficiency and meet SLA Using the Tivoli® Intelligent Orchestrator to automatically provision an environment (for example, Linux virtual machine on zSeries) triggered by a response time SLA provisioning trigger. Also can allocate work to BladeCenter. When work is complete, signals tear down (re-purposing) of the zSeries resource to an available resource pool. Automation speeds environment set up and tear down so the user can get started on work faster, and business can handle additional situations in far less time.  Scenario 4: Storage Consolidation with resiliency via SAN Consolidated storage can benefit the zSeries/BladeCenter customer. IBM TotalStorage® provides the storage area networking (SAN) fabric, Network Attached Storage, and devices for open storage. Both zSeries and BladeCenter customers can use Fibre Channel Protocol (FCP) SAN connected IBM TotalStorage Enterprise Storage Server ® (ESS) to cost effectively store data with the option to leverage features like remote synchronous mirroring for high data resiliency. BladeCenter can use a NAS Gateway to store files on the SAN. NAS media sharing facilitates data store and sharing at read-only levels of data security.

While the primary driver of simplification is to be able to control your environment and simplify system management and availability, the simplified environment also enables large cost savings by consolidating hardware, software, and personnel resources.

4.12 IBM IGS High Availability Services

In this section we discuss IBM IGS High Availability Services.

4.12.1 More than just technology There is no such thing as a simple, easy, and inexpensive high-availability solution. Any given approach must strike a balance between true need and economics—and there are many ways to approach the problem.

256 Achieving the Highest Levels of Parallel Sysplex Availability Our approach to sizing up this as yet unknown entity is called the “bounded system,” which we define most simplistically as the environment (including hardware, software, processes and more) to be encompassed by our support agreement or guarantee. More specifically, it is a group of interacting, interrelated, and interdependent system elements forming a complex whole and performing complex procedures, work flows, and tasks associated with accomplishing business requirements.

A bounded system may be configured and measured as either a server-centric or a network-centric configuration.  A server-centric configuration is typically a centralized server or server cluster configuration with availability measured as the percentage of time the online services are functioning at the server level and are potentially available to a minimum group of clients anywhere within the system’s domain.  A network-centric configuration is typically based on a distributed architecture. Availability is measured as the percentage of time the online services are functioning at an end-user’s level and that a critical mass of end users are functioning during the customer’s specified online window.

Under either scenario, the bounded system configuration forms the basis for managing system availability and allows us to include or exclude certain system components.

Bounded system components At its simplest, a bounded system might include nothing but a CPU or data on a direct access storage device (DASD). These historically have been among the aspects of a system first addressed in providing enhanced availability. In today’s environment, however, it is recognized that applications and/or communications paths will generally be included in a bounded system. So a more complex bounded system might include redundant CPUs, hot-swappable RAID drives with multiple access paths, controlling software for the high-availability cluster, multiple LANs and WANs with all the redundant hardware and lines this implies, end-user devices with multiple network adapters, and so forth. A bounded system might even extend to an application and its performance, with a requirement that the application be available at all times to x percent of users and that the average response to any user request be completed in no more than y seconds. This approach to availability differs significantly from the other companies that are attempting to sell availability on the basis of CPU reliability. IBM’s methodologies allow us to encompass the entirety of the business system, from the physical processor through and including the end-user application.

Technologies and processes in a bounded system Understanding the bounded system is a crucial and complex task that forms the basis for further assessments of the computing environment. Only through the creation of a bounded system can the customer’s current availability level (CAL) be accurately determined. While determining the parameters of the bounded system, we are almost certain to uncover numerous deficiencies in the system and associated system management practices. These can inhibit or prevent the system from achieving the customer’s target availability level (TAL). By determining the gap between the CAL and the TAL, and working with the customer, we can develop a unique solution to address the customer’s availability goals.

Here are descriptions of the technology and process areas we evaluate in the bounded system creation effort:  Application Management  Availability Management  Capacity Management  Change Management  Metric Management

Chapter 4. Systems Management 257  Network Management  Performance Management  Problem Management  Service Level Management  System Recovery Management

A high-availability solution should focus on both preventing and avoiding problems in all the listed disciplines, that might lead to an interruption of service. Also it should focus on recovering quickly and minimizing the impact from any outages that do occur. Taking a proactive approach requires not only the proper hardware, but also the right mix of software and services to arrive at a total solution.

The need for software and services, in addition to infrastructure, is an important point. High availability is not achieved through technology alone; although technology is a key component of any solution. Purchasing a high-cost, fault-tolerant, state-of-the-art system does not necessarily mean that your business will achieve its desired level of availability. The technology requires proper service, administration, preventative support, recovery management, and planning.

The key to any high-availability solution is knowledge and planning. Systems availability must be assessed, preventive and remedial measures taken, and outage control plans put in place. A high degree of expertise is required if the solution is to be effective. Supporting such a solution may require a wide range of services that reach far beyond the initial stages. For instance, the system and software must be kept updated to meet changing demands. Often outside vendors may provide key components of the solution, such as off-site disk mirroring, monitoring services, or fully equipped recovery facilities.

All of these aspects of high availability require serious commitment. As the requirements for systems availability increase, so does the magnitude of this commitment. Few organizations are able to muster the resources, experience, and expertise needed to properly deliver on the promise of high availability. Consequently, they must turn to others for help.

IBM Global Services Offerings IBM Global Services is uniquely qualified to provide, support, and be an active part of high-availability solutions. It is one of the few vendors able to provide complete, end-to-end solutions for businesses and organizations in all industries. In addition to exclusive, world-class technologies, IBM offers an unparalleled breadth of high-availability skills and services. These include:  IT consulting  Systems and network management  Business continuity services  Hardware and software support services  Operations and administrative services  Site management and technology enablement

Depending on the need, IBM can provide everything from planning and assessments to fully operational disaster recovery centers that take over an organization’s entire data center operations in the event of catastrophe. IBM can also bring to bear its unique, proven methodologies that increase the effectiveness of its services while reducing cost. More information on IBM High Availability Services can be found at: http://www.ibm.com/services/its/us/availability.html

258 Achieving the Highest Levels of Parallel Sysplex Availability 4.13 Summary

In today's world, 24x7 availability is coming to be expected from customers. The business environment is changing rapidly and radically, with new emphasis on areas and opportunities that include rapid globalization of business, leading to more demanding needs such as 24x7 availability to accommodate the worldwide, boundless marketplace with explosive opportunity to enter new markets rapidly. The Internet has increased the emphasis, and the opportunity, in this global electronic marketplace, with easy access and limitless choices, accessible to anyone with a PC. The consequences of unavailability can be both tangible and intangible. It includes:  Lost sales  Lapsed customer service  Reduced customer loyalty  Lost employee productivity  Public image  Fines

Technology is a key part of high availability, but it does not stop there. Planning and ongoing support of the solution are just as critical. An optimum balance between availability solution costs and costs of unavailability is required in any environment. By far, the most inexpensive way of improving availability and also the method by which the most benefits would be obtained is by defining, and using, effective system management processes.

Chapter 4. Systems Management 259

260 Achieving the Highest Levels of Parallel Sysplex Availability Related publications

The publications listed in this section are considered particularly suitable for a more detailed discussion of the topics covered in this redbook.

IBM Redbooks

For information on ordering these publications, see “How to get IBM Redbooks” on page 263. Note that some of the documents referenced here may be available in softcopy only.  Parallel Sysplex Application Considerations, SG24-6523  Parallel Sysplex Continuous Availability Presentation Guide, SG24-4502  IBM eServer zSeries 900 Technical Guide, SG24-5975  IBM eServer zSeries 990 Technical Guide, SG24-6947  z/OS Intelligent Resource Director, SG24-5952  IBM ESCON Director 9032-005 Presentation, SG24-2005  Getting Started with the INRANGE FC/9000 Ficon Director, SG24-6858  Getting Started with the McDATA Intrepid Ficon Director, SG24-6857  Getting Started with the IBM 2109 M12 FICON Director, SG24-6089  DB2 UDB for OS/390 and Continuous Availability, SG24-5486  Parallel Sysplex Application Considerations, SG24-6523  DB2 for z/OS and OS/390 Version 7 Performance Topics, SG24-6129  DB2 for z/OS and OS/390 Version 7 Selected Performance Topics, SG24-6894  DB2 UDB for z/OS Version 8: Everything You Ever Wanted to Know, ...and More, SG24-6079  DB2 for z/OS and OS/390 Version 7 Using the Utilities Suite, SG24-6289  Application Design Guidelines for High Performance, SG24-2233  WebSphere MQ in a z/OS Parallel Sysplex Environment, SG24-6864  ICF Catalog Backup and Recovery:A Practical Guide, SG24-5644  Parallel Sysplex. Managing Software for Availability, SG24-5451  S/390 Parallel Sysplex Performance, SG24-4356

Other publications

These publications are also relevant as further information sources:  MVS Setting Up a Sysplex, SA22-7625  2064 zSeries 900 System Overview, SA22-1047  2064 zSeries 900 PR/SM Planning Guide, SB10-7033  IMS V8 DBRC Guide and Reference, SC27-1295  z/OS and Sysplex Health Checker User’s Guide, SA22-7931

© Copyright IBM Corp. 2004. All rights reserved. 261  z/OS MVS Planning: Operations, SA22-7601  S/390 Parallel Enterprise Server and OS/390 Reference Guide, G326-3070  IBM eServer zSeries 900 and z/OS Reference Guide, G326-3092  IBM eServer zSeries 990 and z/OS Reference Guide, GM13-0229  zSeries Capacity Backup User’s Guide, SC28-6810  zSeries 890 and 990 Processor Resource/Systems Manager Planning Guide, SB10-7036  DB2 UDB for z/OS V8 Installation Guide, GC18-7418  DB2 UDB for z/OS Version 8 Administration Guide, SC18-7413  DB2 UDB for z/OS Version 8 Data Sharing: Planning and Administration, SC18-7417  DB2 UDB for z/OS Version 8 Utility Guide and Reference, SC18-7427  DB2 UDB for z/OS Version 8 Application Programming and SQL Guide, SC18-7415  DB2 Connect Users’Guide Version 8, SC09-4835  DB2 UDB for z/OS Version 8 Command Reference, SC18-7416  z/OS DFSMSdss Storage Administration Guide, SC35-0423  z/OS MVS Planning: Global Resource Serialization, SA22-7600  z/OS JES2 Initialization and Tuning Guide, SA22-7532  z/OS JES2 Commands, SA22-7526  z/OS DFSMSdfp Storage Administration Reference, SC26-7402  z/OS: MVS Initialization and Tuning Reference, SA22-7592  Coupling Facility Configuration Options, GF22-5042

Online resources

These Web sites and URLs are also relevant as further information sources:  To get the CHPID Mapping tool: https://app-06.www.ibm.com/servers/resourcelink/hom03010.nsf  To get more information about IRD: http://www-1.ibm.com/servers/eserver/zseries/zos/wlm/documents/ird/ird.html  To get more detailed information about GDPS: http://www-1.ibm.com/servers/eserver/zseries/library/whitepapers/gf225114.html http://www.ibm.com/servers/eserver/zseries/announce/april2002/gdps.html  To get more detailed information about System-Managed CF Structure Duplexing: http://www.ibm.com/servers/eserver/zseries/library/techpapers/pdf/gm130103.pdf  To get more detailed information about 64-bit mode: http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/Flash10185  To get a copy of the white paper z/OS Performance: Managing Processor Storage in an all Real Environment: ftp://ftp.software.ibm.com/software/mktsupport/techdocs/allreal_v11.pdf  To read a document talking about performance and consoles in a sysplex: http://www.ibm.com/servers/eserver/zseries/library/techpapers/pdf/gm130166.pdf

262 Achieving the Highest Levels of Parallel Sysplex Availability  To read a WSC flash discussing the structure placement: http://www.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/W98029  The CFSizer can be accessed at: http://www.ibm.com/servers/eserver/zseries/pso/tools.html  The IBM Health Checker for z/OS and Sysplex can be downloaded from: http://www.ibm.com/servers/eserver/zseries/zos/downloads/  msys for Operations is available at: http://www.ibm.com/servers/eserver/zseries/zos/downloads/  To get more information related to XCF Coupling Facility Performance: http://www-1.ibm.com/support/techdocs/atsmastr.nsf/WebIndex/FLASH10011  To know more about the IBMLink facility: http://www.ibmlink.ibm.com  To know more about the ESO IBM service delivery mechanism: http://service.software.ibm.com/holdata/390hdmvsinst.html#Header_3  To know more about the SUF IBM facility: http://www.ibm.com/servers/eserver/zseries/zos/suf/  To know more about the IBM ShopzSeries facility: http://www.ibm.com/software/ShopzSeries  To get further information regarding Consolidated Service Test: http://www.ibm.com/servers/eserver/zseries/zos/servicetst/  To get further information regarding Enhanced Holddata: http://service.software.ibm.com/holdata/390holddata.html  To get further information regarding IBM maintenance suggestions: http://www.ibm.com/servers/eserver/zseries/library/whitepapers/psos390maint.html  To get more information on IBM High Availability Services: http://www.ibm.com/services/its/us/availability.html

How to get IBM Redbooks

You can search for, view, or download Redbooks, Redpapers, Hints and Tips, draft publications and Additional materials, as well as order hardcopy Redbooks or CD-ROMs, at this Web site: ibm.com/redbooks

Help from IBM

IBM Support and downloads ibm.com/support

IBM Global Services ibm.com/services

Related publications 263

264 Achieving the Highest Levels of Parallel Sysplex Availability Index

APAR OW51103 138 Symbols APAR OW52293 157 $HASP426 142 APAR OW53349 67 APAR OW53831 44 Numerics APAR OW56611 70 2064 49–50 APF 8, 108, 164, 185, 188, 197 2064-100 13 APPC/MVS 109, 184 2064-2xx 44 APPCPMxx 185 2074 63, 81 ARF 17, 22 2084 27 ARM 67, 106 2105 50 ARMWRAP 108–109 3088 119 ASCHPMxx 185 3174 51, 63 ASM 74 31-bit mode 36, 93 asynchronous 56–57 3494 60, 62 Asynchronously 62 64-bit mode 45, 48, 50, 74, 90, 93 AUTOACT 80, 84 900 45, 48–49 AUTOBACKUP 72 9032 51 AUTODELETE 118 9036 42 Automatic I/O Interface Reset Facility 17, 22 9037 41, 43–44 Automatic Reconfiguration Facility 17 9393 50 Automation 9, 217, 228, 230, 232, 235, 241, 256 9672 13, 18, 26, 49, 54, 79 automation 8, 14, 43, 58, 65, 68, 71, 73, 76–77, 81, 84, 91, 97, 103, 108, 147–148, 165, 192, 217, 219–221, 227–231, 241, 244 A AUTOMOVE 152, 154, 157 ACDS 72–73, 196 Availability ACS 112, 174–175 better 210 ADYSETxx 187, 204 CDS 68 ALLOCAS 138 Continous Operation 5 ALLOCxx 113 continuous 210, 214 ALLOWAUTOALT 118 Continuous Availability 5, 210, 214 ALTERNATE 83, 94 Coupling Facilities 161 ALTGRP 83, 94 current 257 AMDSADD 203 customer 184 AMDSADMP 203 end-to-end 216 AMRF 82, 84, 95 end-user 250 AMS 171 enhanced 257 APAR 178 greater 150 APAR OA05025 70 Guidelines 211 APAR OA05391 70 guidelines 211 APAR OW16903 123 High Availability 5, 66, 183, 211, 215, 223, 226, 246, APAR OW26833 133 256, 258–259 APAR OW28460 33, 87 highest 88 APAR OW30814 105 impact 80, 213 APAR OW30926 103, 105 improve 106, 242 APAR OW32480 108 improvements 213 APAR OW33615 105 improving 211 APAR OW38138 125 increases 94 APAR OW41306 116 level 258 APAR OW41617 88 Levels 212 APAR OW41959 105 levels 85, 211–212, 215, 246 APAR OW44231 43 log data 117 APAR OW45976 88 logical 68 APAR OW48624 67 management 165 APAR OW50900 138 maximize 229

© Copyright IBM Corp. 2004. All rights reserved. 265 measuring 216 CF 12–14, 19, 24, 27, 49 Monitor 211 CF CPU 86, 89, 112 monitor 211 CF CPU utilization 25, 34 monitoring 217 CF failure 26, 31, 33, 89, 105, 117, 162 network 216 CF latch 35 objectives 211 CF Link 19 ongoing 68 CF link 15, 34, 36, 39–40, 86, 88, 105, 112, 162 on-line 243 CF LPAR 35, 39 overall 71 CF memory 35 physical 131 CF response times 25 plan 211–212 CF storage 39–40 possible 65 CF Structure Sizer 112, 138, 166, 171 potential 146 CF subchannel 86 practices 212 CFCC 14, 20, 24–25, 30, 37, 39–40 real 213 CFCC Level 9 49 reasons 96 CFLevel 30, 32–33, 36–37 report 217 CFLevel 12 36 requirements 214 CFLevel 9 37 resource 230 CFLEVEL=11 88 service 213 CFRM 37, 67, 69, 89, 92, 94, 117 sysplex 65, 97, 134 Channel Subsystem Priority 50 System 212 Channel Subsystem Priority Queuing 50 system 65, 80, 218, 257–258 CHNGDUMP 201 systems 212 CHPID 16, 24 target 257 CICS 8, 16, 27, 35, 73–74, 109, 148 XCF 131 CICS Subspace Group Facility 17 Availability plan 4, 212 CICS Subsystem Storage Protection 17 CIU 18–19 CKPT 141 B CKPTDEF 140, 144 bandwidth 16, 22, 24, 34, 46–47, 49–50, 52, 60 CLASSDEF 122, 125, 127 basic mode 49–50, 96 CLASSLEN 125 battery 12, 39–40, 56, 110, 144 CLEANUP 96 see also IBF 110 CLO 41, 44 BCDS 168 cloning 4, 75, 155 BCS 169, 171, 173 CMDSYS 79, 82, 84 BLKUPD 161 CNGRPxx 187–188 BPXF214E 68 COFVLFxx 173 BPXMCDS 67–68, 151, 157 COMMDS 73, 196 BPXPRMxx 68, 150–151, 198 COMMON 73, 193 BRLM 156–157 common recall queue BSS 74 refer to CRQ 166 BTLS 61 Component Failure Impact Analysis 216, 250 byte range lock manager CONFAIL 104–105 refer to BRLM 156 CONFIG 81 CONSOLE 76, 187 C Console 76 Capacity BackUp console 94 refer to CBU 17 hold mode 78 Capacity Upgrade on Demand 17 roll deletable mode 78 capped 48 roll mode 78 Capping 47 Console Availability Feature 80 Console Integration 16 CAS 171–172 cascaded FICON Switches 54 CONSOLxx 79, 81, 187, 192, 205 CBPDO 181 cooling 12, 56 CBU 18–22 Corrective Actions 4 CBU UNDO 18 Couple Data Set CDS 66, 128, 168 refer to CDS 66 CDSC 173 COUPLExx 70, 96, 120, 122–123, 128, 198 Certificate Authority 163 Coupling Facilities 13

266 Achieving the Highest Levels of Parallel Sysplex Availability Coupling Facility 24 Dynamic I/O Reconfiguration 48 coupling link 26 Dynamic I/O reconfiguration 8 CP 16, 19, 21 dynamic I/O reconfiguration 23–24, 30 CPC 12, 14, 17–18, 20–21, 23, 26–27 Dynamic ICF Expansion 35 CPF 84 Dynamic Reconfiguration 144–145 CQS 109 dynamic storage reconfiguration 17 CRQ 166–167 Dynamic upgrade of spare installed memory 19 Cryptographic Coprocessor 164 Cryptographic coprocessor 23 cryptographic coprocessor 16, 22 E CSA 95, 106, 109 ECS 170, 178 CSS IOPQ 45 ECSHARING 171 CST 181 EJB 73 CSVLNKPR 190 EMCS 79 CSVRTLS 194 EMIF 56, 60, 63 CSVRTLxx 194 ENF 161 CTC 119–120, 125 Enhanced Availability Feature 51 CTP 51 EREP 191, 206 CTP card 53 ESCON 15–16, 19, 24, 48, 69 CUoD 18–19 ESCON channel 50–51 Customer Initiated Upgrade 17 ESCON Director 44, 48, 50, 52–53, 63, 80 ESCON Director Console Application 53 ESO 181 D ESS 12, 176 DAE 186, 203 ETR 43–44 DASD Fast Write 69 EXAMINE 172 DASD mirroring 56 EXITxx 188 data sharing group restart 27 Expanded Availability configuration 41 DB2 16, 27, 30, 33, 39, 74 EXSPATxx 96 DB2 GBP 27, 33 Extended Remote Copy 58 DB2 Global Buffer Pool 88–89 DCM 45, 48 DEACTTIME 104 F dedicated CP 23 FACILITY 176 dedicated engine 25, 35 fault tolerance 5 DEVNUM 83 fault tolerant hardware 7 DFHCSD 73 FC 54 DFHEJDIR 73 FC link 54 DFHEJOS 73 FCTC 54 DFS 153, 157 FCV 48 DFSMS 74–75, 89 Fibre Channel 54 DFSMSdfp 57 FICON 16, 24, 48, 69 DFSMSdfs 62 FICON adapter card 54 DFSMSdss 70, 175 FICON channel 51 DFSMShsm 61, 71–73, 166–168, 175 FICON channel card 54 DIAGNOSE 171–172 FICON Director 51 DIAGxx 186 FICON switch 53 Director Device Port 52 FICON Switch port 54 DOM 84 FICON-capable Fibre Channel Switche 51 DSEXTENT 98 FMID (JBB7727) 76 dump 73, 85, 96, 99, 102, 166, 175, 186, 195, 199 Dump Analysis and Elimination 203 G DUPLEX 72, 87–88, 110, 112, 117, 141, 143 GBP2 35 duplexing 25, 35 GDPS 14, 56, 58, 63, 65, 75, 89 duplicated data 7 generator 12–13 DVP 52 generators 13 DWDM 41–42, 44 Geographically Dispersed Parallel Sysplex 58 Dynamic CF Dispatching 25 GMT 116 Dynamic Channel Path Management 50 Goal mode 48 Dynamic I/O 16 GRS 60, 70, 89, 96, 128, 143, 164, 178, 188 Dynamic I/O configuration 24

Index 267 GRSCNFxx 134, 189 IEA793A 202 GRSRNL 137 IEAABD00 199–200 GRSRNLxx 130, 134, 136, 138, 159–160, 170, 189 IEAABDxx 186 IEAAPFPR 185 IEAAPFxx 185 H IEACMD00 200, 208 HALDB 75 IEADMP00 199, 201 HARDCOPY 83 IEADMPxx 186 Hardware Management Console 19 IEADMR00 199 HBB7705 112 IEADMRxx 186 HCD 23, 48, 52, 88, 138, 189 IEAICSxx 197 HCM 14 IEAIPSxx 197 HCPYGRP 84 IEAMDBLG 206 HDD 53 IEAOPTxx 197 Health Checker 31, 76, 92–93, 96 IEARELCN 81 Health Monitor 145 IEASLP00 204 HFS 149, 157, 163 IEASLPxx 195, 204 HIGHOFFLOAD 98, 111, 113, 115, 118 IEASYM00 76 HIPER 180 IEASYSxx 73, 95, 100, 191, 193, 205–207 HMC 20, 39, 44, 47, 79–80 IECIOSxx 190 HSA 23–24 IEE037D 98 HSM 28 IEE041I 98 HyperSwap 66, 68, 75 IEE043I 98 Hyperswap 59 IEE141A 83 IEE533I 98 I IEE769E 99 I/O bus 26 IEFAUTOS 89, 138 I/O Cage 19 IEFEXPR 188 I/O cage 16–17 IEFSSNxx 196 I/O card 16–17, 19 IF 86 I/O port 19 IFAPRDxx 194 IBF 13, 16, 30, 39 IFL 19, 21, 48 IBM Health Checker 96 IGDSMSxx 196 IBM Resource Link 19 IGW609A 73 IBM Structure Sizer 117 IGW614W 73 IBMLINK 180 IGW615W 73 IC 26 IGW619I 73 ICB 26, 159 IKJTSOxx 197 ICB-2 26 IML 16 ICB-3 26 IMS 16, 28, 74, 89, 109 ICB-4 26 INGCF 100 ICB4 27 INGPLEX 100 ICF 16, 19, 21, 24, 30, 61, 86, 169 INITSIZE 112 ICFRU 62, 176 Integrated Cryptographic Service Facility ICHRDSNT 158, 160, 162–163 refer to ICSF 163 ICHRRNG 158 Internal Battery Feature ICKDSF 175 refer to IBF 13 ICSF 163 INTERVAL 96 IDCAMS 61 IOCDS 23, 40, 63, 189 IEA015A 43 IOCP 20 IEA099A 77 IODEF 23 IEA101A 132 IODF 169–170, 178, 189 IOEFSPRM 155 IEA230E 77 IEA231A 77 IOEZPRM 154 IEA262I 43 IOS 48 IEA404A 77 IOT 190 IEA405E 77 IPCS 201, 207 IEA652A 77 IPL 16, 20, 22, 53, 66, 68, 71, 73, 75–76, 79, 81, 89, 93, IEA653I 77 95, 102, 117, 119, 128, 130–132, 138, 147, 151, 155, IEA654A 77 158–159, 162, 164, 175, 180, 183–193, 199, 232

268 Achieving the Highest Levels of Parallel Sysplex Availability IPLPARM 175, 193 LOADxx 71, 175, 189, 193 IRD 45, 50 LOCAL 193 IRLM 29 LOCALMSG 126 IRRDBU00 161 lock 29, 143 IRRUT200 161 Lock structure 36 IRRXCF00 159–160 lock structure 29, 129 ISC 26, 172 locks 39 ISC-3 19 LOGGERDUPLEX 112 ISC3 16 LOGR 67, 89, 94 ISG005I 132 LOGREC 28, 36, 109, 115, 191, 206 ISG006I 132 LOGSNUM 111, 118 ISG210E 131 LOWOFFLOAD 111, 113, 115, 118 ISG219E 131 LPA 95 ISG220D 131 LPALST 95 ISG322A 133 LPALSTxx 191 ISG337I 133 LPAR mode, 50 ISG338W 133 LPAR Profile 20 ISGCGRS 133 LPAR Weight Management 48 ISGLOCK 129, 132 LSQA 200 ISGRNLCK 136 lXGLOGR 67 ISGRUNAU 135 ISL 54 ISMF 61 M ISOLATETIME 103, 105 MADS 74 ISPF 72, 136, 204 maintenance 21, 178 IXC102A 99, 104 MASTER 83 IXC220W 68 master catalog 66, 71, 75, 172–176 IXC253I 68 Matrix Controller/Matrix Switch 51 IXC426D 120 MAXGROUP 128 IXC803I 107 MAXMEMBER 128 IXCARM 106, 108 MAXMSG 94 IXCL1DSU 67, 119 MAXVIRT 173 IXCMIAPU 104, 107, 115 MCCU 119 IXF220W 69 MCDS 168 IXF253I 69 MCS 79–80, 94, 187 IXG257I 114 Memory Upgrade on Demand 21 IXG261E 114, 118 MES 13 IXG262A 114, 118 microcode 16, 39, 50, 79 IXGWRITE 116 MIF 16, 24, 26 MIH 190 MIPS 15, 23, 34–35, 80 J MMS 192 JES2 28, 39, 72, 78 MMSLSTxx 192 JES2 checkpoint 139 Monoplex 49 JES2 Exit 72 MPF 78, 84, 192 JES3 139 MPFLST 84 JESLOG 147 MPFLSTxx 84, 192 JESMSGLG 147 MQ 37 MQSeries 28 MSCOPE 83, 95 K msys for Operations 68, 74, 77, 97 kernel 149 MVS/ESA 8 MVS/ESA 5.2 108 L MXC/MXS 51 LIC 16, 48, 51, 56 LIC-CC 17, 19–20 N Linux 16, 19 NetView 79 list structure 110, 119 NEWCKPT 141 LNKLST 190 NIP 80, 95 LNKLSTxx 163–164, 190 NOAUTOMOVE 152, 157

Index 269 nonvolatile random access memory 61 PLPA 73, 191, 193 NUCLEUS 95 policy 32, 38, 70, 101, 103, 115, 152, 163, 183, 188, 193, NVRAM 61 224, 229, 238, 245–246 NVRs 169 ARM 67, 107–108 CDS 67 CFRM 37, 67, 85, 90, 101, 105, 112, 117, 132, 138, O 141, 144, 160, 162, 166, 170 OCDS 168 LOGR 67, 110–111, 115 OFFLOADRECALL 115, 118 SFM 67, 103, 120, 162 OLDS 74 WLM 67, 146, 148 OMVS 149, 154 policy based 8 On/Off Capacity Upgrade on Demand 17–18 policy CDS 67 On/Off CUoD 19 POR 23, 40 OpenEdition 148 port swapping 52 OPERLOG 28, 36, 78, 83, 109, 115, 206 power supply 12, 16, 56, 144 OPERPARM 79 Power-On-Reset 15, 18, 20, 30, 37 OS/390 20, 24, 26, 30, 43 Power-on-Reset 189 OS/390 1.3 87, 105, 114–115, 117 Power-On-Reset. 17 OS/390 2.10 97, 154 POWERSAVE 39 OS/390 2.7 100, 133 PPRC 56–57, 65, 68, 70, 75, 89, 175 OS/390 2.8 32, 101 PPT 194 OS/390 2.9 67 PR/SM 40, 45, 67, 96 OS/390 R3 33 PREFLIST 31, 36–37, 85–86 OS/390 R6 33 Preventive Actions 4 OS/390 V2 97 procedures 4, 9 OS/390 Version 2 39 PROCLIB 71–72, 154 OSA-2 16 PROGxx 163–164, 185, 188, 190–191 OSA-Express 16 PTF 178 Outage PU 16, 18–19 component 4 Public Key Infrastructure planned 153 refer to PKI 163 service 4 PUT tape 179 unplanned 152 outage 1–2, 12, 16, 21, 27, 36, 52, 59, 61, 65, 75, 86–87, 92, 95, 99, 105, 132, 137, 144, 150, 153, 173, 183–184, R 203, 207, 213, 230, 232, 235, 238, 241, 244, 258 RACF 28, 72, 79, 89, 157 cost of an 2 RACFDS 160 extended 92 RAID 56, 68, 74 planned 7, 157, 199, 227 REBUILDPERCENT 31, 105, 132 power 249 Recovery 4 system 180 Recovery Point Objective 57 unanticipated 237 Recovery Time Objective 57 unplanned 7, 132, 208, 224 Redbooks Web site 263 unscheduled 209 Contact us xiv redundancy 9, 13, 21, 24, 41, 51, 53, 59–60, 74, 121, 125, 128, 216, 239, 251 P remote copy 56 page data sets 73 remote copy technology 14 PAGEADD 193 Remote Support Facility 19 PAGEDEL 193 Reserved CP 22 PARMLIB 72, 75, 95–96, 100, 113, 120, 122–123, 128, RESERVED ICF 40 130, 134, 136, 138, 150–151, 159–160, 163–164, 170, Reserved Processor 21 173, 185, 193 Reserved Storage 21 PATHIN 128 RESETTIME 104 PATHOUT 128 resilient software 7 PCI 23 RESOURCE 20 Peer to Peer VTS. 62 RESTART_CSA 106 PER 195 RESTART_GROUP 106 PKI 163 RESTART_ORDER 106 Plan Ahead 19 RESTART_PACING 106 planned outages 8 RETPD 118

270 Achieving the Highest Levels of Parallel Sysplex Availability RIDEOUT 39 SMSVSAM 73 Ring mode 128 spare capacity 19, 21, 35, 245 ring mode 132, 136 spare port feature 52 RLIM 82 spare PU 19 RLS 89, 164, 168 SPIN 147 RMF 36, 86, 124–126, 128, 133 SPINTIME 96 RNL 139, 159–160, 164, 178, 188 SPOOL 139, 143, 147 RNLs 134 SPOOLDEF 72, 140 root cause 4 SPPINST 136 ROUTCODE 83, 95 SQA 95 ROUTE 78 SRM 197 RPO 57, 59 SSI 78, 84 RPQ 42 SSM 167 RPQ 8K1919 42 SSN 196 RPQ 8P1955 43 Star mode 129 RRS 28, 109 STG_DUPLEX 112 RTLS 194 STI 24, 26 RTO 57, 59 STRUCTURE 87 RVA 176 structure 31, 33, 37, 85, 87, 111 BCS 169 buffer pool 33 S cache 87, 164, 168, 170, 173 S/390 18, 88 catalog 169, 171 SA/390 53, 69, 74, 77 CF 30, 50, 67, 88–89, 94, 98, 102, 138 SAF 108 checkpoint 144 SAMPLIB 81, 133, 136, 206 CICS Logger 27 SAP 16, 24 CICS Named Counter 27 SCA 29 CICS TablesData 27 SCDS 72 CICS Temp Storage 27 SCHEDxx 194 CRQ 167 SDM 58 DB2 GBP 27 Security 13 DB2 Lock 27 security 14, 176 DB2 SCA 28 service class 148 Duplexed 25, 86 Service Level Agreement 215 duplexed 25 service level agreement 5 Duplexing 33, 35 Service level objectives 9 duplexing 111 SETSMF 196 Enhanced Cat Sharing 28 SETXCF 30, 38, 68, 87–88, 119, 121, 125, 128, 198 GBP 29, 33, 88 SFM 17, 67, 96, 99, 103, 106, 121 GRS Star 28 shared data 4 HSM Common Recall Queue 28 shared engine 35 IEFAUTOS 28, 138 shared engines 25 IMS Cach 28 Shared HFS 67 IMS Lock 28 shared HFS 151–153, 155, 157 IMS Logger 28 SHCDS 73 IMS Resource 28 shell 149 IMS Shared Msg 28 ShopzSeries 181 IMS VSO 28 single point of failure 15, 33, 37, 60, 65–66, 72, 75, 85, ISGLOCK 132 94, 97, 110, 140, 157, 175, 236, 239, 248, 250 JES2 28 SIZE 112 list 110–111, 119, 166 SLIP 195, 200, 204 lock 25, 29, 36, 129–130, 164 SMCS 80, 94, 187 log 118 SMDUPLEX 89 Logger 112 SMF 62, 71, 165, 176, 195 LOGR 117–118 SMFPRMxx 195 LOGREC 28 SMP 179 MQSeries 28 SMP/E 179, 182 OPERLOG 28 SMPE 180 RACF 28, 160, 162 SMS 74, 98, 112, 165, 174, 196, 202 RRS Logger 28 SMS-managed 70

Index 271 SCA 29 TSO 163, 167 System Logger 111 TSO/E 149, 197 system-managed 144 Turbo 26 VSAM/RLS Cache 28 VSAM/RLS Lock 28 VTAM GR 28 U VTAM MNPS 28 UADS 163 WLM 49 UNDESIG 94 WLM Enclaves 28 UNDO CBU 20 WLM LPAR Clust. 28 UNIX 149, 153–154, 156–157, 198 XCF 38, 87, 94, 120, 132 Unix 16, 198 XCF Signalling 28 UNIX95 148 Structure Duplexing 33 UNMOUNT 152 Structure rebuild 31 unplanned outages 8 structures 25–27, 30, 34, 67, 85 UPS 12–13, 16, 39 subchannel 21 user catalog 71, 173 SUBPARM 196 user-managed rebuild 31 SUF 181 User-managed structure duplexing 33, 87 SVC dump 96 USS 156, 198 SYNCHDEST 80, 84 synchronous 56 V SYS1.LINKLIB 190 VIO 192 SYS1.LPALIB 191 VLF 173, 178 SYS1.PARMLIB 72 VM 48 SYS1.PROCLIB 72 VM/ESA 20 SYSABEND 200 VPN 163 SYSBPX 151 VSAM 28, 89, 98, 110, 154, 164, 168 SYSCONS 83, 95 VSAM Record Level Sharing SYSLOG 78, 83, 205, 239 refer to RLS 164 SYSNAME 82, 152 VSAM/RLS 73 sysplex 17, 21–22 VTAM 28, 80 Sysplex Failure Management VTS 60, 62 refer to SFM 67 VVDS 169, 171 Sysplex Failure Manager 17 VVRs 169 Sysplex Timer 12, 14–15, 41–42, 44, 53, 80 sysres 75 System Automation 39 W System Automation for OS/390 53 WADS 74 refer to SA/390 69 WEIGHT 104 System Data Mover 57–58 weight 22–23, 25, 35, 47 System Isolation Function 17 WLM 28, 45, 48, 50, 106, 148, 197 System Logger 39, 73, 97 WLM structure 49 System Managed CF Structure Duplexing 25, 30, 33, 86, Workload Manager 45 112 WTO 95, 97, 99 System Managed Structure Duplexing 88 WTOR 145 System-managed CF structure duplexing 33 System-managed rebuild 32 Systems Assurance Product Review 24 X XCF 28, 30, 38, 49, 66, 69, 87, 89–90, 92, 94, 96–97, systems management techniques 8 102, 105, 118, 125, 144, 150, 159, 198, 230 SYSZRACF 159–160 XCF structure 94 SYSZVOLS 139 XCFLOCAL 49 XCF-local 162 T XES 100, 111 Tape configuration Data Base 61 XISOLATE 70–71 TCDB 62 XRC 56–57, 68, 175 TCP/IP 64, 148–149, 180, 182, 230 testing 9 time stamp 44, 58 Z z/900 48, 88 TKRG 51 z/OS 14, 16–17, 20, 22, 24–26, 28, 30, 43–44, 49, 58, TOD 41–42

272 Achieving the Highest Levels of Parallel Sysplex Availability 66, 86 z/OS 1.1 48, 97, 154 z/OS 1.2 33, 39, 88, 97, 112, 138, 147, 154–156, 206 z/OS 1.3 74, 166, 197 z/OS 1.4 67, 76–77, 80–81, 145, 152, 157, 164 z/OS 1.5 73, 76–77, 80 z/OS 1.7 81 z/OS Health Checker 76 z/VM 19–20, 24 z800 13, 54 z890 54 z900 16, 26, 54 z900 Turbo 44 z990 13, 19–21, 23, 30, 37, 39, 54, 88 zAAP 19 zFS 154–157 zSeries 4, 16, 18–19, 39, 45, 48, 54, 58, 79 zSeries 900 49, 88

Index 273

274 Achieving the Highest Levels of Parallel Sysplex Availability

Achieving the Highest Levels of Parallel Sysplex Availability

Back cover ® Achieving the Highest

Levels of Parallel Sysplex Availability

The latest availability This IBM Redbook provides an example of the "ideal" Parallel Sysplex INTERNATIONAL recommendations environment (one that is configured to deliver the highest levels of application availability), and describes the features and functions used TECHNICAL in this environment. For each function, we describe briefly what it does SUPPORT Covers hardware, and what its benefit is, and refer to the appropriate implementation ORGANIZATION systems documentation. management, and software In this document, we discuss how to configure hardware and software, and how to manage systems processes for maximum availability in a One-stop shopping for Parallel Sysplex environment. We discuss the basic concepts of BUILDING TECHNICAL continuous availability and describe a structured approach to availability INFORMATION BASED ON developing and implementing a continuous availability solution. PRACTICAL EXPERIENCE information This document provides a list of items to consider for trying to achieve IBM Redbooks are developed near-continuous application availability and should be used as a guide by the IBM International when creating a high-availability Parallel Sysplex. Technical Support Organization. Experts from Information is provided in recommendations lists and will help you IBM, Customers and Partners configure and manage your IT environment to meet your availability from around the world create requirements. timely technical information based on realistic scenarios. Specific recommendations This publication is intended to help customers’ systems and operations are provided to help you personnel and IBM systems engineers to plan, implement, and use a implement IT solutions more Parallel Sysplex in order to get closer to a goal of continuous availability. effectively in your It is not intended to be a guide to implementing or using Parallel Sysplex environment. as such. It only covers topics related to continuous availability.

For more information: ibm.com/redbooks

SG24-6061-00 ISBN 0738491721