Parallel Sysplex Availability Checklist

Total Page:16

File Type:pdf, Size:1020Kb

Parallel Sysplex Availability Checklist Parallel Sysplex Availability Checklist June 2020 Authors: David Raften Gene Sale 1 Contents Table of Contents Introduction .................................................................................................................................................. 5 Establishing availability objectives ................................................................................................................ 6 Designing a high availability Parallel Sysplex ................................................................................................ 8 Maximizing computer environment availability ......................................................................................... 10 Power ...................................................................................................................................................... 10 Cooling .................................................................................................................................................... 10 Building/Computer room design ............................................................................................................ 11 Configuring hardware for availability ......................................................................................................... 11 Availability features ................................................................................................................................ 11 Configuring for availability ...................................................................................................................... 12 Servers..................................................................................................................................................... 12 Coupling Facilities ................................................................................................................................... 13 DASD ....................................................................................................................................................... 15 Availability features ................................................................................................................................ 15 Configuring Disk for Availability .................................................................................................................. 16 Other critical devices .................................................................................................................................. 17 Server Time Protocol and Precision Time Protocol .................................................................................... 17 Server Time Protocol (STP) ..................................................................................................................... 17 IEEE 1588 Precision Time Protocol.......................................................................................................... 19 Consoles .................................................................................................................................................. 20 Maximizing software availability ................................................................................................................ 24 Configure software for high availability .................................................................................................. 24 Sysplex data sets ..................................................................................................................................... 24 Other important data sets ...................................................................................................................... 26 Sysres and Master Catalog sharing ......................................................................................................... 26 Consoles .................................................................................................................................................. 27 Naming conventions ............................................................................................................................... 27 Software symmetry ................................................................................................................................. 27 Sysplex Policies........................................................................................................................................ 28 Health Checker for z/OS and Sysplex ...................................................................................................... 29 Sysplex Failure Management (SFM)........................................................................................................ 29 2 Automatic Restart Manager (ARM) ........................................................................................................ 30 System Logger (LOGR) ............................................................................................................................. 30 Configuring individual components ............................................................................................................ 31 XCF Signaling ........................................................................................................................................... 31 XCF Controls ............................................................................................................................................ 32 GRS .......................................................................................................................................................... 32 JES2 ......................................................................................................................................................... 32 WLM ........................................................................................................................................................ 34 RACF ........................................................................................................................................................ 37 Db2 .......................................................................................................................................................... 37 Backup and Reorg Processes .............................................................................................................. 40 Application Design/Coding Techniques .................................................................................................. 42 CICS and CICSPlex SM (CP/SM) ............................................................................................................... 45 WebSphere MQ ...................................................................................................................................... 46 IMS .......................................................................................................................................................... 47 Communications Server for z/OS ................................................................................................................ 48 SNA .......................................................................................................................................................... 48 TCP/IP ...................................................................................................................................................... 49 Network .................................................................................................................................................. 50 Use of automation ...................................................................................................................................... 51 Automation tools .................................................................................................................................... 51 Message volume reduction ..................................................................................................................... 52 Managing complexity .............................................................................................................................. 53 Automation performance ....................................................................................................................... 53 Sysplex-wide automation ........................................................................................................................ 54 Startup and shutdown ............................................................................................................................ 54 Effective systems management processes ................................................................................................. 55 Availability Management ........................................................................................................................ 55 Change Management .............................................................................................................................. 55 Problem Management ............................................................................................................................ 57 Recovery Management ........................................................................................................................... 57 Crisis Management ................................................................................................................................
Recommended publications
  • MVS Planning: Workload Management
    z/OS Version 2 Release 3 MVS Planning: Workload Management IBM SC34-2662-30 Note Before using this information and the product it supports, read the information in “Notices” on page 235. This edition applies to Version 2 Release 3 of z/OS (5650-ZOS) and to all subsequent releases and modifications until otherwise indicated in new editions. Last updated: 2019-06-24 © Copyright International Business Machines Corporation 1994, 2019. US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents Figures................................................................................................................. xi Tables..................................................................................................................xv About this information....................................................................................... xvii Who should use this information..............................................................................................................xvii Where to find more information............................................................................................................... xvii Other referenced documents.................................................................................................................. xviii Information updates on the web...............................................................................................................xix How to send your comments to IBM.....................................................................xxi
    [Show full text]
  • MAIB Annual Report 2020
    2020 Marine Accident Recommendations and Statistics ANNUAL REPORT ANNUAL H NC RA N B IO GAT TI S INVE T DEN ACCI This document is posted on our website: www.gov.uk/maib Marine Accident Investigation Branch Email: [email protected] RINE First Floor, Spring Place, 105 Commercial Road Telephone: 023 8039 5500 MA Southampton, SO15 1GH United Kingdom 9 June 2021 MARINE ACCIDENT INVESTIGATION BRANCH The Marine Accident Investigation Branch (MAIB) examines and investigates all types of marine accidents to or on board UK vessels worldwide, and other vessels in UK territorial waters. Located in ofices in Southampton, the MAIB is a separate, independent branch within the Department for Transport (DfT). The head of the MAIB, the Chief Inspector of Marine Accidents, reports directly to the Secretary of State for Transport. © Crown copyright 2021 This publication, excluding any logos, may be reproduced free of charge in any format or medium for research, private study or for internal circulation within an organisation. This is subject to it being reproduced accurately and not used in a misleading context. The material must be acknowledged as Crown copyright and the title of the publication specified. Details of third party copyright material may not be listed in this publication. Details should be sought in the corresponding accident investigation report or [email protected] contacted for permission to use. CONTENTS CHIEF INSPECTOR'S STATEMENT 1 PART 1 - 2020: CASUALTY REPORTS TO MAIB 5 Statistical overview 5 Summary of investigations started
    [Show full text]
  • Critical Systems
    Critical Systems ©Ian Sommerville 2004 Software Engineering, 7th edition. Chapter 3 Slide 1 Objectives ● To explain what is meant by a critical system where system failure can have severe human or economic consequence. ● To explain four dimensions of dependability - availability, reliability, safety and security. ● To explain that, to achieve dependability, you need to avoid mistakes, detect and remove errors and limit damage caused by failure. ©Ian Sommerville 2004 Software Engineering, 7th edition. Chapter 3 Slide 2 Topics covered ● A simple safety-critical system ● System dependability ● Availability and reliability ● Safety ● Security ©Ian Sommerville 2004 Software Engineering, 7th edition. Chapter 3 Slide 3 Critical Systems ● Safety-critical systems • Failure results in loss of life, injury or damage to the environment; • Chemical plant protection system; ● Mission-critical systems • Failure results in failure of some goal-directed activity; • Spacecraft navigation system; ● Business-critical systems • Failure results in high economic losses; • Customer accounting system in a bank; ©Ian Sommerville 2004 Software Engineering, 7th edition. Chapter 3 Slide 4 System dependability ● For critical systems, it is usually the case that the most important system property is the dependability of the system. ● The dependability of a system reflects the user’s degree of trust in that system. It reflects the extent of the user’s confidence that it will operate as users expect and that it will not ‘fail’ in normal use. ● Usefulness and trustworthiness are not the same thing. A system does not have to be trusted to be useful. ©Ian Sommerville 2004 Software Engineering, 7th edition. Chapter 3 Slide 5 Importance of dependability ● Systems that are not dependable and are unreliable, unsafe or insecure may be rejected by their users.
    [Show full text]
  • Ask Magazine | 1
    National Aeronautics and Space Administration Academy Sharing Knowledge The NASA Source for Project Management and Engineering Excellence | APPEL SUMMER | 2013 Inside THE KNOWLEDGE MANAGEMENT JOURNEY KSC SWAMP WORKS PREDICTABLE PROJECT SURPRISES gnarta SuhsoJn amrir AoineS, ecroFr i. AS.: Utidero CtohP Utidero AS.: i. ecroFr AoineS, amrir SuhsoJn gnarta ON THE COVER Above Bear Lake, Alaska, the Northern Lights, or aurora borealis, are created by solar radiation entering the atmosphere at the magnetic poles. The appearance of these lights is just one way solar radiation affects us; it can also interfere with NASA missions in low-Earth orbit. To achieve long-duration human spaceflight missions in deeper space, several NASA centers are working to find better safety measures and solutions to protect humans from space radiation. ASK MAGAZINE | 1 Contents 21 5 9 DEPARTMENTS INSIGHTS STORIES 3 9 5 In This Issue Predictable Surprises: Back to the Future: BY DON COHEN Bridging Risk-Perception Gaps KSC Swamp Works BY PEDRO C. RIBEIRO Concerns about BY KERRY ELLIS Kennedy’s new Swamp project-threatening risks are often ignored Works encourages innovation through 4 or unspoken. experimentation and knowledge sharing. From the NASA CKO Our Knowledge Legacy BY ED HOFFMAN 21 13 The Knowledge Management Journey University Capstone Projects: BY EDWARD W. ROGERS Goddard’s CKO Small Investments, Big Rewards 42 reflects on what ten years on the job have BY LAURIE STAUBER Glenn’s partnership The Knowledge Notebook taught him. with universities generates first-rate Big Data—The Latest Organizational space-medicine research. Idea-Movement BY LAURENCE PRUSAK 25 Creating NASA’s Knowledge Map 17 BY MATTHEW KOHUT AND HALEY STEPHENSON One Person’s Mentoring Experience 44 A new online tool helps show who knows BY NATALIE HENRICH The author explains ASK Interactive what at NASA.
    [Show full text]
  • |||GET||| Group Interaction in High Risk Environments 1St Edition
    GROUP INTERACTION IN HIGH RISK ENVIRONMENTS 1ST EDITION DOWNLOAD FREE Rainer Dietrich | 9781351932097 | | | | | Looking for other ways to read this? Computational methods such as resampling and bootstrapping are also used when data are insufficient for direct statistical methods. There is little point in making highly precise computer calculations on numbers that cannot be estimated accurately Group Interaction in High Risk Environments 1st edition. Categories : Underwater diving safety. Thus, the surface-supplied diver is much less likely to have an "out-of-air" emergency than a scuba diver as there are normally two alternative air sources available. Methods for dealing with Group Interaction in High Risk Environments 1st edition risks include Provision for adequate contingencies safety factors for budget and schedule contingencies are discussed in Chapter 6. This genetic substudy included Two Sister Study participants and their parents, and some Sister Study participants who developed incident breast cancer invasive or DCIS before age 50 during the follow-up period. In terms of generalizability, our sample is predominantly non-Hispanic white women, and the results may not apply to other racial or ethnic groups. This third parameter may give some management support by establishing early warning indicators for specific serious risks, which might not otherwise have been established. ROVs may be used together with divers, or without a diver in the water, in which case the risk to the diver associated with the dive is eliminated altogether. Suitable equipment can be selected, personnel can be trained in its use and support provided to manage the foreseeable contingencies. However, taken together, there is the possibility that many of the estimates of these factors would prove to be too optimistic, leading.
    [Show full text]
  • Ten Strategies of a World-Class Cybersecurity Operations Center Conveys MITRE’S Expertise on Accumulated Expertise on Enterprise-Grade Computer Network Defense
    Bleed rule--remove from file Bleed rule--remove from file MITRE’s accumulated Ten Strategies of a World-Class Cybersecurity Operations Center conveys MITRE’s expertise on accumulated expertise on enterprise-grade computer network defense. It covers ten key qualities enterprise- grade of leading Cybersecurity Operations Centers (CSOCs), ranging from their structure and organization, computer MITRE network to processes that best enable effective and efficient operations, to approaches that extract maximum defense Ten Strategies of a World-Class value from CSOC technology investments. This book offers perspective and context for key decision Cybersecurity Operations Center points in structuring a CSOC and shows how to: • Find the right size and structure for the CSOC team Cybersecurity Operations Center a World-Class of Strategies Ten The MITRE Corporation is • Achieve effective placement within a larger organization that a not-for-profit organization enables CSOC operations that operates federally funded • Attract, retain, and grow the right staff and skills research and development • Prepare the CSOC team, technologies, and processes for agile, centers (FFRDCs). FFRDCs threat-based response are unique organizations that • Architect for large-scale data collection and analysis with a assist the U.S. government with limited budget scientific research and analysis, • Prioritize sensor placement and data feed choices across development and acquisition, enteprise systems, enclaves, networks, and perimeters and systems engineering and integration. We’re proud to have If you manage, work in, or are standing up a CSOC, this book is for you. served the public interest for It is also available on MITRE’s website, www.mitre.org. more than 50 years.
    [Show full text]
  • Z/OS Intelligent Resource Director
    Front cover z/OS Intelligent Resource Director WLM LPAR CPU Management Dynamic Channel-path Management Channel Subsystem I/O Priority Queueing Frank Kyne Michael Ferguson Tom Russell Alvaro Salla Ken Trowell ibm.com/redbooks International Technical Support Organization z/OS Intelligent Resource Director August 2001 SG24-5952-00 Take Note! Before using this information and the product it supports, be sure to read the general information in “Special notices” on page 401. First Edition (August 2001) This edition applies to Version 1 Release 1 of z/OS, Program Number 5694-A01. Note: This book is based on a pre-GA version of a product and may not apply when the product becomes generally available. We recommend that you consult the product documentation or follow-on versions of this redbook for more current information. Comments may be addressed to: IBM Corporation, International Technical Support Organization Dept. HYJ Mail Station P099 2455 South Road Poughkeepsie, NY 12601-5400 When you send information to IBM, you grant IBM a non-exclusive right to use or distribute the information in any way it believes appropriate without incurring any obligation to you. © Copyright International Business Machines Corporation 2001. All rights reserved. Note to U.S Government Users – Documentation related to restricted rights – Use, duplication or disclosure is subject to restrictions set forth in GSA ADP Schedule Contract with IBM Corp. Contents Preface . .ix The team that wrote this redbook. .ix Special notice . .xi IBM trademarks . xii Comments welcome. xii Part 1. Introduction to Intelligent Resource Director . 1 Chapter 1. Introduction to Intelligent Resource Director (IRD).
    [Show full text]
  • Zero Robotics Tournaments
    Collaborative Competition for Crowdsourcing Spaceflight Software and STEM Education using SPHERES Zero Robotics by Sreeja Nag B.S. Exploration Geophysics, Indian Institute of Technology, Kharagpur, 2009 M.S. Exploration Geophysics, Indian Institute of Technology, Kharagpur, 2009 Submitted to the Department of Aeronautics and Astronautics and the Engineering Systems Division in Partial Fulfillment of the Requirements for the Degrees of Master of Science in Aeronautics and Astronautics and Master of Science in Technology and Policy at the Massachusetts Institute of Technology June 2012 © 2012 Massachusetts Institute of Technology. All rights reserved Signature of Author ____________________________________________________________________ Department of Aeronautics and Astronautics and Engineering Systems Division May 11, 2012 Certified by __________________________________________________________________________ Jeffrey A. Hoffman Professor of Practice in Aeronautics and Astronautics Thesis Supervisor Certified by __________________________________________________________________________ Olivier L. de Weck Associate Professor of Aeronautics and Astronautics and Engineering Systems Thesis Supervisor Accepted by __________________________________________________________________________ Eytan H. Modiano Professor of Aeronautics and Astronautics Chair, Graduate Program Committee Accepted by __________________________________________________________________________ Joel P. Clark Professor of Materials Systems and Engineering Systems Acting Director,
    [Show full text]
  • Service Inquiry Into the Fatal Diving Incident at the National Diving and Activity Centre, Newport on 26 March 2018
    Service Inquiry SERVICE INQUIRY INTO THE FATAL DIVING INCIDENT AT THE NATIONAL DIVING AND ACTIVITY CENTRE, NEWPORT ON 26 MARCH 2018. NSC/SI/01/18 NAVY COMMAND OFFICIAL - SENSITIVE PART 1.1 – COVERING NOTE. NSC/SI/01/18 20 Feb 19 FLEET COMMANDER SERVICE INQUIRY INVESTIGATION INTO THE FATAL DIVING INCIDENT AT THE NATIONAL DIVING AND ACTIVITY CENTRE, NEWPORT ON 26 MARCH 2018. 1. The Service Inquiry Panel assembled at Navy Safety Centre, HMS EXCELLENT on 26 Apr 18 for the purpose of investigating the death of 30122659 LCpl Partridge on 26 Mar 18 and to make recommendations in order to prevent recurrence. The Panel has concluded its inquiries and submits the finalised report for the Convening Authority’s consideration. 2. The following inquiry papers are enclosed: Part 1 REPORT Part 2 RECORD OF PROCEEDINGS Part 1.1 Covering Note and Glossary Part 2.1 Diary of Events Part 1.2 Convening Order and TORs Part 2.2 List of Witnesses Part 1.3 Narrative of Events Part 2.3 Witness Statements Part 1.4 Findings Part 2.4 List of Attendees Part 1.5 Recommendations Part 2.5 List of Exhibits Part 1.6 Convening Authority Comments Part 2.6 Exhibits Part 2.7 List of Annexes Part 2.8 Annexes President Lt Col Army President Diving SI Members Lt Cdr RN Warrant Officer Class 1 Technical Member Diving SME Member 1.1 - 1 NSC/SI/01/18 OFFICIAL - SENSITIVE © Crown Copyright OFFICIAL - SENSITIVE Part 1.1 – Glossary 1. Those technical elements listed below without an explanation here are explained in full when they are first used.
    [Show full text]
  • Evaluating Electrical Distribution Equipment to Determine Replacement Needs
    ASHE Monograph Evaluating Electrical Distribution Equipment to Determine Replacement Needs ASHE Monograph Evaluating Electrical Distribution Equipment to Determine Replacement Needs Krista McDonald Biason, PE © 2013 ASHE American Society for Healthcare Engineering (ASHE) of the American Hospital Association 155 North Wacker Drive, Suite 400 Chicago, IL 60606 312-422-3800 [email protected] www.ashe.org ASHE members can download this monograph from the ASHE website under the Resources tab. Paper copies can be purchased from www.ashestore.com. ASHE catalog #: 055579 About the Author Krista McDonald Biason, PE, is associate vice president of HGA Architects and Engineers in Minneapolis. She focuses on electrical distribution design, specializing in health care facilities, and has designed and implemented numerous electrical infrastructure upgrades. She is a member of Code-Making Panel 13 of the National Fire Protection Association (NFPA) Technical Committee on the National Electrical Code®. NFPA Disclaimer Although Krista Biason is a member of Code-Making Panel 13 of the NFPA Technical Committee on the National Electrical Code®, the views and opinions expressed in this paper are purely those of the author and shall not be considered the official position of NFPA or any of its technical committees. Nor shall it be considered to be, nor be relied upon as, a formal interpretation of any NFPA documents. Readers are encouraged to refer to the entire text of all referenced documents. NFPA members can obtain staff interpretations of NFPA standards at www.nfpa.org. ASHE Disclaimer This document was prepared on a volunteer basis as a contribution to ASHE and is provided by ASHE as a service to its members.
    [Show full text]
  • Design to Fail: Controlling How Machinery and Systems React When Failure Happens
    Designed to Fail Controlling How Machinery and Systems React When Failure Happens Author: Melissa Simpson, P.E., Project Engineer, Envista Forensics Although the phrase “failure is not an option” may be a common philosophy, the prudent engineer will consider the effects of failure and the preferred Highlights outcome in order to mitigate dangerous or undesirable results. Though the • Foundation for Responsible future cannot be predicted with certainty, the certainty of failure is assured. Engineering • Nature of Failures When it comes to the design of machinery or systems, there is a reasonable • Physical vs. Functional Failures expectation for engineers to render a design with careful consideration to • Single Point, Cascade, and operation in both ideal and less-than ideal circumstances. Following industry standards and best practices are implicit in responsible engineering. Widespread Failures • Probability and Possibility of A Foundation for Responsible Engineering Failures • Practical Design Applications There are many large organizations that serve as a foundation for responsible • Anticipating Failure engineering for various industries and applications. For example, the International Code Council (ICC) has developed extensive requirements relevant to the construction of buildings and building systems. In addition, the American Society for Testing and Materials (ASTM), American Petroleum Institute (API), and International Electrotechnical Commission (IEC) have established best practices for the design and construction of raw materials and components. Other organizations have a more unique audience, such as an association for fabricators, manufacturers, or operators, that have publications to address specific concerns regarding failure and safety of a given application relevant to their areas of expertise. Guidelines, recommendations, videos, or case studies may be available that highlight the known risks involved in the design and operation of specific types of machinery.
    [Show full text]
  • MOBILE INTERACTION with SAFETY CRITICAL SYSTEMS: a Feasibility Study
    M¨alardalenUniversity School of Innovation Design and Engineering V¨aster˚as,Sweden Thesis for the Degree of Master of Computer Science with specialization in Embedded Systems 30.0 credits MOBILE INTERACTION WITH SAFETY CRITICAL SYSTEMS: A feasibility study Erik Jonsson [email protected] Examiner: Johan Akerberg˚ M¨alardalenUniversity, V¨aster˚as,Sweden Supervisor: Adnan Causevic M¨alardalenUniversity, V¨aster˚as,Sweden Company supervisor: Markus Wallmyr, Maximatecc, Uppsala, Sweden November 16, 2015 M¨alardalenUniversity Master Thesis Abstract Embedded systems exists everywhere around us and the number of applications seems to be ever growing. They are found in electrical devices from coffee machines to aircrafts. The common denominator is that they are designed for the specific purpose of the application. Some of them are used in safety critical systems where it is crucial that they operate correct and as intended in order to avoid accidents that can harm humans or properties. Meanwhile, general purpose Commercial Off The Shelf (COTS) devices that can be found in the retail store, such as smartphones and tablets, has become a natural part of everyday life in the society. New applications are developed every day that improves everyday living, but numerous are also coupled to specific devices in order to control its functionality. Interaction between embedded systems and the flexible devices do however not come without issues. Security, safety and ethical aspects are some of the issues that should be considered. In this thesis, a case study was performed to investigate the feasibility of using mobile COTS products in interaction with safety critical systems with respect to functional safety.
    [Show full text]