High Availability and Scalability with System z and z/OS

Joachim von Buttlar, Robert Vaupel IBM Deutschland Research & Development GmbH

© 2010 IBM Corporation Who is Who?

 Joachim von Buttlar – System z Firmware Development – Joachim_von_buttlar@de..com

 Robert Vaupel – z/OS Workload Management Development and Design – IBM Senior Technical Staff Member – [email protected]

2 © 2010 IBM Corporation WS 2010/2011: Structure and Content

 CPU Architecture  z/OS – Register sets – Memory organization – Address space concept – Virtual storage – Task execution and serialization – Interrupt mechanism – Program communication and data exchange – Timing facilities – Data formats, data sets and I/O flow – Instruction set – Multiprocessing facilities – z/OS subsystems: TSO, ISPF, JES  I/O Architecture  z/OS Dispatching and Hiperdispatch – I/O infrastructure – Adapter types & channels  z/OS Workload Management – Control unit & devices – Extensions for large configurations  Parallel Sysplex  Partitioning and virtualization – Cluster concepts – LPAR versus z/VM – Parallel Sysplex structure and exploitation – Differences and commonalities – Data Mirroring and Global Dispersed Parallel Sysplex – Hardware facilities – Data mirroring – Storage management – Processor management – I/O management  Middleware Integration and Software Architecture

Date (always Fridays) 11:30-13:00 14:00-15:30 22.10.2010 Introduction and Orientation System z Architecture 5.11.2010 System z Architecture System z Architecture 19.11.2010 System z Architecture System z Architecture 3.12.2010 z/OS Introduction z/OS Introduction 17.12.2010 z/OS Dispatching and Virtualization z/OS Dispatching and Virtualization 14.1.2011 z/OS Parallel Sysplex z/OS Workload Management 28.1.2011 z/OS Workload Management z/OS and System z Software Architecture 11.2.2011 Wrap-Up and Closing

3 © 2010 IBM Corporation What is System z?

IBM z Enterprise z196

IBM System z10 EC

System /360

AA System System z z server server is is what what businessesbusinesses use use to to host host the the largest largest commercialcommercial databases, databases, transaction transaction servers,servers, and and applications applications that that requirerequire a a greater greater degree degree of of security security andand availability availability than than is is commonly commonly foundfound on on smaller-scale smaller-scale machines. machines.

4 © 2010 IBM Corporation System z Architecture

 S/360 architecture is based on von S/360 = 360° Neumann‘s computing model:  One hardware architecture  One operating system

 For all IBM computers

360°

315° 45°  S/360 architecture got invented and documented in the S/360 Principles of Operation in 1964 by: 270° 90° – Gene Amdahl – Fred Brooks – Garry Blaauw 225° 135°

http://publibz.boulder.ibm.com/epubs/pdf/dz9zr007.pdf 180°

5 © 2010 IBM Corporation System z and z/OS History

MVT SVS MVS/370 MVS/XA MVS/ESA OS/390 z/OS MFT

Expanded Parallel Java Storage Sysplex Websphere Virtual I/O Workload IEEE Float Fast Management 64 bit Fixed Storage One Address Spaces 2 GB Program Unix System IRD 15 Partitions 16MB Virtual Load Services Hiperdispatch VS Multiple Virtual Storage Dynamic I/O TCP/IP Offload or Tasks Area Storage Posix ... Security Cluster GDPS ...

1960 1965 1970 1975 1980 1985 1990 1995 2000 2005

Expanded CMOS 64bit Storage Access Technology Addressing 2GB Addressing Registers Parallel LPAR Data-spaces Sysplex Symmetric Multi Processing 7. April 1964 Virtual Memory Introduction of S/360 Architecture S/360 S/370 S/390 z Architecture

6 © 2010 IBM Corporation Mainframe Computing

Mainframes are computers which – Execute hundreds of applications – Connect to thousands of I/O devices – And serve thousands of users simultaneously

Mainframes can best be defined by their characteristics – The most important characteristic is to ensure a reliable and predictable execution of transactions – The importance of mainframes is for data base transaction processing and as the backend in data centers

7 © 2010 IBM Corporation Economical Importance: Why System z and z/OS

 All companies which have the need to store huge amounts of data require – Security – Scalability – Compatibility – Availability – Reliability – Serviceability

 95% of the 2000 world-wide biggest companies use System z computers  Around 65-70% of all relevant data are stored on System z computers  60% of all data being access thru the world wide web are stored in databases on System z (DB2, VSAM, and IMS)

8 © 2010 IBM Corporation High Availability and Scalability

 System z Hardware Overview and Introduction  System z Usage in Customer Environments  RAS capabilities  What does High Availability and Scalability mean?

9 © 2010 IBM Corporation Elements of System z Architecture

 Central Processor Units – Up to 64 PUs  Main Memory – Byte-wise addressable – 64-bit addressability – 'shared' between all CPU‘s

 I/O Subsystem – ‘old': parallel (copper), 4.5 MB/sec – 1990: serial (fiber), 17 MB/sec – 1999: FiCON (fiber), 270 MB/sec

 ESCON & FiCON 'Director' – Switch  Control Units (CU) – Managing unit  Devices – Hard disk, tape, printer, etc.  Network (GbE, ...)

10 © 2010 IBM Corporation Heart of System z Architecture: MCM

 MCM = Multi Chip Module – Processor Units (PU), Storage Controller (SC), SEEPROM (S) and clock functions – Integration increases with each generation, example: • z9: 8 PUs per MCM with up to 2 cores • z10: 5 PUs per MCM with up to 4 cores • z196: 6 Pus per MCM with up to 4 cores  A single MCM can provide 24 processors on a z196 but a z196 can have up to 96 processors (80 usable for workloads)

11 © 2010 IBM Corporation z196 PU chip, SC chip and MCM

12 © 2010 IBM Corporation z196 Book Layout MCM @ 1800W 8 I/O FAN OUT Backup Air Plenum 16X DIMMs Refrigeration Cooled or 2 FSP 100mm High Water Cooled

M MC Fanout Memory Front Rear Cards

Memory DCA Power Supplies

3x DCA 11 VTM Card Assemblies 14X DIMMs 8 Vertical 100mm High 3 Horizontal Cooling from/to MRU 13 © 2010 IBM Corporation z196 Water cooled Under the covers (Model M66 or M80) front view

14 © 2010 IBM Corporation z196 Frames

 On z196: Traditional System z Operating Systems: z/OS, , zVSE, zVM  On z196 Blade Extensions: Power 7 Blades, System x Blades  Integration via Unified Resource Manager

15 © 2010 IBM Corporation Growth of System z Servers

 Growth encompasses – Speed: from z900 (770MHz) to z196 (5.2 GHz) – Integration of processors and chips on same MCM – Number of MCMs per system – And now with z196 • Integration of Blade Server

z/OS release used for LSPR measurements z196 measurements are for a xx-way PCI - (Processor Capacity Index

16 © 2010 IBM Corporation A typical System z could look like this

L D J LL L L L L L V z/ C I S ii i i i Ci Ci z/ z/ i L S O C B I M A B a n n n M M V V n n n n n n n E S I a C S P uu u u u Su Su i S S u C t 2 v S x x x n E E x S c x x x x x x h a u x z/VM z/VSE z/OS z/VMz/VM V4

LPAR LPAR LPARLPAR LPAR LPAR

CP1 CP2 CP3 CP4 zIIP zAAP IFL1 IFL2 IFL3

Standard Processors Offload Engines Linux Engines System z Enterprise Server

17 © 2010 IBM Corporation System z Processor Characterization

 Central Processor ( CP ) – Provides processing capacity for z/Architecture and ESA/390 instruction sets – Runs z/OS, z/VM, z/VSE, z/TPF, Linux for System z  System Assist Processor ( SAP ) – SAPs manage the start and ending of I/O operations for all LPARs and all attached I/O – Each machine has at least one SAP  Internal ( ICF , since 1997) – Provides additional processing capacity for the execution of the Coupling Facility Control Code (CFCC) in a CF LPAR  Integrated Facility for Linux ( IFL , since 2001) – Provides additional processing capacity for Linux workloads  IBM System z Application Assist Processors ( zAAP , since 2004) – Provides additional processing capacity for Java workloads under z/OS  IBM System z Information Integration Processors ( zIIP , since 2006) – Provides additional processing capacity for certain DB2 workloads under z/OS  Spares – Provides extra processing capacity in case of any failure of any PU

 SAP, ICF, IFL, zAAP, zIIP offer the same functionality as CPs  Lower price than CP  Do not affect traditional System z software charges

18 © 2010 IBM Corporation Why is System z different?

 Many different types of workloads  Business Critical workloads  Running systems at very high utilizations  Access to systems is always required

19 © 2010 IBM Corporation System z Quality of Services

 RAS – Reliability – Availability – Serviceability  Security / Integrity  Scalability  Manageability – Centralized control – Workload management  Virtualization / Partitioning Technology – Workload separation  Capacity – Evolving architecture  Flexibility / Variety – Multiple workloads, multiple users  Compatibility  Capability – Autonomic features

20 © 2010 IBM Corporation System z: RAS Design Focus

 High Availability (HA) – The attribute of a system designed to provide service during defined periods, at acceptable or agreed upon levels and masks UNPLANNED OUTAGES from end-users . It employs fault tolerance, automated failure detection, recovery, bypass reconfiguration, testing, problem and change management .  Continuous Operations (CO) – Attribute of a system designed to continuously operate and mask PLANNED OUTAGES from end-users. It employs non-disruptive hardware and software changes, non-disruptive configuration, software coexistence.  Continuous Availability (CA) – Attribute of a system designed to deliver non-disruptive service to the end user 7 days a week, 24 HOURS A DAY (there are no planned or unplanned outages). It includes the ability to recover from a site disaster by switching computing to a second site.

High Continuous Continuous Availability Availability Operations

21 © 2010 IBM Corporation Business Issue of “Non-Availability“

 On demand challenges – Downtime unaffordable – Heterogeneous by nature – Complex to manage

 Loss of business  Loss of customers – the Unplanned Outage Causes

Application Failures competition is just a mouse click Hardw are 30% 45% away Failures 25%  Loss of credibility, brand image

IDC 2005 and stock value Operator Errors

E.g. “Toll Collect”: The state of Germany and the company collecting toll on the autobahn agreed on a contractual penalty of €30 Million for each 1 hour of down time (represents €500.000 / min).

229/30/2010 Template Documentation © 2010 IBM Corporation Continuous Availability / Disaster Recovery

Single System Clustering in a Box Parallel Sysplex Geographical Dispersed PS

12 12 11 1 1 11 10 2 2 10 9 3 3 9 12 4 11 1 8 4 8 7 5 5 7 10 2 6 6 9 3 8 4 7 5 6

1 to 32 Systems Site 1 Site 2 • MTBF – in decades • Using an ICF, a • Addresses planned and • Addresses site failure / single CEC (Central unplanned HW/SW outages maintenance • Built-In redundancy Electronic Complex) • Flexible, non-disruptive • Metro / Global data mirroring • On/Off Capacity on Parallel Sysplex can growth • Sync (PPRC) – 100 km Demand be defined • Maintenance on • Capacity beyond largest • Async (XRC) – any distance • Capacity Backup CEC LPAR without loss of • Eliminates tape / disk Single • Hot pluggable I/O data • Scales better than SMPs Point of Failure (SPOF) • Protection from • Dynamic workload / • No / Some data loss software outages resource management • Application independent

23 © 2010 IBM Corporation Scalability

600

500

400

300

7xx 200 6xx 5xx 100 4xx 0 1-way 2-way 3-way 4-way 5-way 6-way 7-way 8-way

4xx 5xx 6xx 7xx Model S08

 Scale-up Example for System z9  Allows installations to choose the capacity they need in a granular fashion and to grow when business needs require it

24 © 2010 IBM Corporation What does the Course Encompass?

 How High Availability and Scalability is implemented on System z

– System z Technology and Hardware – Operating System (z/OS) and Partitioning Technology • Focus: Dispatching – Cluster Technology to achieve Continuous Availability • Parallel Sysplex – Capability to execute many different workloads at the same time and meet business objectives • Workload Management – Integration of Software, Operating System and Hardware

25 © 2010 IBM Corporation What to Remember?

 What technology steps have been invented to reach high availability and scalability  Why a technology is exploited in System z  On a high level – How software, operating system and hardware work together – And why do they work together

26 © 2010 IBM Corporation 27 © 2010 IBM Corporation Literature

 Introduction to the New Mainframe: Large-Scale Commercial Computing – http://www.redbooks.ibm.com/abstracts/sg247175.html?Open  ABCs of z/OS System Programming Volume 11, – http://www.redbooks.ibm.com/abstracts/sg246327.html  Documents for Workload Management – http://www-03.ibm.com/servers/eserver/zseries/zos/wlm/documents/ • z/OS Workload Manager: How It Works and How To Use It, April 2004 – http://www.research.ibm.com/journal/sj/362/aman.html • Adaptive algorithms for managing a distributed data processing workload  Das Betriebssystem z/OS und zSeries, M.Teuffel, R.Vaupel, ISBN 3-486-27528-3

28 © 2010 IBM Corporation Trademarks The following are trademarks of the International Business Machines Corporation in the United States and/or other countries. APPN* HiperSockets OS/390* VM/ESA* CICS* HyperSwap Parallel Sysplex* VSE/ESA DB2* IBM* PR/SM VTAM* DB2 Connect IBM eServer Processor Resource/Systems Manager WebSphere* DirMaint IBM e(logo)server* RACF* z/Architecture e-business logo* IBM logo* Resource Link z/OS* ECKD IMS RMF z/VM* Enterprise Storage Server* Language Environment* S/390* z/VSE ESCON* MQSeries* Sysplex Timer* zSeries* FICON* Multiprise* System z9 GDPS* NetView* TotalStorage* Geographically Dispersed Parallel Sysplex On demand business logo Virtualization Engine * Registered trademarks of IBM Corporation

The following are trademarks or registered trademarks of other companies. Java and all Java-related trademarks and logos are trademarks of Sun Microsystems, Inc., in the United States and other countries Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. Microsoft, Windows and Windows NT are registered trademarks of Microsoft Corporation. Red Hat, the Red Hat "Shadow Man" logo, and all Red Hat-based trademarks and logos are trademarks or registered trademarks of Red Hat, Inc., in the United States and other countries. SET and Secure Electronic Transaction are trademarks owned by SET Secure Electronic Transaction LLC. * All other products may be trademarks or registered trademarks of their respective companies. Notes : Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here. IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply. All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions. This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to change without notice. Consult your local IBM business contact for information on the product or services available in your area. All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography.

29 © 2010 IBM Corporation