High Availability and Scalability with System z and z/OS
Joachim von Buttlar, Robert Vaupel IBM Deutschland Research & Development GmbH
© 2010 IBM Corporation Who is Who?
Joachim von Buttlar – System z Firmware Development – Joachim_von_buttlar@de.ibm.com
Robert Vaupel – z/OS Workload Management Development and Design – IBM Senior Technical Staff Member – [email protected]
2 © 2010 IBM Corporation WS 2010/2011: Structure and Content
CPU Architecture z/OS – Register sets – Memory organization – Address space concept – Virtual storage – Task execution and serialization – Interrupt mechanism – Program communication and data exchange – Timing facilities – Data formats, data sets and I/O flow – Instruction set – Multiprocessing facilities – z/OS subsystems: TSO, ISPF, JES I/O Architecture z/OS Dispatching and Hiperdispatch – I/O infrastructure – Adapter types & channels z/OS Workload Management – Control unit & devices – Extensions for large configurations Parallel Sysplex Partitioning and virtualization – Cluster concepts – LPAR versus z/VM – Parallel Sysplex structure and exploitation – Differences and commonalities – Data Mirroring and Global Dispersed Parallel Sysplex – Hardware facilities – Data mirroring – Storage management – Processor management – I/O management Middleware Integration and Software Architecture
Date (always Fridays) 11:30-13:00 14:00-15:30 22.10.2010 Introduction and Orientation System z Architecture 5.11.2010 System z Architecture System z Architecture 19.11.2010 System z Architecture System z Architecture 3.12.2010 z/OS Introduction z/OS Introduction 17.12.2010 z/OS Dispatching and Virtualization z/OS Dispatching and Virtualization 14.1.2011 z/OS Parallel Sysplex z/OS Workload Management 28.1.2011 z/OS Workload Management z/OS and System z Software Architecture 11.2.2011 Wrap-Up and Closing
3 © 2010 IBM Corporation What is System z?
IBM z Enterprise z196
IBM System z10 EC
System /360
AA System System z z server server is is what what businessesbusinesses use use to to host host the the largest largest commercialcommercial databases, databases, transaction transaction servers,servers, and and applications applications that that requirerequire a a greater greater degree degree of of security security andand availability availability than than is is commonly commonly foundfound on on smaller-scale smaller-scale machines. machines.
4 © 2010 IBM Corporation System z Architecture
S/360 architecture is based on von S/360 = 360° Neumann‘s computing model: One hardware architecture One operating system
For all IBM computers
360°
315° 45° S/360 architecture got invented and documented in the S/360 Principles of Operation in 1964 by: 270° 90° – Gene Amdahl – Fred Brooks – Garry Blaauw 225° 135°
http://publibz.boulder.ibm.com/epubs/pdf/dz9zr007.pdf 180°
5 © 2010 IBM Corporation System z and z/OS History
MVT SVS MVS/370 MVS/XA MVS/ESA OS/390 z/OS MFT
Expanded Parallel Java Storage Sysplex Websphere Virtual I/O Workload IEEE Float Fast Management 64 bit Fixed Storage One Address Spaces 2 GB Program Unix System IRD 15 Partitions 16MB Virtual Load Services Hiperdispatch VS Multiple Virtual Storage Dynamic I/O TCP/IP Offload or Tasks Area Storage Posix ... Security Cluster GDPS ...
1960 1965 1970 1975 1980 1985 1990 1995 2000 2005
Expanded CMOS 64bit Storage Access Technology Addressing 2GB Addressing Registers Parallel LPAR Data-spaces Sysplex Symmetric Multi Processing 7. April 1964 Virtual Memory Introduction of S/360 Architecture S/360 S/370 S/390 z Architecture
6 © 2010 IBM Corporation Mainframe Computing
Mainframes are computers which – Execute hundreds of applications – Connect to thousands of I/O devices – And serve thousands of users simultaneously
Mainframes can best be defined by their characteristics – The most important characteristic is to ensure a reliable and predictable execution of transactions – The importance of mainframes is for data base transaction processing and as the backend in data centers
7 © 2010 IBM Corporation Economical Importance: Why System z and z/OS
All companies which have the need to store huge amounts of data require – Security – Scalability – Compatibility – Availability – Reliability – Serviceability
95% of the 2000 world-wide biggest companies use System z computers Around 65-70% of all relevant data are stored on System z computers 60% of all data being access thru the world wide web are stored in databases on System z (DB2, VSAM, and IMS)
8 © 2010 IBM Corporation High Availability and Scalability
System z Hardware Overview and Introduction System z Usage in Customer Environments RAS capabilities What does High Availability and Scalability mean?
9 © 2010 IBM Corporation Elements of System z Architecture
Central Processor Units – Up to 64 PUs Main Memory – Byte-wise addressable – 64-bit addressability – 'shared' between all CPU‘s
I/O Subsystem – ‘old': parallel (copper), 4.5 MB/sec – 1990: serial (fiber), 17 MB/sec – 1999: FiCON (fiber), 270 MB/sec
ESCON & FiCON 'Director' – Switch Control Units (CU) – Managing unit Devices – Hard disk, tape, printer, etc. Network (GbE, ...)
10 © 2010 IBM Corporation Heart of System z Architecture: MCM
MCM = Multi Chip Module – Processor Units (PU), Storage Controller (SC), SEEPROM (S) and clock functions – Integration increases with each generation, example: • z9: 8 PUs per MCM with up to 2 cores • z10: 5 PUs per MCM with up to 4 cores • z196: 6 Pus per MCM with up to 4 cores A single MCM can provide 24 processors on a z196 but a z196 can have up to 96 processors (80 usable for workloads)
11 © 2010 IBM Corporation z196 PU chip, SC chip and MCM
12 © 2010 IBM Corporation z196 Book Layout MCM @ 1800W 8 I/O FAN OUT Backup Air Plenum 16X DIMMs Refrigeration Cooled or 2 FSP 100mm High Water Cooled
M MC Fanout Memory Front Rear Cards
Memory DCA Power Supplies
3x DCA 11 VTM Card Assemblies 14X DIMMs 8 Vertical 100mm High 3 Horizontal Cooling from/to MRU 13 © 2010 IBM Corporation z196 Water cooled Under the covers (Model M66 or M80) front view
14 © 2010 IBM Corporation z196 Frames
On z196: Traditional System z Operating Systems: z/OS, Linux, zVSE, zVM On z196 Blade Extensions: Power 7 Blades, System x Blades Integration via Unified Resource Manager
15 © 2010 IBM Corporation Growth of System z Servers
Growth encompasses – Speed: from z900 (770MHz) to z196 (5.2 GHz) – Integration of processors and chips on same MCM – Number of MCMs per system – And now with z196 • Integration of Blade Server
z/OS release used for LSPR measurements z196 measurements are for a xx-way PCI - (Processor Capacity Index
16 © 2010 IBM Corporation A typical System z could look like this
L D J LL L L L L L V z/ C I S ii i i i Ci Ci z/ z/ i L S O C B I M A B a n n n M M V V n n n n n n n E S I a C S P uu u u u Su Su i S S u C t 2 v S x x x n E E x S c x x x x x x h a u x z/VM z/VSE z/OS z/VMz/VM V4
LPAR LPAR LPARLPAR LPAR LPAR
CP1 CP2 CP3 CP4 zIIP zAAP IFL1 IFL2 IFL3
Standard Processors Offload Engines Linux Engines System z Enterprise Server
17 © 2010 IBM Corporation System z Processor Characterization
Central Processor ( CP ) – Provides processing capacity for z/Architecture and ESA/390 instruction sets – Runs z/OS, z/VM, z/VSE, z/TPF, Linux for System z System Assist Processor ( SAP ) – SAPs manage the start and ending of I/O operations for all LPARs and all attached I/O – Each machine has at least one SAP Internal Coupling Facility ( ICF , since 1997) – Provides additional processing capacity for the execution of the Coupling Facility Control Code (CFCC) in a CF LPAR Integrated Facility for Linux ( IFL , since 2001) – Provides additional processing capacity for Linux workloads IBM System z Application Assist Processors ( zAAP , since 2004) – Provides additional processing capacity for Java workloads under z/OS IBM System z Information Integration Processors ( zIIP , since 2006) – Provides additional processing capacity for certain DB2 workloads under z/OS Spares – Provides extra processing capacity in case of any failure of any PU
SAP, ICF, IFL, zAAP, zIIP offer the same functionality as CPs Lower price than CP Do not affect traditional System z software charges
18 © 2010 IBM Corporation Why is System z different?
Many different types of workloads Business Critical workloads Running systems at very high utilizations Access to systems is always required
19 © 2010 IBM Corporation System z Quality of Services
RAS – Reliability – Availability – Serviceability Security / Integrity Scalability Manageability – Centralized control – Workload management Virtualization / Partitioning Technology – Workload separation Capacity – Evolving architecture Flexibility / Variety – Multiple workloads, multiple users Compatibility Capability – Autonomic features
20 © 2010 IBM Corporation System z: RAS Design Focus
High Availability (HA) – The attribute of a system designed to provide service during defined periods, at acceptable or agreed upon levels and masks UNPLANNED OUTAGES from end-users . It employs fault tolerance, automated failure detection, recovery, bypass reconfiguration, testing, problem and change management . Continuous Operations (CO) – Attribute of a system designed to continuously operate and mask PLANNED OUTAGES from end-users. It employs non-disruptive hardware and software changes, non-disruptive configuration, software coexistence. Continuous Availability (CA) – Attribute of a system designed to deliver non-disruptive service to the end user 7 days a week, 24 HOURS A DAY (there are no planned or unplanned outages). It includes the ability to recover from a site disaster by switching computing to a second site.
High Continuous Continuous Availability Availability Operations
21 © 2010 IBM Corporation Business Issue of “Non-Availability“
On demand challenges – Downtime unaffordable – Heterogeneous by nature – Complex to manage
Loss of business Loss of customers – the Unplanned Outage Causes
Application Failures competition is just a mouse click Hardw are 30% 45% away Failures 25% Loss of credibility, brand image
IDC 2005 and stock value Operator Errors
E.g. “Toll Collect”: The state of Germany and the company collecting toll on the autobahn agreed on a contractual penalty of €30 Million for each 1 hour of down time (represents €500.000 / min).
229/30/2010 Template Documentation © 2010 IBM Corporation Continuous Availability / Disaster Recovery
Single System Clustering in a Box Parallel Sysplex Geographical Dispersed PS
12 12 11 1 1 11 10 2 2 10 9 3 3 9 12 4 11 1 8 4 8 7 5 5 7 10 2 6 6 9 3 8 4 7 5 6
1 to 32 Systems Site 1 Site 2 • MTBF – in decades • Using an ICF, a • Addresses planned and • Addresses site failure / single CEC (Central unplanned HW/SW outages maintenance • Built-In redundancy Electronic Complex) • Flexible, non-disruptive • Metro / Global data mirroring • On/Off Capacity on Parallel Sysplex can growth • Sync (PPRC) – 100 km Demand be defined • Maintenance on • Capacity beyond largest • Async (XRC) – any distance • Capacity Backup CEC LPAR without loss of • Eliminates tape / disk Single • Hot pluggable I/O data • Scales better than SMPs Point of Failure (SPOF) • Protection from • Dynamic workload / • No / Some data loss software outages resource management • Application independent
23 © 2010 IBM Corporation Scalability
600
500
400
300
7xx 200 6xx 5xx 100 4xx 0 1-way 2-way 3-way 4-way 5-way 6-way 7-way 8-way
4xx 5xx 6xx 7xx Model S08
Scale-up Example for System z9 Allows installations to choose the capacity they need in a granular fashion and to grow when business needs require it
24 © 2010 IBM Corporation What does the Course Encompass?
How High Availability and Scalability is implemented on System z
– System z Technology and Hardware – Operating System (z/OS) and Partitioning Technology • Focus: Dispatching – Cluster Technology to achieve Continuous Availability • Parallel Sysplex – Capability to execute many different workloads at the same time and meet business objectives • Workload Management – Integration of Software, Operating System and Hardware
25 © 2010 IBM Corporation What to Remember?
What technology steps have been invented to reach high availability and scalability Why a technology is exploited in System z On a high level – How software, operating system and hardware work together – And why do they work together
26 © 2010 IBM Corporation 27 © 2010 IBM Corporation Literature
Introduction to the New Mainframe: Large-Scale Commercial Computing – http://www.redbooks.ibm.com/abstracts/sg247175.html?Open ABCs of z/OS System Programming Volume 11, – http://www.redbooks.ibm.com/abstracts/sg246327.html Documents for Workload Management – http://www-03.ibm.com/servers/eserver/zseries/zos/wlm/documents/ • z/OS Workload Manager: How It Works and How To Use It, April 2004 – http://www.research.ibm.com/journal/sj/362/aman.html • Adaptive algorithms for managing a distributed data processing workload Das Betriebssystem z/OS und zSeries, M.Teuffel, R.Vaupel, ISBN 3-486-27528-3
28 © 2010 IBM Corporation Trademarks The following are trademarks of the International Business Machines Corporation in the United States and/or other countries. APPN* HiperSockets OS/390* VM/ESA* CICS* HyperSwap Parallel Sysplex* VSE/ESA DB2* IBM* PR/SM VTAM* DB2 Connect IBM eServer Processor Resource/Systems Manager WebSphere* DirMaint IBM e(logo)server* RACF* z/Architecture e-business logo* IBM logo* Resource Link z/OS* ECKD IMS RMF z/VM* Enterprise Storage Server* Language Environment* S/390* z/VSE ESCON* MQSeries* Sysplex Timer* zSeries* FICON* Multiprise* System z9 GDPS* NetView* TotalStorage* Geographically Dispersed Parallel Sysplex On demand business logo Virtualization Engine * Registered trademarks of IBM Corporation
The following are trademarks or registered trademarks of other companies. Java and all Java-related trademarks and logos are trademarks of Sun Microsystems, Inc., in the United States and other countries Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. UNIX is a registered trademark of The Open Group in the United States and other countries. Microsoft, Windows and Windows NT are registered trademarks of Microsoft Corporation. Red Hat, the Red Hat "Shadow Man" logo, and all Red Hat-based trademarks and logos are trademarks or registered trademarks of Red Hat, Inc., in the United States and other countries. SET and Secure Electronic Transaction are trademarks owned by SET Secure Electronic Transaction LLC. * All other products may be trademarks or registered trademarks of their respective companies. Notes : Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here. IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply. All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions. This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to change without notice. Consult your local IBM business contact for information on the product or services available in your area. All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. Information about non-IBM products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the performance, compatibility, or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography.
29 © 2010 IBM Corporation