By Bill Bitner & Brian Wade, Ph.D.

he z/VM product has its roots in an IBM product named VM/370, released in August 1972. VM/370 grew from earlier unsold versions of virtual machine technology dating back to 1967. This article Texplores some of the current product’s constructs and concepts. Let’s start with the basics. Virtual Machines The IBM System z processors operate according to a data processing machine architecture defined in z/Architecture Principles of Operation. This book explains the machine’s instruction set, its I/O pro- cessing rules, and how it runs programs. Engineers implement z/Architecture in IBM System z machines such as those in the IBM z9 family. The role of many kernels is to create well-defined program execution con- texts in which applications can run concurrently without interfering with one another. For z/VM, these execution contexts are called virtual machines. Where an ordinary operating system might create execution contexts for application programs, z/VM creates execution contexts for System z operating systems. In other words, z/VM’s task is to create virtual machines adhering to z/Architecture. z/VM can create nearly any virtual configuration that could legitimately exist in the real hardware. Further, z/VM lets many such virtual machines exist and operate simultaneously on one instance of real hardware; in fact, in the most interesting cases, z/VM overcommits the real hardware. A virtual machine also is often known as a VM user ID, a VM logon, a VM guest, or a virtual . Just as a real System z machine consists of real processors, real memory, and real I/O devices, a virtual machine consists of virtual processors, , and virtual I/O devices. For example, z/VM might equip a virtual machine with two virtual processors, 1GB of virtual memory, a virtual Ethernet card, and a virtual 3270 terminal device. Because of the histories of z/VM and its guests, some of the virtual devices found in a typical virtual machine’s configuration correspond to real devices IBM no longer manufactures. For example, each virtual machine will typically have a virtual card reader, >

z/Journal • February/March  0 0 8 • 3 9 virtual card punch, and virtual printer. Virtualization of Processors called a SIE exit or a SIE break. CP Many guests still effectively use these Because z/Architecture defines that responds to the SIE break by handling antiquated virtual devices even today. a System z data processing system can the condition and eventually redispatch- Various System z operating systems house one to 64 processors, each virtual ing the virtual processor by issuing can run in virtual machines. One, machine z/VM creates can have one to another SIE instruction. Conversational Monitor System (CMS), 64 virtual processors. Usually, these are A customer obtains best utility from is sold as part of z/VM. CMS is an inter- all general-purpose instruction proces- a System z machine when the machine active, single-user operating system sors. However, z/VM does let the cus- spends its time in SIE running guests, meant to run in a virtual machine. tomer define some of a guest’s virtual instead of outside of SIE running CP. Though for many years CMS and its processors to be specialty processors z/VM performance engineers have applications were a mainstay of IBM’s such as System z Application Assist means to keep track of CPU time spent offerings in interactive general-purpose Processors (zAAPs), System z Integrated inside and outside of SIE and to keep computing, CMS recently has found its Information Processors (zIIPs), or track of the reasons why virtual proces- niche in supporting applications meant System z Integrated Facility for sors leave SIE. An important aspect of to help manage the z/VM system. The (IFL) processors. z/VM dispatches z/VM performance management is z/VM TCP/IP stack, for example, runs guests’ virtual processors on the proces- building guest operating systems and in a CMS virtual machine. CMS isn’t the sors of the System z machine it’s con- their applications to be aware they’re only System z operating system that can trolling. running virtually and to accomplish run virtually. Other System z operating z/VM provides control over processor their work using techniques that tend to systems, such as z/OS, z/VSE, z/TPF, resources by letting a system administra- promote remaining in SIE. IBM has and Linux for System z, can all run in tor assign a value to each virtual spent many years tuning its long-lived virtual machines on z/VM. machine. This share value typically sets System z operating systems, such as The heart of z/VM is a multi-pro- the minimum amount of processor CMS, z/OS, and z/VSE, to use SIE- gramming, multi-processing operating resource a virtual machine can expect CP friendly techniques. Linux for System z, system kernel known as the Control to allocate to it. z/VM also lets the system being comparatively young, is still Program (CP). (Be careful when using administrator define a maximum share evolving. the term “CP,” for in some communities, value to prevent a guest from excessively “CP” means “central processor.”) CP is consuming processor resource. The Virtualization of Memory the component of z/VM that creates z/VM system administrator or system Memory can be defined in the CP and dispatches virtual machines on the operator can adjust share settings while directory or can be changed with the real System z hardware. virtual machines are running. CP DEFINE command. The virtual CP supports hundreds of com- machine memory is usually defined as a mands. Some control the configura- The Heart of Virtualization contiguous range. However, one can tion of the virtual machine. For System z virtualization depends on a define gaps in the memory. This can be example, the CP DEFINE command hardware instruction called Start helpful in some situations. lets a customer create a virtual device Interpretive Execution (SIE, pronounced Of course, all guest memory is vir- and add it to a virtual machine’s con- “sigh”). This instruction gives CP a tual. To virtualize memory, z/VM har- figuration. Though CP lets customers means to tell System z hardware to run nesses the System z processor’s Dynamic change virtual machine configurations a virtual processor. CP and the System z Address Translation (DAT) facility. nearly at will, customers usually define hardware cooperate in their use of a With DAT, the System z processor trans- their virtual machines ahead of time memory-resident control structure— lates each instruction address or oper- in a user enrollment and definition file called the SIE State Descriptor or the and address from virtual to real, using called the CP directory. The CP direc- SIE Block—to track the state of the vir- translation tables CP maintains. CP tory describes virtual machines and tual processor. The SIE block contains overcommits physical memory by keep- sets attributes for each one, such as such things as the virtual processor’s ing resident only those guest pages that number of virtual processors, amount last register values, the pointers to the appear to have been needed in the of virtual memory, and complement of owning guest’s DAT tables, and other recent past, constantly adjusting the disk devices. state information that lets a virtual pro- DAT tables to preserve the memory Because of the nesting capabilities of cessor continue to run instructions. The illusion for the guests. When physical IBM’s System z virtualization technolo- address of the SIE block is an operand memory is scarce, CP moves stagnant gy, the System z computer on which of the SIE instruction. guest pages first to expanded storage (a z/VM perceives it’s running might itself Owing to the completeness of the high-speed page storage buffer) and be mythical. A commonly seen nesting virtual processor description stored in eventually to disk. CP brings these pages situation is the one where z/VM runs in the SIE block, the System z hardware back to memory if the guest ever needs a (LPAR) of a System z can handle many of the virtual proces- them again. computer. A System z hardware facility sor’s needs without requiring the help of CP manages the level of physical called the LPAR creates and CP. However, some conditions, such as memory overcommitment by consider- manages logical partitions in much the occurrence of a page fault or the guest’s ing a guest’s apparent memory need as same way that z/VM creates and man- use of an I/O instruction, are too com- part of deciding whether to run (dis- ages virtual machines. System z plex for the hardware to handle on its patch) the guest. If CP determines that machines contain special hardware that own. When the System z hardware letting a virtual machine run could lets this commonly found two-level encounters such a condition, it ends the cause the system to enter a thrashing nesting arrangement run with almost virtual processor’s time in SIE and gives condition because of an insurmountable no performance penalty. control back to CP. This phenomenon is lack of physical memory, CP prohibits

4 0 • z/Journal • February/March  0 0 8 the virtual machine from being dis- to a virtual machine. This gives the vir- idisks, it can use memory to improve patched for a period of time. Instead of tual machine exclusive use of the entire their performance. Central to z/VM’s moving the virtual machine to the dis- real device. Tape drives are typically minidisk strategy is the CP Minidisk patch list to run, CP places the virtual attached to virtual machines. CP also Cache (MDC). With MDC, CP uses machine on the eligible list, which is a can virtualize a device, which means it real memory or expanded memory to list of virtual machines that are ready to gives a guest a portion of a real device. cache recently read portions of min- run, but which CP is holding back This can be a portion in time, such as idisks. This greatly improves perfor- because there appears to be too little of a processor, or a portion of the mance for minidisks that are frequently physical resource available to run them. device’s storage capacity, such as of a read, such as those containing object The tendency of a virtual machine to disk drive. Simulation of devices is a code libraries or frequently used bina- incur page faults is directly related to third approach. Earlier we discussed ries. The minidisk cache is a write- the amount of physical memory CP has devices such as a virtual card reader. through cache, which means that if a set aside for the virtual machine’s pages. This is an example of a device where guest writes to blocks that are cached, Usually, CP makes this determination real hardware isn’t present, but CP sim- CP updates the cache and commits the on its own. However, the CP SET ulates it using memory and disk. The change to the minidisk before inform- RESERVE command lets the operator last approach we’ll mention, emulation, ing the guest that the write is complete. influence the determination, telling CP is when CP uses hardware of one type A z/VM system administrator or opera- to reserve at least said minimum num- to create the illusion of a similar type. tor can use the CP SET MDCACHE ber of real frames to hold the guest’s For example, CP uses modern SCSI command to control or configure the pages. When the operator uses SET disks to cause guests to believe that an minidisk cache. RESERVE, he’s offering a guest favored older, no-longer-manufactured style of Network connectivity is an impor- status with respect to memory con- disk, called Fixed Block Architecture tant concern in many environments. sumption. Using SET RESERVE is a (FBA), is present. z/VM meets customers’ network needs valuable performance tuning technique. z/VM provides disks to guests in by offering several networking options. z/VM lets virtual machines share various ways. While CP can dedicate CP can dedicate network devices to memory, which helps reduce memory entire disk volumes to virtual machines, virtual machines. The dedicated device requirements. z/VM has three kinds of more common is for CP to divide real can be a channel-to-channel adapter, an shared memory. The first, a disk volumes into disjoint, contiguous IBM Open Systems Adapter (OSA) that Discontiguous Saved Segment (DCSS), cylinder or block ranges called min- provides Ethernet connectivity, or a is a range of guest memory addresses idisks, thereby letting many guests each HiperSockets device, a kind of network for which all participating guests see the use some fraction of a real volume’s adapter that connects one LPAR to same physical memory pages. A z/VM storage capacity. Minidisks can be exclu- another. z/VM also has its own TCP/IP systems programmer can place com- sive to virtual machines, or many virtual stack, which guests can use as if it were monly used data or programs in a DCSS, machines can use a single minidisk an IP router. A common network letting many virtual machines share one simultaneously, thereby sharing data. option used today is the virtual switch. physical copy. One particularly interesting kind of Here, CP equips each virtual machine With the second type of shared disk z/VM can provide for a guest is the with a simulated IBM OSA and con- memory, a Named Saved System (NSS), Virtual Disk in Storage or VDISK. A nects all those simulated OSAs to a participating guests share a physical VDISK appears to the guest as an FBA simulated LAN segment called a guest copy of the data; in addition, the “data” disk drive with extremely fast perfor- LAN. Also connected to the guest LAN is typically a bootable operating system. mance. The VDISK performs well is a real OSA that CP manages. With This lets many guests share, for exam- because z/VM backs the VDISK in this configuration established, CP can ple, a single copy of the Linux kernel or paged memory instead of on real disk provide packet- or frame-switching CMS nucleus. Booting from memory hardware. Because CP doesn’t back a functions for the guests, just as a real offers speed advantages as well as mem- VDISK with permanent disk, the data is switch would in a real external net- ory economy. volatile. Even so, VDISKs are handy in work. In this way, the guest LAN The third type of shared memory, a several situations. In particular, VDISKs becomes an extension of a real external VM Data Space, is similar to a DCSS, are an especially good choice for Linux LAN segment. but offers much more addressability. swap space. With VM Data Spaces, the shared data One last disk feature worth mention- Diagnostic and Programming Services are in one or more distinct address ing is Temporary Disk (TDISK). The z/VM offers a variety of debug facili- spaces, each address space being entire- system administrator can assign CP a ties, making it invaluable for developing ly available for sharing. The participat- pool of disk volumes it can use to instan- operating systems for System z. ing guests access those address spaces tiate minidisks users need for only a Commands exist to display, search, or using a System z operand addressing short time. The CP DEFINE command modify virtual machine memory. In architecture called Access Register (AR) lets the z/VM user define such a min- displaying memory, CP can display the mode. With AR mode, a single System z idisk; when a user does so, CP finds memory in hexadecimal, ASCII, instruction can refer to operands locat- some free space in the pool and uses it EBCDIC, or even as disassembled ed in more than one address space. to create the minidisk. When the user assembler instructions. no longer needs the minidisk, he issues The tracing facility, whose documen- Virtualization of I/O Devices the CP DETACH command to dispose tation exceeds 40 pages, is extensive. CP z/VM uses various methods to pro- of it, and CP clears the space and returns TRACE can trap references to memory, vide devices to virtual machines. First, it to the pool. changes to registers, use of specific CP can dedicate, or attach, a real device Because CP mediates access to min- instructions, or arrival of interrupts, to

  • z/Journal • February/March  0 0 8 S av e e n e r gy

US e S D S

Best-of-Breed z/OS SDS FTP Manager IP Management Transforming z/OS FTP • Locate performance-sapping glitches. • Automate batch transfers to gain • Avoid resource-hog monitoring optimum throughput. techniques. • Eliminate FTP-related productivity • Maintain OSA and Enterprise shortfalls. Extender at peak performance. • Thwart security violations. • Perform live traces with ease. • Expedite transfer tracking.

Putting your time and MIPS to best use.

Free test drives available. www.sdsusa.com • 800-443-6183 name a few. CP TRACE also includes Inter-User Communication Vehicle saw CMS wither as a general-purpose the ability to issue commands when cer- (IUCV), also lets a virtual machine interactive environment, while Linux tain trace points are hit. The software communicate with CP. Over an IUCV ascended. At the turn of the century, developer can conduct tracing interac- connection to CP, a trusted virtual Linux on the mainframe gained a foot- tively, or let the trace run, collecting the machine can help CP accomplish cer- hold, again bringing VM’s virtualiza- trace records for later analysis. tain important system management tion capabilities to the forefront. In z/VM provides several programming functions such as accounting, perfor- response to the Linux boom, IBM interfaces to let guests interact with CP. mance monitoring, or security. again changed z/VM, improving its The System z architecture provides a spe- virtualization capabilities so that CP cial assembler instruction, Diagnose, The Future could begin to handle a large number which on real hardware performs diag- IBM’s VM product family has of Linux guests as easily as it once nostic functions. Recognizing the utility thrived for four decades because IBM handled many CMS guests. z/VM’s of such a trappable instruction, CP imple- has constantly improved VM to match ability to adapt to the needs of the sys- ments an entire Application Programming the market’s virtualization needs. In tems running in its virtual machines is Interface (API) built on Diagnose. To use the early ’70s, VM hosted a small a strength that should carry it forward the API, a guest builds a parameter list in number of ordinary guest operating in the next three decades. Z memory, puts the address of the parame- systems. In the late ’70s and early ’80s, ter list into a register, and then issues the the number grew modestly. In the About the Authors Diagnose instruction. CP performs the mid-80s, CMS became popular as a Bill Bitner is a senior software engineer in the IBM requested operation and returns control general-purpose interactive comput- Systems and Technology Group in Endicott, NY. He to the guest. ing platform, owing largely to a CMS- joined IBM in 1985, and has worked on performance CP provides more than 50 different based email and calendar package in various areas of VM. He currently leads the VM functions through Diagnose. These known as Professional Office System Performance team. Email: bitnerb@us..com functions include interrogating real (PROFS). To handle the growth, IBM Brian Wade is a senior software engineer in the device characteristics, performing I/O, sharpened VM’s ability to run many IBM Systems and Technology Group in Endicott, NY. He or managing memory segments. CP lightweight, single-user interactive vir- joined IBM in 1986 after earning his Ph.D. in Electrical also provides various communication tual machines; some customers ran Engineering from the University of Notre Dame. He APIs that connect virtual machines to 20,000 office users concurrently on a currently works in z/VM Performance. one another. One of these methods, the single hardware footprint. The late ’90s Email: [email protected]

BluePhoenix Migration Plus

Modernize your legacy mainframe databases and applications and cut growing maintenance costs now!

� Enhance user interfaces

� Update business processes

� Expand and improve mission critical applications

With Migration Plus, BluePhoenix helps you to move into the future with confidence. BLUEPHOENIX www.bphx.com

  • z/Journal • February/March  0 0 8