US 2013 OO24659A1 (19) United States (12) Patent Application Publication (10) Pub. No.: US 2013/0024659 A1 Farrell et al. (43) Pub. Date: Jan. 24, 2013

(54) EXECUTING AN INSTRUCTION FOR Related U.S. Application Data PERFORMING ACONFIGURATION (63) Continuation of application No. 13/193,641, filed on VIRTUAL TOPOLOGY CHANGE Jul. 29, 2011, now Pat. No. 8,301,815, Continuation of (71) Applicant: International Business Machines application No. 12/636,200, filed on Dec. 11, 2009, Corporation, Armonk, NY (US) now Pat. No. 8,015,335. (72) Inventors: Mark S. Farrell, Pleasant Valley, NY Publication Classification (US); Charles W. Gainey, JR., Poughkeepsie, NY (US); Jeffrey P. (51) Int. Cl. Kubala, Poughguag, NY (US); Donald G06F 5/76 (2006.01) W. Schmidt, Stone Ridge, NY (US) (52) U.S. Cl...... 712/30; 712/E09.003 (57) ABSTRACT (73) Assignee: INTERNATIONAL BUSINESS In a logically partitioned host computer system comprising MACHINES CORPORATION, host processors (host CPUs) partitioned into a plurality of Armonk, NY (US) guest processors (guest CPUs) of a guest configuration, a perform topology function instruction is executed by a guest processor specifying a topology change of the guest configu (21) Appl. No.: 13/628,413 ration. The topology change preferably changes the polariza tion of guest CPUs, the polarization related to the amount of (22) Filed: Sep. 27, 2012 a host CPU resource is provided to a guest CPU.

| REQUESTED INFORMATION

current CONFIGURA iON-EVE

NUMBER BASC-li?h NE CONFGRAON BASC-kiACHINE CP BASIC-liACH NE CPijs LOGICAL-PARTITION CPU 2 OGOAL-PARTION CRijs 2 2 VIRTUAL-MACHINE CPUs

TOPOLOGY INFORMATION OF CURREN CONFCURA. ON

Patent Application Publication Jan. 24, 2013 Sheet 1 of 15 US 2013/0O24659 A1

reas--- |- rarunara- *rester-all------rtraees hiOST COPUER

Rilreserwarx ?t O PROCESSOR (CPU) O3,

SOSR iM

fo NSTRUCTION COUR EC Ni fEiORY

(MAN

NSTRUCON STORE) CE. Ni

INSTRUCTION EXECUCN N

A - Patent Application Publication Jan. 24, 2013 Sheet 2 of 15 US 2013/0O24659 A1

20 raesareer-ra: Taha fakea al

EMULATED (VIRTUA) 100 iOS CORUER serraranasierre-as-s-s-s-s-s MEMORY 202

rol). COiPUER EORY

208 r ------irrassinat are see serie err wrural rase as a ran sees er PROCESSOREMLATED (VRTUAL) (CPU) ||F" 20;

if 27it - EMAON PROCESSOR ROUNES | NATIVE INSTRUCTION SET ARCHECURE 'B'

------vr ---N. r y N MEDA 1. A NEORK

O FG.2 Cr Art Patent Application Publication Jan. 24, 2013 Sheet 3 of 15 US 2013/0024659 A1 22, S.

GENERAL REGISTERaharsaal () , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , () 31

32 FC SS000000000000000000000000000 S8SELECTOR 82 CENERA. RECSR

O 82

current conficuRATION-EVEL NU/8ER SASC-yACHINE CONFG RAON BASIC-MACHINE CP 2 3ASC-SiACNE Clijs FIG.5 : : OCCA-PARTTION CPU i. ot {ijs VfRFUAll-MACH NE Cijs

OPOi OGY NFORMATION OF CURREN i CONFIGURATION Patent Application Publication Jan. 24, 2013 Sheet 4 of 15 US 2013/0024659 A1

RESERVE) ty ------

MAN: JFACRSir

23 | ------rrrl IPE RESERVED

6 ------rra-erra-aa------FG.6 ki(OE - CAPACY (ENFER Prior Art

waanaanaurer

PANT OF MANUFACTUREsaw-wha aaa------erran.www.as hiO.

RESERVE

SEQUENCE CO3

- a-----v-u-un-kula------25 RESERVED "CPU ApoRESS RESERVE)

{} 8 3. Patent Application Publication Jan. 24, 2013 Sheet 5 of 15 US 2013/0024659 A1

FORMAT-C SYSIR 1.2.2 -FORMAT RESERVED,Y------arrawark--- RESERVED SECONDARY CPU CAPABILITY a-...-CPUOIA CPL COUN CAPABILITYCONFIGURED CPU COUNT STANDEY CPUMilti-ROCESSING COUNTRESERVEDCF COGS------CPU-CAA8i Y 2. AJUS.E i FACTORS 8 18 S1

: Oi Ri?i - SYSIS 3 fe FORMAT RESEREDSrrrrr" I ACCas OFFSET

SECONDARY CPt CAPABELY it CAPASTY 9 OTAL CPU COUNT CONFIGUREDranamaxas-a-- CPU COUNT 10 SANDBY CPU cous RESERVED CPU COUNT kiji ROCESSNG {j-CA-ABY AUSEN ACORS N ALTERNATE CPU CAPABILITY N+1 AirAl MROCESSNG . CP-CAAB. TY (23 AjjSi EN FACTORS { 8 8 3. FIG.9 Frio Ari Patent Application Publication Jan. 24, 2013 Sheet 6 of 15 US 2013/0024659 A1

SYSIS 5, 2 tra-mer-aaa... RESERVE) ENGTH

hi&SS iA34 AG3 in 2 MAG2 MA - servo AES; r-Yaa-r--arawal

8 6 24 3i FG.10

YRE - CONIANER

N NO RESERVSD oooooooo reserved

{p} YE T-- Patent Application Publication Jan. 24, 2013 Sheet 7 of 15 US 2013/0024659 A1

HARWARE ARTONC PART ON ...i | ARiiON-3 APRLCAONS APPLICATIONS

|OS-1

FIRMARE manahaass raw wavrr wrrrrrrrrrrrrrrrly-warrank-n-m-www.us-s-s-s-r-www.ww. CPU-cPU-2) lcPU-3 cPu | i :A. rsesseurream-e-eleasess-a- i-AiROARE F.G. 5A Patent Application Publication Jan. 24, 2013 Sheet 8 of 15 US 2013/0O24659 A1

SOARE PARiiONN.C. A&ITON.A FARiiON. FARTION-C

eccaraeses APPLICATIONS APP iCATIONS

ARENARE

FG.15B Patent Application Publication Jan. 24, 2013 Sheet 9 of 15 US 2013/0O24659 A1

iOGICA. FARiiONN

PARTITION-B PARTITION-C f iAPPLICATIONS3 APPLICATIONS

Riff R: Partitioning Ritar

ARCAR

FG.15C Patent Application Publication Jan. 24, 2013 Sheet 10 of 15 US 2013/0O24659 A1

re-- Yxswall.--> SECON EVE: 2/VM z/VM z/VM (SES SYSE CUEST-AGJES-B GUEST.C ear

Yaksalealulik v.v.-..r.l. www. re t 2/M (GUES - i.

v , A FIRST LEVEi.

|GUES- STS's GUESE--r SYSTEM

k terrar z/VM OPERATING SYSTEM (OS)

HARDWARE COPONENIS (CPs)

FG.15D Patent Application Publication Jan. 24, 2013 Sheet 11 of 15 US 2013/0O24659 A1

Patent Application Publication Jan. 24, 2013 Sheet 12 of 15 US 2013/0024659 A1

|lseuuelloon)|–

}

~~~~…. Patent Application Publication Jan. 24, 2013 Sheet 13 of 15 US 2013/0O24659 A1

N ER is

FIG. 18

BOAR)

sessee-eeeeeeee-eeeeeeeee-seasesseesaasaaaaaaaaaaaaaaaaawaaaaaaaaa

p;7

C. : a CONTAINER 9.

FG.9 Patent Application Publication Jan. 24, 2013 Sheet 14 of 15 US 2013/0O24659 A1

f rs---...-- Fetch Perform 2OO Topology Function . ----- Instruction -

Obtain from General Register Specified by a Field of the instruction, a Function Code Value . . . Specifying a Polarization Request a Change to the Change Guest Configuration Topology ACCording to a raw manamax. - Change Specified by the 2002 Fetched Perform Topology Function instruction Set Condition Code to indicate Change Pending

(A)-ar f/ 2005 / y 2006

2004 N

CheckProgress Status i.)p 9 N 2OO7 Set Condition Code to indicate Change Not Pending

FG. 20 Patent Application Publication Jan. 24, 2013 Sheet 15 of 15 US 2013/0O24659 A1

(A) 2 C. v's 2104 - E. Y S REASON COE WA UE SC. DING A

St. OFO_{3GY CHANGE SAL REAY is N ROCRESS ROCESS,

2102- - r - PARE)ALREADYNY -- SET RESU CODEVALUECODE NDCATNNDCATNG ifE CONFIGRAOh S AREAY FOARZED ACCORDING TO THE FUNCION CODE t 26 slee-axacacas SE CONDON CODE TO NiCAE REQUES NATE

pROR; HE SECFE

C-ANCE O HE CONFQijRAON OP) OGY FIG.2 US 2013/0O24659 A1 Jan. 24, 2013

EXECUTING AN INSTRUCTION FOR continue. There are six classes of interruption: external, I/O. PERFORMING ACONFIGURATION machine check, program, restart, and Supervisor call. Each VIRTUAL TOPOLOGY CHANGE class has a distinct pair of old-PSW and new-PSW locations permanently assigned in real storage. CROSS-REFERENCE TO RELATED APPLICATIONS General Registers: 0001. This is a continuation of U.S. patent application Ser. 0007 Instructions may designate information in one or No. 13/193,641 *EXECUTING AN INSTRUCTION FOR more of 16 general registers. The general registers may be PERFORMING A CONFIGURATION VIRTUAL TOPOL used as base-address registers and index registers in address OGY CHANGE' filed Jul. 29, 2011, which is a continuation arithmetic and as accumulators in general arithmetic and of U.S. Pat. No. 8,015,335 issued Sep. 6, 2011, “PERFORM logical operations. Each register contains 64 bit positions. ING A CONFIGURATION VIRTUAL TOPOLOGY The general registers are identified by the numbers 0-15 and CHANGE AND INSTRUCTION THEREFORE filed Dec. are designated by a four-bit R field in an instruction. Some 11, 2009, which is a continuation of U.S. Pat. No. 7,739,434, instructions provide for addressing multiple general registers issued Jun. 5, 2010, “PERFORMING ACONFIGURATION by having several R fields. For some instructions, the use of a VIRTUAL TOPOLOGY CHANGE AND INSTRUCTION specific general register is implied rather than explicitly des THEREFORE filed Jan. 11, 2008, assigned to IBM. The ignated by an R field of the instruction. For Some operations, disclosures of the foregoing application and patents are incor either bits 32-63 orbits 0-63 of two adjacent general registers porated herein by reference. are coupled, providing a 64-bit or 128-bit format, respec tively. In these operations, the program must designate an FIELD OF THE INVENTION even-numbered register, which contains the leftmost (high 0002 The present invention relates in general to virtual order) 32 or 64bits. The next higher-numbered register con ization of multi-processor systems. In particular, the present tains the rightmost (low-order) 32 or 64 bits. In addition to invention relates to enabling programs to change elements of their use as accumulators in general arithmetic and logical the topology of their virtual environment. operations, 15 of the 16 general registers are also used as base-address and index registers in address generation. In BACKGROUND these cases, the registers are designated by a four-bit B field or X field in an instruction. A value of Zero in the B or X field 0003. Among the system control functions is the capabil specifies that no base or index is to be applied, and, thus, ity to partition the system into several logical partitions general register 0 cannot be designated as containing a base (LPARs). An LPAR is a subset of the system hardware that is address or index. defined to support an operating system. An LPAR contains 0008. The current program-status word (PSW) in the CPU resources (processors, memory, and input/output devices) contains information required for the execution of the cur and operates as an independent system. Multiple logical par rently active program. The PSW is 128 bits in length and titions can exist within a mainframe hardware system. includes the instruction address, condition code, and other 0004. In the systems from IBM control fields. In general, the PSW is used to control instruc including the S/390R), for many years there was a limit of 15 tion sequencing and to hold and indicate much of the status of LPARs. More recent machines have 30 (and potentially the CPU in relation to the program currently being executed. more). Such machines are exemplified by those of the Z/Ar Additional control and status information is contained in chitecture(R). The IBM(R) Z/Architecture(R) is described in the control registers and permanently assigned storage locations. Z/Architecture Principles of Operation SA22-7832-05 pub The status of the CPU can be changed by loading a new PSW lished April, 2007 by IBM and is incorporated by reference or part of a PSW. Control is switched during an interruption of herein in its entirety. the CPU by storing the current PSW, so as to preserve the 0005. The IBM(R) Z/Architecture(R) teaches elements of a status of the CPU, and then loading a new PSW. Execution of computer system including PSWs, Condition Codes and LOAD PSW or LOAD PSW EXTENDED, or the successful General registers. conclusion of the initial-program-loading sequence, intro duces a new PSW. The instruction address is updated by PSW: sequential instruction execution and replaced by Successful 0006. The program-status word (PSW) includes the branches. Other instructions are provided which operate on a instruction address, condition code, and other information portion of the PSW. used to control instruction sequencing and to determine the state of the CPU. The active or controlling PSW is called the Program Status Word: current PSW. It governs the program currently being executed. The CPU has an interruption capability, which per 0009. A new or modified PSW becomes active (that is, the mits the CPU to switch rapidly to another program in information introduced into the current PSW assumes control response to exceptional conditions and external stimuli. over the CPU) when the interruption or the execution of an When an interruption occurs, the CPU places the current instruction that changes the PSW is completed. The interrup PSW in an assigned storage location, called the old-PSW tion for PER associated with an instruction that changes the location, for the particular class of interruption. The CPU PSW occurs under control of the PER mask that is effective at fetches a new PSW from a second assigned storage location. the beginning of the operation. This new PSW determines the next program to be executed. When it has finished processing the interruption, the program Condition Code (CC): handling the interruption may reload the old PSW, making it 0010 Bits 18 and 19 are the two bits of the condition code. again the current PSW, so that the interrupted program can The condition code is set to 0, 1, 2, or 3, depending on the US 2013/0O24659 A1 Jan. 24, 2013

result obtained in executing certain instructions. Most arith 0018 System administrators assign portions of memory to metic and logical operations, as well as some other opera each LPAR and memory cannot be shared among LPARs. The tions, set the condition code. The instruction BRANCHON administrators can assign processors (also known as central CONDITION can specify any selection of the condition-code processors (CPs) or central processing units (CPUs)) to spe values as a criterion for branching. cific LPARs or they can allow the system controllers to dis patch any or all the processors to all the LPARs using an Instruction Execution and Sequencing internal load-balancing algorithm. Channels (CHPIDs) can 0011. According to the IBM ZArchitecture, the program be assigned to specific LPARs or can be shared by multiple status word (PSW), contains information required for proper LPARs, depending on the nature of the devices on each chan program execution. The PSW is used to control instruction nel. sequencing and to hold and indicate the status of the CPU in 0019. A system with a single processor (CP processor) can relation to the program currently being executed. The active have multiple LPARs. PR/SM has an internal dispatcher that or controlling PSW is called the current PSW. can allocate a portion of the processor to each LPAR, much as 0012 Branch instructions perform the functions of deci an operating system dispatcher allocates a portion of its pro sion making, loop control, and Subroutine linkage. A branch cessor time to each process, thread, or task. instruction affects instruction sequencing by introducing a 0020 Partitioning control specifications are partly con new instruction address into the current PSW. The relative tained in the IOCDS and are partly contained in a system branch instructions with a 16-bit I2 field allow branching to a profile. The IOCDS and profile both reside in the Support location at an offset of up to plus 64K 2 bytes or minus 64K Element (SE) which, for example, is simply a notebook com bytes relative to the location of the branch instruction, without puter inside the system. The SE can be connected to one or the use of a base register. The relative-branch instructions more Hardware Management Consoles (HMCs), which, for with a 32-bit I2 field allow branching to a location at an offset example, are desktoppersonal computers used to monitor and of up to plus 4G 2 bytes or minus 4G bytes relative to the control hardware such as the mainframe microprocessors. An location of the branch instruction, without the use of a base HMC is more convenient to use than an SE and can control register. several different mainframes. 0021 Working from an HMC (or from an SE, in unusual Decision Making circumstances), an operator prepares a mainframe for use by selecting and loading a profile and an IOCDS. These create 0013 Facilities for decision making are provided by the LPARs and configure the channels with device numbers, BRANCHON CONDITION, BRANCH RELATIVE ON LPAR assignments, multiple path information, and so forth. CONDITION, and BRANCHRELATIVE ON CONDITION This is known as a Power-on Reset (POR). By loading a LONG instructions. These instructions inspect a condition different profile and IOCDS, the operator can completely code that reflects the result of a majority of the arithmetic, change the number and nature of LPARs and the appearance logical, and I/O operations. The condition code, which con of the I/O configuration. However, doing this is usually dis sists of two bits, provides for four possible condition-code ruptive to any running operating systems and applications and settings: 0, 1, 2, and 3. is therefore seldom done without advance planning. 0014. The specific meaning of any setting depends on the 0022 Logical partitions (LPARs) are, in practice, equiva operation that sets the condition code. For example, the con lent to separate mainframes. dition code reflects such conditions as Zero, nonzero, first 0023. Each LPAR runs its own operating system. This can operand high, equal, overflow, and Subchannel busy. Once be any mainframe operating system; there is no need to run set, the condition code remains unchanged until modified by Z/OSR), for example, in each LPAR. The installation planners an instruction that causes a different condition code to be set. may elect to share I/O devices across several LPARs, but this is a local decision. Loop Control 0024. The system administrator can assign one or more 0015 Loop Control can be performed by the use of system processors for the exclusive use of an LPAR. Alter BRANCHON CONDITION, BRANCH RELATIVE ON nately, the administrator can allow all processors to be used CONDITION, and BRANCHRELATIVE ON CONDITION on some or all LPARs. Here, the system control functions LONG to test the outcome of address arithmetic and counting (often known as microcode or firmware) provide a dispatcher operations. For some particularly frequent combinations of to share the processors among the selected LPARs. The arithmetic and tests, BRANCHON COUNT, BRANCHON administrator can specify a maximum number of concurrent INDEX HIGH, and BRANCH ON INDEX LOW OR processors executing in each LPAR. The administrator can EQUAL are provided, and relative-branch equivalents of also provide weightings for different LPARs; for example, these instructions are also provided. These branches, being specifying that LPAR1 should receive twice as much proces specialized, provide increased performance for these tasks. sor time as LPAR2. 0016 Practical limitations of memory size, I/O availabil 0025. The operating system in each LPAR is initialized ity, and available processing power usually limit the number (for example, IPLed) separately, has its own copy of its oper of LPARs to less than these maximums. ating system, has its own operator console (if needed), and so 0017. The hardware and firmware that provides partition forth. If the system in one LPAR crashes, there is no effect on ing is known as PR/SMTM (Processor Resource/System Man the other LPARs. ager). It is the PR/SM functions that are used to create and run 0026. In a mainframe system with three LPARs, for LPARs. This difference between PR/SM (a built-in facility) example, you might have a production z/OS in LPAR1, a test and LPARs (the result of using PR/SM) is often ignored and version of Z/OS in LPAR2, and (R) for S/390R) in the term LPAR is used collectively for the facility and its LPAR3. If this total system has 8 GB of memory, we might results. have assigned 4 GB to LPAR1, 1 GB to LPAR2, 1 GB to US 2013/0O24659 A1 Jan. 24, 2013

LPAR3, and have kept 2 GB in reserve. The operating system by helping to make direct access to DB2(R) more cost effective consoles for the two z/OS LPARs might be in completely and reducing the need for multiple copies of the data. different locations. 0027. For most practical purposes there is no difference Integrated (ICF) between, for example, three separate mainframes running Z/OS (and sharing most of their I/O configuration) and three 0041. These processors run only Licensed Internal Code. LPARs on the same mainframe doing the same thing. With They are not visible to normal operating systems or applica minor exceptions Z/OS, the operators, and applications can tions. For example, a coupling facility is, in effect, a large not detect the difference. memory scratch pad used by multiple systems to coordinate 0028. The minor differences include the ability of z/OS (if work. ICFs must be assigned to LPARs that then become permitted when the LPARs were defined or anytime during coupling facilities. execution) to obtain performance and utilization information across the complete mainframe system and to dynamically Spare shift resources (processors and channels) among LPARs to improve performance. 0042. An uncharacterized PU functions as a “spare.” If the 0029. Today’s IBM(R) mainframes, also called a central system controllers detect a failing CP or SAP, it can be processor complex (CPC) or central electronic complex replaced with a spare PU. In most cases this can be done (CEC), may contain several different types of Z/Architec without any system interruption, even for the application ture(R) processors that can be used for slightly different pur running on the failing processor. poses. 0043. In addition to these characterizations of processors, 0030 Several of these purposes are related to software Some mainframes have models or versions that are configured cost control, while others are more fundamental. All of the to operate slower than the potential speed of their CPs. This is processors in the CPC begin as equivalent processor units widely known as “knee-capping, although IBM prefers the (PUs) or engines that have not been characterized for use. term capacity setting, or something similar. It is done, for Each processor is characterized by IBM during installation or example, by using microcode to insert null cycles into the at a later time. The potential characterizations are: processor instruction stream. The purpose, again, is to control 0031) Processor (CP) Software costs by having the minimum mainframe model or 0032. This processor type is available for normal operat version that meets the application requirements. IFLs, SAPs, ing system and application software. ZAAPs, ZIPs, and ICFs always function at the full speed of 0033 System Assistance Processor (SAP) the processor because these processors “do not count” in 0034) Every modern mainframe has at least one SAP; Software pricing calculations. larger systems may have several. The SAPs execute internal 0044 Processor and CPU can refer to either the complete code to provide the I/O subsystem. A SAP, for example, system box, or to one of the processors (CPUs) within the translates device numbers and real addresses of channel path system box. Although the meaning may be clear from the identifiers (CHPIDs), control unit addresses, and device num context of a discussion, even mainframe professionals must bers. It manages multiple paths to control units and performs clarify which processor or CPU meaning they are using in a error recovery for temporary errors. Operating systems and discussion. IBM uses the term central processor complex applications cannot detect SAPs, and SAPs do not use any (CPC) to refer to the physical collection of hardware that “normal memory. includes main storage, one or more central processors, timers, 0035) Integrated Facility for Linux.R. (IFL) and channels. (Some system programmers use the term cen 0036. This is a normal processor with one or two instruc tral electronic complex (CEC) to refer to the mainframe tions disabled that are used only by Z/OS(R). Linux does not “box,” but the preferred term is CPC.) use these instructions and can therefore operate on an IFL. 0045 Briefly, all the S/390 or Z/Architecture processors Linux can be executed by a CP as well. The difference is that within a CPC are processing units (PUs). When IBM delivers an IFL is not counted when specifying the model number of the CPC, the PUs are characterized as CPs (for normal work), the system. This can make a substantial difference in software Integrated Facility for Linux (IFL), Integrated Coupling COStS. Facility (ICF) for Parallel Sysplex configurations, and so 0037 ZAAP forth. 0038. This is a processor with a number of functions dis 0046 Mainframe professionals typically use system to abled (interrupt handling. Some instructions) such that no full indicate the hardware box, a complete hardware environment operating system can operate on the processor. However, (with I/O devices), or an operating environment (with soft Z/OS can detect the presence of ZAAP processors and will use ware), depending on the context. They typically use processor them to execute JavaTM code. The same Java code can be to mean a single processor (CP) within the CPC. executed on a standard CP. Again, ZAAP engines are not 0047. The z/VM(R) HYPERVISORTM is designed to help counted when specifying the model number of the system. clients extend the business value of mainframe technology Like IFLs, they exist only to control software costs. across the enterprise by integrating applications and data 0039 ZIP while providing exceptional levels of availability, security, 0040. The System z9TM Integrated Information Processor and operational ease. Z/VM technology is (ZIIP) is a specialized engine for processing eligible database designed to allow the capability for clients to run hundreds to workloads. The ZIIP is designed to help lower software costs thousands of Linux servers on a single mainframe running for select workloads on the mainframe, such as business with other System Z operating systems, such as Z/OSR), or as intelligence (BI), enterprise resource planning (ERP) and a large-scale Linux-only enterprise server Solution. Z/VM customer relationship management (CRM). The ZIP rein VSE5.3 can also help to improve productivity by hosting forces the mainframe's role as the data hub of the enterprise non-Linux workloads such as z/OS, Z/VSE, and Z/TPF, US 2013/0O24659 A1 Jan. 24, 2013

0048 Z/VM provides each user with an individual work the hardware, while a guest operating system runs on the ing environment known as a . The virtual virtualization technology. FIG. 15D, illustrates a second level machine simulates the existence of a dedicated real machine, guest Z/VMOS loaded into a first level guest (guest-1) parti including processor functions, memory, networking, and tion. input/output (I/O) resources. Operating systems and applica 0057. In other words, there is a first-level Z/VM operating tion programs can run in virtual machines as guests. For system that sits directly on the hardware, but the guests of this example, you can run multiple Linux and Z/OS images on the first-level Z/VM system are virtualized. By virtualizing the same Z/VM system that is also supporting various applica hardware from the guests, we are able to create and use as tions and end users. As a result, development, testing, and many guests as needed with a small amount of hardware. production environments can share a single physical com 0.058 As previously mentioned, operating systems run puter. ning in virtual machines are often called "guests”. Other 0049 Referring to FIGS. 15A-15D, partitioning and vir terms and phrases you might encounter are: tualization involve a shift in thinking from physical to logical 0059) “Running first level” or “running natively’ means by treating system resources as logical pools rather than as running directly on the hardware (which is what Z/VM does). separate physical entities. This involves consolidating and 0060 “Running second level”, “running under VM’, or pooling system resources, and providing a “single system “running on (top of) VM’, or “running as a guest-1” means illusion' for both homogeneous and heterogeneous servers, running as a guest. Using the Z/VM operating system, it is also storage, distributed systems, and networks. possible to “run as a guest-2 when Z/VM itself runs as a 0050 Partitioning of hardware involves separate CPUs for guest-1 on a PR/SM . separate operating systems, each of which runs its specific 0061 An example of the functionality of Z/VM is, if you applications. Software partitioning employs a software-based have a first-level Z/VM system and a second-level Z/VM "hypervisor to enable individual operating systems to run on system, you could continue to create more operating systems any or all of the CPUs. on the second-level system. This type of environment is par 0051 allow multiple operating systems to ticularly useful for testing operating system installation run on a host computer at the same time. Hypervisor technol before deployment, or for testing or debugging operating ogy originated in the IBM VM/370, the predecessor of the systems. Z/VM we have today. Logical partitioning (LPAR) involves 0062 Virtual resources can have functions or features that partitioning firmware (a hardware-based hypervisor, for are not available in their underlying physical resources. FIG. example, PR/SM) to isolate the operating system from the 2 illustrates virtualization by resource emulation. Such func CPUS. tions or features are said to be emulated by the host program 0052 Virtualization enables or exploits four fundamental such that the guest observes the function or feature to be capabilities: resource sharing, resource aggregation, emula provided by the system when it is actually provided due to the tion of function, and insulation. We explore these topics in host-program assistance. more detail in the following sections. 0063 Examples include architecture emulation software 0053 Z/VM is an operating system for the IBM System Z that implements one processor's architecture using another; platform that provides a highly flexible test and production iSCSI, which implements a virtual SCSI bus on an IP net environment. The Z/VM implementation of IBM virtualiza work; and virtual-tape storage implemented on physical disk tion technology provides the capability to run full-function Storage. operating systems such as Linux on System Z, Z/OS, and 0064. Furthermore, the packing of central-processing others as “guests of z/VM. z/VM supports 64-bit IBM z/Ar units (CPUs) in modern technology is often hierarchical. chitecture guests and 31-bit IBM Enterprise Systems Archi Multiple cores can be placed on a single chip. Multiple chips tecture/390 guests. can be placed in a single module. Multiple modules can be 0054 Z/VM provides each user with an individual work packaged on a board often referred to as a book, and multiple ing environment known as a virtual machine. The virtual books can be distributed across multiple frames. machine simulates the existence of a dedicated real machine, 0065 CPUs often have several levels of caches, for including processor functions, memory, networking, and example each processor may have a cache (or possibly a split input/output (I/O) resources. Operating systems and applica Instruction cache and a data cache) and there may be addi tion programs can run in virtual machines as guests. For tional larger caches between each processor and the main example, you can run multiple Linux and Z/OSR images on memory interface. Depending upon the level of the hierarchy, the same Z/VM system that is also Supporting various appli caches are also placed in order to improve overall perfor cations and end users. As a result, development, testing, and mance, and at certain levels, a cache may be shared among production environments can share a single physical com more than a single CPU. The engineering decisions regarding puter. Such placement deal with space, power/thermal, cabling dis 0055. A virtual machine uses real hardware resources, but tances, CPU frequency, memory speed, system performance, even with dedicated devices (like a tape drive), the virtual and other aspects. This placement of elements of the CPU address of the tape drive may or may not be the same as the creates an internal structure that can be more or less favorable real address of the tape drive. Therefore, a virtual machine to a particular logical partition, depending upon where the only knows “virtual hardware' that may or may not exist in placement of each CPU of the partition resides. A logical the real world. partition gives the appearance to an operating system, of 0056. For example, in a basic-mode system, a first-level ownership of certain resources including processor utiliza Z/VM is the base operating system that is installed on top of tion where in actuality, the operating system is sharing the the real hardware FIG. 15D. A second-leveloperating system resources with other operating systems in other partitions. is a system that is created upon the base Z/VM operating Normally, software is not aware of the placement and, in a system. Therefore, Z/VM as a base operating system runs on symmetric-multiprocessing (SMP) configuration, observes a US 2013/0O24659 A1 Jan. 24, 2013

set of CPUs where each CPU provides the same level of running be changed. In an embodiment, the guest CPU topol performance. The problem is that ignorance of the internal ogy is Switched between horizontal and vertical polarization. packaging and “distance' between any two CPUs can result 0073. By having the capability to learn the CPU topology in Software making less than optimum choices on how CPUs information, the program understands the “distance' between can be assigned work. Therefore, the full potential of the SMP any arbitrary two or more CPUs of a symmetric-multipro configuration is not achieved. cessing configuration. 0066. The mainframe example of virtualization presented is intended to teach various topologies possible in virtualizing 0074 The capability provided for minimizing the aggre a machine. As mentioned, the programs running in a partition gate distance of all CPUs in a configuration, and how particu (including the operating systems) likely have a view that the lar application-program tasks are dispatched on the indi resources available to them, including the processors, vidual CPUs provides supervisory programs with the ability memory and I/O are dedicated to the partition. In fact, pro to improve performance. The improved performance can grams do not have any idea that they are running in apartition. result from one or more of the following attributes which are Such programs are also not aware of the topology of their improved by better topology knowledge: partition and therefore can not make choices based on Such (0075 Shortening Inter-CPU Signaling Paths. topology. What is needed is a way for programs to optimize 0076 Shared storage, accessed by multiple CPUs is more for the configuration topology on which they are running. likely to be in caches that are closer to the set of CPUs. Therefore, inter-cache storage use is confined to a smaller SUMMARY subset of the overall machine and configuration which allows 0067. The shortcomings of the prior art are overcome, and faster cache-to-cache transfer. Presence of a storage location additional advantages are provided, through the provision of in the closest cache of a CPU (L1) is significantly more likely a method, system, and computer program product for tO OCCur. enabling a Subset of dormant computer hardware resources in 0077. Because of improved performance, the number of an upgradeable computer system having a set of dormant CPUs actually in the configuration can be fewer in number, computer hardware resources. while still getting the same job done in the same or less run 0068 A host computer comprising host CPUs can be par time. Such reduction of CPUs lessens the number of commu titioned intological/virtual partitions having guest CPUs. The nication paths that each CPU must use to communicate with partitioning is preferably accomplished by firmware or by the other CPUs of the configuration, thereby further contrib software as might be provided by an operating system such as uting to overall performance improvement. Z/VM from IBM. Each guest CPU is a virtual CPU in that the guest programs view the guest CPUs as actual CPU proces (0078 For example, if 10 CPUs need to executea particular sors, but in fact, the underlying host is mapping each guest program, the inter-cache traffic is Substantial whereas if the CPU to host CPU resources. In an embodiment, a guest CPU same program can be executed on one CPU, there is no is implemented using a portion of a host CPU by the host inter-cache traffic. This indicates that the cache presence of designatingaportion of CPU time for the guest CPU to utilize desired storage locations is guaranteed to be in the cache of the host CPU. It is envisioned that a plurality of guest CPUs the single CPU, if that storage is in any cache at all. might be supported by a single host CPU but the opposite may 0079. When storage and associated cache hierarchy is also apply. local, as opposed to being distributed across multiple physical 0069. In another embodiment, the guest CPUs are emu frames (i.e. boxes, etc.), signaling paths are shorter. Topology lated by software whereby, emulation routines convert func knowledge indicates the relative distance in selecting the tions of the guest CPU (including instruction decode and appropriate Subset of CPUs to assign to an application pro execution) to routines that run on host CPUs. The host CPUs gram such that, even within a larger set of CPUs in an SMP are provisioned to support guest CPUs. configuration, the Subset optimizes the minimized distance 0070. In another embodiment, a first guest image may be among them. This is sometimes called an affinity group the host of a second guest image. In which case the second 0080. The notions of CPU-count reduction and distance guest CPUs are provisioned by first guest CPUs which are between CPUs are informed by topology information which themselves provisioned by host CPUs. The topology of the allows the program to optimize the assignment of CPUs to an configurations is a nesting of levels of guest CPUs and one or affinity group. more host CPUs. (0071. A new PERFORM TOPOLOGY FACILITY (PTF) I0081. An object of the present invention is to provide in a instruction is provided and the prior art STORE SYSTEM logically partitioned host computer system comprising host INFORMATION (STSI) instruction is enhanced to provide a processors (host CPUs), a method, system, and program new SYSIB (SYSIB identifier 15.1.2) which provides com product for a configuration change of a topology of a plurality ponent affinity and logical packaging information to soft of guest processors (guest CPUs) of a guest configuration. ware. This permits the software to apply informed and intel Preferably, a guest processor of the guest configuration ligent selection on how individual elements, such as fetches a PERFORM TOPOLOGY FUNCTION instruction processing units of the multi-processor, are assigned to Vari defined for a computerarchitecture, the PERFORMTOPOL ous applications and workloads. Thus providing information OGY FUNCTION INSTRUCTION comprising an opcode to a program (OS) for improving performance by increasing field specifying a function to be performed. The function to be the shared-cache hit ratios for example. performed by execution of the perform topology function 0072 A new PERFORM TOPOLOGY FUNCTION instruction comprising: (PTF) instruction is used by a privileged program Such as a 0082 requesting a specified change of the configuration of supervisor, an OS, a kernel and the like) to request that the the polarization of the guest processors of the guest configu CPU configuration topology within which the program is ration; and US 2013/0O24659 A1 Jan. 24, 2013

0083 responsive to the requested specified polarization identify a predetermined software routine for emulating the change, changing the configuration of the topology of the operation of the PERFORM TOPOLOGY FUNCTION guest processors of the guest configuration according to the instruction; and specified polarization change. 0102 wherein executing the PERFORM TOPOLOGY I0084. In an aspect of the invention the PERFORM FUNCTION instruction comprises executing the predeter TOPOLOGY FUNCTION instruction further comprises a mined software routine to perform steps of the method for register field, wherein the executing the PERFORMTOPOL executing the machine instruction. OGY FUNCTION instruction further comprises: 0103) Additional features and advantages are realized 0085 obtaining from a function code field of a register through the techniques of the present invention. Other specified by the register field, a function code field value, the embodiments and aspects of the invention are described in function code field value consisting of any one of a horizontal detail herein and are considered a part of the claimed inven polarization instruction specifier, a vertical polarization tion. For a better understanding of the invention with advan instruction specifier or a check of the status of a topology tages and features, refer to the description and the drawings. change specifier, I0086 responsive to the instruction specifying the horizon BRIEF DESCRIPTION OF THE DRAWINGS tal polarization, initiating horizontal polarization of the guest 0104. The subject matter which is regarded as the inven processors of the computer configuration; tion is particularly pointed out and distinctly claimed in the 0087 responsive to the instruction specifying the vertical concluding portion of the specification. The invention, how polarization, initiating vertical polarization of the guest pro ever, both as to organization and method of practice, together cessors of the computer configuration; and with further objects and advantages thereof, may best be 0088 setting a result code value in a result field of the understood by reference to the following description taken in register. connection with the accompanying drawings in which: 0089. In another aspect of the invention, initiated polar 0105 FIG. 1 depicts a Host Computer system of the prior ization is asynchronous to the completion of the execution, art, and responsive to the function code field value specifying a 0106 FIG. 2 depicts a prior art emulated Host Computer status check of atopology change the completion status of the system; topology change is checked. 0107 FIG. 3 depicts an instruction format of a prior art 0090. In an embodiment, horizontal polarization com STSI machine instruction; prises providing substantially equal host processor resource 01.08 FIG. 4 depicts implied registers of the prior art STSI to each guest processor resource, wherein Vertical polariza instruction; tion comprises providing Substantially more host processor 0109 FIG. 5 depicts a function code table: resource to at least one guest processor of said guest proces 0110 FIG. 6 depicts a prior art SYSIB 1.1.1 table: sors than to at least another guest processor of said guest 0111 FIG. 7 depicts a prior art SYSIB 1.2.1 table: processors. (O112 FIG. 8 depicts a prior art Format-1 SYSIB 1.2.2 0091. In another embodiment, the result code value speci table; fies a reason code indicating an inability to accept the polar 0113 FIG. 9 depicts a prior art Format-2 SYSIB 1.2.2 ization request and consisting of table; 0092 responsive to the configuration being polarized as 0114 FIG. 10 depicts a SYSIB 15.1.2 table according to specified by the function code prior to execution, the result the invention; code value indicating the configuration is already polarized 0115 FIG. 11 depicts a container type TLE: according to the function code; and 0116 FIG. 12 depicts a CPU type TLE: 0093 responsive to the configuration processing an 0117 FIG. 13 depicts an instruction format of a PTF incomplete polarization prior to execution, the result code machine instruction according to the invention; value indicating a topology change is already in process. 0118 FIG. 14 depicts a register format of the PTF instruc 0094. In an embodiment, the execution further comprises: tion; 0.095 responsive to a topology change being in progress, 0119 FIGS. 15A-15D depict prior art elements of parti setting a condition code indicating a topology-change initi tioned computer systems; ated; and I0120 FIG. 16 depicts an example computer system; 0096 responsive to the request being rejected, setting a I0121 FIGS. 17-19 depicts containers of the example com condition code indicating the request is rejected. puter system; and 0097. In an embodiment, the execution further comprises: (0.122 FIGS. 20-21 depict a flow of an embodiment of the 0098 responsive to no topology change report being invention. pending, setting a condition code indicating a topology change-report not pending; and DETAILED DESCRIPTION 0099 responsive to a topology change report being pend (0123. In a mainframe, architected machine instructions ing, setting a condition code indicating a topology-change are used by programmers (typically writing applications in report pending. “C” but also Java R, COBOL, PL/I, PL/X, Fortran and other 0100. In an embodiment, the perform topology function high level languages), often by way of a compiler application. instruction defined for the computer architecture is fetched These instructions stored in the storage medium may be and executed by a of an alternate executed natively in a z/Architecture IBM Server, or alterna computer architecture, tively in machines executing other architectures. They can be 0101 wherein the method further comprises interpreting emulated in the existing and in future IBM mainframe servers the PERFORM TOPOLOGY FUNCTION instruction to and on other machines of IBM (e.g. pSeries(R) Servers and US 2013/0O24659 A1 Jan. 24, 2013

XSeries(R Servers). They can be executed in machines run machine for a target machine available to those skilled in the ning Linux on a wide variety of machines using hardware art, as well as those commercial Software techniques used by manufactured by IBMR, Intel(R), AMDTM, those referenced above. and others. Besides execution on that hardware under a Z/Ar 0.126 Referring to FIG. 1, representative components of a chitecture(R), Linux can be used as well as machines which use host computer system 100 are portrayed. Other arrangements emulation by Hercules, UMX, FSI (Fundamental Software, of components may also be employed in a computer system Inc) or Platform Solutions, Inc. (PSI), where generally execu which are well known in the art. tion is in an emulation mode. In emulation mode, emulation I0127. The host computing environment is preferably Software is executed by a native processor to emulate the based on the Z/Architecture offered by International Business architecture of an emulated processor. Machines Corporation (IBM(R), Armonk, N.Y. The Z/Archi tecture is more fully described in: Z/Architecture Principles of 0.124. The native processor typically executes emulation Operation, IBM(R) Pub. No. SA22-7832-05, 6' Edition, Software comprising either firmware or a native operating (April 2007), which is hereby incorporated by reference system to perform emulation of the emulated processor. The herein in its entirety. Computing environments based on the emulation Software is responsible for fetching and executing Z/Architecture include, for example, eServer'TM and zSeries(R), instructions of the emulated processor architecture. The emu both by IBM(R). lation software maintains an emulated program counter to I0128. The representative host computer 100 comprises keep track of instruction boundaries. The emulation software one or more CPUs 101 in communication with main store may fetch one or more emulated machine instructions at a ( 102) as well as I/O interfaces to storage time and convert the one or more emulated machine instruc devices 111 and networks 101 for communicating with other tions to a corresponding group of native machine instructions computers or SANs and the like. The CPU may have for execution by the native processor. These converted Dynamic Address Translation (DAT) 103 for transforming instructions may be cached Such that a faster conversion can program addresses (virtual addresses) into real address of be accomplished. Not withstanding, the emulation software memory. A DAT typically includes a Translation Lookaside must maintain the architecture rules of the emulated proces Buffer (TLB) 107 for caching translations so that later sor architecture so as to assure operating systems and appli accesses to the block of computer memory 102 do not require cations written for the emulated processor operate correctly. the delay of address translation. Typically a cache 109 is Furthermore the emulation software must provide resources employed between computer memory 102 and the Processor identified by the emulated processor architecture including, 101. The cache 109 may be hierarchical having a large cache but not limited to control registers, general purpose registers available to more than one CPU and smaller, faster (lower (often including floating point registers), dynamic address level) caches between the large cache and each CPU. In some translation function including segment tables and page tables implementations the lower level caches are split to provide for example, interrupt mechanisms, context Switch mecha separate low level caches for instruction fetching and data nisms, Time of Day (TOD) clocks and architected interfaces accesses. In an embodiment, an instruction is fetched from to I/O Subsystems such that an operating system or an appli memory 102 by an instruction fetch unit 104 via a cache 109. cation program designed to run on the emulated processor, The instruction is decoded in an instruction decode unit (16) can be run on the native processor having the emulation and dispatched (with other instructions in some embodi software. ments) to instruction execution units 108. Typically several 0.125. A specific instruction being emulated is decoded, execution units 108 are employed, for example an arithmetic and a subroutine called to perform the function of the indi execution unit, a floating point execution unit and a branch vidual instruction. An emulation Software function emulating instruction execution unit. The instruction is executed by the a function of an emulated processor is implemented, for execution unit, accessing operands from instruction specified example, in a “C” subroutine or driver, or some other method registers or memory as needed. If an operand is to be accessed of providing a driver for the specific hardware as will be (loaded or stored) from memory 102, a load store unit 105 within the skill of those in the art after understanding the typically handles the access under control of the instruction description of the preferred embodiment. Various software being executed. and hardware emulation patents including, but not limited to I0129. In an embodiment, the invention may be practiced U.S. Pat. No. 5,551,013 for a “Multiprocessor for hardware by software (sometimes referred to Licensed Internal Code emulation of Beausoleil et al., and U.S. Pat. No. 6,009,261: (LIC), firmware, micro-code, milli-code, pico-code and the Preprocessing of stored target routines for emulating incom like, any of which would be consistent with the present inven patible instructions on a target processor of Scalzi et al; and tion). Software program code which embodies the present U.S. Pat. No. 5,574,873: Decoding guest instruction to invention is typically accessed by the processor also known as directly access emulation routines that emulate the guest a CPU (Central Processing Unit) 101 of computer system 100 instructions, of Davidian et al; U.S. Pat. No. 6,308.255: Sym from long term storage media 111, such as a CD-ROM drive, metrical multiprocessing bus and chipset used for coproces tape drive or hard drive. The software program code may be sor Support allowing non-native code to run in a system, of embodied on any of a variety of known media for use with a Gorishek et al; and U.S. Pat. No. 6,463,582: Dynamic opti data processing system, Such as a diskette, hard drive, or mizing object code translator for architecture emulation and CD-ROM. The code may be distributed on such media, or dynamic optimizing object code translation method of Lethin may be distributed to users from the computer memory 102 or et al; and U.S. Pat. No. 5,790,825: Method for emulating storage of one computer system over a network 110 to other guest instructions on a host computer through dynamic computer systems for use by users of Such other systems. recompilation of host instructions of Eric Traut, and many 0.130. Alternatively, the program code may be embodied others, illustrate the a variety of known ways to achieve in the memory 102, and accessed by the processor 101 using emulation of an instruction format architected for a different the processor bus. Such program code includes an operating US 2013/0O24659 A1 Jan. 24, 2013

system which controls the function and interaction of the CPU Topology Overview: various computer components and one or more application programs. Program code is normally paged from dense Stor 0.135 With the advent of the new IBM eSeries main age media 111 to high speed memory 102 where it is available frames, and even previously, machine organization into nodal for processing by the processor 101. The techniques and structures has resulted in a non-uniform memory access (NUMA) behavior (sometimes also called “lumpiness'). The methods for embodying software program code in memory, purpose of the new SYSIB 15.1.2 function of the prior art on physical media, and/or distributing Software code via net STSI (Store System Information) instruction and the new works are well known and will not be further discussed PERFORM TOPOLOGY FUNCTION (PTF) instruction is herein. Program code, when created and stored on a tangible to provide additional machine topology awareness to the pro medium (including but not limited to electronic memory gram so that certain optimizations can be performed (includ modules (RAM), flash memory, compact discs (CDs), DVDs, ing improved cache-hit ratios) and thereby improve overall magnetic tape and the like is often referred to as a “computer performance. The amount of host-CPU resource assigned to a program product. The computer program product medium is multiprocessing (MP) guest configuration has generally been typically readable by a processing circuit preferably in a spread evenly across the number of configured guest CPUs. computer system for execution by the processing circuit. (A guest CPU is a logical CPU provided to a program, all guest CPUs are supported by software/hardware partitioning 0131. In FIG. 2, an example emulated host computer sys on actual host CPUs). Such an even spread implies that no tem 201 is provided that emulates a host computer system 100 particular guest CPU (or CPUs) are entitled to any extra of a host architecture. In the emulated host computer system host-CPU provisioning than any other arbitrarily determined 201, the host processor (CPUs) 208 is an emulated host pro guest CPUs. This condition of the guest configuration, affect cessor (or virtual host processor) and comprises an emulation ing all CPUs of the configuration, is called “horizontal polar processor 207 having a different native instruction set archi ization’. Under horizontal polarization, assignment of a host tecture than that of used by the processor 101 of the host CPU to a guest CPU is approximately the same amount of computer 100. The emulated host computer system 201 has provisioning for each guest CPU. When the provisioning is memory 202 accessible to the emulation processor 207. In the not dedicated, the same host CPUs provisioning the guest example embodiment, the memory 202 is partitioned into a CPUs also may be used to provision guest CPUs of another host computer memory 102 portion and an emulation routines guest, or even other guest CPUs of the same guest configu 203 portion. The host computer memory 102 is available to ration. programs of the emulated host computer 201 according to 0.136. When the other guest configuration is a different host computer architecture. The emulation processor 207 logical partition, a host CPU, when active in each partition, executes native instructions of an architected instruction set typically must access main storage more because the cache of an architecture other than that of the emulated processor hit ratio is reduced by having to share the caches across 208, the native instructions obtained from emulation routines multiple relocation Zones. If host-CPU provisioning can alter memory 203, and may access a host instruction for execution the balance such that some host CPUs are mostly, or even from a program in host computer memory 102 by employing exclusively, assigned to a given guest configuration (and that one or more instruction(s) obtained in a Sequence & Access/ becomes the normal behavior), then cache-hit ratios improve, Decode routine which may decode the host instruction(s) as does performance. Such an uneven spread implies that one accessed to determine a native instruction execution routine or more guest CPUs are entitled to extra host-CPU provision for emulating the function of the host instruction accessed. ing versus other, arbitrarily determined guest CPUs that are entitled to less host-CPU provisioning. This condition of the 0132 Other facilities that are defined for the host com guest configuration, affecting all CPUs of the configuration, puter system 100 architecture may be emulated by Archi is called “vertical polarization'. The architecture presented tected Facilities Routines, including such facilities as General Purpose Registers, Control Registers, Dynamic Address herein categorizes vertical polarization into three levels of Translation, and I/O Subsystem Support and processor cache entitlement of provisioning, high, medium, and low: for example. The emulation routines may also take advantage 0.137 High entitlement guarantees approximately of function available in the emulation processor 207 (such as 100% of a host CPU being assigned to a logical/virtual General Registers and dynamic translation of virtual CPU, and the affinity is maintained as a strong corre addresses) to improve performance of the emulation routines. spondence between the two. With respect to provision Special hardware and Off Load Engines may also be provided ing of a logical partition, when Vertical polarization is in to assist the processor 207 in emulating the function of the effect, the entitlement of a dedicated CPU is defined to host computer 100. be high. 0.138 Medium entitlement guarantees an unspecified 0.133 U.S. patent application Ser. No. 1 1/972,734 amount of host CPU resource (one or more host CPUs) ALGORITHM TO SHARE PHYSICAL PROCESSORS being assigned to a logical/virtual CPU, and any remain TO MAXIMIZE PROCESSOR CACHE USAGE AND ing capacity of the host CPU is considered to be slack TOPOLOGIES” filed on the same day as the present inven that may be assigned elsewhere. The best case for the tion, describes creation of topology information and is incor available slack would be to assign it as local slack if that porated herein in its entirety by reference. is possible. A less-beneficial result occurs if that avail 0134. In order to provide topology information to pro able slack is assigned as remote slack. It is also the case grams, two instructions are provided. The first is an enhance that the resource percentage assigned to a logical CPU of ment to a prior art instruction STSI (Store System Informa medium entitlement is a much softer approximation as tion) and the second is a new instruction PTF (Perform compared to the 100% approximation of a high-entitle Topology Function). ment setting. US 2013/0O24659 A1 Jan. 24, 2013

0.139 Low entitlement guarantees approximately 0% of mately 100% of the assigned physical-CPU resource (high a host CPU being assigned to a logical/virtual CPU. entitlement) while the remaining two logical CPUs would not However, if slack is available, such a logical/virtual CPU normally be dispatched (low entitlement). Such resource may still receive some CPU resource. A model of nested assignment is normally an approximation. Even a low-entitle containers using polarization is intended to provide a ment CPU may receive some amount of resource if only to level of intelligence about the machine's nodal structure help ensure that a program does not get stuck on Such a CPU. as it applies to the requesting configuration, so that, By providing a means for a control program to indicate that it generally, clusters of host CPUs can be assigned to clus understands polarization, and to receive an indication for ters of guest CPUs, thereby improving as much as pos each CPU of its polarization and, if vertical polarization, the sible the sharing of storage and the minimizing of dif degree of entitlement, the control program can make more ferent configurations essentially colliding on the same intelligent use of data structures that are generally thought to host CPUs. Polarization and entitlement indicate the be local to a CPU vs. available to all CPUs of a configuration. relationship of physical CPUs to logical CPUs or logical Also, such a control program can avoid directing work to any CPUs to virtual CPUs in a guest configuration, and how low-entitlement CPU. The actual physical-CPU resource the capacity assigned to the guest configuration is appor assigned might not constitute an integral number of CPUs, so tioned across the CPUs that comprise the configuration. there is also the possibility of one or more CPUs in an MP 0140. Historically, a guest configuration has been horizon Vertically-polarized configuration being entitled but not to a tally polarized. For however many guest CPUs were defined high degree, thereby resulting in such CPUs having either to the configuration, the host-CPU resource assigned was medium or low vertical entitlement. It is possible for any spread evenly across all of the guest CPUs in an equitable, remaining low-entitlement CPUs to receive some amount of non-entitled manner. It can be said that the weight of a single host-CPU resource. For example, this may occur when such a logical CPU in a logical partition when horizontal polariza CPU is targeted, such as via a SIGP order and slack host-CPU tion is in effect is approximately equal to the total configura resource is available. Otherwise, such a logical/virtual CPU tion weight divided by the number of CPUs. However, with might remain in an un-dispatched State, even if it is otherwise the introduction of the 2097 and family models, it becomes capable of being dispatched. imperative to be able to spread the host-CPU resource in a 0144 Preferably, according to the invention, a 2-bit polar different manner, which is called vertical polarization of a ization field is defined for the new CPU-type “topology-list configuration, and then the degree of provisioning of guest entry” (TLE) of the STORE SYSTEM INFORMATION CPUs with host CPUs being indicated as high, medium, or (STSI) instruction. The degree of vertical-polarization low entitlement. High entitlement is in effect when a logical/ entitlement for each CPU is indicated as high, medium, or virtual CPU of a vertically-polarized configuration is entirely low. The assignment is not a precise percentage but rather is backed by the same host CPU. Medium entitlement is in Somewhat fuZZy and heuristic. effect when a logical/virtual CPU of a vertically-polarized 0145. In addition to vertical polarization as a means of configuration is partially backed by a host CPU. Low entitle reassigning weighting to guest CPUs, another concept exists, ment is in effect when a logical/virtual CPU of a vertically which is the creation and management of slack capacity (also polarized configuration is not guaranteed any host-CPU called “white space'). Slack capacity is created under the resource, other than what might become available due to slack following circumstances: resource becoming available. 0146 A high-vertical CPU contributes to slack when its average utilization (AU) falls below 100 percent (100-AU). CPU Slack: 0147 A medium-vertical CPU that has an assigned provi 0141 CPU resource, there are two kinds of slack CPU sioning of M percent of a host CPU contributes to slack when SOUC its average utilization (AU) falls below M percent (M-AU>0). 0142 Local slack becomes available when a logical/vir 0.148. A low-vertical CPU does not contribute to slack. tual CPU of a configuration is not using all the resource to 0149. A high-vertical CPU is not a consumer of slack. which it is entitled and such slack is then used within the configuration of that CPU. Local slack is preferred over Store System Information Instruction: remote slack as better hit ratios on caches are expected when 0150. An example embodiment of a format of a Store the slack is used within the configuration. System Information instruction FIG. 3 comprises an opcode 0143 Remote slack becomes available when a logical/ field B27D, a base register field B2 and a signed displace virtual CPU of a configuration is not using all the resource to ment field D2. The instruction opcode informs the machine which it is entitled and such slack is then used outside the executing the instruction that there are implied general reg configuration of that CPU. Remote slack is expected to isters '0' and 1 associated with the instruction. An address is exhibit lower hit ratios on caches, but it is still better than not obtained of a second operand by adding the signed displace running a logical/virtual CPU at all. ment field value to the contents of the general register speci The goal is to maximize the CPU cache hit ratios. For a logical fied by the base field. In an embodiment, when the base partition, the amount of physical-CPU resource is determined register field is 0, the sign extended value of the displace by the overall system weightings that determine the CPU ment field is used directly to specify the second operand. resource assigned to each logical partition. For example, in a When the Store System Information instruction is fetched and logical 3-way MP that is assigned physical-CPU resource executed, depending on a function code in general register 0, equivalent to a single CPU, and is horizontally polarized, either an identification of the level of the configuration each logical CPU would be dispatched independently and executing the program is placed in general register 0 or infor thus receive approximately 33% physical-CPU resource. If mation about a component or components of a configuration the same configuration were to be vertically polarized, only a is stored in a system-information block (SYSIB). When infor single logical CPU would be run and would receive approxi mation about a component or components is requested, the US 2013/0O24659 A1 Jan. 24, 2013

information is specified by further contents of general register Valid Function Code: 0 and by contents of general register 1. The SYSIB, if any, is 0156 When the function code is equal to or less than the designated by the second-operand address. The machine is number of the current level, it is called valid. In this case, bits considered to provide one, two, or three levels of configura 36-55 of general register 0 and bits 32-47 of general register tion. The levels are: 1 must be Zero or 15; otherwise, a specification exception is 0151 1- The basic machine, which is the machine as if it recognized. Bits 0-31 of general registers 0 and 1 always are were operating in the basic or non-interpretive instruction ignored. When the function code is 0, an unsigned binary execution mode. integer identifying the current configuration level (1 for basic 0152 2. A logical partition, which is provided if the machine, 2 for logical partition, or 3 for virtual machine) is machine is operating in the LPAR (logically partitioned, placed in bit positions 32-35 of general register 0, the condi interpretive instruction-execution) mode. A logical partition tion code is set to 0, and no further action is performed. When is provided by the LPAR hypervisor, which is a part of the the function code is valid and nonzero, general registers 0 and machine. A basic machine exists even when the machine is 1 contain additional specifications about the information operating in the LPAR mode. requested, as follows: 0153. 3—A virtual machine, which is provided by a vir 0157 Bit positions 56-63 of general register 0 contain an tual machine (VM) control program that is executed either by unsigned binary integer, called selector 1, that specifies a the basic machine or in a logical partition. A virtual machine component or components of the specified configuration. may itself execute a VM control program that provides a 0158 Bit positions 48-63 of general register 1 contain an higher-level (more removed from the basic machine) virtual unsigned binary integer, called selector 2, that specifies the machine, which also is considered a level-3 configuration. type of information requested. The terms basic mode, LPAR mode, logical partition, hyper 0159. The contents of general registers 0 and 1 are shown visor, and virtual machine, and any other terms related spe in FIG. 4. cifically to those terms, are not defined in this publication; (0160. When the function code is valid and nonzero, infor they are defined in the machine manuals. A program being mation may be stored in a system-information block (SYSIB) executed by a level-1 configuration (the basic machine) can beginning at the second-operand location. The SYSIB is 4K request information about that configuration. A program bytes and must begin at a 4K-byte boundary; otherwise, a being executed by a level-2 configuration (in a logical parti specification exception may be recognized, depending on tion) can request information about the logical partition and selector 1 and selector 2 and on whether access exceptions are about the underlying basic machine. A program being recognized due to references to the SYSIB. executed by a level-3 configuration (a virtual machine) can Selector 1 can have Values as Follows: request information about the virtual machine and about the one or two underlying levels; a basic machine is always underlying, and a logical partition may or may not be between the basic machine and the virtual machine. When information Selector 1 Information Requested about a virtual machine is requested, information is provided O None: selector is reserved about the configuration executing the program and about any 1 Information about the configuration level specified by the function code underlying level or levels of virtual machine. In any of these 2 Information about one or more CPUs in the specified cases, information is provided about a level only if the level configuration level implements the instruction. 3-255 None: selectors are reserved 0154 The function code determining the operation is an unsigned binary integer in bit positions 32-35 of general 0.161 When selector 1 is 1, selector 2 can have values as register 0 and is as follows: follows:

Function Selector 2 when Code Information Requested: Selector 1 Is 1 Information Requested O Current-configuration-level number O None: selector is reserved 1 Information about level 1 (the basic machine) 1 Information about the specified configuration level 2 Information about level 2 (a logical partition) 2 Topology information about the specified 3 Information about level 3 (a virtual machine) configuration level 4-14 None: codes are reserved 3-65,535 None: selectors are reserved 15 Current-configuration-level information (0162 When selector 1 is 2, selector 2 can have values as follows: Invalid Function Code: 0155 The level of the configuration executing the pro gram is called the current level. The configuration level speci Selector 2 when fied by a nonzero function code is called the specified level. Selector 1 Is 2 Information Requested When the specified level is numbered higher than the current O None: selector is reserved level, then the function code is called invalid, the condition 1 Information about the CPU executing the program in code is set to 3, and no other action (including checking) is the specified configuration level performed. US 2013/0O24659 A1 Jan. 24, 2013 11

-continued necessary. (This is called the model number in programming note 4 on page 10-111 of STORE CPUID.) When word 25 is Selector 2 when binary zeros, the contents of words 16-19 represent both the Selector 1 Is 2 Information Requested model-capacity identifier and the model. 2 Information about all CPUs in the specified configuration level Programming Notes: 3-65,535 None: selectors are reserved (0176 1. The fields of the SYSIB 1.1.1 are similar to those of the node descriptor described in the publication Common 0163 Only certain combinations of the function code, I/O-Device Commands and Self Description. However, the selector 1, and selector 2 are valid, as shown in FIG. 5. contents of the SYSIB fields may not be identical to the 0164. When the specified function-code, selector-1, and contents of the corresponding node-descriptor fields because selector-2 combination is invalid (is other than as shown in the SYSIB fields: FIG. 5), or if it is valid but the requested information is not 0177 Allow more characters. available because the specified level does not implement or 0.178 Are more flexible regarding the type of characters does not fully implement the instruction or because a neces allowed. sary part of the level is uninstalled or not initialized, and (0179 Provide information that is justified differently provided that an exception is not recognized, the condition within the field. code is set to 3. When the function code is nonzero, the 0180 May not use the same method to determine the con combination is valid, the requested information is available, tents of fields such as the sequence code field. and there is no exception, the requested information is stored 2. The model field in a node descriptor corresponds to the in a system-information block (SYSIB) at the second-oper content of the STSI model field and not the STSI model and address. capacity-identifier field. (0165 Some or all of the SYSIB may be fetched before it is 3. The model field specifies the model of the machine (i.e., the stored. physical model); the model capacity identifier field specifies (0166 A SYSIB may be identified in references by means a token that may be used to locate a statement of capacity or of “SYSIBfe.s1.s2, where “fic,” “sland “s2' are the values performance in the System Library publication for the model. of a function code, selector 1, and selector 2, respectively. (0167 Following sections describe the defined SYSIBs by SYSIB 1.2.1 (Basic-Machine CPU) means of figures and related text. In the figures, the offsets shown on the left are word values (a word comprising 4 0181 SYSIB 1.2.1 has the format shown in FIG. 7 where bytes). “The configuration” refers to the configuration level the fields have the following meaning: specified by the function code (the configuration level about 0182 Reserved: The contents of words 0-19, bytes 0 and 1 which information is requested). of word 25, and words 26-63 are reserved and are stored as Zeros. The contents of words 64-1023 are reserved and may SYSIB 1.1.1 (Basic-Machine Configuration) be stored as Zeros or may remain unchanged. 0183 Sequence Code: Words 20-23 contain the 16-char (0168 SYSIB 1.1.1 has the format shown in FIG. 6, where acter (0-9 or uppercase A-Z) EBCDIC sequence code of the the fields have the following meanings: configuration. The code is right justified with leading (0169. Reserved: The contents of words 0–7, 13-15, and EBCDIC Zeros if necessary. 29-63 are reserved and are stored as Zeros. The contents of 0.184 Plant of Manufacture: Word 24 contains the four words 64-1023 are reserved and may be stored as zeros or character (0-9 or uppercase A-Z) EBCDIC code that identi may remain unchanged. fies the plant of manufacture for the configuration. The code 0170 Manufacturer: Words 8-11 contain the 16-character is left justified with trailing blanks if necessary. (0-9 or uppercase A-Z) EBCDIC name of the manufacturer of the configuration. The name is left justified with trailing 0185 CPU Address: Bytes 2 and 3 of word 25 contain the blanks if necessary. CPU address by which this CPU is identified in a multipro (0171 Type: Word 12 contains the four-character (0-9) cessing configuration. The CPU address is a 16-bit unsigned EBCDIC type number of the configuration. (This is called the binary integer. The CPU address is the same as is stored by machine-type number in the definition of STORE CPUID.) STORE CPU ADDRESS when the program is executed by a (0172 Model-Capacity Identifier: Words 16-19 contain the machine operating in the basic mode. 16-character (0-9 or uppercase A-Z) EBCDIC model-capac 0186 Programming Note: ity identifier of the configuration. The model-capacity iden 0187 Multiple CPUs in the same configuration have the tifier is left justified with trailing blanks if necessary. same sequence code, and it is necessary to use other informa 0173 Sequence Code: Words 20-23 contain the 16-char tion, such as the CPU address, to establish a unique CPU acter (0-9 or uppercase A-Z) EBCDIC sequence code of the identity. The sequence code returned for a basic-machine configuration. The sequence code is right justified with lead CPU and a logical-partition CPU are identical and have the ing EBCDIC Zeros if necessary. same value as the sequence code returned for the basic-ma 0.174 Plant of Manufacture: Word 24 contains the four chine configuration. character (0-9 or uppercase A-Z) EBCDIC code that identi fies the plant of manufacture for the configuration. The code SYSIB 1.2.2 (Basic-Machine CPUs) is left justified with trailing blanks if necessary. 0188 The format field in byte 0 of word 0 determines the (0175 Model: When word 25 is not binary zeros, words format of the SYSIB. When the format field has a value of 25-28 contain the 16-character (0-9 or uppercase A-Z) Zero, SYSIB 1.2.2 has a format-0 layout as shown in FIG.8. EBCDIC model identification of the configuration. The When the format field has a value of one, SYSIB 1.2.2 has a model identification is left justified with trailing blanks if format-1 layout as shown in FIG. 9. US 2013/0O24659 A1 Jan. 24, 2013

(0189 Reserved: non-secondary CPUs in the configuration. That is, all non 0190. When the format field contains a value of Zero, the secondary CPUs in the configuration have the same capabil contents of bytes 1-3 of word 0 and words 1-6 are reserved 1ty. and stored as Zeros. When the format field contains a value of one, the contents of byte 1 of word 0 and words 1-6 are Total CPU Count: reserved and stored as Zeros. When fewer than 64 words are 0.195 Bytes 0 and 1 of word 9 contain a 16-bit unsigned needed to contain the information for all the CPUs, the por binary integer that specifies the total number of CPUs in the tion of the SYSIB following the adjustment-factor list in a configuration. This number includes all CPUs in the config format-0 SYSIB or the alternate-adjustment-factor list in a ured state, the standby state, or the reserved state. format-1 SYSIB, up to word 63 are reserved and are stored as Zeros. The contents of words 64-1023 are reserved and may Configured CPU Count: be stored as Zeros or may remain unchanged. When 64 or more words are needed to contain the information for all the 0196. Bytes 2 and 3 of word 9 contain a 16-bit unsigned CPUs, the portion of the SYSIB following the adjustment binary integer that specifies the number of CPUs that are in factor list in a format-0 SYSIB or the alternate-adjustment the configured state. A CPU is in the configured state when it factor list in a format-1 SYSIB, up to word 1023 are reserved is in the configuration and available to be used to execute and may be stored as Zeros or may remain unchanged. programs.

Format: Standby CPU Count: 0191 Byte 0 of word 0 contains an 8-bit unsigned binary 0.197 Bytes 0 and 1 of word 10 contain a 16-bit unsigned integer that specifies the format of SYSIB 1.2.2. binary integer that specifies the number of CPUs that are in the standby state. A CPU is in the standby state when it is in Alternate-CPU-Capability Offset: the configuration, is not available to be used to execute pro grams, and can be made available by issuing instructions to (0192. When the format field has a value of one, bytes 2-3 place it in the configured State. of word 0 contain a 16-bit unsigned binary integer that speci fies the offset in bytes of the alternate-CPU-capability field in Reserved CPU Count: the SYSIB. (0198 Bytes 2 and 3 of word 10 contain a 16-bit unsigned binary integer that specifies the number of CPUs that are in Secondary CPU Capability: the reserved state. A CPU is in the reserved state when it is in 0193 Word 7 contains a 32-bit unsigned binary integer the configuration, is not available to be used to execute pro that, when not Zero, specifies a secondary capability that may grams, and cannot be made available by issuing instructions be applied to certain types of CPUs in the configuration. to place it in the configured State. (It may be possible to place There is no formal description of the algorithm used to gen a reserved CPU in the standby or configured state by means of erate this integer, except that it is the same as the algorithm manual actions.) used to generate the CPU capability. The integer is used as an indication of the capability of a CPU relative to the capability Multiprocessing CPU-Capability Adjustment Factors: of other CPU models, and also relative to the capability of (0199 Beginning with bytes 0 and 1 of word 11, the SYSIB other CPU types within a model. The capability value applies contains a series of contiguous two-byte fields, each contain to each of the CPUs of one or more applicable CPU types in ing a 16-bit unsigned binary integer used to form an adjust the configuration. That is, all CPUs in the configuration of an ment factor (fraction) for the value contained in the CPU applicable type or types have the same capability. When the capability field. Such a fraction is developed by using the value is zero, all CPUs of any CPU type in the configuration value (V) of the first two-byte field according to one of the have the same capability, as specified by the CPU capability. following methods: The secondary CPU capability may or may not be the same value as the CPU-capability value. The multiprocessing (0200. If V is in the range 0

Alternate CPU Capability: greater than one, containing nesting levels above the Magl level are occupied only by container-type TLEs. A dynamic 0204. When the format field has a value of one, if bits 0-8 change to the topology may alter the number of TLES and the of word N are Zero, the word contains a 32-bit unsigned number of CPUs at the Magl level, but the limits represented binary integer (I) in the range 0sI<2 that specifies the by the values of the Mag1-6 fields do not change within a announced capability of one of the CPUs in the configuration. model family of machines. If bits 0-8 of word N are nonzero, the word contains a 32-bit 0214. The topology is a structured list of entries where an binary floating point short-format number instead of a 32-bit entry defines one or more CPUs or else is involved with the unsigned binary integer. Regardless of encoding, a lower nesting structure. The following illustrates the meaning of the value indicates a proportionally higher CPU capacity. magnitude fields: Beyond that, there is no formal description of the algorithm 0215. When all CPUs of the machine are peers and no used to generate this value. The value is used as an indication containment organization exists, other than the entirety of the of the announced capability of the CPU relative to the central-processing complex itself, the value of the nesting announced capability of other CPU models. The alternate level is 1, Magl is the only non-zero magnitude field, and the capability value applies to each of the CPUs in the configu number of CPU-type TLEs stored does not exceed the Mag1 ration. That is, all CPUs in the configuration have the same value. alternate capability. 0216. When all CPUs of the machine are subdivided into peer groups such that one level of containment exists, the Alternate Multiprocessing CPU-Capability Adjustment value of the nesting level is 2, Mag1 and Mag2 are the only Factors: non-Zero magnitude fields, the number of container-type 0205 Beginning with bytes 0 and 1 of word N+1, the TLEs stored does not exceed the Mag2 value, and the number SYSIB contains a series of contiguous two-byte fields, each of CPU-type TLEs stored within each container does not containing a 16-bit unsigned binary integer used to form an exceed the Mag1 value. adjustment factor (fraction) for the value contained in the 0217. The Mag3-6 bytes similarly become used (proceed alternate-CPU-capability field. Such a fraction is developed ing in a right-to-left direction) when the value of the nesting by using the value (V) of the first two-byte field according to level falls in the range 3-6. one of the following methods: 0218 MNest: Byte 3 of word 2 specifies the nesting level 0206. If V is in the range 0

0223 Sibling TLEs have the same value of nesting level CPU Type: which is equivalent to either the value of the nesting level 0235 Byte 1 of word 1 of a CPU-type TLE specifies an minus one of the immediate parent TLE, or the value of 8-bit unsigned binary integer whose value is the CPU type of MNest minus one, because the immediate parent is the topol the one or more CPUs represented by the TLE. The CPU-type ogy list rather than a TLE. value specifies either a primary-CPU type or any one of the 0224 Reserved, 0: For a container-type TLE, bytes 1-3 of possible secondary-CPU types. word 0 and bytes 0-2 of word 1 are reserved and stored as Zeros. For a CPU-type TLE, bytes 1-3 of word 0 and bits 0-4 CPU-Address Origin: of word 1 are reserved and stored as Zeros. 0236 Bytes 2-3 of word 1 of a CPU-type TLE specify a Container ID: 16-bit unsigned binary integer whose value is the CPU address of the first CPU in the range of CPUs represented by 0225 Byte four of word 1 of a container-type TLE speci the CPU mask, and whose presence is represented by the fies an 8-bit unsigned non-Zero binary integer whose value is value of bit position 0 in the CPU mask. A CPU-address the identifier of the container. The container ID for a TLE is origin is evenly divisible by 64. The value of a CPU-address unique within the same parent container. origin is the same as that stored by the STORE CPU ADDRESS (STAP) instruction when executed on the CPU Dedicated (D): represented by bit position 0 in the CPU mask. 0226 Bit 5 of word 1 of a CPU-type TLE, when one, CPU Mask: indicates that the one or more CPUs represented by the TLE 0237 Words 2-3 of a CPU-type TLE specify a 64-bit mask are dedicated. When D is zero, the one or more CPUs of the where each bit position represents a CPU. The value of the TLE are not dedicated. CPU-address origin field plus a bit position in the CPU mask equals the CPU address for the corresponding CPU. When a Polarization (PP): CPU mask bit is zero, the corresponding CPU is not repre 0227 Bits 6-7 of word 1 of a CPU-type TLE specify the sented by the TLE. The CPU is either not in the configuration polarization value and, when polarization is vertical, the or else must be represented by another CPU-type TLE. degree of Vertical polarization also called entitlement (high, 0238 When a CPU mask bit is one, the corresponding medium, low) of the corresponding CPU(s) represented by CPU has the modifier-attribute values specified by the TLE, is the TLE. The following values are used: in the topology of the configuration, and is not present in any other TLE of the topology. PP Meaning: 0239 Thus, for example, if the CPU-address origin is a value of 64, and bit position 15 of the CPU mask is one, CPU 0228 0 The one or more CPUs represented by the TLE are 79 is in the configuration and has the CPU type, polarization, horizontally polarized. entitlement, and dedication as specified by the TLE. 0229. 1 The one or more CPUs represented by the TLE are vertically polarized. Entitlement is low. TLE Ordering: 0230 2 The one or more CPUs represented by the TLE are 0240. The modifier attributes that apply to a CPU-type vertically polarized. Entitlement is medium. TLE are CPU type, polarization, entitlement, and dedication. Polarization and entitlement (for vertical polarization) are 0231. 3 The one or more CPUs represented by the TLE are taken as a single attribute, albeit with four possible values vertically polarized. Entitlement is high. (horizontal, Vertical-high, Vertical-medium, and vertical 0232 Polarization is only significant in a logical and vir low). tual multiprocessing configuration that uses shared host pro 0241. A single CPUTLE is sufficient to represent as many cessors and addresses how the resource assigned to a configu as 64 CPUs that all have the same modifier-attribute values. ration is applied across the CPUs of the configuration. When 0242. When more than 64 CPUs exist, or the entire range horizontal polarization is in effect, each CPU of a configura of CPU addresses are not covered by a single CPU-address tion is guaranteed approximately the same amount of origin, and the modifier attributes are constant, a separate resource. When vertical polarization is in effect, CPUs of a sibling CPU TLE is stored for each CPU-address origin, as configuration are classified into three levels of resource necessary, in ascending order of CPU-address origin. Each entitlement: high, medium, and low. such TLE stored has at least one CPU represented. The col 0233. Both subsystem reset and successful execution of lection of one or more such CPU TLEs is called a CPU-TLE the SIGP set-architecture order specifying ESA/390 mode Set. place a configuration and all of its CPUs into horizontal 0243. When multiple CPU types exist, a separate CPU polarization. The CPUs immediately affected are those that TLE set is stored for each, in ascending order of CPU type. are in the configured state. When a CPU in the standby state 0244. When multiple polarization-and-entitlement values is configured, it acquires the current polarization of the con exist, a separate CPU-TLE set is stored for each, in descend figuration and causes a topology change of that configuration ing order of polarization value and degree (vertical high, to be recognized. medium, low, then horizontal). When present, all polarization 0234. A dedicated CPU is either horizontally or vertically CPU-TLE sets of a given CPU type are stored before the next polarized. When a dedicated CPU is vertically polarized, CPU-TLE set of the next CPU type. entitlement is always high. Thus, when D is one, PP is either 0245. When both dedicated and not-dedicated CPUs exist, 00 binary or 11 binary. a separate CPU-TLE set is stored for each, dedicated appear US 2013/0O24659 A1 Jan. 24, 2013

ing before not-dedicated. All TLES are ordered assuming a (0277 6. When the first TLE at a nesting level is a CPU depth-first traversal where the sort order from major to minor entry, the maximum nesting level 0 has been reached. is as follows: 0278 Programming Note: 0246 1. CPU Type 0279 A possible examination process of a topology list is 0247 a. Lowest CPU-type value described. Before an examination of a topology list is begun, 0248 b. Highest CPU-type value the current-TLE pointer is initialized to reference the first or 0249 2. Polarization-Entitlement top TLE in the topology list, the prior-TLE pointer is initial (0250 a. Vertical high ized to null, and then TLEs are examined in a top-to-bottom 0251 b. Vertical medium order. 0252 c. Vertical low 0280. As a topology-list examination proceeds, the cur 0253 d. Horizontal rent-TLE pointer is advanced by incrementing the current 0254 3. Dedication (When Applicable) TLE pointer by the size of the current TLE to which it points. 0255 a. Dedicated A container-type TLE is advanced by adding eight to the 0256 b. Not dedicated current-TLE pointer. A CPU-type TLE is advanced by adding 0257 The ordering by CPU-address origin and modifier sixteen to the current-TLE pointer. The process of advancing attributes of sibling CPU TLEs within a parent container is the current-TLE pointer includes saving its value as the prior done according to the following list, which proceeds from TLE pointer just before it is incremented. TLE examination is highest to lowest. not performed if the topology list has no TLEs. (0258 1. CPU-TLE set of lowest CPU-type value, ver 0281. The examination process is outlined in the follow tical high, dedicated ing steps: (0259 2. CPU-TLE set of lowest CPU-type value, ver 1. If the current-TLE nesting level is Zero, and the prior-TLE tical high, not-dedicated nesting level is null or one, the current TLE represents the first 0260. 3. CPU-TLE set of lowest CPU-type value, ver CPU-type TLE of a group of one or more CPU-type TLEs. tical medium, not-dedicated The program should perform whatever action is appropriate 0261 4. CPU-TLE set of lowest CPU-type value, ver for when a new group of one or more CPUs is first observed. tical low, not-dedicated Go to step 5. 0262 5. CPU-TLE set of lowest CPU-type value, hori 2. If the current-TLE nesting level is zero, and the prior-TLE Zontal, dedicated nesting level is Zero, the current TLE represents a Subsequent 0263. 6. CPU-TLE set of lowest CPU-type value, hori CPU-type TLE of a group of CPU-type TLEs that represent Zontal, not-dedicated siblings of the CPUs previously observed in steps 1 or 2. The 0264 7. CPU-TLE set of highest CPU-type value, ver program should perform whatever action is appropriate for tical high, dedicated when the size of an existing sibling group of one or more 0265 8. CPU-TLE set of highest CPU-type value, ver CPUs is increased. Go to step 5. 3. If the current-TLE nesting level is not zero, and the prior tical high, not-dedicated TLE nesting level is Zero, the prior TLE represents a last or 0266 9. CPU-TLE set of highest CPU-type value, ver only CPU-type TLE of a group of one or more CPU-type tical medium, not-dedicated TLEs. The program should perform whateveraction is appro 0267 10. CPU-TLE set of highest CPU-type value, ver priate for when an existing group of one or more CPUs is tical low, not-dedicated completed. Go to step 5. 0268 11. CPU-TLE set of highest CPU-type value, 4. Go to step 5. horizontal, dedicated 0282. By elimination, this would be the case where the 0269 12. CPU-TLE set of highest CPU-type value, current-TLE nesting level is not zero, and the prior-TLE horizontal, not-dedicated nesting level is not zero. If the current-TLE nesting level is less than the prior-TLE nesting level, the direction of topol Other TLE Rules: ogy-list traversal is toward a CPU-type TLE. If the current 0270) 1. A container-type TLE is located at nesting lev TLE nesting level is greater than the prior-TLE nesting level, els in the range 1-5. the direction of topology-list traversal is away from a CPU (0271 2. A CPU-type TLE is located at nesting level 0. type TLE. Container-type TLEs are being traversed leading to 0272. 3. The number of sibling container-type TLEs in either (1) another group of CPU-type TLEs that are a separate a topology list or a given parent container does not group in the overall topology, or (2) the end of the topology exceed the value of the magnitude byte (Mag2-6) of the list. In either case, no particular processing is required beyond nesting level corresponding to the siblings. advancing to the next TLE. (0273 4. The number of CPUs represented by the one or 5. Advance to the next TLE position based upon the type of more CPU-type TLEs of the parent container does not the current TLE. If the advanced current-TLE pointer is exceed the value of the Mag1 magnitude byte. equivalent to the end of the topology list: 0274 5. The content of a TLE is defined as follows: 0283 a. No more TLEs of any type exist. 0275 a. If a TLE is a container-type TLE, the content 0284 b. If the prior-TLE nesting level is zero, the program is a list that immediately follows the parent TLE, should perform whatever action is appropriate for when an comprised of one or more child TLEs, and each child existing group of one or more CPUs is completed. TLE has a nesting level of one less than the nesting 0285 c. The examination is complete. level of the parent TLE or topology-list end. 0286. Otherwise, go to step 1. (0276 b. If a TLE is a CPU-type TLE, the content is 0287. In an example implementation, FIG.16, a computer one or more CPUs, as identified by the other fields of system comprises two physical frames (Frame 1 and Frame a CPU TLE 2). Each frame contains two logic boards (Board 1 and Board US 2013/0O24659 A1 Jan. 24, 2013

2), main storage (Memory), and I/O adapters (I/O Channels 1 0297. In our example, the 3 CPUs are contained in two and I/O Channels 2) for communicating with peripheral modules on the same board, therefore the following 4 TLE devices and networks. Each Board contains two multi-chip entries are provided: modules (Mod1, Mod 2, Mod3 and Mod 4). Each multi-chip 0298 1. NL=1; CtnrID=1 (container 6) module contains two processors (CPU1, CPU2, CPU3, 0299 2. NL=0; CPU type=1 (the type of CPUs); CPU CPU4, CPUS, CPU6 CPU7 and CPU8). Each module also Addr Origin=0; CPU Mask 0110... 00 (CPU1 and CPU2 of contains a level-2 Cache (Cache 1, Cache 2, Cache 3 and the addressed CPUs) Cache 4). Each processor (Central Processing Unit or CPU), (0300 3. NL=1; CtnrID=2 (container 7) includes a level-1 Cache and Translation-Lookaside-Buffer 0301 4. NL=0; CPU type=1 (the type of CPUs); CPU (TLB). The TLB buffers address translation information dur Addr Origin=0; CPU Mask 00010 . . . 00 (CPU3 of the ing dynamic address translation. addressed CPUs) 0288. According to the invention, FIGS. 17, 18 and 19 the 0302) Thus, the program has a representation of the topol components of the computer system areassigned to “contain ogy based on the container and CPUTLE's returned. ers' according to proximity. Each module is assigned to inner-most containers 6, 7, 8 and 9 as the CPU's of each Perform Topology Function (PTF) module are in most close proximity to each other. Since the modules are packaged on boards, the modules of the respec (0303 Referring to FIG. 13, a Perform Topology Function tive boards are assigned to next level containers 4 and 5. The (PTF) instruction preferably comprises an opcode and a reg next higher proximity grouping is the Frame. The boards of a ister field R1 identifying a first operand. The contents of the frame are assigned to a container representing the frame general register identified by R1 specify a function code in bit (containers 2 and 3). The whole system is represented by positions 56-63, as illustrated in FIG. 14. container 1. 0304. The defined function codes are as follows: 0289. It can be seen that two CPUs on the same module such as CPU1 and CPU2 are in closest topological relation ship or distance to each other and that both reside in one FC Meaning container (container 6) and no container boundary is crossed O Requesthorizontal polarization. when only those CPUs are involved. However, if CPU1 and 1 Request vertical polarization. CPU8 are involved, there are 4 container boundaries crossed. 2 Check topology-change status. CPU1 in container 6 is separated by container 4,5 and 9 from CPU8. Therefore, knowing the container structure, a user can 0305 Undefined function codes in the range 0-255 are get a view of the topology of the system. reserved for future extensions. 0290 Of course, in a logically partitioned system, CPUs 0306 Upon completion, if condition code 2 is set, a reason may be shared between operating systems as previously dis code is stored in bit positions 48-55 of general register R1. cussed. Therefore, ifa logical partition was assigned 3 logical Bits 16-23 and 28-31 of the instruction are ignored. CPUs, each logical CPU assigned to 20% of each of three real CPUs, the partition would perform best if the 3 real CPUs Operation of Function Code 0: were in closest proximity to each other as communication between CPUs and CPU resources (cache and memory for 0307 Execution completes with condition code for any of example) would perform best. In our example, CPU1 and the following reasons and (reason codes): CPU2 in a partition would experience less thrashing of cache No reason specified (O). lines in Cache 1 than if the two CPUs were CPU1 and CPU8. The requesting configuration is already horizontally polar 0291. In the Example, a partition is created including ized (1). CPU1, CPU2 and CPU3. A program operating in the partition A topology change is already in process (2). issues a STSI instruction and a SYSIB 15.1.2 (FIG. 10) table Otherwise, a process is initiated to place all CPUs in the is returned to the program. In this case, partition comprises configuration into horizontal polarization. container 6 and container 7 and therefore there are two levels 0308 Completion of the process is asynchronous with of nesting. The SYSIB table values are: respect to execution of the instruction and may or may not be 0292. The MNest field is set to 4 indicating that the topol completed when execution of the instruction completes. ogy includes 4 levels of nesting, which is an absolute maxi mum nesting for the model, and may or may not be fully Resulting Condition Code: exploited on any arbitrary topology list, depending on 0309 0 Topology-change initiated resource assignment to the guest configuration issuing STSI, 1— and how resources are assigned to other guest configurations 2 request rejected of the system. 0293. The Magl field is set to 2 indicating 2 CPUs are available at the most primitive first level. Operation of Function Code 1: 0294 The Mag2 field is set to 2 indicating that 2 second 0310 Execution completes with condition code 2 for any level (Module) containers are available. of the following reasons and (reason codes): 0295) The Mag3 field is set to 2 indicating 2 third level No reason specified (O). (Boards) are available. The requesting configuration is already Vertically polarized 0296. The Mag4 field is set to 2 indicating 2 fourth level (1). (Frames) are available. A topology change is already in process (2). US 2013/0O24659 A1 Jan. 24, 2013

0311. Otherwise, a process is initiated to place all CPUs in tions to it are not keeping pace with increases in processor the configuration into Vertical polarization. Completion of the speed. While one might want to try to pack everything closer process is asynchronous with respect to execution of the together "on chip’ or the like, power consumption and cool instruction and may or may not be completed when execution ing issues run counter to this. of the instruction completes. 0320. With the introduction of Z990, LPAR became aware of the machine topology and began optimizing the allocation Resulting Condition Code: of logical partition CP and storage resources to the physical 0312 0 Topology-change initiated resources. Enhancements to the capabilities for dynamically re-optimizing logical partition resource assignments were 2 Request rejected introduced with z9 GA-1 primarily in support of concurrent 3— book repair. 0313 Operation of Function Code 2: The topology 0321. The new support discussed here addresses the true change-report-pending condition is checked, and the instruc start of having zSeries OS software become aware of the tion completes with the condition code set. topology of the machine, presented as a logical partition topology, to then provide affinity dispatching with regards to Resulting Condition Code: CPU placement in the CEC book structure. 0314 0 Topology-change-report not pending Topology 0322 You can think of the way zSeries LPAR manages 1 change report pending shared logical partitions today as being horizontally polar ized. That is, the processing weight for the logical partition is 3— equally divided between all the online logical CPs in the 0315 A topology change is any alteration such that the logical partition. This support introduces a new, optional form contents of a SYSIB 15.1.2 would be different from the of polarization for managing the shared logical CPs of a contents of the SYSIB 15.1.2 prior to the topology change. logical partition called vertical polarization. 0316 A topology-change-report-pending condition is cre 0323 When a logical partition chooses to run in vertical ated when atopology-change process completes. A topology mode, Software issues a new instruction to inform the ZSeries change-report-pending condition is cleared for the configu hypervisor of this and the hypervisor will change how it ration when any of the following is performed: dispatches the logical partition. Execution of PERFORM TOPOLOGY FUNCTION speci 0324 Depending on the configuration of the vertical logi fies function-code 2 that completes with condition code 1. cal partition, logical processors would have high, medium or STORE SYSTEM.INFORMATION for SYSIB 15.12iS Suc low polarity. Polarity describes the amount of physical pro cessfully executed by any CPU in the configuration. cessor share vertical logical processors are entitled to. Cus Subsystem reset is performed. tomers define weights for logical partitions which effectively defines the amount of physical processor cycles each logical Special Conditions: partition in a machine is entitled to. 0325 Polarity is measured by the ratio of a logical parti 0317. If bit positions 0-55 of general register R1 are not tion’s current weight to the number of logical processors Zeros, a specification exception is recognized. If an undefined configured to the logical partition. High polarity processors function code is specified, a specification exception is recog have close to 100% CPU share. Medium Polarity processors nized. have >0 to 99% shares and low polarity processors have 0% share (or very close to it). High polarity logical CPs will be Program Exceptions: assigned a physical processor to run on very similar to dedi 0318 Operation (Configuration-topology facility is not cated CPs but the shared high polarity CP can still give up the installed) physical resource and allow other shared CPs to use its excess cycles. The key here then becomes that software sees the Privileged Operation logical topology and tries to exploit the highly polarized logical CPs for its work queues. Specification 0326 For example, a customer configures a three-way processor with 2 logical partitions, each with 2 logical pro A Mainframe Example Environment: cessors and each with a weight of 50. If the first logical 0319. As high end server architectures increase the num partition defined itself as vertical, it would have 1 high and 1 ber of physical processors and processor speeds continue to medium polarity logical CP. improve, the processor “nest' needed to build large machines 0327 Note that when a logical partition chooses to run in continues to be made of smaller building blocks that are more Vertical mode, the entire logical partition runs in Vertical nodal in nature. For instance, while the L2 cache of a z990 or mode. This includes all of its secondary processors such as Z9 machine is fully cache coherent, the fully populated mod ZAAPs (IFAS) and/or ZIPs. It is the responsibility of the els actually have four (4) separate L2s that are connected by customer to define weights to all of the processor types for a fabric to present the appearance of a single L2 cache. The these logical partitions that will achieve the desired level of penalty for going off node to resolve a cache miss continues Vertical processing for each type. to increase. For instance, resolving an L1 miss in a remote L2 is more expensive than resolving it in the local L2. Missing in Logical Partition Topology Support a CP's private, usually on chip, L1 cache is expensive to start with and having to go all the way out to memory can seem like 0328 Establish infrastructure for a logical partition an eternity. The increase in speed of memory and the connec nodal topology. US 2013/0O24659 A1 Jan. 24, 2013

0329 Make any changes needed to LPAR's nodal Store System Information Instruction Mappings assignment algorithms for existing horizontal partitions needed to provide a valid topology. 0343 Add 3 structures to map the 15.1.2 response: 0330 Provide instruction simulation for the new con 1. Mapping for STSI 15.1.2, the Response BlockMapping the figuration topology information block for the Store Sys Topology tem Information (STSI) instruction to provide logical partition nodal topology to the logical partition. 0344 0331 Examine changes to the physical or logical con figuration to determine if a topology change is needed. These can occur when: Dcl 1 syibk1512 char(4096) based (*), 3 : char(2), 0332 A physical CP is added or removed from the 3 syibk1512 length fixed (16), length configuration 3 syibk1512 mag6 fixed (8), 6' level nest 3 syibk1512 mag5 fixed (8), 5th 0333. A logical CP is added or removed from the 3 syibk1512 maga. fixed (8), 4th configuration 3 syibk1512 mag3 fixed (8), 3rd 3 syibk1512 mag2 fixed (8), 2", nodes 0334. A logical partition is activated 3 syibk1512 mag1 fixed (8), 1, cpus 3 : char(1), 0335 A logical partition is deactivated 3 syibk1512 minest fixed (8), nesting level 3 : char(4), 0336. The logical partition weights are changed from 3 Syibk1512 topology list char(O); topology list the HMC/SE 0337 Software initiation of changes a logical parti tions weight. 2. Mapping for a Container-Type TLE for STSI 15.1.2 0338 A logical partition is reset (switch to horizon (0345 tal). 0339. A logical partition switches to ESA/390 mode (switch to horizontal). Dc 1syibk Vcm container char(8) based (*), 3 syibknl fixed (8), nesting level 3 : char(3), Algorithms of the Environment: 3 : char(1), 3 : char(2), 0340. A topology must be assigned to a logical partition 3 Syibk container id fixed (8); node id when it is first activated and then any changes in the nodal topology assigned to a logical partition must result in the logical partition being notified. The results of the nodal topol 3. Mapping for a CPU-Type TLE for ST5115.1.2 ogy must be kept in a convenient new data structure to allow easier queries by the new STSI processing as well as limiting 0346) processing as best as possible when configuration changes are made. This new structure also allows for topology change processing completing in multiple steps with the required Dc 1 Syibk Vcm cpu char(16) based(*), serialization for each step without introducing inconsistent 3 syibknl2 fixed (8), nesting level views of the topology to the logical partition. 3 : char(3), 3 syibk ded polarization bit(8), vcm byte 5 * bit(5), Topology Assignment 5 syibk dedicated bit(1), dedicated bit 5 Syibk polarization bit(2), polarization bits 0341. How the logical topology is chosen, is not important 3 Syibk cputype fixed (8), cputype for this disclosure. Suffice it to say that a determination must 3 Syibk cpuaddrorg fixed (16), address origin be made of how many of each types of logical processors are 3 Syibk cpumask bit(64); cpu mask entry needed and which nodes or books they need to be assigned to. For a vertical mode partition, this means the count of vertical high, Vertical medium, and vertical low processors for each TOPBK Partition Topology processor type. 0347 A summary of a logical partitions topology is kept Assigning Polarization Values to Logical CPS current in this block by the nodal assignment routines. The data in this block is ordered in such a way that STSI process 0342. Once the above counts are determined, the polariza ing can make one pass of the entire structure to create the tion assignments are made from lowest online logical CP logical partition topology response to the program, preserv address of the cp type to highest in the order of (1) all vertical ing the order and separation of CPU entries as required by highs, (2) all vertical mediums, and (3) all verticallows. The architecture. order in which this is done is arbitrary and other orders of It consists of a 3 dimensional array (node, cp type, polariza selection are possible. tion classification) with a 64-bit CPU mask per entry. US 2013/0O24659 A1 Jan. 24, 2013

A second working area, TOP WORKING, is included for use -continued in updating the topology. PP (polarization) values to be returned for DECLARE this group of CPs in the STSI 1 TOPBK BASED BDY(DWORD), data. Note 3 TOP CURRENT, only the first two 5 TOPCPUMASK(1:MAXNODE, f* Each of 1-4 nodes classifications can be used for a horizontal partition. 0:CPUTMAX, f* CP type index */ * STATIC INIT("00000 b|1b|PPH, f* Classification 0:3) f*4 possible topology O: classifications when the Dedicated logical partition is vertical. Horizontal * There are 00000 b|Ob|PPH), /* only 2 classifications when the Classification 1: partition is Shared horizontal. Horizontal * * 3 : BIT(5), f* Unused, need byte alignment for BIT(64), f* Mask of the logical CPUs that bit arrays'? fall into this 3 TOPDPP BIT(3), f*Actual D, PP values to use classification. */ * f8:: * : * * * * * * * * * * * * * * * * * * */ 3 TOP WORKING CHAR(LENGTH(TOP CURRENT); /* /* NDI translation array */ Work area for building f8:: * : * * * * * * * * * * * * * * * * * * */ new topology 1 TOPDPP2CLASS(0:7) FIXED /* Used by the nodal assignment * routines to create the topology information. LPDPP is used as an index into this TO3PO Partition Topology Translation array to determine which classification 0348. These “constant translation tables are used by the index the logical CP should use. This one nodal assignment routines which build the topology (mapped array is by TOPBK) and STSI processing which reads the topology. used for both horizontal and vertical partitions */ f8:: * : * * * * * * * * * * * * * * * * * * * * * */ STATIC INIT( 1, f* Shared, horizontal /* STSI translation arrays: */ */ f8:: * : * * * * * * * * * * * * * * * * * * * * * */ 3. * Shared, vertical low DECLARE */ 1 TOPVERT(0:3) BIT(8) /* For vertical partitions, 2, f* Shared, vertical medium translates the */ classification index above to 1, f* Shared, vertical high the */ architected D (dedication) and O, f* Dedicated, horizontal PP */ (polarization) values to be 0, f* Not applicable returned for */ his group of CPs in the STSI 0, f* Not applicable data */ */ STATIC INIT(*00000 b||1b|PPVH, f* O), f* Dedicated, vertical high Classification O: */ Dedicated 3 : CHAR(4); /* Force to be non-simple item Vertical high */ */ 00000 b|Ob|PPVH, f* Classification 1: Shared Vertical high */ Logical Processor Block: 00000 b|Ob|PPVM, f* Classification 2: 0349. A 2 bit encoding of partition polarization can be Shared Vertical tracked for each logical processor to reflect its polarization. Medium *. Grouping this with a 1-bit indication of dedication allows a 00000 b|Ob|PPVL), /* Classification 3: complete polarity picture for a logical processor in 3-bits: Shared Vertical Low if 3 : BIT(5), f* Unused, need byte alignment for bit arrays'? 3 TOPDPP BIT(3), f*Actual D, PP values to use 3 LPDPP BIT(3), f* Polarization, including dedication */ * 4 LPDED BIT(1), /* =1, logical CP is dedicated */ 1 TOPHOR(0:1). BIT(8) /* For horizontal partitions, 4 LPPP BIT(2), f* Processor Polarization */ translates the classification index above to f* Encodings for Processor Polarization */ the PPH BIT(2) CONSTANT(OOB), f* Horizontally polarized */ architected D (dedication) and PPVL BIT(2) CONSTANT(*01B), f* Vertically polarized - Low */ US 2013/0O24659 A1 Jan. 24, 2013 20

-continued For horizontal partitions, only classification 0 and 1 are cur rently valid. Fill in the dedicated bit and polarization value as PPVM BIT(2) CONSTANT(10B), f* Vertically polarized - Medium */ follows: PPVH BIT(2) CONSTANT(11B), f* Vertically polarized - High */ Update Topology Block Classification 0 in topbk is horizontal dedicated, store a 00 in Clear Local copy of CPU topology mask the PP and 1 in D Classification 1 in topbk is horizontal DO for every logical CP in the target logical partition shared, store a 00 in PP and 0 in D IF the logical CP is online THEN DO 0357 The CPU Type, the next value to be filled in the Polarity index = Polarity index appropriate for the logical CPs polarization CPU-TLE is just the index of the second array within topcpu value according to the translate polarization value to polarity index array mask in topbk. (0 GP. 2 IFA, 3—IFL, 4 ICF 1 is cur Local copy of CPU topology mask (cpu address, node.cptype, Polarity index) = rently unused). ON 1. The next value is the CPU address origin. This value is END explicitly stored as 0 as the 64 is maximum number of CPUs END available in the current embodiment. 2. The last value in syibk Vcm cpu is the CPU mask, the 0350 IF new topology block NOT-current topology non-Zero 64 bit mask stored in the nested array of arrays block for partition THEN topcpumask. 0351. Set topology change report bit and copy the new 3. For each non-zero mask following the first non-zero bit topology to the current. mask within a node, create a separate CPU-type TLE entry Instruction simulation for STSI 15.1.2 and iterate through this process for all 4 nodes. 0358. In an embodiment, the PTF instruction might 0352. Within syibk mappings for the STSI 15.1.2 response request specific changes to the topology other than a change block, a container-type TLE, and a CPU-type TLE have been of polarization, Such changes include (but are not limited to) be added. Essentially, the data must be returned in container requesting more guest processors be added to the guest con (s) with the entry at the lowest level being a CPU-type TLE. figuration, requesting fewer guest processors in the guest One can think of this as an array of arrays based on how the configuration, requesting one or more dedicated processors logical partition’s resources have been subdivided or allo be added or removed from the guest configuration, requesting cated. For the preferred embodiment, each container is essen specific polarization of specific guest processors, requesting tially a node with a nesting level of 1 and includes CPU type co-processors be added or removed from the guest configu TLE(S) that each has a nesting level of 0. The CPUTLEs are ration, requesting a temporary change of the topology, ordered by CPU type followed by their classification. Vertical requesting a change of the topology for a predetermined partitions have four classifications (vertical dedicated, verti period of time and the like. cal high shared, vertical medium shared, & Vertical low 0359 Furthermore, the invention is not limited to topol shared) and horizontal partitions have two classifications ogy of processors. It can be appreciated that the basic com (dedicated & shared). ponent of the invention could advantageously apply to com 0353. The following steps illustrate a use case for how a ponents other than CPUs, including, but not limited to STSI 15.1.2 is handled after all the upfront checks have co-processors, Caches, TLBS, internal data paths, data path validated the instruction input. buffers, distributed memory and I/O communications adapt 0354 For the current embodiment a max of 4 nodes and 64 ers for example. processors is assumed. What is claimed is: 0355 Start scan of topbk, and in a local variable called 1. A computer program product, the computer program current node value maintain the value of the node index we product comprising a tangible storage medium readable by a are currently on. The reason we need this is because if all the processing circuit and storing instructions for execution by 64 bit masks within a node are Zero, we do not need to create the processing circuit for performing a method comprising: a container-type TLE for that node. executing, by a processor a perform topology function 0356. Once the first non-zero entry is found within a node, instruction for requesting a configuration change of a first create a container-type TLE entry for that node. Within topology of a plurality of first processors (CPUs) of a the container TLE entry, the nesting value is 1, followed by 48 first configuration, the execution comprising: reserved bits. The last bits are the node ID which is the index any one of initiating a change of the polarization of the in topbk of the current node we are processing. After creating processors of the guest configuration according to the the container-type TLE, create a CPU-type TLE for the entry requested polarization change or rejecting the polariza with a non-zero bit mask. Within this entry, the nesting level tion change request; and is 0, followed by 24 reserved bits. The next 8 bits include the setting a condition code, by the processor, to a value indi dedicated bit and the polarization bit. If the partition is cur cating whether a polarization change is initiated or rently vertical, fill in the polarization value and dedicated bit rejected, wherein the condition code of an instruction set as follows: architecture provides branching criterion for branch on Classification 0 in topbk is vertical dedicated, store a 11 in the condition instructions. PP and 1 in D 2. The computer program product according to claim 1, Classification 1 in topbk is vertical high shared, store a 11 in wherein the first processors are guest processors and the first PP and 0 in D configuration is a guest configuration in a logically parti Classification 2 in topbk is vertical medium shared, store 10 tioned computer system, the method further comprising: in PP and 0 in D fetching, by a processor of the guest configuration, the Classification 3 in topbk is vertical low shared, store a 01 in perform topology function instruction defined for a PP and 0 in D computer architecture, wherein the perform topology US 2013/0O24659 A1 Jan. 24, 2013

function instruction comprises an opcode field and a the computerarchitecture is fetched and executed by a central register field for requesting a polarization change; and processing unit of an alternate computer architecture, wherein the execution further comprises, based on the wherein the method further comprises interpreting the per rejecting the polarization change request, setting the form topology function instruction to identify a prede condition code to a second value indicating said change termined software routine for emulating the operation of request is rejected. the perform topology function instruction; and 3. The computer program product according to claim 2, setting a condition code to a value indicating whether a wherein the executing the perform topology function instruc polarization change is initiated or rejected. tion further comprises: 10. A computer system comprising: obtaining from a function code (FC) field of a register specified by the registerfield, a function code field value, a memory; and the function code field value consisting of any one of a one or more processors (CPUs) in communication with horizontal polarization specifier, a vertical polarization said memory, wherein the computer system is config specifier or a check of a topology change specifier, ured to perform a method, the method comprising: based on the FC field value specifying a horizontal polar executing, by a processor a perform topology function ization, initiating horizontal polarization of the proces instruction for requesting a configuration change of a sors of the guest configuration; topology of a plurality of first processors (CPUs) of a based on the FC field value specifying a vertical polariza first configuration, the execution comprising: tion, initiating vertical polarization of the processors of any one of initiating a change of the polarization of the the guest configuration; and processors of the guest configuration according to the based on the requested topology change being rejected, requested polarization change or rejecting the polariza setting a reason code (RC) value in a register specified tion change request; and by said register field. setting a condition code, by the processor, to a value indi 4. The computer program product according to claim 2, cating whether a polarization change is initiated or further comprising obtaining from a function code field of a rejected, wherein the condition code of an instruction set register specified by the register field, a function code field architecture provides branching criterion for branch on value, wherein an initiated polarization request is asynchro condition instructions. nous to completion of a previously executed perform topol 11. The system according to claim 10, wherein the first ogy function instruction; and processors are guest processors and the first configuration is a based on the function code field value specifying a check of guest configuration in a logically partitioned computer sys atopology change, checking the completion status of the tem, the method further comprising: topology change. fetching, by a processor of the guest configuration, the 5. The computer program product according to claim 3, perform topology function instruction defined for a wherein horizontal polarization comprises providing Sub computer architecture, wherein the perform topology stantially equal host processor resource to each processor function instruction comprises an opcode field and a resource, wherein Vertical polarization comprises providing register field for requesting a polarization change; and Substantially more host processor resource to at least one wherein the execution further comprises, based on the processor of said processors than to at least another processor rejecting the polarization change request, setting the of said processors. condition code to a second value indicating said change 6. The computer program product according to claim 3, request is rejected. wherein the RC value specifies a reason code consisting of: based on the configuration being polarized as specified by 12. The system according to claim 11, wherein the execut the function code prior to execution, the RC value indi ing the perform topology function instruction further com cating the configuration is already polarized according prises: to the function code; and obtaining from a function code (FC) field of a register based on the configuration processing an incomplete polar specified by the register field, a function code field value, ization prior to execution, the RC value indicating a the function code field value consisting of any one of a topology change is already in process. horizontal polarization specifier, a vertical polarization 7. The computer program product according to claim 3, specifier or a check of a topology change specifier, further comprising: based on the FC field value specifying a horizontal polar based on the instruction specifying the vertical polariza ization, initiating horizontal polarization of the proces tion, initiating vertical polarization of the processors of sors of the guest configuration; the computer configuration, thereby producing a result based on the FC field value specifying a vertical polariza ant updated topology. tion, initiating vertical polarization of the processors of 8. The computer program product according to claim 4. the guest configuration; and wherein the execution further comprises: based on the requested topology change being rejected, based on no topology change report being pending, setting setting a reason code (RC) value in a register specified a condition code indicating a topology-change-report by said register field. not pending; and 13. The system according to claim 11, further comprising based on a topology change report being pending, setting a obtaining from a function code field of a register specified by condition code indicating a topology-change-report the register field, a function code field value, wherein an pending. initiated polarization request is asynchronous to completion 9. The computer program product according to claim 3, of a previously executed perform topology function instruc wherein the perform topology function instruction defined for tion; and US 2013/0O24659 A1 Jan. 24, 2013 22

based on the function code field value specifying a check of wherein the execution further comprises, based on the atopology change, checking the completion status of the rejecting the polarization change request, setting the topology change. condition code to a second value indicating said change 14. The system according to claim 12, wherein horizontal request is rejected. polarization comprises providing Substantially equal host 18. The method according to claim 17, wherein the execut processor resource to each processor resource, wherein Ver ing the perform topology function instruction further com tical polarization comprises providing Substantially more prises: host processor resource to at least one processor of said pro obtaining from a function code (FC) field of a register cessors than to at least another processor of said processors. specified by the register field, a function code field value, 15. The system according to claim 12, wherein the perform the function code field value consisting of any one of a topology function instruction defined for the computer archi horizontal polarization specifier, a vertical polarization tecture is fetched and executed by a central processing unit of specifier or a check of a topology change specifier, an alternate computer architecture, based on the FC field value specifying a horizontal polar wherein the method further comprises interpreting the per ization, initiating horizontal polarization of the proces form topology function instruction to identify a prede sors of the guest configuration; termined software routine for emulating the operation of based on the FC field value specifying a vertical polariza the perform topology function instruction; and tion, initiating vertical polarization of the processors of wherein executing the perform topology function instruc the guest configuration; and tion comprises executing the predetermined software based on the requested topology change being rejected, routine to perform steps of the method for executing the setting a reason code (RC) value in a register specified perform topology function instruction. by said register field. 16. A computer implemented method, the method compris 19. The method according to claim 18, further comprising ing: obtaining from a function code field of a register specified by executing, by a processor a perform topology function the register field, a function code field value, wherein an instruction for requesting a configuration change of a initiated polarization request is asynchronous to completion topology of a plurality of first processors (CPUs) of a of a previously executed perform topology function instruc first configuration, the execution comprising: any one of initiating a change of the polarization of the tion; and processors of the guest configuration according to the based on the function code field value specifying a check of requested polarization change or rejecting the polariza atopology change, checking the completion status of the tion change request; and topology change. setting a condition code, by the processor, to a value indi 20. The method according to claim 18, wherein the perform cating whether a polarization change is initiated or topology function instruction defined for the computer archi rejected, wherein the condition code of an instruction set tecture is fetched and executed by a central processing unit of architecture provides branching criterion for branch on an alternate computer architecture, condition instructions. wherein the method further comprises interpreting the per 17. The method according to claim 16, wherein the first form topology function instruction to identify a prede processors are guest processors and the first configuration is a termined software routine for emulating the operation of guest configuration in a logically partitioned computer sys the perform topology function instruction; and tem, the method further comprising: wherein executing the perform topology function instruc fetching, by a processor of the guest configuration, the tion comprises executing the predetermined software perform topology function instruction defined for a routine to perform steps of the method for executing the computer architecture, wherein the perform topology perform topology function instruction. function instruction comprises an opcode field and a register field for requesting a polarization change; and