SYSTEM PERFORMANCE OPTIMIZATION R.J. Bednarz

SYSTEM PERFORMANCE OPTIMIZATION R.J. Bednarz CYFRONET, Swierk, Poland 0. Introduction 1. Hardware Overview The System Performance Optimization In this Chapter I would like to has become and important and difficult present the examples of hardware used for field for large scientific computer centres. the large scale scientific computations Important because the centres must satisfy /batch and interactive/. The examples will increasing user demands at the lowest be taken from CERN and INR computer centres, possible cost. Difficult because the which use IBM and CDC equipment. It is well System Performance Optimization requires known, that scientific computations are a deep understanding of hardware, software almost monopolized by these two and workload. The optimization is a manufacturers especially in the field of dynamic process depending on the changes physics. Assuming the elementary knowledge in hardware configuration, current level of computer structure, I will concentrate of the operating system and user generated on the hardware aspects, which are crucial workload. With the increasing complication for performance and programming. Since the of the computer system and software, the IBM and CDC computers differ very much in field for the optimization manoeuvres the organization, it is necessary to broadens. describe them separately. In the three hour lecture it is of course difficult to cover all aspects of 1.1. IBM Hardware the System Performance Optimization. First of all it was necessary to talk in the Figure i shows IBM 370/168-3 computer, Chapter i about the hardware of only two which is installed at CERN. The central manufacturers IBM and CDC. Chapter 2 processing unit is equiped with 3 megabytes contains the description of four IBM and memory, byte and block multiplexer channels, two operating system. The description controllers and devices. Up to 16 concentrates on the organization of the controllers can be linked to a shared operating systems, the job scheduling and channel, and therefore the reconfiguration I/O handling. The performance definitions, of the controllers linked to block workload specification and tools for the multiplexer channels can be easily done system stimulation are presented in the manually. The 3333 and 3350 disk storage Chapter 3. Chapter 4 is devoted to the can be accessed from two independent description of the measurement tools for storage controllers. Such feature is the System Performance Optimization. called dual access, and it causes a In this Chapter I am going to present complicated routing of disk requests software, hardware and hybrid monitors. through the system. The results of the measurement and various The central processing units of methods used for the operating system higher models from IBM 370 line include tuning will be discussed in the Chapter 5. a buffer storage. The buffer storage can Unfortunately it was not possible to sharply reduce the time required for cover during the lectures theoretical fetching currently used sections of main aspects related to the System Performance storage. On the Model 165, for example, Optimization. Therefore the author intends the CPU can obtain eight bytes from the to cover the subject of the computer buffer in two cycles /1G0 nanoseconds/, models and simulation in the separate and a request can be initiated every cycle. publication. This compares with 18 cycles /1440 nano- - 144 - seconds/ required to obtain eight bytes First, DAT obtains the address of the directly from main storage. On average, appropriate segment table from a system the high-speed buffer storage operates control register. To this segment table to make the effective system storage cycle address, DAT adds the segment address bits time one-third to one-quarter of the to obtain the segment table entry. Next actual main-storage cycle time. Buffer DAT obtains the page table address from operation is handled entirely by hardware the segment table entry and adds the page and it is transparent to the programmer, address bits to it in order to obtain the who doesn*t need to adhere to any page table entry. Finally, DAT forms the particular program structure in order to 24 bit real storage address by appending achieve close-to-optimum use of the buffer. the displacement to the page frame address. Very important hardware feature of To reduce the amount of time required for IBM 370 is the dynamic address translation address translation, DAT retains up to 128 /DAT/. This feature is essential for previously translated addresses in a virtual storage operating systems. The CPU translation lookaside buffer /TBL/. Prior can operate with their virtual storage to performing a translation using segment features disabled /Basic Control mode/ or and page tables, DAT searches the TBL for enabled /Extended Control mode/. For ease required address. in storage management, virtual storage, A program interruption occurs during real storage, and direct access storage address translation, if DAT attempts to used to contain virtual contents, are translate a virtual storage address to devided into contiguous fixed-length a real storage address and the required sections of equal size. Virtual storage is page is not in real storage. This devided into 64K-byte segments. A maximum interruption, called a page fault, alerts virtual storage of 16, 777, 216 bytes, the control program that the page must be therefore contains 256 segments. Each loaded from external page storage into a segment of virtual storage is devided into page frame of real storage. The transfer of 4K - byte pages. A page frame is a 4K-byte a page into real storage is a page-in. The block of real storage, that can obtain one page-in process is shown in Figure 3. First, page at a time. An equivalent of frame on when a needed page is not in real storage the direct access storage is called a slot. /indicated by a bit in page table entry/, In a virtual storage system, storage management automatically goes to a mechanism is required to associate the corresponding entry in a external page virtual storage addresses of data and table. The external page table entry gives instruction with their actual location in the slot location for page. real storage. This function is performed Next, storage management selects by DAT. To translate the addresses, DAT a frame in real storage to hold the uses tables in real storage. These tables, required page. To do so, it refers to the which are maintained by the control page frame table, which indicates, which program, are the segment table and the frames are allocated. Storage management page tables. One segment table and finds an available frame and brings in the a corresponding set of page tables exist required page from its slot in external for each address space in the system page storage. To complete the page-in /see Figure 2/. There is one page table process, storage management updates the for each segment in the address space. appropriate page frame table entry and The page table indicates which pages are page table entry. currently in real storage and the real In order to keep a supply of frames storage location of those pages. DAT available for page-in, the control program translates the virtual storage addresses removes pages from real storage that have contained in a instruction during execution not been recently referenced. Prior to of the instruction. removing a page from a frame, the control - 145 - program determines whether the page million instruction per second, 96K central contents were modified during processing. memory /60 bit words/ and fourteen If so, storage management performs a peripheral processors with 4K memory page-out. Otherwise an exact copy of the /12 bit words/. The peripheral equipment page already exists in external page may be attached to 24 channels with the storage. A page-out copies the modified maximum transfer rate 2 millions page from its real storage to a slot. The characters per second. Two types of disks slot need not be the one that contains the are used at INR. old version of the page. Storage The characteristics of the disk are the management need only update the external following : page table entry to designate the new slot. Disk CD 841 CD 844 At the end of this section I would Number of units 7 2 like to qoute some characteristics of the Unit capacity in Model 165. The Central Processing Unit millions of characters 36 118 has Basic Machine Time of SO nanoseconds. Transfer rate in Storage Cycle Time is 2 microseconds with thousands of characters 179 461 the 8 byte Storage Access Width and Access Time /miliseconds/: four-way interleaving. High-Speed Buffer Maximum 135 55 Storage can have 8192 bytes or 16384 bytes. Average 75 30 Block Multiplexor Channels are Minimum 25 S buffered to a width of 16 bytes for Average Rotational communication with storage. The maximum Latency 12.5 8.3 data rates for Block Multiplexor Channels Dual access is provided for CD 841 disks. is 3 million bytes a second. Five Low Speed Batch Terminals and Byte Multiplexor Channels are not buffered seven TTY are connected to Local and they are used for slow peripheral Communications Controller through modems. devices. Finally, the 3300 disk has the Four PDP-11/45 minicomputers are linked average access time 30 ms, the average to 6671 Multiplexer. Slow peripheral latency 8.4 ms and the data transfer rate devices include two printers, Card Reader 806 Kbyte per seconds. and Punch Paper Tape Reader/Punch, and Plotter. The devices form two separate 1.2. CDC Hardware lines linked to channels by means of controllers. Four 659 Magnetic Tape Units In this section I will describe the /9 track/ and one 657 Magnetic Tape Unit architecture of CD 6000, CD 7000, CYBER 70 /7 track/ are served by one controller and CYBER 170 computers.

SYSTEM PERFORMANCE OPTIMIZATION R.J. Bednarz

1.Operating Systems Overview

••••It•• G981 &911651, Elateferyl

Mi!!Lxlosalamos SCIENTIFIC LABORATORY

1. Introduction

NWG/RFC# 752 MRC 2-Jan-79 01:22 Nnnnn a Universal Host Table

Multics Extended I Mail System . User's Guide

Langley Vedit for Nos/Ve Usage Manual

Final Report and Recommendation on New

The UNIX Time- Sharing System

Sistemas Operacionais

Tops-10 Monitor Calls Manual, Vol. 1

Lecture 14 Data Level Parallelism (2) EEC 171 Parallel Architectures John Owens UC Davis Credits • © John Owens / UC Davis 2007–9