<<

NBS SPECIAL PUBLICATION 401

U.S. DEPARTMENT OF COMMERCE / National Bureau of Standards

Computer Performance Evaluation NATIONAL BUREAU OF STANDARDS

The National Bureau of Standards' was established by an act of Congress March 3, 1901.

The Bureau's overall goal is to strengthen and advance the Nation's science and technology and facilitate their effective application for public benefit. To this end, the Bureau conducts research and provides: (1) a basis for the Nation's physical measurement system, (2) scientific and technological services for industry and government, (3) a technical basis for equity in trade, and (4) technical services to promote public safety. The Bureau consists of the Institute for Basic Standards, the Institute for Materials Research, the Institute for Applied Technology, the Institute for Sciences and Technology, and the Office for Programs.

THE INSTITUTE FOR BASIC STANDARDS provides the central basis within the United States of a complete and consistent system of physical measurement; coordinates that system with measurement systems of other nations; and furnishes essential services leading to accurate and uniform physical measurements throughout the Nation's scientific community, industry, and commerce. The Institute consists of a Center for Radiation Research, an Office of Meas- urement Services and the following divisions: Applied Mathematics — Electricity — Mechanics — Heat — Optical Physics — Nuclear

- Sciences — Applied Radiation — Quantum Electronics ^ — Electromagnetics ' — Time

and Frequency ^ — Laboratory Astrophysics " — Cryogenics \

THE INSTITUTE FOR MATERIALS RESEARCH conducts materials research leading to improved methods of measurement, standards, and data on the properties of well-characterized materials needed by industry, commerce, educational institutions, and Government; provides advisory and research services to other Government agencies; and develops, produces, and distributes standard reference materials. The Institute consists of the Office of Standard Reference Materials and the following divisions: Analytical Chemistry — Polymers — Metallurgy — Inorganic Materials — Reactor Radiation — Physical Chemistry.

THE INSTITUTE FOR APPLIED TECHNOLOGY provides technical services to promote the use of available technology and to facilitate technological innovation in industry and Government; cooperates with public and private organizations leading to the development of technological standards (including mandatory safety standards), codes and methods of test; and provides technical advice and services to Government agencies upon request. The Institute consists of a Center for Building Technology and the following divisions and offices: Engineering and Product Standards — Weights and Measures — Invention and Innova- tion — Product Evaluation Technology — Electronic Technology — Technical Analysis — Measurement Engineering — Structures, Materials, and Life Safety * — Building Environment * — Technical Evaluation and Application * — Fire Technology.

THE INSTITUTE FOR COMPUTER SCIENCES AND TECHNOLOGY conducts research and provides technical services designed to aid Government agencies in improving cost effec- tiveness in the conduct of their programs through the selection, acquisition, and effective utilization of automatic data processing equipment; and serves as the principal focus within the executive branch for the development of Federal standards for automatic data processing equipment, techniques, and computer languages. The Institute consists of the following divisions: Computer Services — Systems and — Computer Systems Engineering — Informa- tion Technology.

THE OFFICE FOR INFORMATION PROGRAMS promotes optimum dissemination and accessibility of scientific information generated within NBS and other agencies of the Federal Government; promotes the development of the National Standard Reference Data System and a system of information analysis centers dealing with the broader aspects of the National Measurement System; provides appropriate services to ensure that the NBS staff has optimum accessibility to the scientific information of the world. The Office consists of the following organizational units:

Office of Standard Reference Data — Office of Information Activities — Office of Technical Publications — Library — Office of International Relations.

Headquarters and Laboratories at Gaithersburg, Maryland, unless otherwise noted; mailing address Washington, D.C. 20234. ' Part of the Center for Radiation Research. 3 Located at Boulder, Colorado 80302. * Pan of the Center for Building Technology. Computer Performance Evaluation

1

Proceedings of the Eighth Meeting of Computer Performance Evaluation Users Group [CPEUG]

Sponsored by United States Army Computer Systems Command and

Institute for Computer Sciences and Technology National Bureau of Standards Washington, D.C. 20234

Edited by

Dr. Harold Joseph Highland

State University Agricultural and Technical College at Farmingdale New York 11735

U.S. DEPARTMENT OF COMMERCE, Frederick B. Dent, Secretary

NATIONAL BUREAU OF STANDARDS, Richard W. Roberts, Diredor

Issued September 1974 Library of Congress Cataloging in Publication Data

Computer Performance Evaluation Users Group. Computer performance evaluation.

(NBS Special Publication 401) "CODEN: XNBSAV." Sponsored by the U. S. Army Computer Systems Command and the Institute for Computer Sciences and Technology, National Bureau of Standards. Supt. of Docs. No.: C 13.10:401 1. Electronic digital — Evaluation —Congresses. I. Highland, Harold Joseph, ed. II. United States. Army. Computer Systems Command. III. United States. Institute for Computer Sciences and Technology. IV. Title. V. Series: United States. National Bureau of Standards. Special Publication 401.

QC100.U57 No. 401 [QA76.5] 389'.08s [001.6'4] 74-13113

National Bureau of Standards Special Publication 401

Nat. Bur. Stand. (U.S.), Spec. Publ. 401, 155 pages (Sept. 1974) CODEN: XNBSAV

U.S. GOVERNMENT PRINTING OFFICE WASHINGTON: 1974

For sale by the Superintendent of Documents, U.S. Government Printing Office, Washington, D.C. 20402 (Order by SD Catalog No. C13.10:401). Price $1.80 s

FOREWORD

As part of the Federal Information Processing Standards Program, the Institute for Computer Sciences and Technology of the National Bureau of Standards sponsors the Federal Information Processing Standards Coordinating and Advisory Committee (FIPSCAC) and a series of Federal Information Processing Stand- ards (FIPS) Task Groups. The Task Groups are composed of rep- resentatives of Federal departments and agencies, and in some cases of representatives from the private sector as well. Their purpose is to advise NBS on specific subjects relating to Infor- mation Processing Standards.

One of the Task Groups is FIPS Task Group 10 - Computer Component and Systems Performance Evaluation. Among FIPS Task Group 10' other responsibilities is to sponsor a self-governing Computer Performance Evaluation User's Group (CPEUG) whose purpose is to disseminate improved techniques in performance evaluation through liaison among vendors and Federal ADPE users , to provide a forum for performance evaluation experiences and proposed applications, and to encourage improvements and standardization in the tools and techniques of computer performance evaluation.

With this volume, the proceedings of the CPEUG are for the first time being made available in a form that is readily accessible not only to Federal agencies but to the general public as well. This is in recognition of the fact that computer performance evaluation is important not only for Federal ADP installations but also for those of the private sector.

It is expected, therefore, that this volume will be useful in improving ADP management through the use of performance evalua- tion in both the private and public sectors.

Acknowledgement is made of the assistance of Dr. Joseph 0. Harrison, Jr., Chairman of the FIPSCAC, and Captain Michael F. Morris, USAF, Chairman of FIPS Task Group 10, in promoting and producing this volume.

Ruth M. Davis, Ph.D., Director Institute for Computer Sciences and Technology National Bureau of Standards U. S. Department of Commerce

iii ,.

ABSTRACT

The Eighth Meeting of the Computer Performance Evaluation Users Group (CPEUG) sponsored by the United States Army Computer Systems Command and the National

Bureau of Standards, was held December 4-7, 1973, at NBS, Gaithersburg . The program chairman for this meeting was Merton J. Batchelder of the U.S. Army Computer Systems Command at Fort Belvoir, Virginia 22060 (CSCS-ATA Stop H-14)

About 150 attendees at this meeting heard the 17 papers presented on computer performance, evaluation and measurement. Among the papers presented were those dealing with hardware and software monitors, workload definition and bench- marking, a report of FIPS Task Force 13, computer and evaluation in time-sharing as well as MVT environment, human factors in performance analysis, dollar effectiveness in evaluation, simulation techniques in hardware allocation, a FEDSIM status report as well as other related topics.

These proceedings represent a major source in the limited literature on computer performance, evaluation and measurement.

Key words: Computer evaluation; computer performance; computer scheduling; hardware monitors; simulation of computer systems; software monitors; systems design and evaluation; time-sharing systems evaluation

iv PREFACE

The evolution and rapid growth of computer lem of exact specification is made the more performance, evaluation and measurement has difficult by the recent birth and development been the result of an amalgam of developments of computer performance, evaluation and mea- in the computer field, namely: surement as a discipline within computer science. • the growing complexity of modern digital computer systems; • the refinement of both multiprogramming and multiprocessing capabilities of an Among the early practitioners of this disci- increasing number of computer systems; pline were the members of CPEUG [Computer • the concomitant expansion and complexity Performance Evaluation Users Group], indi- of systems programming; viduals from many United States Governmental • the accelerated trend toward the estab- agencies involved in various phases of this field. lishment of computer networks; At about the same time, there were a number • the myriad of applications programs in of academicians as well as analysts from busi- business, technical, scientific, industrial ness and industry working in this area, and this and military fields; and gave rise to the formation within the Association

• the emphasis on massive data bases for for Machinery of SIGME [ Special information retrieval, data analysis, and Interest Group in Measurement and Evaluation] administrative management. which is currently known as SIGMETRICS.

Prior to proceeding, it is essential to define In its formative period of growth, computer several basic terms so that the examination performance, evaluation and measurement has is not made in a vacuous environment. relied heavily upon modeling and simulation. It has been concerned with the design of com- Performance, according to the dictionary, is puter systems as well as the analysis of the defined as "an adherence to fulfill, to give a behavior of these systems. The analyst, on rendition of, to carry out an action or a pat- the one hand, has used simulation to secure tern of behavior. " detailed information about a system that he Evaluation is defined as a " determination or has created, or about which his knowledge is the fixing of a value, to estimate or appraise, limited. On the other hand, the analyst to state an approximate cost. " has used simulation to test various hypotheses about the system in an effort to improve its Measurement is defined as a "process of performance. It is a quixotic hope that as measuring, the fixing of suitable bounds, the this discipline grows it will become more inter- regulation by a standard. " disciplinary and involve the work not only of modelers, simulators and statisticians, but also behavioral scientists, economic analysts and Computer performance, evaluation and mea- management scientists. surement is now vital to the designer, the user and the management -owner of a modern computer system. To some, computer performance, evaluation and measurement is Computer performance, evaluation and measure- a tool, a marriage of abstract thought and ment is replete with benefits for the computer logic combined with the techniques of statis- community - the manufacturer of modern elec- tical and quantitative methods. To others, tronic processing equipment, the user of that it is a technique with very heavy reliance on equipment, especially the manager responsible modeling and simulation and simultaneously for the complete operations, as well as the involves features of both classical experi- purchasers of this equipment. It is cap- mentation and formal analysis. The prob- able of providing many urgently needed answers

V Within this to problems faced today, such as: volume are the technical papers presented at the Eighth Meeting of the Com- • procurement - the evaluation of various puter Performance Evaluation Users Group. proposals for different systems configura- The Meeting was sponsored by both the United tions ; States Army Computer Systems Command and • planning — determination of the effects The Institute for Computer Sciences and Tech- of projected future workloads upon an nology of the National Bureau of Standards, existing or planned system; and was held December 4th - 7th, 1973 at Gaithersburg, Maryland. Added to these • costing - detailed information of cost technical papers are other papers which were existing or planned data to users of an presented at an earlier meeting of CPEUG computer system or network; but never published. • scheduling - the allocation of systems resources to the various demands on the system; Dr. Harold Joseph Highland, Editor

• designing - how best to design a new Chairman, ACM\ SIGSIM [ Special Interest or modify an existing system, and how Group in Modeling and Simulation of the to improve the operation of that system; Association for Computing Machinery]

• optimization - determination of a systems Chairman, Data Processing Department of mode to enhance the performance of a the State University Agricultural and

portion or of the entire system. Technical College at Farmingdale ( New

York >

June 1974

Vi TABLE OF CONTENTS

* PREFACE Dr. Harold Joseph Highland, Editor New York State University College at Farmingdale v

* COMMENTS OF CHAIRMAN, CPEUG John A. Blue Department of Navy, ADPESO X

* COMMENTS OF PROGRAM CHAIRMAN

Merton J. Batchelder U. S. Army Computer Systems Command x'ij

* KEYNOTE ADDRESS Brigadier General R. L. Harris

Management Information Systems, U . S. Army 1

* INITIATING COMPUTER PERFORMANCE EVALUATION

GETTING STARTED IN COMPUTER PERFORMANCE EVALUATION

Philip J. Kiviat and Michael F. Morris Federal Computer Performance Evaluation and Simulation Center .... 5

A METHODOLOGY FOR PERFORMANCE MEASUREMENT Daniel M. Venese The MITRE Corporation 15

* ACCOUNTING PACKAGES

USE OF SMF DATA FOR PERFORMANCE ANALYSIS AND RESOURCE ACCOUNTING ON IBM LARGE-SCALE COMPUTERS Robert E. Betz Boeing Computer Services, Inc 23

USING SMF AND TFLOV\^ FOR PERFORMANCE ENHANCEMENT James M. Graves U. S. Army Management Systems Support Agency 33

* SOFTWARE MONITORS

U SAC SO SOFTVV^ARE COMPUTER SYSTEM PERFORMANCE MONITOR: SHERLOC Philip Balcom and Gary Cranson U. S. Army Computer Systems Command 37

vii * B E N C H M ARKS

BENCHMARK EVALUATION OF OPERATING SYSTEM SOFTWARE: EXPERIMENTS ON IBM'S VS/2 SYSTEM Bruce A. Ketchledge Bell Telephone Laboratories 45

REPORT ON FIPS TASK GROUP 13 WORKLOAD DEFINITION AND BENCHMARKING David W. Lambert The MITRE Corporation 49

* HARDWARE MONITORS

PERFORMANCE MEASUREMENT AT USACSC Richard Castle U. S. Army Computer Systems Command 55

A COMPUTER DESIGN FOR MEASUREMENT — THE MONITOR REGISTER CONCEPT Donald R. Deese Federal Computer Performance Evaluation & Simulation Center 63

* SIMULATION

THE USE OF SIMULATION IN THE SOLUTION OF HARDWARE ALLOCATION PROBLEMS W. Andrew Hesser USMC Headquarters, U. S. Marine Corps 73

* RELEVANT CONSIDERATIONS

HUMAN FACTORS IN COMPUTER PERFORMANCE ANALYSES A, C. (Toni) Shetler The Rand Corporation 81

DOLLAR EFFECTIVENESS EVALUATION OF COMPUTING SYSTEMS

Leo J. Cohen Performance Development Corporation 85

* SPECIFIC EXPERIENCE

COMPUTER SCHEDULING IN AN MVT ENVIRONMENT Daniel A. Verbois U. S. Army Materiel Command 99

PERFORMANCE EVALUATION OF TIME SHARING SYSTEMS T. W. Potter Bell Telephone Laboratories 107

A CASE STUDY IN MONITORING THE CDC 6700 - A MULTI- PROGRAMMING, MULTI-PROCESSING, MULTI-MODE SYSTEM Dennis M. Conti Naval Weapons Laboratory 115

viii * FEDERAL COMPUTER PERFORMANCE EVALUATION AND SIMULATION CENTER

FEDSIM STATUS REPORT Michael F. Morris and Philip J, Kiviat Federal Computer Performance Evaluation & Simulation Center 119

* PREVIOUS MEETING: SPECIAL PAPERS

DATA ANALYSIS TECHNIQUES APPLIED TO PERFORMANCE MEASUREMENT DATA G. P. Learmonth Naval Postgraduate School 123

A SIMULATION MODEL OF AN AUTODIN AUTOMATIC SWITCHING CENTER COMMUNICATIONS DATA PROCESSOR LCDR Robert B. McManus Naval Postgraduate School 127

* ATTENDEES: EIGHTH MEETING OF COMPUTER PERFORMANCE EVALUATION USERS GROUP 137

ix COMMENTS OF CHAIRMAN

COMPUTER PERFORMANCE EVALUATION USERS GROUP

This publication provides papers presented at the for the time and effort they have expended on the eighth meeting of the Computer Performance Eval- preparation and presentation of their papers . I uation Group held at the National Bureau of thank them very much for making the meetings that Standards, sponsored by the United States Army- I have chaired so successful. I know that with Computer Systems Command and the Institute for that kind of support, future meetings will be even Computer Sciences and Technology of the National more interesting and informative. I look forward

Bureau of Standards ; and some papers presented at to having continued contact with all members of the meeting held in March 1973 at Monterey, CPEUG, and I wish the new officers all the best. California, and sponsored by the Naval Postgradu- ate School.

As the now past chairman of CPEUG, I am very pleased at the qualities of these papers. They provide valuable information on a wide spectrum of computer performance evaluation techniques and tools. The authors deserve a great deal of credit John A. Blue COMMENTS OF PROGRAM CHAIRMAN

EIGHTH MEETING OF

COMPUTER PERFORMANCE EVALUATION USERS GROUP

I wish to express my appreciation to the members The conference scheduling of a presentation each of government, industry and the academic com- hour allowed time for questions and participation munity who contributed their time and resources from the audience. Ample coffee breaks also per- to m.ake this a highly successful conference. mitted time to pursue specific CPE problems and discussions and stimulate new ideas. Brigadier General Richard L. Harris, Director of Management Information Systems for the Army, set This exchange of performance evaluation experi- the tone of the conference with his keynote ad- ences should have widespread effect on our ADP dress. Authors of the 16 technical papers shared community. Each agency is striving to improve the benefits and problems of their endeavors with its application of ADP systems. Interchange of Computer Performance Evaluation tools and tech- experience should continue through contacts with niques. others met at the conference.

The conference technical program began with an Good luck, and let us hear from you. overview of items for consideration when initial- izing a computer performance evaluation program. This was followed by applications of the CPE tools in order of increasing complexity. Philip Kiviat, Technical Director of the Federal Computer Performance Evaluation and Simulation Center, concluded with a summary of contributions by FED SIM. Mert Batchelder

xi COMPUTER PERFORMANCE EVALUATION USERS GROUP

EIGHTH MEETING HELD 4-7 DECEMBER 19 73

AT NATIONAL BUREAU OF STANDARDS

GAITHERSBURG, MARYLAND

Program Committee

Mr. Merton J. Batchelder, Chairman Mr. John A. Blue MAJ Richard B. Ensign Mr. William J. Letendre Mr. William C. Slater

CPEUG Officials 1971 - 1973

Mr. John A. Blue, Chairman

Mr. William J. Letendre, Vice Chairman MAJ Richard B. Ensign, Secretary

CPEUG Officials 19 74 - 1975

Mr. William J. Letendre, Chairman MAJ Richard B. Ensign, Vice Chairman Mr. Jules B. DuPeza, Secretary

xii KEYNOTE ADDRESS

Brigadier General R. L. Harris

Director, Management Information Systems, US Army

Good Morning Ladies and Gentlemen:

I am pleased to have a chance to was caused by two major factors--the con- welcome you to this meeting of the Com- flict in Southeast Asia and the increas- puter Performance Evaluation Users Group. ing complexity of our weaponry and the I believe such meetings are most bene- personnel skills and logistic base neces- ficial to all users of ADR through the sary to support them. This complexity sharing of information on mutual problems was the price we had to pay for the and because they provide us the chance to greater move, shoot, and communicate leave our desks and, for a little while, equipment of the Army in the field. More to look at the bigger problems on the importantly, the attendant requirement horizon. Since becoming the Army's for faster, more responsive ways of Director of Management Information Sys- supporting this combat power became quick- tems in July, I have had an opportunity ly apparent. to review some of the history and current philosophy which underlies the Army's We have, however, been conscious of approach to the management of ADP. I the need to economize and have made sig- have also discussed with my staff the nificant gains in our efforts to stem this future trends and areas for exploration spiral of e ver- i n creas i n g ADP costs. Like over the next five to ten years. I most of you, we have been helped along by would like to share some of these thoughts Congress, of course. However, in spite of with you this morning. these efforts, our resource commitment is still significant. We are currently The Army has had a long association expending more than three hundred and with the development and use of digital eighty million dollars and nineteen thou- computers. Initially we used these sand manyears on this effort, while oper- machines to assist us in processing com- ating almost 950 computers of all shapes plex scientific computations and, more and sizes. This commitment to automation recently, our attention has been focused has been so pervasive that there is prob- on automating the bread and butter ably no manager at any level within the functions logistics, personnel admin- Army whose decisions are not influenced istration, and financial management. The by the products of our automated systems. Army's first interest in digital computers dates back to 1942 when we commissioned To continue to provide the quali- the University of Pennsylvania's Moore tative support required by today's man- School of Electrical Engineering to design agers in an era of decreasing resource and produce the electronic numerical inte- availability and increasing constraints grator and computer (the ENIAC). The in- on flexibility, we all must become better stallation of the ENIAC in 1947 in the at managing the resources we do have. Our Ballistic Research Labs at Aberdeen, increased management efforts in the Army Maryland marked the beginning of the will fall in three major areas: better widespread use of electronic computing planning, better management of systems machines within the Federal Government development, and, lastly, increased and the civilian economy as well. The productivity in software. ENIAC was closely followed by the instal- lation of our second computer, the EDVAC, From an organization viewpoint, we or Electronic Discrete Variable Automatic shall continue our commitment to the Computer, operational at BRL since 1949. centralization of software development These initial computers were used pri- through increased emphasis on the develop- marily for solution of ballistic equa- ment of standard systems in more tions, fire control problems, data re- functional areas. The benefits which the duction, and related scientific problems. Army has gained since the establishment From these beginnings, the Army expanded of its central design agencies have per- its use of both unique and off-the-shelf mitted us to reduce the total number of computers to other scientific and admin- personnel in ADP by almost one-third istrative areas, so that by the begin- while at the same time increasing the ning of the sixties, the Army had over qualitative support to managers at our 100 computers in its inventory. major operating activities. Combined with our centralization efforts is our During the sixties, the Army ex- continuing search for better methods of panded its use of ADP at a rate of almost defining functional requirements. Part fifteen percent annually. This expansion of this effort has been to place squarely

/ 1 . . :

in the hands of the functional user the We are particularly interested in responsibility for determining these their potential in software optimization. requirements and for preparation of the The trend in software development, in- supporting economic justification for cluding systems software, is toward po- undertaking the software development tential use of a higher order language. effort. (UNIVAC, for example, for their new, but unannounced, 1100 system are building Better software development is predi- most of their compilers and operating cated upon better planning methods which system in PL-1). In this environment, can help us to allocate our planned re- monitors are a prerequisite in achieving sources to meet new user requirements. effectiveness; that is, find- Along these lines, we believe that we ing the critical code and optimizing it. can no longer sell automation by using We are attempting to build our software subjective arguments. We all recognize in a similar manner. We are still sub- that the day of automation for automa- jected to the argument that software tion's sake has passed. We must now sup- effectiveness can only be achieved through port our requirements with quantitative use of an assembly language. I believe economic and benefit analyses, as well as that management, ever sensitive to total workload projections. Use of both of life cycle costs, will find that the these tools is predicated upon an adequate higher level language approach, in con- means of data gathering for predictive junction with an effective monitor, will purposes. In line with this, one of the prove the more economic solution for most current major efforts of my office is to systems improve our dollar and utilization report- ing systems to provide the data we now I certainly cannot be classified as feel is necessary to help us make these a computer expert, but it is clear to me critical resource allocation decisions. that the automation field in this next One such effort which we have just com- decade will undergo considerable change. pleted is a hardware requirements pro- There will certainly be greater emphasis jection model called SCORE . This model on the compu te r- commun i ca t i on s inter- enables us to project needed upgrades in relationship and continued moves toward computer capability dependent upon distributive processing with hierarchies increasing workload over time. of processing power. Revolutionary changes are on the horizon in software. Turning now to the question of soft- We all recognize the accelerating cost ware productivity, it is this area in imbalance between hardware and software. which we can most benefit from the Steps must be taken to redress this im- discussions which will take place over balance. Software development must move the next four days. Increased pro- from its present cottage industry stage ductivity will not come from increased if it is to continue to exist. The or better management alone. Technology engineers, armed with microprogramming will assist us and we in the Army intend techniques and a well established manu- to lean on the technologists. In order facturing discipline, are a force to be to do this, however, we will have to reckoned with. The general purpose off- increase our management of R&D expend- the-shelf computer as we know it may very itures in information science. One of well give way in the eighties to micro- the investments we are making is in the programmed special purpose devices area of performance monitors. We are capable of solving well defined problems excited about the possibilities inherent at minimal cost. in performance monitors, and have been since we first started using them in What does the software industry need 1970. We are currently decreasing our to meet this challenge? From my per- operating costs through the use of per- spective, it appears to me that the list

formance monitors at our central design i ncl udes agencies. Details of these efforts are included in two presentations by the a. A simple management-oriented Computer Systems Command. We also have language for describing automation re- two monitors at our Computer Systems quirements - -a tool which can be placed Support and Evaluation Command (our ADPE in the hands of the computer neophyte to evaluation and selection command). CSSEC enable him to bridge the functional is using these monitors in an ombudsman analyst-computer analyst communication role to help determine the right hardware gap configuration to match user processing requirements. We are using the monitors b. Means of increasing the pro-' in a number of ways -- no different, I'm ductivity of our programmers. Many sure, than how they are being used by attempts have been undertaken to achieve others. We have found the use of these this goal. Monitors used in conjunction monitors most effective as aids in ADPE with a procedure-oriented language will configuration balancing and the help. The team programmer concept, optimization of data base design. structured programminq, the machine tool

2 .

approach to software development and requires refined estimating techniques. automatic programming are all attempts The computer industry has been embar- which have been undertaken to increase rassingly deficient in its failure to programming productivity. develop a system to help management estimate system costs. The DOD commun- c. Programming is estimated to ity has made minimal attempts in this represent only fifteen percent of the area with little success to show for our system life cycle time. Approximately money. This area, too, demands much more fifty percent of a programmer's time is of our managerial attention. spent trying to find logical (and il- logical) errors. Very little attention The challenges I have outlined are is now being focused on the validation substantial, but the potential return is problem, and yet, it is precisely here great. We have no alternative but to where our major resources are expended. approach them -- if necessary -- one at This subject area demands our profes- a time. sional attention. This symposium is certainly a step d. Improved control systems for in that direction, and I urge all of you management use are required. A pre- to keep these objectives in mind as you requisite to management control is ade- participate in these discussions. quate planning. Planning, in turn, Thank y ou

3

.

GETTING STARTED IN COMPUTER PERFORMANCE EVALUATION

F. Morris Philip J. Kiviat and Michael

FEDSIM

1, WHEN TO START A PROGRAM puts nor outputs could be changed. Several proposed changes were tested on the simu- A, EXPENDITURE LEVEL DECISION lator. Two very simple changes were found that made substantial differences in the When the cost of a computer in- system's performance: first, the programs stallation reaches some threshold value, were allowed to run simultaneously (multi- a CPE program may be examined as an in- program) wherever possible; and, second, vestment problem. The installation costs records stored on disk were blocked to the include all computer and peripheral equip- average track size. The average system ment; supplies such as cards, tapes and run time went down to 2 hours 5 minutes paper; all computer operations personnel; giving better than a 50% reduction each and any systems programmers that are used time the system is run. Total CPE project solely to keep the computer system operating. time was about six manweeks; five in the The value of the system's products must be simulation phase and one to alter and test established. Where no price is already the programs. No magic or profound - associated with existing products, a lect was involved in this effort, just a break-even price per customer-ordered thorough examination of the demands and piece of output must be calculated and resources assumed as the start-up value. Once a system's cost exceeds its value, or value C. PRELIMINARY ANALYSES: SETTING could be higher if the system could pro- GOALS, DEFINING APPLICATIONS duce more salable products, CPE is needed. When value exceeds cost, but cost is high, One of the worst ways to start a CPE aimed at cost reduction is in order. CPE effort is to purchase a hardware moni- As long as no value is associated with the tor and assign a person to use it to "do system's products, there will be no real good things". Many starts have been made cost-reduction motivation to attempt to just like this and few, if any, have suc- improve a system's performance (except, ceeded. The monitors are stored away, perhaps, as an academic exercise) . collecting only dust. It is much more reasonable if specific systems are identi- B. PROBLEM DRIVEN DECISION fied for analysis with modest goals set such as -- decrease the time that is used More typical of the reason most by one old program by 5%; or, document the CPE efforts are begun: another set of flow of jobs form the input desk to the applications are to be added to an already- output window and reduce the total turn- busy computer system; projected workload around time for one large job by 5%. No growth was an underestimate; the director sophisticated tools are needed and any of operations says he needs another or a standard time study text will provide guide- larger computer; etc. Someone must look lines for the test. What these types of deeply and carefully into the way that projects allow a manager to accomplish existing resources are being applied to are, first, operations personnel become meet existing demands. Then the demands aware of a systematic improvement effort; and resources must be aligned so that and, second, recommendations will be made room for growth is available. A simple regarding information that is needed, but example maybe seen that really made a sub- very difficult to obtain. Setting only stantial difference in work performed at one or two simple improvement goals causes no real change in products produced: A unknowns to become visible. These unknowns system of 14 batch programs that had been may then be categorized into application operational on various equipment for nearly areas for solution by specific CPE tools 16 years was modelled using a simulation or techniques. The usual result of these package. This system ran an average of 4 initial efforts is establishment of a hours and 3 0 minutes and interfaced with systematic analysis of the accounting pack- several other systems so that neither in- age information and some physical rearrange-

5 . .

ment of computer support room layout. Once Next, and at times most important, is the analysis effort is underway, specific a person who can explain and convince. needs for additional CPE capability will Perhaps an "evangelist" would come closer become apparent. This approach is really to describing this person than "salesman", quite conservative, and it may take six but "evangelist" would be a misleading out- months or more before the effort begins line heading. (That is, no divine guidance to produce useful results. But, once is necessary for CPE success--it does help, underway, very valuable results will be though.) This person is particularly im- produced portant at the beginning and end of each CPE project. The idea of performing each

2 . SELECTING TEAM MEMBERS project usually needs to be "sold" all the way down to the lowest involved working The work "team" is used continuously level. In fact, to be of lasting worth, to stress the point that a lone individual it is more important that these working seldom succeeds for very long in a CPE level people be convinced that CPE is effort. This is not to imply that more worthwhile than that higher management than one full-time person is always neces- be so convinced. Further, a sales type sary, but rather that at least three dif- is essential when a project is completed ferent types of individuals need to be (to make certain that those responsible involved and the background that leads for authorizing implementation of team to CPE successes is seldom found in any recommendations are thoroughly convinced one person. to do so) . Unless changes are made, the CPE effort will be wasted. A. NECESSARY BACKGROUND As a CPE team must produce useful work Implementing improvements is often in a timely manner, a worker is a must. This more difficult than identifying improve- is the one individual who should be full ments. Nearly all significant improve- time in the CPE field. He need not be ments require either programming skill or particularly well educated or intelligent a background in systems engineering. but he must not be lazy. He will probably Since systems programmers usually evolve be both the easiest one to find and the from applications programmers, system men least expensive to pay. This is the team are ideal CPE specialists. A systems member who should have the strong system programmer with a field engineering his- programming background as he will be the tory is even better. Education in a one that will sift through reams of output scientific discipline usually helps a and determine feasible changes in programs. person to examine situations systematic- ally. A scientific education nearly al- It should by now be clear that good ways exposes an individual to fundamentals CPE teams aren't easy to find. This is of mathematics and statistics. These true. In two cases of large CPE activities, traits--systematic thinking and knowledge about 100 detailed resumes of people who of math and statistics —are also neces- think they are qualified are examined to sary for a CPE team. An additional feature find each CPE specialist who turns out to that adds immeasurably to a CPE team's be good. Most often, it's faster to offer effectiveness is an academically diversi- the duty to someone in your shop that has fied background. Reasoning here is like the background and is of the type described in any other developing field —the broader earlier (or maybe to steal someone who the investigator's background, the more is already a success - but this expensive) likely that parallels will be seen in other fields where similar problems have C. IMPORTANCE OF ATTITUDE already been solved. It is often true that very good B. TYPES OF PEOPLE systems programmers are extremely reluct- ant to change anything that they didn't Considerable amounts of nearly think of first (or, at least to change it original or, at least, innovative thought the. way someone else has) . This is a is a major part of any disciplined investi- very poor attitude to bring into a CPE gation. (And the CPE team will indeed team. The best team members are open- appear to be investigators if they are minded and eager to examne suggested changes properly placed in the organizational from whatever source. Be careful that an structure.) Picking a person and making individual who is too conservative in his him think as a routine part of a task is work doesn't defeat the purpose of your a nearly impossible approach. It is far CPE team. It is an imaginative and innova- better to pick a reaearch type who takes tive field. every assignment as a challenge to his thought process. (If no researchers are D. IN-HOUSE OR CONSULTANTS available, pick a lazy person with the experience and education mentioned earlier.) If time is available to decide to : go into CPE based on return on investment report to the corporate-level controller. (as opposed to an immediate operational This placement insures that recommendations problem) it is certainly more beneficial that are accepted will be implemented. to develop the CPE capability in-house. Also, it will encourage the team to view If the CPE need is immediate and if you their activity in a cost-benefit light. have no capability in-house and can't hire The more cost conscious the CPE team, the a proven performer as a regular employee, more likely that sound recommendations will then the only alternative is to use a be made. Unless the data processing center consultant. There are certainly cases is extremely large, the CPE team should not where a return-on-investment decision report to the manager of data processing. requires that the CPE investment be less (Extremely large: more than three separate than the cost of an established in-house and different computer systems and monthly effort. Such cases could be handled by least cost— or equivalent--of more than consultants (although these aren't the $100,000.) If there is a good argument for jobs consultants try very hard to get...). using consultants continuously rather than As a general rule, solve immediate one- in-house personnel, it is in this reporting time problems with consultants. Solve authority region: A CPE team is really a immediate, but general problems with a working level activity. It is not custom- consulting contract that requires the ary to have this type activity reporting at consultant to train your in-house CPE such a high level. However, it is customary team. Solve continuing, non-urgent for consultants and outside auditors to problems totally in-house. The suggestion circulate at the working level and report here is no different for a CPE team than at very high management levels. Except in it would be for any other effort — the cases of very small companies (the president best way is in-house. acting as data processing manager) , company size has little to do with the CPE team's 3. ORGANIZATIONAL PLACEMENT OF THE CPE organizational placement. It is more a TEAM matter of scaling the CPE effort to match the computer effort, which is generally Let's assume here that computer system dependent on company size. expenditures are large enough to justify a continuing 1 1/2 - 2 manyear CPE invest- C . CHARTER ment per year on an in-house basis. (In the consultant case, the following could This is the CPE team's license to be put in terms of "Who should the operate. When duties to identify improve- consultant report to?"). ments in computer system operations are assigned to a group outside the direct A. LINE VERSUS STAFF control of the computer center manager

(as recommended earlier) , it is mandatory A CPE team performs a staff that clear notice of the team's operational function: it investigates existing arena be given. This notice should be methods, finds and recommends alternative formal (written) and signed by the highest methods, and outlines ways of implementing possible corporate officer. Such a notice methods. Although the team must be able will be a charter or license to become to implement any recommendation, it is involved in even the most routine aspects usually a misallocation of scarce talent of providing computer support. The charter for the team to do this. Usually, and need not be long or particularly specific. particularly in very large data processing General direction is usually better because environments, the CPE team will periodi- it's never known ahead of time exactly which cally be challenged (generally by a areas will produce the most plentiful middle-manager in applications programming savings. The ideas regarding system bottle- surroundings) to do the implementation job necks and workload characteristics that themselves. About every 8 to 12 months, grow in the computer center without benefit this should be done. In the process of of CPE tools, are almost always wrong. implementing its own recommendations, the (Explain: scientific vs. business mix, team not only gains operational credibility, printer-bound, etc.) A charter should it also has the opportunity to sharpen and include, as a minimum, the following update the team's own programming (or statements systems design) skills. This keeps the CPE team from becoming a group of former "No new programs or systems of experts. And this is important. programs and no substantial changes to existing programs are to be implemented B. REPORTING AUTHORITY without first having the design or change reviewed by the CPE team. If in the CPE Organizations are so very differ- team's opinion it is necessary, the design ent that the only way of describing or change must be simulated to predict its placement, in general, is in relative impact on the existing workload and to test terms. The team should, at the very any reasonable design alternatives before lowest, report to the same person that the committing resources to programming and data processing or computer center manager implementing the system or change. reports to. Ideally, the CPE team should

7 .

"Before ordering new equipment to devices or tools as are available; (4) replace, to add to, or to enhance existing develop and use simulation and modeling equipment's speed or capacity, the CPE techniques; and (5) document activities and team must be called upon to measure the recommendations for use by management. levels of activity and contentions of the These may be elaborated as required to fit system or portions of the system that each personnel department's peculiar needs. would be affected by the new equipment." But these five basic areas are necessary to insure that the CPE team has the minimal This type of charter will allow capability to perform useful projects. the team to examine both existing systems and proposed changes before new workloads B. RESPONSIBILITIES or equipment acquisitions are committed. Determining the impact of such changes A CPE team just as any other group requires a thorough knowledge of existing of employees, must be expected to produce workloads and systems. The benefit from results that are worth more than they cost. this initial learning phase for the CPE A detailed listing of responsibilities and team will be significant: it is almost procedures will be given in a moment, in always the first time that a review of all classical management terms. At this point, facets of the computer center's activity it is enough to say that the team is will have been attempted by knowledgeable responsible for "paying its own way." In and impartial persons. This phase is, in the early start-up phase, it isn't reason- itself, sound justification for establish- able to demand this of the team. But, by ing a CPE team. the end of its first year of existence, a CPE team that hasn't documented savings D. CONTROLS equal to or greater than its costs should be dissolved. By itself, this is a heavy Since a good CPE team that is responsibility to meet. properly placed in the organization will represent a high level of technical C . PROCEDURES competence and have the endorsement of upper management, there will be a tendency The day-to-day operating procedures to react quickly to the team's sugges- of the CPE team are essentially the same as tions. It is, therefore, important that for an audit group or, in times past, for rather strict controls be established to a time-study effort: define a problem, insure that the team follows a total examine existing methods, postulate changes, project plan (more about this later) to test change, recommend best changes, and its conclusion before discussing possible oversee implementation of accepted recom- recommendations with those who will be mendations. Written procedures are important implementing the changes. Otherwise, especially for groups that visit detached there may be changes well underway before sites all aspects of the changes are examined. And these may turn out to be unacceptable D. RECOMMENDING CHANGES changes when everything is considered. It is quite easy to avoid this situation As pointed out earlier, changes by adding a statement to the "charter" should be recommended to management, not that establishes a need for written at the working level. The recommendations directive by some specified manager should only reach the working level in (above the CPE team) before any recommen- clear-cut, directive form. dation is to be implemented. Whenever possible, management should insist that E. FORMAL REPORTS the CPE team provide at least two recommended approaches to the solution of These will be the only real any problem. And the approach should be (tangible) product of a properly run CPE selected by a manager above the CPE team team. The changes (except in the rare or within the area that will be most cases mentioned already) will be made by affected by the recommended change. other programmers and engineers. Only by carefully and completely documenting each

4 . PROJECT ADMINISTRATION project can the CPE team progress in an orderly manner. A format for project A. JOB DESCRIPTIONS reports that has proven useful in several environments is that of the classical The formal writeups that cover "staff study": Purpose, Scope, Procedure, CPE team members must mention the need to Findings, Conclusions, and Recommendations (1) work with documentation describing the — preceded by a short summary of major logical and physical flow of signals findings and recommendations. Hopefully, through the system and its components; the entire report, but at least the (2) perform measurement analyses and cod- summary and recommendations, should be ing changes of applications and control completely "jargon" free. (The best programs; (3) install or connect such CPE report is worthless if only the authors can understand it.)

8 . . . .

F. PROJECT MANAGEMENT 6. Insuring that schedules will be met. It has been implied several times that the CPE team is most effective if it 7. Establishing quality operates on a project basis. CPE projects control procedures. tend to be ad hoc in nature: each one has a single purpose. The following remarks 8. Thoroughly understanding the are therefore framed in this project kinds of management abilities that will be orientation. In classical management required in a project. terms the CPE project manager's job is to: 9. Determining the quantity of — PLAN: Determine requirements of management required in a project. the project in terms of people, facilities, equipment, etc.; estimate the project's 10. Determining what must be duration and relationships between work- provided by all parties involved, e.g., load of the CPE team and the project's sutomer, other suppliers. deliverables 11. Knowing what is needed to —ORGANIZE: Request and insure "solve" the problem. availability of the necessary facilities, equipment, personnel, material, and (most 12. Understanding the customer's important) authority. problem and translating it into a solvable CPE problem. —DIRECT: Set time and cost milestones and bounds for what must be 13. Insuring that a clear, done; make all operating, project-specific consistent, and appropriate plan for each decisions study is produced.

—CONTROL: Measure performance 14. Knowing how the results of against plan and take any necessary the study will fill the customer's needs. actions to correct malperf ormance. 15. Insuring that the planned —COORDINATE: Insure that all approach to the project is logical and involved activities are aware of, and realizable receptive to the project's efforts, goals, and recommendations. 16. Understanding the details of the approach. To make certain that the full impact of this project administration 17. Insuring that all essential section has been made clear, a few minutes tasks are included. will be spent on an exhaustive listing of the responsibilities of a CPE project 18. Insuring that no unnecessary manager. First, it is pointed out that if tasks are included. the CPE team is large, or if team members have specific skills that not all team 19. Knowing the output from members have, each project or study may each task. have a different project manager. Each project manager is responsible for the 20. Judging whether or not each technical content and conduct of the study. task is the best way to achieve the output. His ultimate responsibility is twofold: to the managers that receive the team's 21. Knowing what resources are recommendations and to the individual or required for each task. group who will implement the team's recommendations. For lack of a better 22. Periodically reviewing the term, these two groups are referred to as adequacy of the skills, quantity of the "customer". The individual CPE project personnel, facilities, equipment, and manager is responsible for: information.

1. Insuring that he can 23. Insuring that commitments detect malperf ormance. of all resources made to the project are honored 2. Establishing measures to prevent malperformance. This is a rather long list for each CPE project but the team's tasks are complex and 3. Assuring that he can detailed enumeration of the individual correct malperformance. project manager's responsibilities is important. What, then, is left for the 4. Insuring that all ideas leader of the CPE team to do if he is not will be explored and exploited. always the project leader? — Even a longer list of non-technical, management tasks. 5. Exercising positive cost control.

9 .

The CPE team leader acts to: -Strive for a minimal, consistent, level of detail, 1. Manage objectives of each adding detail only on evidence CPE study. that it is required.

-Does the customer know what 6. Control costs. he wants? -Establish control procedures. -Does the customer have the authority to perform its -Guard against inefficient use tasks? of people and equipment.

-Are all the results the -Obtain CPE resources best suited customer requests needed? to a project at a cost that is consistent with project size -Is the customer adequately and importance. motivated to support CPE efforts over the long haul? 7. Manage validation phase.

2. Schedule the project tasks. -Expose all assumptions,

-Use a documented, quantita- -Determine the "level of noise" tive (graphic) method to inherent in different methods. schedule people, equipment, facilities, tasks. -Establish procedures for debugging and testing. -Act to acquire resources when needed. -Specify nominal outputs in advance of experimentation. -Establish procedures to monitor and control progress. -Insure sensitivity analyses are performed. 3. Select and organize personnel. -Specify levels of confidence -Determine when outside required to demonstrate support is required. success of a method.

-When necessary, establish a -Insist a validation plan be project management structure. developed in advance of its need. -Motivate personnel to the goals of the project. 8. Manage documentation effort.

-Distribute workload to make -Insure that all assumptions the best use of present and limitations are specified. talents and promote cross training by on-the-job -State the purpose and use of experience. each recommendation.

4. Manage the scope of each study. -Describe each recommendation clearly in narrative form. -Determine whether a study should be specific to a -Establish a project notebook project or made more general. that is a chronological log of all project decisions, -Determine essential components actions, trip reports, etc. and boundaries of the study. -Insure that a good channel of -Insure all boundary effect communication exists between assumptions are considered CPE team members and their and stated explicitly in customers project reports. 9. Manage experimentation effort. 5. Control the level of detail for each study. -Develop an experiment plan that is consistent with project -Derive an evolutionary objectives before starting a development plan that permits new type of project. continual assessment and redirection. 10 .

-Establish validity of statisti- CPE product suppliers as an indication of cal analysis procedures when the broad availability of such products. they are applied. D. LEARNING TECHNIQUES -Insure analyses are consistent and correct. All CPE techniques require some training before they can be used. In order -Insure summaries and presenta- of least to most training, these are: tions are clear, illuminating and concise. -Accounting system analysis -Insure experiments produce -Software monitors required results. -Hardware monitors 5. ACQUIRING TOOLS AND TECHNIQUES -Simulation languages (Most of the rest of the outlined topics will be covered in detail by other -Simulation packages presentations. They are mentioned here, briefly, simply to introduce them as While CPE techniques can be learned important topics that require concern if by self-study, classroom training is advised. a viable CPE effort is to be established On-the-job training is imperative, as class- at any installation) room courses can teach only technique , and not guidelines or standards for its applica- A. FREE TOOLS - ACCOUNTING PACKAGES tion in the field.

Before any special CPE tools are E, PROCURING TOOLS purchased or techniques are learned, the output of the system accounting package As mentioned earlier, it is not v^ise should be studied in detail and thoroughly to begin a CPE effort by purchasing CPE understood. A tremendous amount of informa- equipment. First, use the available account- tion that can lead to improved computer ing data, then acquire a software monitor or performance is contained in accounting accounting data reduction package (lease one, data. In almost every case, it is a "free" don't buy, as it will soon be unused in most good. That is, it is collected as a shops) , then develop a simulation capability, routine task. If nothing is now done to and finally — when the CPE team justifies it your facility's capacity: eliminate the for a specific project — acquire a hardware accounting package. But if you are monitor. This procurement of tools should seriously considering the establishment span at least one year and probably two of a CPE activity, you'll have to or three. reinstall this package at least periodi- cally. Nevertheless, if you don't use 6. A FLOW CHART FOR CPE DECISIONS its output now, stop generating it. The chart shown in Figure 8-17 describes B. COMMERCIALLY AVAILABLE TOOLS many situations where CPE may be useful. It's certainly not exhaustive, but it covers A list of all GPE suppliers is enough situations to give a new CPE team given in the following table. There are an ambitious start. Its use can help an three general product categories: organization to decide where the applica- tion of CPE may produce worthwhile results. -Accounting package reduction programs, referred to in the CONTINUING THE EVALUATION ACTIVITY table as "Other". Once established, the CPE team loops through -Monitors, broken into hardware a series of continuing activities: and software monitors. A. Examine and select new CPE tools -Simulation, which is in the form of general purpose CPE packages, 1. To keep up with developments. specialized simulation languages, 2. To interchange information with and simulation studies. CPE teams B. Determine CPE measurement frequencies C. SURVEYING THE FIELD with respect to: Several individuals have compared 1. The characteristics of each and contrasted the various CPE tools that installation are now available. Such surveys should a. Capital investment be examined before commitments are made b. Workload stability to any one CPE supplier. User's groups c. Personnel are the best sources of this survey information. Figure 8-10 lists several 2. The aging of systems

11 .

3. Planned workload changes E. Integrate operations and measure- ment activities a. New systems 1. Educate operations personnel to b. New users be aware of visual cues to system malperformance c. New user mix, e.g., RJE, interactive, batch 2. Use existing and new operations reports C. Maintain a close watch over the distribution of computer activity by a. Console logs device and by program to: b. System Incidence Reports 1. Relate equipment costs to level of use c. "Suggestion Box"

2. Determine trade-offs possible 3. Educate operators in use of between cost and performance tools to improve selection of jobs for multiprogramming, 3. "Tune" the total system as a mounting of tapes and disk routine task packs upon request, etc.

a. Data set allocation Operator performance is an overriding factor in the operation of an efficient third-

b. Selection of programs for generation computer installation . multi-programming F. Establish good relations with the c. Operating system and computer system vendor applications 1. May require assistance for probe point development D. Develop a CPE data base 2. May require permission to attach 1. CPE information collected by a hardware monitor one tool will usually be used in conjunction with that 3. First instinct of maintenance collected by another tool. personnel is to resist measure- ment activities 2. Historical data can be used to predict trends in workload, 4. Acquire a knowledge of his usage, etc. activities and the acquaintance of his research staff 3. As new analysis tools and techniques are developed, G. Kiviat Graphs for Diagnosing Computer data from well-understood Performance (Refer to reprint of "Computer- situations is needed to test world" article) their discriminant and predictive power. SPECIAL CONSIDERATIONS AND FINANCIAL ASPECTS

4. A common data format and structure A. A minimum investment in CPE tools is needed to compare similar and technicians will be one manyear per year installations for normative and an equivalent in tool and training and management purposes. expenditures, computer costs, travel expenses, etc. Assume this is $40,000 5. Customized interfaces can be based on proration of the following costs built between the data base experienced at the Federal Computer- and the CPE tools used to Performance Evaluation and Simulation standardize and simplify tool Center (FEDSIM) usage and add additional features, e.g., editing. 1. Typical FEDSIM staff salary: $19,700 6. "Trigger" reports can be incorporated into data base 2. Software monitor: $1-15,000 processing or interface programs to indicate that something is 3. Hardware monitor: $4-100,000 happening in the system that should be studied in detail. 4. Accounting system analysis prograiTi: $3-15,000

12 .

5. Simulation packages: larger systems. CPE is counter to this $12-40, 000 goal

B. CPE savings must be at least 5%, E. There are many small systems using probably no more than 20$ on an annualized general purpose computers for highly basis, although many individual projects specialized applications. CPE effort in will show larger returns. For example: these installations should be spent in stripping all unused or unnecessary $40,000/. 05 = $800,000 per year - to parts out of the control program to justify on low expected CPE value; match it to the special application. In the case of serial, batch systems $40, 000/. 20 = $200,000 per year - to (or serial, batch equipment) there is justify on high expected CPE value. little that CPE can do for performance. In these cases, the best available C. CPE savings are difficult to programming techniques are called for. quantify in many instances, e.g., system design analysis via simulators, configura- F. CPE should include: tions studies for system enhancement, measurement of workload to prepare 1. Software package evaluation benchmarks for new RFP, etc. 2. Software validation and CPE savings are easy to quantify verification in other instances, e.g., equipment returned after a reconfiguration prompted 3. Equipment reliability studies by a measurement analysis, computer time saved by system and program tuning, etc. 4. Studies of programming techniques But what is extra computer capacity, improved turnaround or better 5. Management and organizational response worth to the company? effects

D. Manufacturers will not assist most But it rarely does. CPE should smaller installations in their CPE efforts, be viewed from a total systems as their goal is to increase these to perspective .

13

.

A' METHODOLOGY FOR PERFORMANCE MEASUREMENT

Daniel M. Venese

The MITRE Corporation

I. INTRODUCTION (3) conduct benefits analysis, and

This paper presents a methodology for perfor- (4) test performance evaluation hypothe- mance evaluation. A general framework is devel- sis . oped to orient and direct a performance evaluation task. Within this framework, operating systems III. OPERATING SYSTEMS FRAMEWORK can be classified and compared, hypotheses for- mulated, experiments performed, and cost/benefit In order to conduct an effective performance criteria applied. This methodology has the fol- evaluation, it is necessary to understand the lowing underlying assumptions: operating system. Although the precise implemen- tation of each operating system differs in many (a) Most systems can be modified or tuned respects, they have many facets in common. The to achieve improved performance. identification and classification of these common functions into a framework for understanding (b) Although some optimizations will not operating systems is an important step in computer degrade other areas of system performance performance evaluation (CPE)

(removing a system-wide bottleneck) , most performance gains will be the result of This framework for operating systems is not compromise and cause degradation in intended to be all inclusive since operating sys- another area of system performance. tems can be examined and classified from other meaningful points of view. Examining and classi- (c) Relatively small changes can often lead fying from the user point of view, is an example to dramatic changes in system perfor- of another framework. The framework presented

mance . here isolates those portions that are important in CPE and allows various systems to be compared (d) Modifications indicated by a performance by means of this common classification. evaluation study should be cost effective. Large-scale third-generation computers typi- (e) In many cases when a performance bottle- cally have more resources than any single program neck is removed, another will immediately is likely to need. The rationale for an operating become the limiting factor on performance. system is to manage the utilization of these re- sources allowing many programs to be executing at The aim of this methodology is to account for one time. The resource manager concept of oper- the significant factors that affect a performance ating system is the product of J. Donovan and evaluation effort. Inadequate performance evalua- S. Madnick of MIT. tions result from limited outlooks. Purely tech- nical evaluations concentrate on hardware and In a computing system with finite resources software monitors, but ignore the human factors. and a demand for resources that periodically ex- Other efforts have obtained huge quantities of ceeds capacity, the resource manager (operating data without a framework for analysis. In general, system) makes many policy decisions. Policy deci- two of the most common inadequacies are to take a sions are strategies for selecting a course of narrow approach to performance evaluation and to action from a number of alternatives. There is gather data without having a means to analyze it. general agreement that as many relevant factors as possible should be included in policy decisions. II. OUTLINE OF METHODOLOGY For example, the core allocator algorithm might consider such factors as the amount requested, the The following outline describes the steps in amount available, the job priority, the estimated the performance evaluation methodology: run time, other outstanding requests, and the availability of other requested peripherals. Dis- (a) Understand the system. Classify the agreement arises as to how factors should be subject operating system according to the weighted and the strategies that are most appro- framework provided in paragraph III. priate for the installation workload.

(b) Survey the environment. Examine and The component of the operating system that analyze the environment surrounding the decides which jobs will be allowed to compete for computer installation including manage- the CPU is the job scheduler. The scheduling ment attitudes and goals, mode of opera- policy might be first-in, first-out (FIFO) or tion, workload characteristics, and estimated run time with the smallest jobs sched- personnel experience. uled first. The FIFO strategy is one of the simplest to design and implement, but it has many (c) Evaluate system performance: disadvantages. The order of arrival is not necessarily the order in which jobs should be (1) examine problem types, selected for execution nor can it be assured that a FIFO scheduling policy will provide a balanced (2) formulate performance evaluation job mix and adequate turnaround for priority jobs.

hypothesis , Although general prupose algorithms can provide

15 an acceptable level of service for a wide spectrum (a) Determine the prime operational goal of of situations, tuning to the needs of the partic- the computer center. ular DPI can realize significant gains in effi- ciency and ease of use. (b) Determine which job or group of jobs constitutes a significant part of the Once a job has been allocated and is eligible installation's workload. for execution, the process scheduler (system dispatcher) decides when jobs gain control of the (c) Describe the explicit actions taken by CPU and how long they maintain control. The pro- the computer operations to help schedule cess scheduler queue may be maintained in order the computer, e.g., using job priorities by priority, by the ratio of I/O to CPU time or in to promote a good job mix. a number of different ways. The job scheduler decides which jobs will be eligible to compete for (d) Determine the most frequent causes for CPU time while the process scheduler decides which computer reruns, e.g., mounting the wrong jobs will receive CPU time. A number of other tape for a production run. decision points and important queues are found in operating systems. These include I/O supervisors, (e) Determine from interviewing computer output queues, interrupt processing supervisors, operations personnel the extent to which and data management supervisors. the following are done:

Another important facet of operating systems (1) scheduling of jobs, is the non-functional software which carries out the policy decisions and comprises the bulk of the (2) premounting of tapes and disks, operating system software. The gains to be effected in the areas of policy making are pri- (3) interfacing with users, marily the result of tuning to the particular Data Processing Installation (DPI) while the gains from (4) card handling, the non- functional software are primarily the re- sult of improvements in efficiency. For example, (5) responding to console commands , and the routine that physically allocates core has a number of minor housekeeping decisions to make and (6) documenting operational difficulties. any improvements to be realized in this area will be from an improved chaining procedure or a faster (f) Determine the extent and impact of on-line way of searching tables or a similar type of gain applications on the computer system. in efficiency. (g) Determine from interviewing users the Once this framework for computer systems has adequacy of the following: turnaround been adopted, systems are no longer viewed as a on batch jobs, response time for time- collection of disparate components, but as a sharing and on-line applications , and the collection of resources. The operating system is availability of the system. the manager of these resources and attempts to allocate them in an efficient manner through po- (h) Determine the average number of system licy decisions and non- functional software to failures per day, the average time be- carry out these decisions. System bottlenecks are fore the system is available and the not mysterious problems, but are the result of average time between system failures. excess demand for a limited resource. By stating performance problems in terms of the relevant (1) Determine a typical working schedule for resources instead of in terms of the effects of the computer center and include preven- the problem, the areas in need of investigation tative maintenance, dedicated block time, are clearly delineated. availability of time-sharing and on-line applications, and time devoted to classi- IV. SURVEY THE ENVIRONMENT fied jobs.

Understanding and classifying the subject (j) Determine the approximate lines of print operating system is one facet of understanding the output per day, number of cards punched, total system. Computers do not operate in a and number of tapes processed. vacuum. Their performance is influenced by the attitudes and abilities of the operations per- (k) State how priority users are scheduled. sonnel, programmers and managers. Although imr proving the performance of a system by stream- (1) State the number of special forms re- lining the administrative procedures is not as quired each day. dramatic as discovering a previously unknown in- efficiency in the operating system, the net gain Computer System can be just as worthwhile. (a) Determine the approximate CPU utilization Computer Operations on a daily basis.

Parts of the following survey are based on (b) Determine the average number of jobs pro- the Rand document, "Computer Performance Analysis: cessed per day. Framework and Initial Phases for a Performance Improvement Effort," by T. E. Bell. (c) Determine the average number of jobs

1.6 . . !

being mul t ipro gramme d /mult ip recessed. V. PROBLEM TYPES

(d) Construct a diagram of the computer This section describes the various kinds of configuration. problem areas encountered in a CPE effort. The classification of problems as operational, admini- (e) Determine the degree of use of on-line strative, hardware or software is essentially for (disk) storage as compared to tape for the purpose of discussion. Although problems can

the following impact in many areas , corrective action should be taken at the cause instead of treating the effects. temporary (scratch) Operations (2) object programs. The area of computer operations is one of the (3) source programs, most important to CPE. In reality, computer operations can have a dramatic effect on system (A) short-term storage. performance. The typical net gain of an improve- ment in software may be 4% to 8% while the actions (5) permanent storage. and of computer operations in the areas of scheduling, job control, peripheral operation, etc., can affect (6) scratch storage for sorts performance by 10% to 20%.

Job Characteristics There are several measures historically used to gauge the effectiveness and productivity of (a) Determine the typical peripheral require- computer installations. These indicators of per- ments for the median computer job. formance include the number of jobs processed per day, the average CPU utilization, the average (b) Determine the typical core and CPU re- turnaround time, and the system availability. quirements for the median computer job. Although these measures apply with varying degrees to other areas, they seem most appropriate for (c) Determine the percentage of jobs that are computer operations since they are most affected production and debug. by computer operations. Since examples can be found of systems delivering overall poor perfor- (d) Determine the percentage of jobs that are mance that have high CPU utilization or that pro- classified or that require special pro- cess many jobs a day, these indicators should be cessing. considered in the context of the total installa- tion. (e) Determine the percentage of jobs that use a language processor. In general, well run computer operations will have the following characteristics in common: (f) Determine the percentage of jobs that are batch, time-sharing, and on-line. (a) The number of informal or undocumented procedures is at a minimum. (g) Determine and examine in detail the pro- gram or programs that require (s) the (b) The existing procedures are well publi- most significant percentage of computer cized to the users, are easy to use, and resources are kept current.

(h) Determine the language and percentage of (c) Whether the DPI is "open" or "closed"

use for the applications programs. shop , access to the computer room is based primarily on actual need. Measurement Activities (d) Procedures are established to manage and (a) State which performance evaluation tools monitor the computer configuration. This are used to evaluate system performance. might include down time and error rate per device, utilization per device, and (b) Determine the current or last perfor- projections of system changes based on mance evaluation project. workload.

(c) State how hardware and software problems (e) The service to users of the DPI meets are recorded, monitored, and resolved. their needs.

(d) State what system modifications have (f) The areas of responsibility for the been made to improve performance. operators, supervisors, and job control personnel are well defined. (e) Determine what reports are available on a periodic or as required basis on system Administrative performance Administrative problems are non-technical (f) Determine what performance evaluation factors that degrade system performance. Included tools are available at the installation. under this heading are the attitudes and goals of management: the organizational structure of the

17 .

DPI, the training and education of the data pro- (g) The operation of data management routines. cessing staff, and the level of documentation for computer systems. Initial analysis performed on (h) The suitability of the access routines, data gathered by monitoring tools will not indi- e.g., indexed sequential, random to cate that administrative problems are limiting the DPI workload. performance, but further analysis supplemented by the experience of the analyst will usually narrow Applications Software the potential list of causes to a few and through the process of elimination administrative problems While the tuning of system software has a can be isolated. system-wide benefit, the tuning of applications programs has benefit for that particular appli- In a system of any kind and especially in cation. At first, it seems that the investigation computer systems, each of the components should of performance problems for application programs work harmoniously with others. To some degree, would be an area of low payoff since tuning the the operator can compensate for inadequate docu- entire system has an inherently greater impact mentation and the job scheduler can provide a than tuning a subset of the system. The amount reasonable job mix even when not tuned to the DPI of payoff is dependent upon the size of the appli- workload, but beyond that point the effectiveness cation program and the amount of resources used. and performance of the total system will be de- Tuning a system dedicated to a single application graded. When the operators spend a disproportion- is virtually equivalent to tuning the system soft- ate amount of time trying to decipher documenta- ware. The characteristics of the DPI workload tion, other areas will be neglected; and when the will dictate the selection of application programs job scheduler makes assumptions about the DPI for tuning and performance evaluation. Care should workload that are incorrect, the total throughput should be taken not to lose sight of the areas of will be reduced. high payoff while concentrating on areas of in- trinsically limited payoff. An example is a sys- Software tem that was tuned in regard to record and block sizes, file placements on disk and still delivered The types of software problems that can occur poor performance. What was overlooked was the fact are of two kinds, system software problems as in that the system contained 12 sorts. Reducing the operating systems or access methods and applica- number of sorts would have equaled all the other tion software problems or those that are associ- performance gains so painstakingly achieved. ated with a particular set of user developed pro- grams. System problems relating to performance The following list itemizes tuning techniques are potentially more critical since they can for application programs: affect all the programs run on the system while application program problems usually do not have a (a) Tuning programs to the special features system-wide impact. If, however, the application and optimization capabilities of language program is a real-time application that runs 24 processors hours a day, then its performance assumes a high degree of importance. (b) Attempting to balance the ratio of I/O to CPU. System Software (c) Adjusting file volumes, record lengths, Third generation computers have general pur- and block sizes to the peculiarities of pose operating systems that can accommodate a wide the physical media. spectrum of workloads with varying degrees of efficiency. A problem is that inefficiencies in (d) Ordering libraries so that the most fre- large-scale computers are often transparent to the quently used members are searched first. user. A heightened awareness of tuning techniques can lead to improved performance. (e) For virtual paged environments the fol- lowing guidelines are applicable: Performance problems associated with system software include: (1) Locating data and the instructions that will use it in close proximity (a) The suitability of the scheduling algo- to maximize the chance of being on rithm to the DPI workload. the same page.

(b) The selection of resident and transient (2) Structuring the program logic so system routines. that the flow of control is linear and that references are concentrated (c) The interface of the system to the in the same locus. operator via the computer console. (3) Minimizing page breakage by avoiding (d) The tape creation, processing, and constants, data, and instructions label checking routines. that overlap pages.

(e) System recovery and restart procedures. (4) Isolating exception and infrequently used routines on separate pages from (f) Direct access file management techniques, the mainline program logic. e.g., file creation, pruging access.

18 (5) Attempt to construct programs around Figure 1 is a chart which may be used to relate a working set of pages that will symptoms to causes and aid in the formulation of most likely be maintained in core relevant hypothesis. by the page replacement algorithm. VII. BENEFITS ANALYSIS (f) Determine the feasibility of allocating scratch, sort, and intermediate files on This section provides guidelines to be used permanently mounted media. in determining the benefits to be realized from a performance improvement. In almost every case a (g) Key indicators that may indicate in- performance improvement will come at some cost, efficiently written application programs whether it is the time and resources necessary to are the following: implement it, the degradation that will result in some other area of system performance or the change (1) Unusually high CPU utilization may in mode of operation. Certain changes will result indicate unnecessary mathematic in such a dramatic gain in system performance that operations and excessive data con- a detailed benefits analysis is not necessary to

versions . determine their value and other changes will pro- duce such a marginal gain in performance that they (2) Imbalanced channel utilization may are obviously not worth the effort. The difficult indicate poor file dispersal. decisions are the borderline cases where a number of factors must be weighed before a judgement is (3) High seek time may indicate poorly reached. organized files. The cost to implement a proposed change can be

' Hardware expressed in time and resources , degradation in some other area of system performance, or changes Indicators that hardware problems exist in- in operating and administrative procedures. Dif- clude: poor overall reliability, failure of key ferent types of costs can be associated with CPE components, frequent backlogs of work, imbalanced changes. The operating system efficiency might be channel utilization, low channel overlap, and I/O improved by recoding a frequently used system mo- contention for specific devices. These indicators dule in which case the cost would be the one-time can be grouped into three general problem types: expense to replace the module. A change in job log-in procedures might improve the ability to (a) misuse of configuration which implies trace and account for jobs and this would be a re- that the basic hardware is adequate, but curring cost (since it would be done each time a that it is improperly structured so that job was submitted). In general, one-time or non- some channels sit idle while others are recurring costs are usually effected in the areas saturated or that the availability of of computer system hardware and software while key components is low because of poor recurring costs are usually effected in admini- configuration planning, strative and operating procedures.

(b) inadequate configuration which implies VIII. TEST HYPOTHESIS that the hardware components lack the capacity, speed, or capability to provide In a structured approach starting with an a minimum level of service and that it is understanding of the total system, an analysis of advisable to procure an expanded or re- the problem types, and a well formulated hypothe- vised configuration, and sis, the amount and type of data needed to test the hypothesis should be limited and relatively (c) excess capacity which implies that DPI easy to obtain. Data to test the hypothesis can workload is being processed inefficiently. be derived from personal observations and account- ing data or explicit performance monitoring tools Hardware problems are corrected by upgrading, such as hardware and software monitors, simula- eliminating or reconfiguring specific components. tions, analytical models, or synthetic programs. Alternatives exist to hardware changes such as reducing or smoothing the workload, tuning the The hypothesis is intended to be no more than system, adopting a new operating system, but the a tentative working solution to the problem and increasing cost of software versus the decreasing should be reformulated as necessary when addition- cost of hardware often dictates that the cost al data is collected. To test a specific hypothe- effective solution is a hardware change. sis data should be collected and experiments per- formed in a controlled environment. VI. FORMULATE HYPOTHESIS OF PROBLEM Even when the data gathered tends to support Working hypotheses are guideposts to system the validity of the performance evaluation hypo- evaluation. The purpose of the hypothesis is to thesis, the net gain in performance may be sur- provide a working solution to a specific problem prisingly small or even negligible. The reason as related to an understanding of the total sys- for this is that a second restraint on performance tem. The hypothesis directs the collection of became the primary limiting factor once the most data that will be used to affirm or deny its obvious was removed. The performance analyst validity. Becuase of the complexity of computer should be aware that there are many limitations on systems, a number of hypothesis will be required performance and as one bottleneck is eliminated to investigate interrelated system problems. another will become evident. If, for example, the

19 :

s TIME

Y OVERLAP

M OVERLAP USE RESPONSE USE

P IMBALANCE

MULTIPROGRAMMING

(WAIT)

T CPU/CHANNEL

CHANNEL SEEK

CHANNEL CHANNEL 0 CPU CPU

INADEQUATE

M CHANNEL

LARGE HIGH HIGH LOW LOW LOW LOW LOW S

1, 2 3 HXGiL CPU 3

LOW CPU (WAIT) 4 2

2 LOW CPU /CHANNEL OVEHLAP 1, 9 3 LARGE SEEK 2 St 5, 6, 8 8 HIGE CHANNEL USE 9

LOW CHANNEL USE. 10 7 7

LOW CHANNEL OVERLAP 8 7 2,8

CHANNEL IMBALANCE 8 7 2, 8 I, 4 LOW MULTIPROGRAMMING II, D 3, 9, JU.\i\l/X2*V^Uill JIl SSSli QJT U Ix J IZ^ i JJXLCi 11, 12

CAUSES

1. IMBALANCED SCHEDULING 8. POOR DEVICE/CHANNEL PLACEMENT 2. POOR FILE PLACEMENT 9. CPU BOUND WORKLOAD 3. INEPFICIENT CODING CAPPLICATLONS) 10. I/O BOUND WORKLOAD 4. INEFFICIENT OPS 11. LIMITED CORE CAPACITY (INADEQUATE) 5. EXCESSIVE PROGRAM LOADING 12. LIMITED CPU CAPACITY (INADEQUATE) 6. INEFFICIENT DATA BLOCKING 13. LIMITED I/O CAPACITY (INADEQUATE) 7. EXCESSIVE

FIGURE 1

RELATING SYMPTOMS TO CAUSES

20 . , , , , , . .

primary impediment to multiprogramming is the Segmentation, B. Randell, scheduling algorithm, then its redesign may (July 69, vol. 12, no. 7, pp. 365-372). achieve a significant gain in the multiprogramming factor, but only a small improvement in total per- 13. The Instrumentation of Multics , Jerome H. formance because the addition of more jobs causes Saltzer and John W. Gintell, a key channel to become saturated and the I/O wait (August 70, vol. 13, no. 8, pp. 495-500). time to become excessive. A total approach will not stop with discovering the most obvious limita- 14. Comparative Analysis of Disk. Scheduling tions to performance, but will delve deeply into Policies, T. Teorey the system and postulate a number of limitations (March 72, pp. 177-184) on performance and approach them in a concerted manne r 15. The Role of Computer System Models in Perfor- mance Evaluation, Stephen R. Kimbleton, BIBLIOGRAPHY (July 72, vol. 15, no. 7, pp. 586-490).

ACM-COMMUNICATIONS ACM-COMPUTING SURVEYS

1. A Method for Comparing the Internal Operating 16. Performance Evaluation and Monitoring, Speeds of Computers, E. Raichelson and Henry C. Lucas, Jr., G. Collins, (September 71, pp. 79-91). (May 64, vol. 7, no. 5, pp. 309-310). ACM-JOITRNAL 2. Relative Effects of Central Procassor and Input -Output Speeds Upon Throughput on the 17. Analysis of Drum I/O Queue Under Scheduled Large Computer, Peter White, Operation in a Paged System, E. G. Coffman, (December 6A, vol. 7, no. 12, pp. 711-714). (January 69, pp. 73-99).

3. Determining a Computing Center Environment, ACM-PROCEEDINGS R. F. Rosin, (July 65, vol. 8, no. 7, pp. 463-468). 18. Application Benchmarks: The Key to Meaning- ful Computer Evaluations, E. 0. Joslin, 4. Economies of Scale and IBM System/360, (20th National Conference 65, pp. 27-37). M. B. Solomon, Jr. (June 66, vol. 9, no. 6. pp. 435-440). 19. The Program Monitor - A Device for Program Correction: (February 67, vol. 10, no. 2, Performance Measurement, C. T. Apple, p. 93). (20th National Conference 65, pp. 66-75).

5. Systems Performance Evaluation: Survey and 20. Hardware Measurement Device for IBM System/ Appraisal, Peter Calingaert, 360 Time-Sharing Evaluation, Franklin D. (January 67, vol. 10, no. 1, pp. 12-18). S chulman Correction: (April 67, vol. 10, no. 4, (22nd National Conference 67, pp. 103-109). p. 224). 21. A Hardware Instrumentation Approach to Evalu- 6. An Experimental Comparison of Time-Sharing ation of a Large Scale System, D. J. Roek and and Batch Processing, M. Schatzoff, R. Tsao, W. C. Emerson, and R. Wiig, (24th National Conference 69, pp. 351-367).

(May 67, vol. 10, no. 5 , pp . 261-265). ACM-SIGOPS 7. Further Analysis of a Computing Center Envi- ronment, E. S. Walter and V. L. Wallace, 22. Workshop on System Performance Evaluation, (May 67, vol. 10, no. 5, pp. 266-272). (April 71, Harvard University).

8. An Experimental Model of System/360, AFIPS-CONFERENCE PROCEEDINGS: SPARTAN BOOKS . Jesse H. Katz NEW YORK (November 67, vol. 10, no. 11, pp. 694-702). 23. Cost-Value Technique for Evaluation of Compu- 9. A Methodology for Calculating and Optimizing ter System Proposals, E. 0. Joslin Real-Time System Performance, S. Stimler and (1964 SJCC, vol. 25, pp. 367-381). K. A. Brons (July 68, vol. 11, no. 7, pp. 509-516). 24. System Aspect: System/360 Model 92, C. Conti, (1964 FJCC, vol. 26, pt. II, pp. 81-95). 10. On Overcoming High-Priori ty Paralysis in Multiprogramming Systems: A Case History, COMPUTER DECISIONS David F. Stevens, (August 68, vol. 11, no. 8, pp. 539-541). 25. How to Find Bottlenecks in Computer Traffic, (April 70, pp. 44-48) 11. Some Criteria for Time-Sharing System Per- formance, Saul Stimler, 26. A Pragmatic Approach to Systems Measurement, (January 69, vol. 12, no. 1, pp. 47-53). Sewald,

(July 71) . 12. A Note on Storage Fragmentation and Program

21 . , . . , , ......

27. Use Measurement Engineering for Better Sys- 40. Computer Performance Analysis: Applications tems Performance, Philip G. Bookman, Barry A. of Accounting Data, R. A. Watson, Brotman, and Kurt L. Schmitt, (May 71, R-573-NASA/PR) (April 72, pp. 28-32). 41. Computer Performance Analysis: Framework COMPUTERS AND AUTOMATI ON and Initial Phases for a Performance Improve- ment Effort, T. E. Bell, B. W. Beohm, and 28. A Methodology for Computer Selection Studies, R. A. Watson, et. al. 0. Williams, , (November 72, R-549-1-PR) (May 63, vol. 12, no. 5, pp. 18-23). SHARE 29. Methods of Evaluating Computer Systems Per- formance, Norman Statland, 42. Study of OS 360 MVT System's CPU Timing, (February 64, vol. 13, no. 2, pp. 18-23). L. H. Zunlch, SHARE Computer Measurement and Evaluation Newsletter, 30. Computer Analysis and Thruput Evaluation, (February 7, 70, no. 3) R. A. Arbuckle, (January 56, vol. 15, no. 1, pp. 12-15, 19). 43. Computer Measurement and Evaluation Committee Reports, Thomas E. Bell, Chairman, 31. Standardized Problems Measure (SHARE XXXIV Proceedings, March 70, vol. 1, Computer Performance, John R. Hillegass pp. 492-547). (January 66, vol. 15, no. 1, pp. 16-19). 44. Systems Measurement —Theory and Practice, MODERN DATA K. W. Kolence, (SHARE (SHARE XXXIV Proceedings, March 70, vol. 1, 32. The Systems Scene: Tuning for Performance, pp. 510-521). J. Wiener and Thomas DeMarco, (January 70, p. 54). 45. Hardware Versus Software Monitors, T. Y. Johnston CONFERENCE NATIONAL COMPUTER (SHARE XXXIV Proceedings, March 70, vol. 1, pp. 523-547). 33. Performance Determination - The Selection of Tools, If Any, Thomas E. Bell, SYSTEMS DEVELOPMENT CORPORATION REPORTS (1973, pp. 31-38). 46. Two Approaches for Measuring the Performance 34. A Method of Evaluating Mass Storage Effects of Time-Sharing Systems, Arnold D. Karush, on Systems Performance, M. A. Diethelm, (May 69, SP-3364) (1973, pp. 69-74). THE COMPUTER BULLETIN 35. Use of the SPASM Software Monitor to Evaluate the Performance of the Burroughs B6700, Jack 47. A Review and Comparison of Certain Methods of M. Schwartz and Donald S. Wyner, Computer Performance Evaluation, J. A. Smith, (1973, pp. 109-120) (May 68, pp. 13-18)

36. A Structural Approach to Computer Performance UNIVERSITY REPORTS Analysis, P. H. Hughes and G. Moe, (1973, pp. 109-120) 48. Evaluation of Time-Sharing Systems, Georgia Institute of Technology, RAND REPORTS (GITIS-69-72)

37. Computer Systems Analysis Methodology: 49. Performance Evaluation and Scheduling of Studies in Measuring, Evaluating, and Computers, Engineering Summer Conference, Simulating Computer Systems, B. W. Boehm, (July 1972, University of Michigan). (September 70, R-520-NASA) 50. Statistical Computer Performance Evaluation, 38. Experience with the Extendable Computer Sys- W. F. Freiberger, tem Simulator, D. W. Kosy (1972, University of Rhode Island). (December 70, R-560-NASA/PR) BOOKS 39. Computer Performance Analysis: Measurement Objectives and Tools, T. E. Bell, 51. Joslin, E. 0., Computer Selection . Addison- (February 71, R-584-NASA/PR) Wesley Publishing Co. (1968).

22 .

USE OF SMF DATA FOR PERFORMANCE ANALYSIS AND RESOURCE ACCOUNTING ON IBM LARGE-SCALE COMPUTERS

R. E. Betz

Boeing Computer Services, Inc.

INTRODUCTION supervisor calls (SVC's) such as obtain core, allocate devices, execute channel The search for ways to measure the per- program (EXCP) and set timer. Modules formance of computer systems has led to to perform these functions either are the development of sophisticated hard- already resident in memory and are ware and software monitoring techniques. executed or are loaded from the system These tools provide visibility of the device and executed. Other portions of utilization of resources, such as CPU, the supervisor read the job card deck; channels and devices and also, with schedule, initiate, and terminate the proper analysis, an indication of job; and print the output. internal bottlenecks within the computer system. By themselves, however, these The capability to perform online dataset utilization values tell a data process- editing and online application program ing manager very little about his current execution is provided via the Time capacity, how much and what type of Sharing Option (TSO) . TSO is a special additional work can be processed or type of application program which is whether the configuration can be reduced given the ability to execute some without suffering throughput degradation. privileged instructions usually The manager has a further problem in restricted to the supervisor. knowing when to apply the monitoring tools, in determining beforehand that The System Management Facility is so the system is operating inefficiently named because it is a series of system or in identifying those applications exit points that allow the user to which are causing the problem. program decisions peculiar to his system. For example, an exit is provided Boeing Computer Services, Inc. (ECS), immediately after a JOB CARD is read. when faced with the problem of managing The computing facility may have decided many nationwide large scale IBM computers, to abort all jobs with invalid account turned to the use of System Management numbers and make a check of the job's Facility (SMF) accounting data. Through account number at this exit point. experience and slight modifications to When an invalid account number is SMF a reporting system was developed encountered, an appropriate message that quantifies performance and capacity. is written and the job is terminated. This reporting system is known as the SARA (System Analysis and Resource As the application program is processed Accounting) system and is in use in all SMF data is collected and stored in computing centers managed by BCS. memory buffers. Wall clock time data is collected on such items as when the job This paper briefly describes the SARA was read in, when the job step was system, discusses some of its uses initiated, when device allocation in past studies, and discusses the started, and when the job step was impact of virtual systems on SMF terminated. Resource utilization data reporting. The SMF data base is explain- is collected such as how much memory was ed in order to put the SARA function in allocated, location and type of data sets perspective with the other monitoring or files allocated, and number of I/O techniques accesses (EXCP's) made to each data set. When the job step terminates, the SMF SYSTEM OVERVIEW data collected on it is output to a file. At the completion of all job steps for a Figure 1 is a simple overview of a job, another type of record is written large scale BCS computer system, to the file to describe the job. OS/MVT with TSO. As application Similar data is collected causing programs are processed they make demands different record types to be written at on the supervisor in the form of the completion of printing or of a TSO

23 . . s

terminal session. All types of records Accounting data usually includes devices are identified by a unique number allocated, CPU time, wall clock time of (0 through 255) contained in the record. critical events and memory allocated. Note that the supervisor does not This type of data can provide informa- collect data on the functions it must per- tion on the work mix and on the resource form in support of the application utilizations of the application programs. programs and thus SMF data is only The data has further advantages of (1) indicative of resources used for applica- continual availability and (2) report- tion programs. ing in terms familiar to the computer manager FIGURE 1. SYSTEM OVERVIEW, OS/MVT WITH TSO BCS EXTENSIONS TO SMF DATA

BCS began using SMF for performance reporting as soon as the data became CPU available with IBM system releases. As weaknesses of data content became appar- ent, programs were written by BCS for various SMF exits and modifications were APPLICATICN PPOGRAM 1 made to interrjal system code to supple- ment the data. This additional data STEP 2 included the start time of device STEP 1 allocation and the start time of problem program loading (this data is present in APPLICATICN PROGRAM 2 the latest system releases) . It also included counts of tape mounts due to end of volumes, counts of disk mounts, time STEP 1 STEP 2 STEP 3 TIME SHARING in, and absolute memory _OPTiq^ roll-out status, TS address of region allocation. SUPERVISOR I Two significant additions to the SMF SUPERVISOR FUNCnCNS data were a calculation of the job step single stream run time, called Resource

Utilization Time (RUT) , and a calcula- tion of the proportionate resources used SUPERVISOR APPLICATION by the job step, called Computer Resource MXULES PROGRAM Units (CRU's) MXULES SMF OR DATA The RUT and CRU concepts were originated to provide repeatable billing to custom- ers. A job can be processed on any BCS computer system and in any workload environment and be billed the same cost. RUT and CRU became important as MONITORING TECHNIQUES soon performance indicators as discussed in the following sections. Software monitors are usually used to supervisor activity intercept (SVC's) THE SARA SYSTEM for module loading activity, to sample tables for various queue internal The BCS entries of RUT and CRU and the lengths/times, and/or to sample devices optional SMF extensions allow the SARA for device busy time. system to overcome the limitations of accounting data for system analysis. Hardware monitors physically attach to the In addition, during the processing of SMF computer system. They usually measure data, SARA provides the capability of resource utilizations by timing the calibrating the data to the computer duration of pulses of the internal configuration and workload so that EXCP ' circuitry. can be transformed into device and channel utilizations. The results of The disadvantages of the above monitors SMF data are now calibrated to unique are they tend to expensive if not that be computing environments and can be needed continually and that it is diffi- different for device locations as well cult to correlate the workload impact on as device types. Although the device/ the When a computer manager system. channel utilizations are estimated, wants to know what his present perform- they help to pinpoint trouble spots ance what is and additional workload can much more readily than do counts of be processed, it means very little to him EXCP's. The calibration constants of that the channels are being used 30% and each BCS computer system are cali- the CPU 55%. brated periodically.

24 . . . .

During the report program processing, that resulted, but because they demon- the SMF data is analyzed by the machine strate some of the uses of SMF type data states of the multiprogramming environ- when organized properly. ment. The CPU and I/O activities of a job step are averaged over the period JOB SUMMARY REPORT from program load time to job step termination. A snapshot of system Before proceeding to a system study, it processing at any time period would show may be well to introduce the type of a level of activity with a number of information provided by SARA. The JOB jobs processing at a specific CPU SUMMARY REPORT, shown in Figure 2, gives and I/O activity level with a specific an overview of system performance and amount of memory and devices allocated. shows total job resource requirements. These parameters represent a machine Figure 2 summarizes an eleven hour, second state until one of the job steps stops shift period of operation for a 370/165, or another starts. The activities of although the report period is selectable all machine states are accounted and and may cover any number of hours, days reported as illustrated in a later and/or weeks section which shows some of the report types This particular report (Figure 2) came from the main processor of a dual 370/165 Using CRU's, one can calculate CPU's per configuration driven by a LASP operating hour (the CRU's generated per hour of system. The 24002 (seven-track tapes) system active time) or the problem and the 24003 (nine-track tapes) device programs CRU rate (the CRU's generated types are unique to this facility, since per hour of problem program run time) they were so named during the calibration These figures quantify the machine capa- input phase of SARA (the device type city required to process the given work- notation extracted from the operating load. CRU can thus be used as a system by SMF, denotes that a tape drive figure of merit for computer performance. is a 2400 (or 3400) device type and makes no distinction between different types of Using RUT, one can calculate RUT hours tape drives) per system active hour (this is termed "throughput" in SARA and is the This system is displaying relatively good accumulated RUT of all job steps over performance with good "throughput" and an interval of system active time) "CRU/hour" values. Two criteria which One can also calculate a job-lengthen- indicate possible performance improvements ing-factor by dividing job step RUT are the relatively low "percent CPU" and into the actual job step run time. "job lengthening factor". Memory size These figures indicate internal conflicts does not appear to be a problem per the caused by poor machine loading or a poor- low value of "memory wait" but "device ly tuned system, and thus give an wait" could be excessive. This system indication of multiprogramming efficiency. will be studied further in the follow- ing examples of SARA reports. SARA also calculates "memory wait time" FIGURE 2 and "device wait time" . Memory wait time is the time from termination of SARA JOB SUMMARY REPORT the last job step to start of device SARA JOB SUMMARY REPORT FROM 73118 TO 73118 allocation for the current job step. TOTALING 1 DAY MACHINE=165A (This wait time also includes some device deallocation and initiate time, NO. MEMORY DEVICE but long periods indicate a wait for JOBS STEPS WATT WATT memory.) Device wait time is the time from start of device allocation to TOTAI£ 117 975 2.000 6.194 start of problem program load. CPU RUN ROT CRU HOURS HOURS HOURS TYPICAL SARA STUDIES TOTALS 4.945 50.896 35.189 6488.25 BCS makes a technical audit of its 24003 24003 24002 24002 large scale computers at least once a HOURS NUMBER HOURS NUMBER year. The audit is performed by personnel who are familiar with TOTAI£ 9.565 499 4.992 306 hardware/software monitors and SARA. 3330 3330 2314 2314 The audit establishes the CRU capacity HOURS NUMBER HOURS NUMBER of the machine, its performance in meeting priority demands, identifies TOTALS 4.967 2778 0.197 103 bottlenecks and makes recommendations THROUOMjr = 3.20 for system and/or configuration changes. AVERAGE CRU/HOUR = 589.75 LENGTHENING FPCTOR The following sample reports were taken OF AVERAGE JOB = 1.45 PERCENT CPU = 44.9 from audits and are presented not UP TIME = 11.002 TOTAL TIME ACCOUNTED FOR=11.002 because of major benefits or savings NUMBER OF TAPE MXJNTS = 861, NUMBER OF DISK MOUNTS = 7

25 : s

RESOURCE ALLOCATION DISTRIBUTION The Resource Distribution Reports showed that four 9-track drives (with To effectively manage a configuration, it minimal risk) and no 7-track drives is necessary to know the utilization of could be released from the main pro- resources and specifically how that cessor. No 9-track drives but four utilization is distributed. To con- 7-track drives could be released from the figure the proper number of tape drives, support processor. To quantify the risk, one must know how much time the minimum the eight drives were "taped off" and number and subsequent numbers of tape restricted from operations use. "Before" drives are required. The "Resource and "after" runs of SARA showed no Allocation Distribution" report, shown discernible degradation of capacity or in Figure 3, illustrates this distri- throughput and resulted in a trade butive data. of tape drives for disk.

As the multiprogramming machine states These distribution reports are available are analyzed, SARA records the discrete for other devices, memory, CPU, channels, number or amount of resources allocated concurrent job streams, throughput, CRU ' or used at different machine states. and direct access data sets. Figure 3 is an example of this type of data for allocated nine-track tape drives. FIGURE 3 The report period was based on a pre- RESOURCE ALLOCATION DISTRIBUTICN dominantly scheduled production period when tape drive demand was the highest. SARA RESOURCE DISTRIBUTION REPORT FRCM 73118 The 24003 device type is unique to this TO 73118 TOTALING 1 DAY MACHINE=165A facility since it was so named during the calibration input phase. The report ALOC ACCUM DISTRIBUTION shows 24003 TIME %TIME * 10 *' 20 * 30 * 40 0 The discrete increments of the 0 0.002 0.02 0.02 resource (in this example zero to twenty eight 24003 tape drives) 1 0.003 0.03 0.04 2 0.005 0.04 0.08 0 The time in hours that exactly 0, 3 0.018 0.16 0.24

1, 2, . . ., 28 tape drives were 4 0.048 0.44 0.68 allocated 5 0.359 3.27 3.95 AAA 6 0.179 1.63 5.58 AA 0 The percent of the total time period 7 0.195 1.77 7.35 AA that is represented by the 8 AAAA allocation time of the discrete 0.464 4.22 11.58 increment 9 0.417 3.79 15.37 AAAA 10 0.637 5.79 21.16 AAAAAA 9 The accumulated percent 11 0.212 1.93 23.08 AA 12 0.810 7.36 30.44 AAAAAAA A graphic distribution to the nearest 0 13 39.35 AAAAAAAAA one percent of the discrete 0.980 8.91 14 AAAA increment percentage. 0.479 4.36 43.71 15 0.836 7.59 51.30 AAAAAAAA 16 0.815 7.41 58.71 AAAAAAA The arithmetic mean number of tape drives 17 0.358 3.26 61.97 AAA allocated is reported as well as the 18 0.905 8.23 70.20 AAAAAAAA "number of units or less" that are 19 0.789 7.17 77.37 AAAAAAA allocated 95% of the time. 20 0.256 2.33 79.70 AA 21 0.536 4.87 84.57 AAAAA This particular distribution is from the 22 0.497 4.52 89.09 AAAAA same main processor of a dual 370/165 23 0.575 5.23 94.32 AAAAA configuration as was discussed in 24 0.097 0.88 95.20 A Figure 2. The total configuration had 25 0.232 2.11 97.30 AA 84 tape drives, 28 nine- track and 20 26 0.165 1.50 98.80 A seven-track on the main processor and 27 0.071 0.65 99.45 A 20 nine-track and 16 seven-track on the 28 0.061 0.55 100.00 A support processor. In addition, the configuration had thirty two 3330 disk MEAN UTILIZATION = 15.39 spindles and sixteen 2314 disk spindles. Facility management wanted to decrease 95% UTILIZATION UNDER 23.78 their dependence on tape and add more 3330 modules but they were constrained to maintain the same config- uration costs with no schedule slides. A logical tradeoff was: How many tape drives can be released to recoup the costs of additional 3330 modules?

26 .

JOB STREAM EFFICIENCY REPORT LASP interface commands such as status of job, backlog, submit, etc. The support processor while providing OS services A question frequently asked of an OS/MVT for a large number of job streams and environment is: What is an optimum LASP support for both systems, generated number of active concurrent jobs? Too internal supervisor queuing such that the many active jobs may produce internal problem programs were seriously degraded. conflicts sufficient to degrade capacity compared to a number of smaller jobs. Too The main processor, however, had not yet few active jobs may not achieve maximum peaked at seven initiators and continued capacity. to produce more work with each increase of concurrent jobs. It is possible that As concurrent job streams are determined an increase in initiators would produce

from the multiprogramming machine state an increase in throughput and CRU ' s per analysis, SARA records the average hour number or amount of resources allocated or used and the performance criteria at It was recommended that the number of the different levels of concurrent job initiators on the support processor be streams. Figure 4 is an abbreviated reduced by one and on the main processor example of this type of data showing be increased by one. Although the only the performance related criteria. recommendation was followed, the results of the changes are not yet available. The report shows for each level of concurrent job streams: FIGURE 4 JOB STREAM EFFICIENCY REPORT, • the level of job streams DUAL 370/165 's « the percent of time that exactly

0, 1, 2, . . .,7 job streams SUPPORT PROCESSOR were active Concurrent 0 the throughput Job Streams Time Thruput % CPU CRU's/Hr . 0 CPU percent 1 7 .81 16 284 • CRU's/hour 2 6 1.66 30 482 The SMF data was collected from the same 3 20 2.66 48 651 dual 370/165 configuration as in Figure 3 but for a period representing 4 23 3.35 49 581 more of a demand (as opposed to sched- 457 uled) workload. 5 29 2.77 35 6 13 3.21 26 448 This facility was previously running ten initiators on each system. The BCS 7 2 4.09 20 432 software monitor showed heavy device and I PROCESSOR channel queuing and so the active initiators was reduced to seven. The 1 0 0 0 0 report in Figure 4 shows vividly that neither ten nor seven active initiators 2 0 0 0 0 was correct. 3 2 2.84 39 447

The support processor that must handle 4 7 3.06 46 484 the spooling activity (reading cards 5 13 3.96 45 567 and printing output) and scheduling of 6 35 4.09 47 574 jobs seems to have bogged down at more than 4 job streams as witnessed by the 7 43 4.55 49 619 throughput and CRU's/hour (note that data points at less than 10% of the active time are usually discarded as INTERVAL AVERAGE REPORT too small a sample). In addition, LASP response to Remote Job Entry (RJE) Some facilities have a cyclic workload, and a BCS online system seems to have i.e., the daytime workmix characteristics slowed down drastically during the are different from the overnight work- demand workload periods when short jobs mix characteristics. To analyze these were running and much initiating, systems, a capability is required to terminating, card reading and printing observe performance at different inter- output was occurring. This was verified vals of the day, perhaps hourly or by by timing the duration for LASP to shift. The "Interval Average" report refurnish the RJE print buffers and provides this capability. timing its response time to the online/

27 . .

The interval average report shows • the deducted initiators should be chronological intervals of average activated at 1600 per the reduction activity throughout the report period. in job streams (this is about the Figure 5 is an abbreviated example of time that terminal users quit sub- this report showing problem program mitting small jobs since they won't hourly averages of CPU per cent, CRU's/ be processed by quitting time and hour, throughput, core allocated, number submit large overnight jobs; this of job streams and calculated channel is also the time of highest job utilization for the time period from backlog) 0800 to 1800. The initiator job class assignments were This data was collected from a 370/165 reworked (this is a separate study in running HASP RJE and BCS TSO. The work- itself but relies heavily on job class load is primarily priority demand jobs and job priority resource requirements which are submitted via RJE with up to as reported by SARA) . The result was 20 communication lines and TSO support- an increase in CRU's/hour generation ing 60 to 70 users. As jobs are sub- from less than 300 to 350, better satis- mitted, a "job class" is calculated and faction of priority demands and little assigned by the system based on job effect on the improved TSO response times resource requirements. The system with four initiators was doing better than the previous operation with Initiators are given job class assign- six initiators (although the new job ments to best satisfy the priority class assignments were primarily demands (e.g. an initiator assigned job responsible, a concurrent effort to classes A, B, C must process all jobs reorganize the system packs was also of class A before B or C; if no class A beneficial) jobs are in the queue, it may process class B; if no class A's or B's, it may FIGURE 5 process class C) . The calculated job SARA INTERVAL AVERAGE REPORT classes and the initiator class assign- ments are such that small jobs are given TIME PRDB CRU's THRU CORE preferential treatment. The larger the inter\;al CPU /Hr. PUT ALLOC job, the longer the committed time to process the job. 0800-0900 6 142 1.23 198 1000 19 365 3.79 524 This facility had been experiencing 1100 8 195 2.11 323 unacceptable TSO response at 70 users. 1200 14 267 2.24 392 To reduce internal conflicts and/or TSO 1300 31 351 3.01 347 degradation, the number of active 1400 21 350 2.90 402 initiators was reduced from six to 1500 17 251 2.24 402 four during peak TSO periods from 0800 1600 11 234 1.96 351 to 1800. Thus, four problem programs 1700 8 257 2.30 335 would be the most that could run con- 1800 36 340 2.55 246 currently. The two deducted initiators had been primarily assigned to process- JOB CHAN ing large jobs. STMS 2 3 4 5

The reduction in active initiators did 1 20 9 4 4 result in reduced TSO response time. 4 26 2 8 2 SARA showed a reduced job lengthening 3 10 6 5 7 factor, and the BCS software monitor 3 36 27 8 15 showed a significant reduction in 3 26 1 3 0 device/channel queuing. There was also 3 59 3 3 25 a reduction in processing CRU's/hour. 3 22 4 10 11 3 18 8 27 10 Figure 5 is typical of processing with 2 14 3 47 0 four active initiators. It is apparent 2 16 5 3 11 from the Interval Average Report that: THE FOLLOW-UP • the job class assignments were far too weighted to small jobs as The foregoing studies resulted in witnessed by the poor core utili- recommendations for system improvements, zation which in turn produced few but a study should not end at that point. CRU's It is relatively easy to obtain a tape 0 the input workload could not keep of SMF data for post change analysis the initiators active with their even at an offsite location. Analysis present class assignments per of the results of change not only allows the low number of job streams further tuning of the original recom- mendation but provides experience in the most significant types of changes.

28 .

Also, the studies make mention of number of swaps, swap page ins and swap throughput, CRU ' s per hour and the job page outs are recorded. A system type lengthening factor as performance indi- SMF record gives counts of system page cators. With experience and a variable ins, page outs, page reclaims and TSO number of computer configurations to region swapping statistics. investigate, BCS performance analysts have come to recognize the performance FIGURE 6 capability of a given machine config- SYSTEM OVERVIEW, 0S/VS2 uration. This allows an almost instan- taneous analysis of a "machine in trouble" and if the problem cannot be found, the hardware/software monitors CPU are used for a thorough analysis.

VIRTUAL MEMORY SYSTEM DAT Virtual memory systems provide up to 16 megabytes of virtual core, a faster and larger real memory, and a more integra- ted operating system with internal processing algorithms that can be tailor- ed to the workload. The concept of job VIRIUAL STEP 2 VIRIUAL processing does not change. The follow- MEMORY MEMORY ing discussion is based on 0S/VS2- Release 1.6 unless specific details are known on Release 2.0. For example. Release 2 offers a System Activity REAL PROBLEM PROGRAM

Measurement Facility (MFl) , a monitoring MEMDRY ] PAGES function that sounds like a limited On software monitor, but little is known REAL SUPERVISOR-PERFORMS I/O ADDRESS TRANSLATIOJ about it at present.

Figure 6 is a simplified overview of the 0S/VS2 system operation. The primary functional features and differences over OS/MVT are:

9 the Dynamic Address Translator

(DAT) , a hardware device that translates CPU "virtual memory" accesses into "real memory" address locations « the 10 supervisor additional soft- ware function which translates 10 virtual memory addresses into real memory address locations • virtual memory allocation to job steps in 64K byte blocks VS2 OR MVI % real memory utilization by job stqps The decision to change frcm a basic MVI systan in 4K byte pages. (i.e. 370/155,165) to a VS2 paging system (i.e. 370/158,168) involves many The virtual system philosophy then is to considerations including cost, perfor- increase the user's (a memory size mance and/or special applications. The bottleneck heretofore) by giving him up cost tradeoffs are well defined but one to 16 megabytes of usable core and to must still determine what performance decrease the a real memory needed by gains new, faster memory will produce particular the job by making the job's what paging overhead will be suffered most active 4K pages reside in real and what the performance impact will be memory. This along with the internal on applications programs and/or time priority schemes should allow the sharing. Before proceeding with replace- user to load up the system with jobs ment, BCS chose to benchmark the new and let the system work them off as systems efficiently as possible. Benchmark jobs were obtained from each Under 0S/VS2, SMF records for job steps BCS large scale computer. The BCS TSO and TSO sessions include page ins and stimulator was rewritten for 0S/VS2 page outs. In addition, for TSO, the

29 .

(the TSO stimulator called TIOS, TSO Input lower total CPU than under the 2 mega- Output Simulator, emulates the TSO user's byte system) and results in inefficient activity per predefined scripts) . The job processing. Note that the thrashing BCS software monitor was rewritten for threshold algorithm which is supposed to 0S/VS2. Where possible, tests were con- limit paging was not available during ducted using the hardware monitor. SARA these tests. was modified to incorporate the new paging data. FIGURE 7 370/158 TEST RESULTS Testing was done at IBM data centers on both VS2 and MVT. OS/MVT(21.6) VS2 (1.6) VS2 (1.6) 2 MEGABYTE 2 MEGABYTE 1 MEGABYTE APPLICATION PROGRAMS BENCHMARK RESULTS 4 INITIATORS 7 INITIATORS 7 INITIATORS

Figure 7 shows benchmark results of 1. ELAPSED application programs (jobs) run under OS/ TIME HRS 1.907 1.893 1.916* MVT and VS2 on a 370/158, 2 megabyte (DATA FRCH system and a special case of VS2 on a SARA) 370/158, 1 megabyte system. This particu- lar benchmark contains 56 jobs and comes 2. CHJ's/HR 311 306 230 closest to representing an overall BCS (SARA) workload although other benchmarks of CPU and I/O boundness were also run. The 3. THRDUaiPUT 2.03 2.03 1.56 test results are somewhat tailored to BCS (SARA) test requirements (such as for initiators with MVT) and BCS fine tuning. The test 4. J0B- of the 1 megabyte system was prematurely LENGTHINING 1.60 2.70 4.10 aborted at 75% completion because excess- FACTOR (SARA) ive paging extended the elapsed time beyond the scheduled time. 5. PROGRAM CPU% 64 57 44 The test data show some interesting (SARA) results. The CRU generation and through- put rates of MVT and VS2 under these test 6. TOTAL CPU % 75 88 75 environments were about equal. The job (HDWE MONITOR) lengthening factor was very much less under MVT and this was an objective of 7. PROBLEM STATE/ that test. The CPU accounted to applica- SUPERVISOR 54/21 55/33 38/37 tion programs, as shown by SARA, was less STATE-CPU % under the VS2 system. The total CPU, (HDWE MONITOR) as shown by the hardware monitor, was less under the MVT system. This apparent CPU 8. CHANNEL HRS 1.57 1.47 1.48 contradiction is a result of IBM's (HDWE MONITOR) decision to delete from SMF accounting those routines that produced variable CPU times under variable paging requirements. * TEST ABORTED PREMATURELY

The ratio of SMF CPU% to total CPU% on the MVT test results is typical of past TSO BENCHMARK RESULTS MVT system studies. The lower channel hours required under VS2 are reflective TSO was tested using the BCS written TIOS of very little paging activity. (TSO Input Output Simulator) . TIOS simulates any number of terminal users The VS2,1 and 2 megabyte systems test each performing a set of unique commands results make another interesting compari- per a pre-established script. The script son. The excessive job lengthening factor requirements were developed based on a of 4.10 indicates inefficient job process- survey of experienced BCS terminal users. ing for the 1 megabyte system. This TIOS allows simulation of a proportionate system required 260,000 pages as compared TSO user load across different increments to 1800 pages for the 2 megabyte system. of users and thus provides a consistent Approximately 40% of the paging was basis of analysis (quite different from for the performed two jobs. Although analysis of the production TSO system are same, channel hours about the much with its random user load) of the channel time for the 1 megabyte system was spent in paging. The excess- Figure 8 shows an abbreviated SARA inter- ive paging shows increased CPU overhead val report (much like Figure 5 for (as evidenced by the increased ratio application programs) depicting TSO of supervisor CPU to total CPU) and some activity only. The variables that were time waiting for (as evidenced pages by important in analyzing system performance

30 .

for application programs are also impor- IMPLICATIONS OF VS2 ON SARA tant for analyzing TSO. This report shows the number of users who started their session during the interval, the The test results show that excessive pag- average number of users logged on, the ing produces system degradation. This percent of time all users actually means that paging activity and perform- occupied memory (i.e. were swapped in), ance relative to the paging activity will CPU percent, throughput and CRU's somehow have to be monitored. This also per hour. A thorough analysis of TSO holds true for the priority/scheduling can be made by using SARA type data, schemes and the trashing threshold response time data provided by the setting. analysis program of TIOS, and hard- ware/ software monitor data. SARA now contains paging activity for application programs and for the total results show that In general, the test system. This data, at a minimum, allows TS0/VS2 at the present time cannot one to recognize specific application compete with TSO/MVT. TS0/VS2 generates programs that require excessive paging much paging activity, uses three times (and should be reorganized) and to estab- as much CPU as MVT (e.g., from 16% lish performance levels at different to 52% under VS2) produces under MVT , paging levels. With experience, it is 30% less CRU's per hour, and gives hoped that performance correlations can longer response times. These observa- be found that will evolve into more tions are valid for as few as 15 users concise parameters depicting overall to as many as 60 users. system performance.

TEST RESULT CONCLUSIONS As new SMF data becomes available with new 0S/VS2 releases, notably MF/1 the VS2 The test results showed that under Release 2, SARA should have the paging system is not yet an acceptable flexibility to accommodate the data in- replacement for the MVT system but is compatibility with its reporting sufficient to handle some of the special requirements. For example, MF/1 data applications within BCS (e.g., a large may replace our current calibration memory on line system) . The excessive function or if overhead to collect overhead will undoubtedly be reduced in MF/1 data is high, it may be used later system releases. These releases infrequently to support the adjustment will warrant testing because of the of the calibration constants. operating system advantages e.g., job scheduling, device allocation, priority grouping.

Other test results showed that with either operating system: • The 370/158 produced better performance than the 370/155 SUMMARY • the software translation of virtual addresses in a strictly I/O bound benchmark produced more overhead on 370/158 than was BCS has found that the use of accounting balanced by the improvement in data provides a relatively quick and CPU speed over the 370/155 flexible method of system analysis and resource accounting. With experience Automatic Priority Grouping « and appropriate modifications to SMF data, does produce better performance. BCS has been able to establish capaci- ties, determine more cost effective FIGURE 8 configurations and resolve workload SARA TSO INTERVAL REPORT scheduling conflicts for large scale START USER NO. % M % THRU CRU computers TIME STRT USER STOR CPU -PUT /HR 20.750 0 0.0 0 0 0.0 0 Incorporating performance indicators 21.000 0 0.0 0 0 0.0 0 into the SARA reporting system not only 21.250 0 0.0 0 0 0.0 0 aids in evaluating the current production the 21.500 44 17.8 20 4 0.04 20 systems, but also aids in predicting 21.750 0 45.0 179 36 0.29 48 capability of new systems. 22.000 1 44.8 175 33 0.27 46 22.250 1 45.0 174 32 0.27 46 Recently announced operating system 22.500 0 45.0 174 32 0.27 47 changes with their associated scheduling 22.750 0 45.0 174 32 0.27 47 and priority parameters will make more 23.000 0 38.1 141 24 0.22 39 important the quantitative measurement 23.250 0 0.0 0 0 0.0 0 of work done and will increase the value 23.500 0 0.0 0 0 0.0 0 of SARA in studying these systems.

31 BIBLIOGRAPHY 4. BCS Memo, Technical Audit of 1. BCS Docximent No. 10058, SARA System Northwest District 370/165 A&B User's Guide 5. BCS Memo, Technical Report - Data 2. BCS Document No. 10068, 370/158 Services District 370/165 Evaluation Advanced Technology Testing 6. IBM Publications GC35-0004-OS/VS 3. BCS Document No. 10077, TIOS Users SMF, GC-6712-OS SMF, GC28-0667- Guide 0S/VS2 Planning Guide for Release 2.

32 .

USING SMF AND TFLOW FOR PERFORMANCE ENHANCEMENT

J . M . Graves

U.S. Army Management Systems Support Agency

Agency Overview Core Availability

Before I get too involved in this presentation The first area of system performance we examined of how we use SMF and TFLOW for performance was core availability, which on most machines, is enhancement, let me give you a few words of back- one of the most important constraints on the de- ground on my Agency, USAMSSA, The U. S. Army gree of multiprogramming, and therefore on the Management Systems Support Agency. We were maximum utilization of the machine. The more created several years ago to serve the Army tasks that are in core executing concurrently, Staff's Data Processing requirements. From our the higher will be the utilization of the CPU, site in the Pentagon, we handle the full range of the channels, and the devices, provided of course commercial and scientific applications, with a that device allocation problems do not arise from slight bias toward the commercial. Applications the additional concurrently executing tasks. If are processed via batch, TSO, and RJE on a 3 allocation problems do occur as a result of an megabyte IBM 360/65 and 1 and 1/2 megabyte IBM increased level of multiprogramming, a rigidly en- 360/50 using OS/MVT and HASP. As you might forced resource oriented class structure will expect, we have a large number of peripheral help. If this approach fails to reduce allo- devices attached to the two machines, some of cation lockouts to an acceptable level, device which are shared. We have a number of oper- procurement is about the only other avenue of ational conventions, but I need mention here approach. One way of justifying (or perhaps I only those two which have an impact on systems should say rationalizing) the acquisition of performance. The first operating convention is additional peripheral devices to support the what I call the SYSDATA convention, whereby all additional tasks now multiprogramming is to temporary disk data sets are created on per- regard the CPU cost and the core cost per addi- manently mounted work packs given the generic tional task as essentially zero, since these costs name of SYSDATA at System's Generation Time. were a fixed item before you began system tuning. The second operating convention to bear in mind In summary, by making more core available, one is the fact that of the total of 80 spindles on should be able to support extra applications at the two machines, only 41 are available for no cost, or in the worst case, the cost of mounting our 150 private mountable packs, the enough new devices to lower allocation conten- remainder being used by permanently resident tion to an acceptable level. Short of bolting packs on more core, there are a number of approaches one can adopt to increase the average amount of Problem Areas Defined core available at any one time, and I will de- scribe two of our approaches. We have in the past made use of a number of tools to analyze system perfomance: two hardware The Weighted Core Variance Report monitors - DYNAPROBE and XRAY, a software monitor CUE, and last but not least, personal observation In our computing environment, since i\fe do not the computer room. These tools are all excel- bill for resources consumed, there is no econo- lent in defining those areas of system perform- mic incentive for a user to request only that ance that need optimizing, e.g., the CPU is not amount of core which he actually needs to ex- 1001 busy, the channels are not uniformly busy at ecute his program. To the contrary, many con- a high level, certain disk packs show more head siderations in the applications programmer's movement than others, and there are an excessive mileau influence his REGION request to be on number of private pack mounts. However, none of the high side. First, he wants to avoid S 80A the usual tools tell you what to do to eliminate abends -core required exceeds core requested. or alleviate defined system bottlenecks. Our ap- Second, he ivants to provide enough core over and proach to filling this information gap, is to use above the requirements of his program to provide free IBM software - SMF and TFLOW - to produce a dump in the event of an abend. Next, he wants data for subsequent reduction into reports which to provide room for future expansion of his at least point you in the right direction. SMF program mthout the necessity of also changing is a SYSGEN option and TFLOW, which records his JCL. This consideration is particularly everything that goes through f^e trace table, is Important to him if his JCL has been stored on available from your IBM FE as a PTE. After tiie catalogued procedure librar>', S\'S1.PR0CLIB. making the indicated system changes, v.'e i^ack Wext, t^^ere is f^p cf^rtain knnwledae that cod- to the monitors to validate results. ing and keypunching effort can be saved by

33 specifying the REGION request at a higher level One of the most interesting design features of than the step EXEC statement, e.g., the EXEC OS is the use of re-entrant modules resident PROC statement, or on the JOB statement. In in a shareable area of core designated the either case, the REGION specified is applied to Link Pack Area. If these modules were not so each step of a group of steps, and must there- located and designed, there would have to be fore be large enough to enable the largest step multiple copies of the same module in core at of the group to execute. Obviously this practice the same time - one copy for each task that can lead to gross core waste if there is a large required its services. By having these modules variance between the REGION requirement of the core resident, the various tasks multipro- largest and smallest steps. For exanple, we gramming can share the one copy, thus effec- have experienced 300K compile, link, and go jobs tively saving an amount of core equal to the when the 30 OK was required only for the go step. number of concurrently executing tasks, minus 1, times the size of the module. The higher One approach to this problem would have been to the degree of multiprogramming and the larger introduce a JCL scan to prohibit the use of the the module in question, the greater is the ben-

REGION parameter on the JOB statement or the efit in core savings , or looking at the EXEC PROC statement. I personally preferred situation from another point of view, the this approach, but it was the consensus of our greater is the core space availability. This management that there was nothing wrong per se increase in availability can be anything but with specifying the REGION requirement at a trivial on a machine with a healthy compliment higher JCL level than the STEP statement. Only of core, since some access method modules are when such a specification results in wasted core substantial in size - 2 to 4K for some BISAM should the user be critisized. Since there is no § QISAM modules for example. Our criteria for way to know how much core actually will be used validating the RAM list is to select for link until after it is used, and after the fact re- pack residency only those modules which show porting system had to be developed to identify concurrent usage by two or more tasks more daily, on an exception basis, those users lAose than 501 of the time. Unfortunately, we were wasteful REGION requests were causing the great- unable to locate any free software that would est inq^act on core availability. This mechanism give us a report of concurrent utilization of is the Weighted Core Variance Report. The idea access methods. We then looked to see if we of the report is to produce an index number for could generate such a report from the data we each job step whose magnitude is a measure of the had on hand. Such data would have to provide degree of waste his region request has produced a time stamp giving the date and time that each in thd system. Acceptance of the report by task began using each access method module, and management was contingent upon the repeatability another time stamp when the task stopped using of the computed index number. From SNIP type 4 the module. If we had this information, it step termination records, the variance bet^^^een would be a relatively simple manner to cal- the requested core and the used core is confuted, culate for each access method module the length a 6K abend/growth allowance is subtracted, and of time there were from zero to 15 concurrent the result is weighted by the amount of CPU time users and then relate the total time for each used by the step. This product is further level of multiprogramming to the total measure- weighted by dividing by the relative speed of the ment time as a percentage. hierarchy of core where executed. Negative re- sults were not considered, since this result It becomes clear that the stop and start time- means the programmer was able to come within 6K stamps were already available in the form of on his core request. CPU time was used rather SMF type 4 Step Termination records and the than thru time since it was felt that this SMF type 34 TSO Step Termination records. figure was more independent of job mix consi- What was not available was the all important derations. The weighted variances are listed in access method module names invoked by each descending order with a maximum of 20 job steps step. This is where TFLOW gets into the act. per administrative unit. It was felt that 20 We had used TFLOW previously to get a more job steps was about the maximum that a manager conplete count of SVC requests than CUE sup- would attend to on one day. A minimum index plies, and so were familiar with the fact value of 500 was established to eliminate re- that when SVC 8 (load) is detected, the porting on trivial index numbers. The report is module loaded was provided in a 32 byte TFLOW distributed daily to 40 or 50 management level record. Therefore, in the TFLOW data, we personnel. Implementation of the teport was met had the access method module name, but no with widespread grousing which has continued up time stamps or step identification. The pro- to the present. This, I feel, is a strong indi- blem was how to relate the TFLOW data and the cation that the report is serving its purpose. SMF data. Fortunately, TFLOW gives the ad- dress of the Task Control Block (TCB) of the Access Method Module Utilization Report task requesting the access method. This information is provided to an optional user- Lest I give the impression that my idea of v^^itten exit at execution time. By seardiing system tuning is to attack problem programming, a short chain of control blocks - from the TCB let me describe the second of our approaches to to the TCT to the JMR - we were able to locate achieve a higher level of core availability - the SMF JOBLOG field consisting of the JOBNAME,

34 READER on date, READER on time. This JOBLOG maybe the required infonnation is somewhere out data is reproduced in every SMF record pro- in the SMF data set. We certainly were paying duced by the job in execution, and is unique a considerable system overhead in having SMF to that job. Also located in the JMR is the in the first place; wouldn't it be nice if we current step number. By copying the JOBLOG got something more out of it than job account- and step number to a TFLOW user record also ing and the core availability uses I've al- containing the access method module name, we ready described. have a means of matching the TFLOW user record to the SMF step termination record ivritten some- Disk Multiprogramming Report time after the access method module is loaded. By suppressing the output of the standard TFLOW We developed a report, the Disk Concurrent record, and recording only those user records Usage Report, which gives an approximation of that represent a load request for an access the CUE device busy statistic. We started method module, the overhead attendant with with the assumption that if there are a large TFLOW' s BSAM tape ^vriting is kept to a minimum. number of concurrent users of a single disk pack, it is virtually certain that there will RAM List Validation Results be a correspondingly high device busy stati- stic and that probably a large component of As a direct result of this analysis of access this busy figure is seek time. Also for method module utilization at USA^^SSA, we have private packs not permanently mounted, one recommended that tlie following changes be made can reasonably infer that a medium to high to the standard IBM RA^^ list: degree of concurrent use requires many oper- ator interventions for mounts. Getting the Additions --Totaling------bytes measure of concurrent disk utilization follows the same approach as concurrent access method Deletions " -1^538..bytes module utilization. Again type 4 SMF records were used to supply the start and end time Some of these additions and deletions were sur- stamps for each step run in the system. In- prising. The additional BISAM and QISAM modules stead of having to go to TFLOW to pick up the indicated a much heavier use of ISAM than we had name of the shared resource, this time we imagined. The deletion of the backward read on were able to find the volume serial number tape, and the various unit record modules is (VOL/SER) of each disk pack containing each surprising only in that we should have thought data set on which end of volume processing of it before but didn't, since practically occurred in the SMF types 14 and 15, or EOV everyone at USAMSSA uses disk sort work files records. The types 4, 14, and 15 SMF records instead of tape sort work files where backward are matched to one another by the same common reading is done, and having HASP in the system denominator as was used in the Access Method precludes a number of imit record operations, report -- the SMF JOBLOG, consisting of Job- since HASP uses the EXCP level of I/O. name, Reader On Date, and Reader On Time. Locating the step which processed the data set What Increased Core Availability has meant to ivhich came to end of volume is accomplished by USAMSSA "fitting" the record ivrite tim.e of the type 14 or 15 record to the step start and end times. Use of these reports, plus other systems tuning Once we have the time a step starts to use the techniques, has enabled us to go from 12 to 1^ pack and the time is stops using the packs, it initiators in our night batch setup on the ircdel is a simple matter to resequence the combined 65, and from 3 to 5 TSO swap regions plus an data and compute the percentage of time there extra background initiator in our daytime setup. were varying numbers of concurrent users on the CPU utilization has gone from the 65% - 75% pack. busv range to 90 - 100% busy. Except for week- ends, we are able to sustain this level of We have used this report in several different activity around the clock on the model 65. ways. One way is to validate the permanently resident (PRESRES) list for private packs; So much for core availability, the increase of that is, are all private packs that are per- which was intended to tap the unused capacity manently resident really busy enough to justi- of our CPU. To a large degree, we have suc- fy their mount status, and are there any non- ceeded in this area. However, the increased resident packs which are so busy they would be degree of multiprogramming has tended to exag- coo'^ candidates for addition to the PRESRES gerate whatever I/O imbalances previously exist- list? Another iray we've used this report is ed. We wanted to be able to keep a continuous to check the activity of our SYSDATA, or work eye on the disk usage area, since we had noted packs. Because OS bases its allocation of that some packs were busier than others and tenporary data sets on the number of open that some packs were frequently mounted and DCB's per channel, one may wind up with con- dismounted". We did not want to run Boole 5 siderably more activity on some work packs than get this Babbage's CUE every day, all day to on others . Our circumx^ention of this problem infonnation, for several reasons: systems is to move a frequently allocated but low I/O overhead for one, and the uneasy feeling that activit)^ data set from a relatively busy channel

35 to the channel containing the high activity ator Halt command) must be subtracted from this SYSDATA work pack. This data set is typically a number if the pack was mounted at IPL time or popular load module library. A big problem in ZEOD since a type 19 record is also written at USAMSSA as far as system tuning goes is the these times. dynamic nature of the use of private data sets as systems come and go. A data set that is very We have also used this report to reorganize active one month may be very inactive the next disk packs, particularly where one data set is month. The only way to keep up with the effects the cause for most of the pack mounting. If that this produces in the system is by constant the data set is very heavily used, we may try monitoring and shuffling of data sets. to find space for it on a peimanently mounted pack. Data Set Activity by Disk Pack Report To aid us in getting a better idea of the nature or qual- ity of the disk activity noted in the previous Conclusion report, we produce a report which gives a history of the activity on each pack by data We feel that the reports just described can be set name, the number of EOV's recorded and the used as a vehicle for system tuning, in that DHMAME of the data set, where that information they suggest courses of action by pointing out would be significant. We wanted to know which system anomalies. We can increase core availa- data sets were being used as load module librar- bility by monitoring programmer core requests ies so that we would have good candidates for with the Weighted Core Variance Report and by relocating to a channel containing an over-used modifying the standard IBM RAM list to accu- SYSDATA pack. We also wanted to know which data rately reflect USAMSSA's use of access methods. sets were used by batch only, TSO only, and We can reduce disk I/O contention by monitoring both batch and TSO in order to have the best disk utilization with the Disk Concurrent Usage group of data sets placed on the 2314 spindles Report and by monitoring data set activity which are shared between our two computers. All with the Pack/DSN Activity Report. In a more of the above information is contained in the passive sense, the reports can be used as a 9vtF type 14 and IS records, with the exception daily verification that the system stayed in tune without the attendant inconvenience of of the TSO versus batch breakout , which we accomplish by analysis of our highly structured running a hardware or software montior. We jobname. Additionally, the number of mounts find these reports to be useful. If you are for the pack is assumed to be the number of dis- interested in acquiring them, see me during mounts, which are recorded in the type 19 SMF the coffee break, or see Mert Batchelder and record. The number of IPL's and ZEOD's (oper- he will give you my address.

36 USACSC SOFTWARE COMPUTER SYSTEM PERFORMANCE MONITOR: SHERLOC

Philip Balcom and Gary Cranson

U.S. Army Computer Systems Command

1. ABSTRACT . SHERLOC Something to Help Everybody This technical paper describes the internal and Reduce Load On Computers external characteristics of SHERLOC, a 360 DOS software monitor written by the US Army Computer SECRIT WATSON Systems Command. SHERLOC primarily displays SHERLOC' s External Way of Accumulating 1) the number of times each core image module is Communication Routine and Taking Samples loaded, and 2) CPU active and WAIT time, by core Initiator and Terminator Obviously Needed address, for the supervisor and each partition. The major advantage of SHERLOC over most, if not FIGURE 1 all, other software monitors is that it displays supervisor CPU time by core address. In this

paper emphasis is placed on the concepts required A. SHERLOC Initialization . (Figure 2). When for a knowledgable system programmer to develop SHERLOC is started by the operator (Box Al), his own tailor-made monitor. Results of using SECRIT receives control and requests information SHERLOC since completion of its first phase in such as sampling rate from the operator (Box Bl), August 1973 are discussed. SHERLOC stands for SECRIT then attains supervisor state and key of S^omething to Help Everybody Reduce Load On zero via a B-transient. SECRIT modifies the ex- Computers. ternal new program status work (PSW) and the supervisor call (SVC) new PSW to point to WATSON.

2. GENERAL DESCRIPTION OF SHERLOC . SECRIT makes all of DOS timer-interruptable by turning on the external interrupt bit in operands SHERLOC is started as a normal DOS job in any 40K of the supervisor's Set System Mask instructions partition. During initialization, SHERLOC re- and in all new PSW's except machine-check. SECRIT quests information such as the sampling rate from sets the Sample flag = don't-sample which WATSON the operator. When the operator then presses the will later use to determine whether to take a sam- external interrupt key, SHERLOC begins sampling ple. SECRIT then sets the operator designated at the specified rate, accumulating, in core, all interval in the hardware timer (Box LI). SECRIT samples. When the operator presses the external adjusts the DOS software clock so that the soft- interrupt key a second time, SHERLOC stops sam- ware clock minus hardware time will always give pling and allows the operator to indicate whether DOS the real time of day. SECRIT then WAITs on he wants SHERLOC to terminate, or to record the an Event Control Block (Box B2). gathered data on tape or printer, or a combina- tion of the above. After satisfying these re- B. When the Timer Underflows . When the timer quests, if SHERLOC has not been told to termi- interval elapses, the hardware timer underflows nate, the operator can press external interrupt (Box A3) causing an external interrupt. The ex- to cause SHERLOC to continue sampling without ternal new PSW becomes the current PSW immediately zeroing its internal buckets, upon timer underflow since SECRIT has modified DOS (Box Gl) so that external interrupts are always A separate program named HOUIES exists to print enabled. WATSON receives control (Box B3) because data from the tape or to combine data from the SECRIT changed the address in the external new PSW tape by addition or subtraction before printing. during initialization (Box Fl). WATSON then Thus, in monitoring for two hours, if data is checks the Sample flag (Box B3) previously set by written to tape at the end of each hour, a print- SECRIT (Box Hi). Since this flag initially indi- out for the last hour alone can be obtained by cates samples are not to be taken, WATSON sets running HOLMES and indicating that the first data another interval in the hardware timer (Box G3), is to be subtracted from the second. Naturally, adjusts the software clock, and loads the external HOmES stands for Handy Off-Line Means of Extend- old PSW (Box H3), This returns control to what- ing SHERLOC. ever was occurring when the previous interval elapsed. This loop (Boxes A3, B3, G3, H3) in the interval elapses but no sample is taken 3. SHERLOC PRINTOUT AND INTERNAL LOGIC . which will continue until the external interrupt key is SHERLOC consists of two modules: SECRIT and pressed (Box A5)„ When pressed, an external WATSON. SECRIT stands for SHERLOC' s External interrupt occurs, giving control to WATSON. Communication Routine, Initiator and Terminator. WATSON inverts the Sample flag (Box B5) and then example, WATSON stands for Way of Accumulating~and ~Taking tests it (Box F5). At this time in our £amples Obviously Needed. Sample flag = sample. Therefore, WATSON returns

37 .

SVC Operator TlMER INTERRUPT PRESSES UNDERFLOWS OCCURS EXTERNAL INTERRUPT

Take a SAMPLE

flAKE ALL Pass control OF DOS Set TIMER TO DOS SVC POST SECRIT TIMER- INTERVAL HANDLER COMPLETE INTERRUPTABLE

Set Return to Return to sample fu\g interrupted interrupted = DON T PROCESS PROCESS SAMPLE

AT SO Set timer Reset interval SYSTEM 4 ® Terminate

S E C R I T to the interrupted process (Box H5) by loading untarily) waiting for the CPU because some higher the external old PSW. The next time the timer priority task is using the CPU, From the SVC 7 internal elapses (Box A3), WATSON will test the and SVC 2 bound bits in each task's Program

Sample flag (Box B3) and decide to take a sample Information Block (PIB) , WATSON can determine (Box F3) which buckets to increment in the EXPLICIT and IMPLICIT WAIT columns of the TASK TABLE. If the

C. Taking a Sample Due to Timer Underflow . A external old PSW indicates the computer was in sample is taken by incrementing various buckets the CPU state, then the Program Interrupt Key in core. (See Appendix 1). In the TOTAL TABLE, (PIK) indicates which bucket in the CPU column to the SAMPLE INTERRUPTS and TOTAL INTERRUPTS increment. buckets are incremented during every sampling. The CPU SAMPLE INTERRUPTS or WAIT SAMPLE In the CORE TABLE, the first column gives the INTERRUPTS bucket is incremented based on hexadecimal address of the starting byte of 16 whether the external old PSW indicates the byte intervals throughout core. When WATSON is computer was in the CPU or WAIT state, taking a sample, it finds the resume PSW of each respectively. explicitly waiting partition in that partition's save area. It determines whether a partition is Regarding the TASK TABLE, there are seven tasks explicitly WAITing from the SVC 7 and SVC 2 bits in DOS with no multi-tasking. Three are the in the PIB, WATSON then increments the WAIT SAM- partitions. The combination of the other four PLES bucket associated with the address in each are generally thought of as the supervisor. Each resume PSW. The resume PSW address is the address task may be using the CPU, or may be explicitly of the instruction next to be executed. Further- (ie voluntarily) waiting for something other than more, if the external old PSW indicates the com- the CPU (eg I/O), or may be implicitly (ie invol- puter was in the CPU state, WATSON increments the

38 . , .

CPU SAMPLES bucket associated with the address in 4. USES TO DATE . the external old PSW. The remaining columns of the CORE TABLE are not created during sampling. There have been primarily two uses of SHERLOC to At this point the sampling process (Box F3) is date. complete. A. Supervisor Improvement. The Computer Systems

D. After Taking a Sample Due to Timer Underflow . Command had decided to change from its previous WATSON then sets another interval in the hardware 8K supervisor to a 20K supervisor, primarily due time (Box G3) , adjusts the software clock, and to additional required and desired features. On loads the external old PSW to resume normal sys- a 360-30, it was found that the additional features tem processing (Box H3). in the 20K system caused a decrease in computer thruput of between 207„ and 307o compared with the E. When an SVC Interrupt Occurs. SVC instruc- 8K system. SHERLOC was developed to improve this tions are executed by both the supervisor and thruput. problem programs. Problem programs execute SVC's to ask the supervisor to initiate I/O, to fetch (See Appendix 3). The supervisor options and other core image modules from disk into core, etc. computer system characteristics for the 8K and 20K When an SVC is executed, an SVC interrupt occurs systems are shown in the columns labelled 1 and 8, (Box A4) and the SVC new PSW becomes the current respectively. The general plan was to compare, PSW. WATSON then receives control since SECRIT for each option, the degradation against the ben- changed the address in the SVC new PSW during efits gained. The largest unknown was the degra- initialization (Box Fl). If the Sample flag in- dation of each option. SHERLOC was used to obtain dicates not to take a sample (Box B4) , control is a rough approximation of the degradation of each passed to the DOS SVC interrupt handler (Box G4) option. For supervisor options, the approximations to handle the SVC. On the other hand, if a sample were developed from matching, by absolute address, is to be taken (Box F4) (See Appendix 2), either the supervisor source coding for each option with the SUPERVISOR or PROBLEM PROGRAM bucket at the the CORE TABLE printout. This provided the CPU time top of the FETCH TABLE is incremented based on the spent in each option. This CPU time was in some supervisor state bit in the SVC old PSW. If the cases only approximate since for some options the SVC is an SVC 1, 2 or 4, the FETCH bucket is code is scattered in the supervisor. This CPU incremented, and the name pointed to by register 1 time was then multiplied by an educated estimate is either found in the table and its bucket incre- of the percentage of constraint of this CPU time mented, or else added to the table with bucket on thruput. This converted CPU time into approx- equal to one. This completes the SVC sampling imate run time. process (Box F4) , Control is then passed to the DOS SVC interrupt handler (Box 04). In the case of the job accounting option, JA=YES, it was decided, with help from the CORE TABLE, that

F . When the Operator Presses External Interrupt this option should be treated as two suboptions. to Record Data. If the operator presses the The first, (STEP) in column 8, keeps track of the external interrupt key (Box A5) when the Sample start and stop time of each step. The second flag = sample, WATSON receives control, inverts (CPU, WAIT, OVHD), keeps track of CPU and WAIT the Sample flag to don' t-sample, POSTs SECRIT time by partition, and overhead time. To implement complete (Box G5) and returns to the interrupted the separation of these two suboptions, the full process. SECRIT then (Box F2) reads commands option would be generated and the (CPU, WAIT, OVHD) from the operator. If the operator indicates coding no-op'ed. that the accumulated samples in core are to be recorded on tape and/or printer, SECRIT accom- A non-supervisor option under consideration was on plishes the recording (Box G2) which library each module would reside and in what order their names would appear in the directory of (See Appendix 2) During printing, the last six a given library. This option affects search time. columns of the CORE TABLE are calculated. The This is the next to last option in the figure. CPU SAMPLE PERCENT column is the CPU SAMPLE column From DOS DSERV listings and the FETCH TABLE, it divided by the SAMPLE INTERRUPTS in the TOTAL was concluded that the sort modules should be on TABLE. The WAIT SAMPLE PERCENT column is the WAIT the Private Core Image Library and that their names SAMPLES column divided by the SAMPLE INTERRUPTS. should be at the beginning of its directory. It The TOTAL SAMPLE PERCENT column is the sum of the was also concluded that no other library adjustments previous two columns. The next three columns are need be made. The improvement (or negative degrad- running totals of the corresponding previous three ation) due to this change was easily approximated columns by calculation.

After recording on tape and/or printer (Figure 2, Now that an approximate run time degradation was

Box G2) , if the operator has not indicated that known for each option, tests 2 thru 7 could be SHERLOC should terminate, SECRIT again WAITs designed, ie , options could be grouped rationally, (Box B2). The operator may then press external based on approximate degradation and feature interrupt (Box A5) to resume sampling. On the desirability. An 'X' in columns 2 thru 7 indicates other hand, if the operator has indicated that that the option was specified as in column 8 rather

SHERLOC should terminate (Box H2) , SECRIT resets than as in column 1. The various supervisors and the previously modified new PSW's and Set System libraries were then created and run with live data. Mask operands to their original status, and goes The results are designated as "Eustis" near the to End-of-Job (Box N2) bottom of the figure. Test 1 took 3007 seconds.

39 The degradation for the other tests used 3007 as 3.98% and the background degradation to 9%, their denominator. None of the times in the figure Since this SPOOL currently runs at about 20 sites include operator intervention. The options in and is expected to soon run at 40, this improve- test 4B were considered optimal. Tests were then ment will be substantial, run with an average volume of live data. The results are designated as "Meade" at the bottom 5. FUTURE SHERLOC IMPROVEMENTS . of the figure. The resulting "Meade" 4B degrad- ation was -0.77o instead of the original 20K We intend to change SHERLOC so that it can be run supervisor degradation of between 207o and 30%, without usual operator assistance. This means This improved supervisor currently runs at about that parameters will be entered by card rather 20 sites and is expected to soon run at 40. than console, and that SHERLOC will start and stop Assuming the following factors, the results of sampling based not on pressing external interrupt, this study will save about $50 per hour * 20 hours but perhaps on the starting and stopping of the per day * 300 days per year * 40 sites * 25% job being monitored. We also intend to write each degradation reduction = $3,000,000 per year. sample to tape rather than incrementing appropriate buckets. Later another program would read these

B. SPOOL Improvement . The second use of SHERLOC samples from tape, increment buckets and print. has been in speeding up a SPOOL program on 360-30fe, This would reduce core required from the current Generally our user programs write print data to 40k to about 4K. We would also complete the docu- tape or disk and writes to a printer. It was menting of SHERLOC. At that time it could be used discovered that a background process would run by others. Later, we may add capabilities such as about 377o slower when SPOOL was running in fore- monitoring the percentage of time each device and ground than when nothing was running in foreground. channel is busy. (See Appendix 1). By SHERLOCing this SPOOL running alone, the CORE TABLE revealed that supervisor 6. CONCLUSION, CPU was 33, 57% and problem program CPU was l,267o.

Each of these percentages was calculated by sub- In conclusion, SHERLOC ' s major advantage over most, tracting the CUMULATIVE CPU SAMPLE PERCENT at the if not all, other software monitors is that it beginning of the appropriate core area from that displays supervisor CPU time by core address. at the end. Furthermore, the CORE TABLE indicated This is possible because WATSON operates above the that the large supervisor CPU usage was due to the operating system. SHERLOC has assisted in increas- unchained printer channel command words since the ing thruput on twenty 360-30' s and 40' s by indicat- problem program was asking the supervisor to han- ing modifications to our supervisor and libraries, dle each print or space operation separately. By and is currently aiding us in improving our SPOOL changing SPOOL to chain up to 20 print and space program. Developing SHERLOC took one man-month. operations together, the supervisor would handle Improving the supervisor and libraries took three only one twentieth as many requests. In our test man-months. These four man-months of effort are environment, this reduced the supervisor CPU to expected to save $3 million per year.

40 1 0 9 £ B

jU r FRjii^ t • • JOB_.Mn,i / MNmLl_^/-iAX!JJ^Lt_,^''^Z/C .J2 . _ J^f.. JJ'±^Z ^}X i'^/l l M

O SPiiuL IN r 1 . 9H91 , R_ F(;,.uR LO D3._. _ SPOuL rE:sj__. frL. _filLi?

SA:^1PLE_INIS _CPiL. s 'V'lPL.r. I NTS wait sahple_ ints -TliTAL INTS 00070O3 001059:? o'oCJTO'j TOTAL T/'BLE

CPU ex' wait im wait ALL rtOUNO 0001 0000000 00093? I " " BG 0000000 0000001 0000000 ' F2 00 000 3 0010993 0000000_<- ' -M TABLE Fl 000 2 22^ oooai2^" 00 005 3 5 ATTN koutime: 00 0000 0 00 00 00 000 JO 0_ 00 " Qij'iF.SCr 0000000 0 00 00 00 oooo()bo CORE TABLE OJOOOOO 0 000000 00 000 00

C_U'M_ CPU WAIT TOTAL C3U WA it' tot'al^ C PU WAIT SAMPl S A"PLE SAMPLE SAMPLE SAMPLE A 0 D H S A Mi'L r s S A-'fPL E S PFI'CCNT PFKCEMT P t^CE'^T Pl-t

9" 79" 00 1 5 0 0 ) 1 9 5 0 01. 7 001 . 79 001 . 79 001 . 0016 0 0 0 3 7 J 0 03.47 0 0 3.47 005.?^-> 00 6. 26 ... , 00 170 J ) 1 1 00 1.01 001 .or ''0(/6.2 7 00 6. 27

00 18 0 jOJ3d1 i "; 0 1 . 2 2 003.22 009. 60 009. 50 . _ . . ^ " .' ' 00190 002 06 0 i3 1 . H 9 'SUPERVISOR 001.89 oi l 39 01 1.39 00 1 A 0 000 3 5 COO. -(2 AREA: 0 00.3 2 011.71 Oil. 71 71' 00 ICO J0:l7;i 0 00. 71 33.57-0.00 0 00. 012.43 012.43 00 IDO 0 Ju3 7 0 00. 3 3 =33.57 0 00.3 3 012. 76 012. 76 OOIE 0 0 0 132 OO0.T3 "000.93 U'i3.70 '013-70

OlOAO ) )0n2 300.01 000.01 03 1.17 031. 17

01F50 00153 3 01. 40 001.40 03 2. 5 032 . 5 i 01^6 0 0 0004 000.03 0 00.0 3 0 3 2.61 032. 61 01-^70 00041 000. 3 7 000.37 03 2.99 032.99 OIF8 0 00063 _ 000. 5 7^ 000. 5 7 033. 57 033. 57 3 12;3C0 ToTi 93 lu;3.oo 1 o'o.oo 0 i 3 . 5 7 100. 'O 133.57 10/^60 00001 03 3 .5-3 1 30 . u 0 13 1.53 10370 JJOlO "o'oj.ot' 300. 09' 0:s'i.67 lOvJ.OO 13 3.67 ll)Sf30 OJOLl nou. 10 000.10 03 3.7 7 113. 0 0 133. 77

I 0 T 9 0 0)021 0 00. 1 000. 19 03 1. 96 100.90 133.96 108 AO 0 JU 02 000.91 000.01 033. 98 100. 00 13 3.9"? najo 300 02 9 00.01 000.01 034.00 100.0 0 134. 00 lOHCO 0 )J10 0 00. 0 0 000.09 03 4. 09 loo.n 3 1 J-f. 09 loan 3 0 3:304 000. 0 ^ > PROBLEM 000.0 3 0 3 4.13 1 00. 0 0 13 4.13 10 9o0 0 00 32 000. 0 1 PROGRAM 000. 0 1 03 4. 15 100. 0 0 134. 15 ^ 1 0 17 0 00 0J9 OOO. 9 AREA: 000. 08 034.23 100.0 0 134.23 1 '3 A 3 0 00 OOP 000. 07 34.83-33.57 000.0 7 034.30 1 oo. 90 13-+. 30 10 A50 000 3 7 0 00. 3 i =1.26 OOO. 3 3 0-S4. 64 1 00.00 13 4.64 lOABO 000 11 0 Oo . 1 0 0 00. 10 03 4 . 74 130.0 0 134. 74 lOEEO JOO J^i 0 00. 0 4 000.0 4 0 14.73 100.0 0 134. 79 IDFAO 0 300} 0G129 000. 02 07 4. f'2 0 7h. 6 5 054. 32 1 74. 62 2 0'>.44 1 p 'p 0 0 30 01 034. 32 174.6 2 20'-i.46 1€32 0 3 30 01 03^. 8 3 174.62 239.46 . , w

APPENDIX I

41 1 J

FECH TABLE SVC FETCH COUNTS =FETCH 0000000000065074 =SUPV 0000000000044623 =PPGM 0000000000006026

$$BATTNG 00001 $JOBCTLG 00 108 $$GTPIKC 00036 t$BEO J4

$$8E0J 00 106 tJOBCTLA 00105 $$BATTNU 0000 1 $JOBCTL

$$GTPIK 00035 $$BOPE!MR 00 106 $$B0PfMR2 00038 $$60PEN

$$BOMS G2 00005 $t80MSG5 0000 1 $$BE0J2 00001 ttBPSW

$JOBCTLn 00019 $$BaCP01 00041 $$BCLGSE 00 26 1 $ JOBCTLK

$$BCBL IS 00027 $$BCBUSR 00133 $$B0PEN2 00098 $$BQMT01 tSBOMTfcS 00030 $$BOIS0l 00044 $iBOIS02 00044 t*BOI 503

$tBOI S04 00044 $$HOIS05 00044 $$B0 IS06 00044 $$B01S07

$$BSETFL 00010 tSBSETFF 00010 $$Bi>bTFG 00010 $$8OSD00

$$BOSI GN 00157 t$BUSD12 00066 i$B0SDI4 00066 $$BUSOC 1

$$BCMT0l 00028 $$BENDFL 00012 t$BENOFF 000 10 $$8CL0S2

$$B0UR01 00003 SORT 00016 SURTRCL 00016 SORTRCB

SORTRCD 00025 SORTRCC 00016 $$Bosuai 00091 *$30SD02

SORTRCN 00016 SORTRSU 00016 SORT ASA 0001b SORTROA

SORTRBA 00016 SURTRPB 00016 SORTRSM 00016 SCRTROC

SSBUMTOi'l OOOll $$BUMT04 00024 LSF100R2 00001 $tBCBUSH

L12ATINJ19 00003 L12ATN13 00015 L12ATN 17 00004 L12ATN18

I.SF13200 00001 LSF134E0 00001 SORTRSN 00012 SOKTRGB

LI 1ATN30 00001 LSF13oEl 0000 1 LSF13bRl 00001 LSF136R2

BALIOVRA 00002 HALIOVRB 00001 $$BSETL GO 157 BAL lOVRC

LSFl3a30 0000 1 L11AT(\136 0000 1 L12ATN15 00001 LSF13835

LSF13850 0000 1 LSF13860 00001 LSF13870 00001 LSF 140B1

LSF14200 00001 LSF14300 00001 LSF14350 00001 LSF14400

L12ATN 14 00001 LSF14800 00001 LSF160EI 00001 LSF16000

APPENDIX 2

42 u •H 3 D* 3 • CO CO (U U o. o •H OOrl

>> m m ^4-1 >^ U) 4-1 4-lXJ rH O O •H U T-l O Q) XI H C O C 3 4J CO nj 0) ^ 00 (do- nJ CO o C >-l 4J O "-I Q> CO o n) n) t/) Q) C/) P O >»*H CJ •H 01 4-1 ^ 4-) Ij «1 (-1 4-1 O. -H 01 CO ui-H C "~i 4J JJ iH c «) c QJ QJ CJ iH D.UScOH-H> CO O o -Q n) p. Q) cu cn -H CO I 00 C > O > •H

TEST # = 1 2 3 4 4A 4B 5 6 7 8

MPS= YES X X X X X X X X BJF TP= NO X X X X X X BTAM EU= NO X X X X X X X X YES ERRLOG= NO X X X YES IT= - NO X X X X X X X F2 TEB= 5 X X X X X X X X 13 TEBV= NO X X X (DASD,13) SKSEP= NO X X X X 16 CE= NO X X X X X X X 800 JA= NO X X X X X X X [STEP] JA= NO X X X [SIO] JA= NO X [CPU,WAIT,OVHD] PTO= NO X X X X X X X YES PCIL= NO X X X X X X X X YES CBF= NO X X X X X X X 5 CCHAIN= NO X X X X X X YES AB= NO X X X X X X X YES WAITM= NO X X X X X X X YES DASDFP= NO X X X X (1,2,2314) [PGM= 25,10,5 X X X X X X X X 100,100,22] JIB= 25 X X X X X X X X 80 CHANQ= 15 X X X X X X X X 40 IODEV= 17 X X X X X X X X 40 [peripherals single X X double] [SVC 51 NO X X X X X X X YES] [assgn tapes to 2 X X X X X X X X 1] [sorts on PCIL NO YES YES YES YES YES YES YES YES NO] [JCL, file alloc, 8K X X X X X X X X 20K] libraries ,workarea

fRun t ime in seconds. 3007 2671 2701 2710 2748 2717 2905 2950 3101 3794

Eustis Degradation in I l_=(T-3007)/3007. 0.0 -11.2 -10.2 -9.9 -8.6 -9.6 -3.4 -1.9 +3.1 +26.2

fRun time

I in seconds. 8245 7987 8186 Meade < Degradation in % l^=(T-8245)/8245. 0.0 -3.1 -0.7

[ ] Logical, not actual, specifications. APPENDIX

43

: .

BENCHMARK EVALUATION OF OPERATING SYSTEM SOFTWARE: EXPERIMENTS ON IBM'S VS/2 SYSTEM

Bruce A. Ketchledge

Bell Telephone Laboratories

This paper concerns the problem of evaluating a Up until the time an IBM virtual system was avail- new and unfamiliar operating system using bench- able in-house (April 1973), the main part of the marking techniques. A methodology for conducting evaluation effort centered, necessarily, on read- a general evaluation of such a system is presented, ing IBM-supplied documentation concerning VS/1 and illustrated with a recent study of IBM's new and VS/2. As will be discussed in Section 2, once VS/2 system. This paper is broken into 2 major a 'live' VS system was at hand, an experimental/ sections benchmark approach was used in gaining familiarity with the system. 1. An Operating System Evaluation Methodology. Once familiarity with the system was gained, formal benchmark tests were conducted and the resulting 2. A Case Study: Evaluation of IBM's data were analyzed. Seen in its entirety, it was VS/2 Operating System. felt that this evaluation project followed a pattern of sufficient generality and interest to be abstracted In the first section, an operating system evalua- in the form of a general approach to operating system tion methodology developed in the Business evaluations. It is this methodology we discuss in Information Systems Support Division at Bell the next section. Laboratories is presented. The second section

illustrates the application of this technique in 1 . An Operating System Eval uation Methodology a recently-completed VS evaluation project. We propose to attack the problem of conducting a Before plunging into the main parts of this paper, general evaluation of a batch operating system in a little background (and a few caveats) are needed. four phases. The principal motivation behind for- malizing the evaluation process and dividing it in- This paper will concern, specifically, methods of to four functional phases is to assure that no conducting a general evaluation of a batch only major characteristic of the system being examined operating system. The use of the word 'general' nor any potentially useful approach to that system implies that the type of evaluation project dealt is excluded from consideration. As will be pointed with would not be aimed at predicting the out below, the particular approach proposed also performance of an already-coded application on has the benefit of facilitating management of the the system (utilizing a particular central project in cases where more than one person is processor). Rather, the situation we envision is involved. the case where the general performance-effecting features of the operating system are to be Some caveats to be borne in mind concerning the identified and probed to a sufficient depth to methodology to be proposed are the following: allow 'ball -park' estimates of the system's Vendor cooperation. performance (e.g., overheads) in a given Available system. processing environment. Internal documentation. Our approach depends heavily on the vendor's co- Moreover, this paper looks at batch-only systems, operation, and having available (perhaps through although much of the general methodology the vendor), a 'live' system to test as well as described might well apply in time-shared or internal documentation of the operating system. batch/time-shared envi ronments Generally, the vendors have been cooperative, especially if the system to be tested is a new As this paper is the outgrowth of an evaluation offering. In any event, the most crucial require- project concerning a particular operating system ment for a successful evaluation effort is the (IBM's VS/2 Operating System), it might be well availability of a 'hands-on' system. In the sequel at this point to briefly sketch the historical we shall assume all of the above elements are background of this project. Shortly after the present during the project. IBM announcement of virtual systems on August 2, 1972, the Business Information Systems The phases of the evaluation project are listed Support Division at Bell Laboratories made a on the next page, along with the rough percentage decision to do a general evaluation of these new of the total evaluation effort each might be expected systems. IBM's clear commitment to virtual to consume. systems, combined with the many new features of these systems, made the need for such an evaluation imperative.

45 : . .

EVALUATION PROJECT PHASES

1. Analysis of Vendor Documentation/Establishment goals set in Phase 1. Section 2 of this paper of Project Goals. (30%) offers an example in which a comparison with the 2. Development of Experiments/Tools. (20%) vendor's other major operating system provided an 3. Testing of Experiments/Tools. (20%) all -important working-hypothesis. 4. Conduct formal benchmark experiments, analyze data. (30%) Phase 2 of the project involves capitalizing on the

knowledge gained in Phase 1 by developing specific The phases are named in such a way as to suggest experiments and tools designed to probe important (roughly) their basic functions. Note that the areas of the operating system. Some, though not percentages shown represent only typical amounts all, of the experiments may be aimed at challenging of the total effort (in man-months) devoted to that the working hypothesis developed in Phase 1. Other phase. Also note that according to this table, experiments may be aimed at directly comparing 70% of the effort is concentrated in the pre-bench- throughput of the new operating system of the same mark phases. . .the significance of this fact will vendor. Generally, each experiment will be handled be discussed below. by one individual (including its design, testing, and benchmark runs) and will include 3 elements:

The first phase of the evaluation project involves . Software to stimulate the system ('Job analysis of vendor documentation and development mix ) ' of 'working hypotheses'. This is a crucial phase . Measurements tools - hardware monitor, of the project (and should consume at least 30% of software monitor, log files, etc. This the total effort). During this phase the following category includes software to reduce the activities occur: collected data.

. Analysis of vendor 'high level' (overview) Detailed specification of the experiment. documentation.

. Analysis of vendor 'low-level' (internals) The evaluation team may decide to allocate one mem- documentation. ber of the team to handle some functions common Establishment of reasonable goals for the to all the experiments (ie, hardware monitoring). project, taking into consideration factors The detailed specification of each experiment, how- such as vendor support, complexity of the ever must be done on an individual basis, especially system to be modeled, benchmark time if one goal of the benchmark is the development available, etc. of a real familarity and self-confidence about the Development of one or more 'working hypo- system by members of the team (these experimental theses' about the system. designs must be reviewed by the team leader, of course). During the second Phase new tools needed Analysis of vendor overview documents is especially to probe sensitive areas of the system may have important if a human factors' evaluation of the to be developed. The tools may serve either or system is desired (i.e., ease of use, complexity, both of the validation and measurement functions. debugging aids, etc.). Internal documentation must be analyzed with special care if potential sources The validation function concerns the checking of for one or more software or hardware probes are important system routines for operation that is to be recognized. For example, if the system consistant with the system's logic (as documented keeps tables of important internal events or queues, by the vendor) and consistent with system initiali- it may be worthwhile to gather that data (with soft- zation parameters. For instance, in a VS/2 system ware) in order to aid in performance evaluation. initialization ('IPL') parameters may be set so This technique will be illustrated in Section 2. as to effectively prevent the VS/2 Dispatcher from 'deactivating^ (marking non-dispatchable) user Goals can be set for the project only when some tasks due to excessive paging. During the VS/2 familiarity with the operating system has been evaluation project discussed in this paper, the developed. Influencing the goals are factors systems parameters were set so as to 'turn-off including task deactivation, and the effect of these para- a . Available resources (time, manpower, computer meter settings was checked independently by time) software monitor which examined the systems'

. Complexity (novelty) of the operating system. Nucleus region and validated that no user tasks were being deactivated. (The use of this monitor Once some familiarity with the system has been is further illustrated in Section 2 of this paper). gained, and goals for the project set, it is impor- Particularly with a new operating system release, tant to structure the project around one or more 'bugs' in the system may cause entire benchmark central 'themes'. In many cases this central theme, runs to be invalidated without causing the system or 'working-hypothesis', is suggested by an out- to 'crash' (thus signalling the difficulty). standing or novel feature of the system to be Passive protection can be provided by a software examined. In some cases a 'straw-man' may be set up, monitor or trace facility, such as described above, which then motivates experiments designed to knock which checks for normal operation of the system it down. In any event, such working hypotheses internals, especially immediately after system about the system serve to keep the experiments initialization. The measurements function, of proposed in Phase 2 of the effort relevant to the course, is central to the entire effort. Software

4 6 . - .

monitor techniques, in particular, are useful for Primarily internals documentation of the operating measuring logical queues in the operating system system was examined, especially prominent in this as well as determining efficiency and stability of role being the IBM-supplied 'Features Supplements' operating system resource-allocation algorithms. for VS/1 and VS/2. The hardware changes IBM intro- The integration of these measurements with more duced with the VS announcement also were examined, conventional measurement techniques (hardware primarily through the 'Guide to the IBM 168' monitors and log files) is illustrated in Section manual 2. After the necessary background in VS v/as obtained, Phase 3 of the evaluation project involves testing goals for the evaluation effort were set. It is and debugging of each experiment and tool on the worthwhile at this point to discuss in some detail system to be benchmarked. This is an excellent the considerations which led to the goals establish- opportunity for each member of the team to acquaint ed for the study, and to list the goals themselves. himself with the system and gain a real feel for the response of the system to various stimuli The VS benchmark had several goals. Various factors (job streams). Ideally, this phase of the eval- inherent in the environment in which the tests were uation project should extend over a considerable run conditioned these goals and restricted the type period of time, to allow intuition about the of tests which could be run. In particular, the system to grow as well as allow revamping of hardware configuration used was a 370/145 CPU with tools or experiments which prove invalid. This 51 2K bytes of memory. The restricted amount of main period also allows the sensitivity of the system storage available made it impossible to do a worth- to initialization parameters (in IBM jargon, 'IPL' while comparison of OS and VS throughput. This was parameters) to be determined (especially if nominal because OS/MVT 21.6 was found to occupy over parameter values are to be selected for the vench- half of main memory when running, thus restricting mark itself). the memory available to user programs in such a way as to limit the number of user programs running Phase 4 of the project involves carrying out the simul tanel usly to one or two at most. Thus any OS formal experiments developed in the earlier phases, results obtained would be mostly a function of the analyzing the resulting data, and documenting the lack of multiprogramming depth in the runs. The results. In most cases, by the time this phase only alternative would have been to run jobs of has been reached, most (70% or more) of the work unrepresentati vely small memory (Region) require- has been done and the experimental designs crystal ments so as to allow more jobs to be active ized. Experience has shown that relatively little simultaneously under OS. This alternative was can be done to correct or revamp experiments yielding discarded. invalid or unreproducible results in this phase... this 'debugging' must be done in Phase 3. Another conditioning factor for both the experiments and the resulting conclusions was the size, speed, Certainly the most interesting part of the project is and architecture of the 370/145 CPU. The 370/145 the data analysis which follows the benchmark does not have the highly centralized control regi- runs. The working hypothesis must be refuted or ster structure, pipelining features, outboard chan- validated at this point, and alternative explana- nels, or cache memory of the larger 370 CPU's. Thus tions (based on internals knowledge gained in Phase questions relating, for example, to the effect of 1) of system behavior advanced if necessary. It is a triple-hierarchy memory (cache, main memory, and crucial that all the experimental results be under- virtual memory) on system performance could not stood in terms of the internal structure of the be explored because of the 145's lack of a cache operating system in order to be able to say the pro- memory. Hardware monitoring was also more diffi- ject was a success. Documentation of the results cult on the 145 than on more 'centralized' CPU's should include those explanations, for they may like the 155 or 165. This fact, however, did not help others unfamiliar with the system comprehend significantly hinder the study. and appreciate the benchmark results. Conclusions and recommendations about the operating system With the above factors in mind, and with an eye on should be written up- as a separate section of the Bell Laboratories' particular interests as a con- final report, and should consider the experimental sultant to Operating Telephone Companies, the fol- results obtained in their totality, as well as lowing goals were set for the benchmark: any other knowledge or intuition gained about the 1. Develop a better understanding of VS internals system during Phase 3. and spread this knowledge among others in Bell Laboratories as a whole. While many of the points made above were very gener- 2. Develop an 'intuitive feel' for the general al, it is easy to give specific examples of their performance of the 145 running under a VS op- implementation in the context of the recently-com- erating system. pleted evaluation of VS/2 done at Bell Laboratories. 3. Lay the foundation for further benchmark and In the following section the course of this evalua- simulation efforts directed at VS through the tion project is followed in light of the discussion development of tools and techniques for VS

of Section 1 analysis. 4. Develop a data base of experimental results useful in_ 2. A Case Study : Evaluation of IBM's VS/2 from a VS system that would be Operating Sys tem validating future simultation and analytic

Phase 1 (analysis of documentation) of the VS/2 models of VS. project began with collection of VS/2 documentation. 5. Highlight areas of strength and weakness of VS to help Operating Telephone Companies (OTC's)

47 make informed decisions concerning acquisition Phase 2 of the project was dominated by development of VS systems. of tools and experiments to test the working hypo- 6. Shed light on possible areas of further explora- thesis as well as answer other questions about tion by Bell Laboratories personnel concerning the VS/2 system. One highlight of this phase was VS performance factors. the development of a software monitor, appropriately named VSMON, which was used both to validate the The above goals were, as discussed earlier in this VS/2 operating system and measure its performance. section, conditioned by the hardware restrictions The development of VSMON began in the first phase of the system tested. In particular no comparison of the project, when specific tables used by the of OS and VS throughput was attempted. Rather VS/2 Paging Supervisor to record the status of the above goals emphasized in-depth probing of each 4K-byte page of main storage were discovered.

VS in its own right (goals 1 and 2), as well as the These tables ( Page Vector Table and Page Frame Table ) building of a foundation for a multi-faceted also record the status of various logical queues in approach to the predictive question itself (goals the VS/2 operating system. Hence it was decided 3 and 4). Goal 5 was included to meet the imme- that in order to validate and measure VS/2 Paging diate needs of OTC data processing managers who Supervisor performance a software monitor which must make dollars and cents decisions, based on would periodically access those tables would be such studies, concerning selection of hardware built. The result was a tool useful in both and software. Finally, any good study results measuring the ownership of real memory by user and in more new questions being asked than old questions system programs and in checking for correct oper- being answered, and goal 6 reflected this fact. ation of the Paging Supervisor. It is worth noting In total, these goals represent a rather typical that in one instance, a potentially harmful 'bug' example of goals for a general batch operating in VS/2 was found using the VSMON monitor; the system evaluation project. affected benchmark experiment was rerun after the 'bug' was fixed. The measurements from the monitor After these goals were set, the development of a have made possible (in combination with log file working-hypothesis about VS/2 was facilitated by data and hardware monitor information) a much more comparison with OS/MVT (on which VS/2 is based). complete picture of the VS/2 system than heretofore Clearly, the main difference between these systems had been obtained. is in the area of considering VS/2's virtual memory capability. Hence the Phase 3 of the project was spread over almost two working hypothesis arrived at was a rather simple months of time, during which each of the 5 major one, i.e., that due to the overheads inherent in experiments was tested, and the VSMON monitor managing virtual memory the paging rates associated checked out. As a result of this extended 'dress- with the system would be the main , of benchmark came determinant of rehersal ' the last phase the system overhead and would have a great effect on off without incident, each experiment operating overall throughput. Implicit in this hypothesis as expected. As of the time of this writing, docu- was the assumption that paging would be rather mentation of the project is in progress. costly (in CPU cycles) in VS/2 and hence the system would degrade as soon as any significant We believe that the methodology set forth in Section amount of paging occured. This hypothesis was 1., and the case study presented in Section 2., pro- shown later to be false, but in the process of vide a useful set of guidelines for those in the designing experiments that tested the hypothesis, field who are anticipating general operating system much was learned about the system. eval uations.

48 REPORT ON FIPS TASK GROUP 13 WORKLOAD DEFINITION AND BENCHMARKING

David W. Lambert

The MITRE Corporation

ABSTRACT response to these invitations we have received 26

nominees from various Federal agencies and 6 from Benchmark testing, or benchmarking, one of industry. These names are being submitted to the several methods for measuring the performance of Assistant Secretary of Commerce for approval and we computer systems, is the method used in the selection hope to be able to start holding meetings in late of computer systems and services by the Federal January or early February of next year. Government. However, present benchmarking tech- niques not only have a number of known technical defi- This morning I would like to cover the five main ciencies, but they also represent a significant expense topics shown in the first vugraph (Figure 1). First, a to both the Federal Government and the computer manu- brief review of the approved program of work for Task facturers involved. Federal Information Processing Group 13. Second, a review of some of the techniques

Standards Task Group 13 has been established to pro- and typical problems involved in the current selection vide a forum and central information exchange on procedures. Next, I will discuss some alternatives to benchmark programs, data methodology, and problems. the present methods used to develop benchmark pro-

The program of work and preliminary findings of Task grams and attempt to clarify a major issue in the

Group 13 are presented in this paper. The issue of selection community at this time : application programs application programs versus synthetic programs within or synthetic programs for benchmarking? Asa fourth

the selection environment is discussed. Significant topic , I would like to review the general methodology technical problem areas requiring continuing research of synthetic program development and some specific and experimentation are identified. tools and techniques currently being used. I would

like conclude with an overview of some key technical INTRODUCTION to problems in workload definition and benchmarking Earlier this year the National Bureau of Standards which require further research and experimentation. approved the formation of FIPS Task Group 13 to serve as an interagency forum and central information ex- change on benchmark programs, data, methodology, and problems. The principal focus of Task Group 13 is

to be on procedures and techniques to increase the tech- FIGURE 1 nical validity and reduce the cost of benchmarking as REPORT ON FEDERAL INFORMATION PROCESSING STANDARDS TASK GROUP 13 used in the selection of computer systems and computer services by the Federal Government. WORKLOAD DEFINITION AND BENCHMARKING

PROGRAM OF WORK In May, invitations were issued to Federal REVIEW OF THE CURRENT SELECTION PROCESS Government Agencies, through the interagency Commit- BENCHMARK PROGRAMS: APPLICATION OR SYNTHETIC? tee on Automatic Data Processing and the FIPS Coordi- THE METHODOLOGY FOR SYNTHETIC PROGRAM nating and Advisory Committee, for nominees for mem- DEVELOPMENT bership to Task Group 13. Invitations for industry • SUMMARY OF PROBLEMS REQUIRING FURTHER participation were also issued through CBEMA. In STUDY AND EXPERIMENTATION

49 PROGRAM OF WORK FIGURES STEPS IN CURRENT SELECTION PROCESS The presently approved program of work is sum- 1. USER DETERMINES TOTAL WORKLOAD REQUIREMENTS marized in the next vugraph (Figure The first three 2). FOR PROJECTED LIFE CYCLE OF SYSTEM. tasks should be self-explanatory. Task 4 is probably 2. USER SELECTS TEST WORKLOAD, TYPICALLY WITH ASSISTANCE OF APPROPRIATE SELECTION AGENCY. the most controversial at this time because it is not 3. SELECTION AGENCY PREPARES AND VALIDATES BENCH- really clear what type of benchmark programs should MARK PROGRAMS, DATA AND PERFORMANCE REQUIREMENTS FOR RFP. be included in any sharing mechanism such as a library. 4. VENDOR MAKES NECESSARY MODIFICATIONS TO BENCH- I will discuss this issue later in my talk. As part of MARK PROGRAMS AND RUNS ON PROPOSED EQUIPMENT this task we plan to test and evaluate a number of avail- CONFIGURATIONS. 5. VENDOR RUNS BENCHMARK TEST FOR USER AND able benchmark programs. If you have programs and/ SELECTION AGENCY FOR PROPOSAL VALIDATION or a test facility and would like to participate in such an PURPOSES.

6. VENDOR RERUNS BENCHMARK TESTS AS PART OF activity, we would be pleased to hear from you. In POST-INSTALLATION ACCEPTANCE TESTING. Task 5, initially we plan to capitalize upon the experi- Step 1 is typically only one aspect of the require- ences of people who have been involved in previous ments analysis and preliminary system design phase selections and to prepare a preliminary set of guidelines for any new system. For upgrades or replacements to to be made available to users and others faced with an existing system, there are a variety of hardware selection problems. Federal Guidelines and Standards and software monitors and other software aids that can will possibly come later. The bibliography (Task 6) is be used to collect data on the existing system. Various intended to include articles and reports covering all programs and statistical analysis techniques can be aspects of workload characterization, workload genera- used to analyze this data to identify the dominant char- tion, benchmarking, comparative evaluation, etc. — acteristics of the present workload. For example, a but limited to the selection environment. This task is number of workers are beginning to use a class of well under way with more than 75 articles having been cluster algorithms to aid in this step. The major identified at this time. problems occur for conceptual systems or systems in

which the workload is expected to change radically in FIGURE2 PROGRAM OF WORK the future: for example, when changing from a batch environment to a time- sharing environment. In these 1. FUNCTION AS A FORUM AND CENTRAL INFORMATION

EXCHANGE cases, it is extremely difficult to estimate the future 2. REVIEW CURRENTLY USED SELECTION PROCEDURES workload, particularly for the later years in the pro- AND TECHNIQUES jected life cycle of the system. 3. IDENTIFY AND EVALUATE NEW TECHNICAL APPROACHES

4. DEVELOP AND RECOMIVIEND A MECHANISM TO FACILITATE In step 2, the general goal is to determine peak SHARING OF BENCHMARK PROGRAMS — 5. PREPARE FEDERAL GUIDELINES AND STANDARDS load periods or to compress long-term workloads

6. DEVELOP A SELECTED BIBLIOGRAPHY such as a 30-day workload — into a representative test

workload that can be run in a reasonable length of

time — say two hours. For batch, multi- programmed THE SELECTION PROCESS systems, the test workload is generally designed to

For the benefit of those of you not familiar with have the same mix of jobs (and in some cases, se-

the selection process, I have listed the typical steps quence of jobs) as the total workload determined in

used in most competitive procurements today in the step 1. For on-line systems, the test workload must

next vugraph (Figure 3). also be designed to represent the mix of terminal jobs

in the total workload. For an upgrade or replacement

5 of an existing system, the peak period or representa- reflected in the costs of the computers to the Federal

tive mix can generally be identified or selected using Government.

workload can also resource utilization data. The test BENCHMARK PROGRAM DEVELOPMENT be verified by comparii^ its resource utilization data Most of the technical problems and costs In the against the original data. For conceptual systems, of current selection process are generally attributed to course, none of this experimental activity can talce having to select, prepare, and run a new set of pro- place. grams for each new procurement. What are some of Within the selection environment, the principal the alternative approaches that could be addressed by goal in step 3 is to develop machine- independent bench- Task Group 13 ? The first one is probably the most mark programs. In current selection procedures, the obvious, and that is to develop tools and techniques benchmark programs are generally drawn from among to simplify the process of sanitizing a given user pro- application programs obtained from the user's installa- gram and then translating it to other machines. This tion. This has a number of drawbacks due to differ- would include preprocessors to flag or convert machine- ences in job control techniques, program languages, dependent job control cards, dimension statements, and data base structures from machine to machine. etc. I am not aware of any serious attempts or pro- Also, in many cases the original programs were posed attempts along this line; however, it seems to tailored to take advantage of some of the particular have some merit for consideration by Task Group 13. features of the user's machine. Most of these prob-

A second approach , which has been proposed but lems have been satisfactorily solved for batch, multi-

never implemented even on an experimental basis , is programmed systems. However, there are a host of to develop and maintain a library of application bench- new problems foreseen with the advent of general pur- mark programs. These programs would then only have pose data management systems, time- sharing and to be translated to other machines once. For each new on-line transaction processing applications. procurement, the user would select some mix of bench- In step 4, the principal objective is to collect marks from this library which best approximates his timing data for each configuration (original plus aug- desired test workload. The library would probably mentations to meet a growth in workload) to be proposed have to be extensive, since it would have to contain in response to the RFP. Most problems and costs in programs for a great variety of engineering, scientific this step are attributed to trying to get the benchmark and business applications. One proposal that I am programs to run on the vendor's own machine because familiar with specified 20 Fortran and 20 Cobol pro- of language differences and data base structure differ- grams, all to have been parameterized to some extent ences. In my own experience, seemingly simple prob- to permit tailoring to a specific user's test workload. lems such as character codes and dimension statements This approach seems to have been set aside by the have been the cause for much aggravation in trying to selection community for a variety of reasons, the get one program to run on another machine. primary one probably being the cost to develop and

In steps 4, 5 and 6, there are major costs on the maintain such a library. There is also some question part of the vendors to maintain the benchmark facility, as to the acceptance of this approach by the users. I

including equipment and personnel. There can also be believe that if the costs and benefits were analyzed on

significant scheduling problems if the vendor is involved a Government- wide basis, this approach may be more

in a nimiber of procurements at the same time. These favorable than it has appeared in the past.

costs , and in fact all the vendor costs in preparing his The third approach, and the one receiving the proposals and running benchmarks, eventually get most attention these days in the selection community,'.

51 is the synthetic program, or S3T3thetic job, approach. FIGURE 4 As most of you know, a synthetic program is a highly ELEMENTS OF SYNTHETIC JOB APPROACH

parameterized program which is designed to represent 1. COLLECT JOB AND SYSTEM DATA ON ACTUAL SYSTEM load either (1) a real program or (2) the on a system 2. SELECT SYNTHETIC PROGRAM TASKS OR MODULES

from a real program. The first form, which is a task- 3. CALIBRATE SYNTHETIC WORKLOAD

oriented program, is generally designed so that it can • SET INDIVIDUAL PROGRAM PARAMETERS

be adjusted to place functional demands on the system • SET JOB MIX AND/OR JOB TIMING PARAMETERS

• to be tested, such as compile, edit, sort, update, and RUN SYNTHETIC WORKLOAD ON TEST SYSTEM • COLLECT AND COMPARE DATA calculate. The second form is a re source- oriented • REPEAT TO ACHIEVE DESIRED DEGREE OF CORRELATION program which can be adjusted to use precisely speci-

fied amoimts of computer resources such as CPU Accounting programs are commonly used for

processing time and l/ O transfers. Neither does any data collection, although there is increasing use of

"useful" processing from an operational standpoint. hardware monitors, trace programs and other soft-

ware packages to get more detailed timing data on the Typically, the test workload can be specified in real (and the synthetic) jobs in execution. As has been terms of say 10 and 15 different types of task-oriented

pointed out earlier in this meeting, it is often neces- synthetics which. Incidentally, may contain common sary to use special techniques or develop special pro- parameterized modules such as tape read, compute, or grams to be able to make specific measurements for print. In the resource-oriented synthetics, there is

the problem at hand. For example , in an experimental normally one original program which is duplicated and program sponsored by the Air Force Directorate of adjusted to represent individual programs in the test ADPE Selection, we are using a Fortran frequency workload. analysis program to obtain run time data on actual pro- The motivation toward synthetics in the selection grams. This data is in turn used to set the parameters environment is of course the same as for an applica- of a synthetic Fortran program, while accounting data tion benchmark library. The benchmark program or (SMF on an IBM 370/155) is used to compare the syn- programs need only be translated and checked out once thetic job to the original user job. For TSO jobs, we by each computer manufacturer, thus minimizing the use a system program called TS TRACE to collect costs involved in future procurements. The most terminal session data, use this data in developing a serious technical question regarding ssmthetics seems synthetic session, and compare the two sessions on the to be their transferability from machine to machine, basis of SMF data. particularly the resource-oriented synthetics, since SUMMARY OF TECHNICAL PROBLEMS demands on resources are obviously dependent on the

architecture of each machine. The task-oriented syn- In this last vugraph (Figure 5) , I have listed thetics seem to offer the most promise within the some of the major problems affecting computer per-

selection environment primarily because they seem formance measurement and evaluation in general and least system dependent. Also, user acceptance would computer selection in particular. be more likely if some means can be developed to ex- The lack of common terminology and measures press or map usei>oriented programs in terms of affects all areas of selection. Including workload task-oriented synthetic programs. characterization, performance specification, and data

SYNTHETIC PROGRAM METHODOLOGY collection and analysis. The semantics problems

alone make communication difficult between users and The general methodology of the synthetic job ap- selection agencies and selection agencies and vendors. proach is stmimarized in the next vugraph (Figure 4).

52 FIGURE 5 Directorate of ADPE Selection, there are also major

SUMMARY OF PROBLEMS implementation problems due to the large variety of

terminals and the lack of standard • STANDARD TERMS AND MEASURES ACROSS MACHINES data communications

• WORKLOAD DETERMINATION AND REPRESENTATION FOR control procedures. CONCEPTUAL SYSTEMS

• TRANSFERABILITY OF SYNTHETIC PROGRAMS On the last point (in Figure 5) there is a new set ON-LINE WORKLOAD SPECIFICATION AND GENERATION FOR of problems as a result of the trend toward virtual SYSTEMS memory machines, computer networks and other . EFFECTS OF NEW COMPUTER SYSTEM ARCHITECTURES ON PRESENT METHODS computer-communications systems. In virtual memory

systems, for example, synthetic benchmark programs

have to be carefully constructed to accurately represent

The next two problem areas have been discussed the dynamic addressing characteristics of user pro-

grams. There are practical problems of to earlier. The fourth problem area is essentially how how mea- sure their addressing characteristics, to benchmark on-line systems, particularly those with as well as how

large numbers of terminals. There are a number of to represent them in a synthetic program. Most syn-

techniques in use at this time: operators at terminals thetics currently under investigation for virtual memory

following a given script; special processors to gener- machines contain matrix calculations which have the

ate the terminal load; and special programs to process property that page fault rates can be adjusted (by changing the location of the data) while keeping the a magnetic tape with pre-stored message traffic. CPU execution times constant. However, because of the lack of common terminal

languages, it is necessary to develop a different Problems such as these cannot be neglected by

script for each vendor's computer, which is not only those involved in the selection process. It is hoped

costly but raises questions of equivalency across that Task Group 13 can serve an effective role in their

machines. For external drivers, such as the Remote- clarification and in identifying and evaluating possible

Terminal Emulator being developed for the Air Force solutions.

53

,

PERFORMANCE MEASUREMENT AT USACSC

Richard Castle

U.S. Army Computer Systems Command

INTRODUCTION o Acquire and use existing monitors to evaluate the state-of-the-art. The United States Army Computer Systems Command (CSC) was activated 31 March 1969 under the o Evaluate the experience gained in using direction of the Assistant Vice Chief of Staff the existing monitors. of the Army (AVCSA) and charged with the respon- sibility for the design, development, programming, o Based upon this experience, develop installation, maintenance, and improvement of specifications for a Command combinational Army multicommand automatic data processing (ADP) hardware/software monitor to be used on information systems. These multicommand auto- multicommand ADP systems. matic data processing information systems must satisfy the information management requirements o Develop the monitor and place it in for as many as 41 identical data processing operation. installations (DPI) located throughout the Continental United States, the Pacific and Task IV progressed to the point at which a hard- Europe. These systems include both management ware monitor was purchased, the X-RAY Model 160; data systems and tactical data systems. The several software monitors were leased, DPPE, DCUE, hardware configuration of the DPI's, on which CUE and Johnson Job Accounting; the monitors were the management information systems are executed, used in operational environments; and specification are not controlled by CSC, but rather are selected development began from the experiences gained in by the Army Computer Systems Support and Evalu- the use of the monitors. At this point, it became ation Command under the direction of the AVCSA evident that industry was keeping abreast of and and in accordance with cost effectiveness studies in some instances leading our specification develop to determine the optimum hardware distribution ment in their monitor development programs. Indeed for multicommand systems. Therefore, since it appeared that by the time milestone 4 was reache hardware configuration changes would involve the desired monitor would already be available off- costly and time consuming optimization studies the-shelf from industry. and acquisition procedures, the continually growing Army data processing needs requires that Consequently, the decision was recently made to the throughput of the multicommand DPI's be attempt to leap-frog industry and start development maximized without reconfiguration. of automated evaluation of performance monitor data This research task was assigned to the Advanced This obviously can only be accomplished by reduc- Technology Directorate and the performance monitors ing the run time of existing data processing currently in the inventory were to be put to more systems and insuring that new systems, currently intensive operational use by the Quality Assurance under development or soon to be fielded, are Directorate optimized for minimum run time. Thus, the need exists in the Computer Systems Command for program performance measurement, to reduce system program APPLICATION OF PERFORMANCE MEASUREMENT AT CSC run time, and for configuration performance measurement, to identify hardware /software bottle- The inventory of performance monitors currently necks, caused by program changes or added workload being used at USACSC are shown in Figure 1. from newly fielded systems. These performance monitors are used by CSC through- APPROACH TO PERFORMANCE MEASUREMENT out the multicommand ADP systems life cycle as shown in Figure 2. Figure 2 also shows the appli- The charter of the Computer Systems Command con- cation of performance monitors by the Quality tains a requirement to conduct ADP research and Assurance Directorate during the life cycle. development in order to keep abreast of the cur- rent state-of-the-art and to insure that the Early in the programming and documentation phase, Command and its customers benefitted from tech- as soon as object programs are available, program nology advances. Recognizing the importance of performance measurement can begin and program run performance measurement, the Commanding General time minimized. In the later phases of system assigned the responsibility for a research task integration and prototype testing both problem to the Quality Assurance Directorate to develop program and system performance monitors are used a Command combinational hardware /software perform- to aid in integration and reduce run time. Later and ance monitor. This task, R&D Task IV, Computer in the life cycle, during system installation System Performance Monitor, was organized to after the system is operational, performance accomplish the following milestones: measurements and analyses are made to insure that no bottlenecks have crept into the system.

55 OWNED/ MONITOR TYPE VENDOR LEASED TRAVEL LIMIT CAPABILITIES

X-RAY 160 Hardware Tesdata Owned None 32 Counter/Timers 2 Distributors 96 Probes

DPPE Software Boole & Babbage Owned Mul ticommand IBM 360 DOS Sites Problem Programs

DCUE Software Boole & Babbage Leased 1 Site Per IBM 360 DOS Systems Lease

Johnson Job Software Johnson Sys, Inc. Leased 1 Site Per IBM 360 OS or DOS Accounting Lease Systems

LEAP Software Lambda Corp. To Be None IBM 360 OS Programs Leased Systems & Accounting

FIGURE 1 Quality Assurance Monitor Inventory

LIFE CYCLE P PMING S Of^RATIO^J & H A DEFINITION DEVELDPI^fff A'lD IHST/'iLATION MAIrfFEi^j^iCE S

SYSTEil DLFAIlill PRjGR/V'J^ING /^^D TEST A^!D

DESIC^J DESIG1 i]oaj[€f[rATio;i II^GFATiaJ

; FI!iAL INITIAL PDR DEBUG OPERATIONAL ms DESIQJ TESTING ItSTiriG SYSTEh UST Ie mi& REVIB^! K-ITEGR CDR TESTING S

PRITO "EST

gGFSR e DESIGN A^!AL/SES PROGRC^^ 0 TEST Q CPEf^ICi^iAL

cASRA e PF{OJECr PLA^lNIi^iG NOTEBOOKS s 7uST REPORTS PERPOi^^^NE DFSR e PmGRAr/mD!jL£ TEST FLANS & s PERroR:fljCE A^iALYSIS eSYSTE'V flARRaTIVES/FmWS PROCEDURES ANALYSIS ® OPERATIOrIA;. 9 TEST CRITERIA RE\TB'iS SPECS 9 CHA^iGE

eSYSTETI A'l/^LYSIS

FIGURE 2 Life Cycle

56 .

These performance monitors are used by CSC through- The system utilizes two IBM S360/50 I mainframes out the multlcommand ADP systems life cycle as with 32 IBM 2401 tape drives and 7 shared IBM 2314 shown in Figure 2. Figure 2 also shows the disk units. The operating system is MET II and is application of performance monitors by the Quality supported on line with five IBM 1403 printers, one Assurance Directorate during the life cycle. IBM 2501 card reader, two IBM 2540 card reader punches and sixteen IBM 2260 remote display stations Early in the programming and documentation phase, for on-line inquiry to the system. The Boole & as soon as object programs are available, program Babbage CUE Software monitor was used to monitor performance measurement can begin and program run the configuration utilization. The system was time minimized. In the later phases of system monitored for forty-nine and one-ha^if hours and integration and prototype testing both problem it was found that the channel usage was unbalanced program and system performance monitors are used and the number of loads from the SVC-Library were to aid in integration and reduce run time. Later excessive. After recommendations for device/ in the life cycle, during system installation channel reassignment and changing the resident and after the system is operational, performance SVC's, the measured time saved was 100 seconds measurements and analyses are made to insure that per hour on each of the CPU's in the number of SVC no bottlenecks have crept into the system. loads and 88 seconds per hour per CPU by reduction in the number of seeks required for SVC loads. The total measured savings was 188 seconds per SOME APPLICATION EXAMPLES hour per CPU, Other recommendations were made and are being implemented, The monitors available to CSC have been used for several different applications in the past two SIDPERS, Ft, Riley. years. Some of the applications that were import- ant for their initial instructiveness in the use Monitoring of SIDPERS Prototype Tests at Ft. Riley, of monitors were the Base Operations System Kansas, was the first trip into the field with the (BASOPS) II Timing Study, the Theater Army Support X-RAY Hardware Monitor. The X-RAY Hardware Monitor Command (Supply) System (TASCOM) (S) monitoring is approximately 5.5 feet tall, 2 feet square and effort and the Standard Installation and Division weighs 460 pounds. Although it is mounted on Personnel System (SIDPERS) prototype tests. The casters and is classed as portable, unless there Standard Army Intermediate Level (Supply) System are elevators or ramps leading to the computer (SAILS) Run-Time Improvement Study is the most site, difficulty can arise in locating the monitor recent and most successful application of perform- in the computer room. If this particular monitor ance measurement and therefore, is presented in is crated and shipped to the computer site by air the greatest detail here, freight, the memory unit of the monitor invariably becomes dislodged and must be reseated in its

BASOPS II Timing Study , connector

The purpose of the BASOPS II project was to The SIDPERS Prototype Tests were conducted on an incorporate expanded functional capabilities into IBM S360/30 with 128K bytes of core and a single the SAILS program to incorporate the operational set of peripherals consisting of four IBM 2401 Standard Finance (STANFIN) programs and to include tape drives, four IBM 2314 disk units, one card a newly developed SIDPERS package. The purpose reader/punch and a single printer. There were no of the timing study was to predict if the three vendor supplied monitor probe points for the IBM subsystems would operate in a targeted 16 hour 360/30. Consequently, probe points had to be time frame on the standard BASOPS S360/30 con- developed from the IBM logic diagrams before figuration with core extended to 128K bytes. monitoring could be continued. This trip was The Boole & Babbage DPPE software monitor was extremely beneficial in revealing problem areas used to form the basis for the computations of in the field shipment, installation and monitoring the timing study. Completed SAILS programs with a hardware monitor, were monitored on an IBM 3360/30 times. Based on these figures the remaining program run times SAILS RUN TIME REDUCTION STUDY were computed and it was determined that the

proposed system daily cycles would require in GENERAL . A study designed to improve the perform- excess of 20 hours at five of the largest BASOPS ance capabilities of a Command standard system installations. Nine medium size installations through the use of performance measurement and would require more than 15, but less than 20 evaluation tools was conducted in June 1973. As hours. The remaining installations were estimated the responsible agent for the development of at less than 15 hours run time. As a result of multicommand standard software systems, any ineff- this study, the SAILS and SIDPERS subsystems are iciencies contained in USACSC developed software currently undergoing run time reduction studies, are multiplied many times over when the software is released for operation at numerous Army sites. TASCOM (S). Generally, programs are released to the field when they run in a specified hardware and software The Theater Army Support Command (Supply), environment and satisfy specified functional TASCOM(S), supports the US Army Material Manage- requirements; performance goals are usually afford- ment Agency, Europe (USAMMAE), which is a theater ed a lower priority. inventory control center having centralized accountability for command stocks in Europe,

57 .

The Standard Army Intermediate Level Supply (SAILS) Task Force Development. To accomplish this study

System, a multicommand , integrated automatic data in an expeditious manner, a team was formed whose processing system, was selected for a run time members had the prerequisite skills and capabil- reduction study. This system was developed for ities as described below. implementation on IBM 360/30-40 systems under DOS. Although SAILS is a newly developed supply and o One individual proficient in the use of related financial management system, selected JCL and performing DOS system tasks (i.e., modules from two other Army Supply Systems are library functions, use and application of included in the total package. As a result of executive software); also knowledgeable in this amalgamation, a fairly complex system has the COBOL language, programming techniques emerged, with 70 programs comprising the daily and hardware timing and performance charact-

cycle. Some of the larger programs, with as many eristics . as 30,000 COBOL source statements, contain numerous overlays to accommodate the 98K core allocation o One individual knowledgeable in the func- constraint designed for the problem program. At tional aspects of the SAILS system as well the time of this study, the system was undergoing as the ADP system functions (i.e., overall prototype testing and the basic cycle run times logic flow and processing concepts; file far exceeded expectations. In fact, it seemed formats, organization and accessing methods; apparent that approval for extension of SAILS and interfaces with the remainder of the to other sites would be contingent upon some SAILS system or other external systems). streamlining of the system. In general, serious doubts were expressed as to whether the existing o One individual capable of operating the hardware at 41 Army bases could handle this add- SAILS system on an IBM 360/30-40; also itional workload. capable of handling the scheduling, management of tapes and disk files, and Although the need for a run time reduction study some statistics collection. surfaced, actually selling the idea of initiating a full scale study was met with some resistance. o Three individuals proficient in the use of To understand this resistance, a brief review of performance monitors, their operation and some of the problems experienced is presented. implementation, as well as the interpret- ation of analysis data and subsequent o Management Support. Difficulty was implementation of various optimization encountered in convincing management that techniques allocating resources to an optimization

effort was as important as supporting other Baseline Cycle Selection . A baseline cycle was priority commitments. Meeting present selected against which all subsequent testing was milestone schedules took precedence over to be performed. Primary considerations in the efficiency of operation. The continual selection of this cycle for monitoring, analysis, receipt of functional modifications also and trial modifications were: was frequently cited as reason for deferring the proposed study. o A cycle for which detail statistical and job accounting information was available. o Past Usage of Performance Measurement Tools. Until recently, the use of software perform- o One which had processed a representative ance monitors at USACSC was limited. For a volume and mix of transactions, long period of time, vendor releases of the software monitors either contained errors o One which had processed the broadest or when employed in a particular environ- possible range of transactions In terms ment, presented some special problem. It of transaction type and logical processing was not always clear who was at fault; situations. This qualification was to however, a fair assessment of the situation insure the highest degree of reliability is that success was only obtained through that trial program modification which were a process of trial and error. The documen- to be made would not adversely affect tation, training and vendor support did not functional logic. prepare the user for the "out of the ordin- ary" situation. A cycle processed during prototype testing, which met the above criteria, was selected. All master o Experience with Optimizing Techniques. files, libraries and inputs necessary for rerunning Once the problems of getting the monitors in a controlled test environment were available. operational were overcome, few personnel For accounting purposes, both console logs and had the necessary training and experience Johnson Job Accounting reports were available. to effectively analyze the trade-offs of

implementing various optimizing alternatives. Subject Program Selection . After giving appropriate The development of ability to identify consideration to the objectives of the effort, improvement potential and make experimental resources available and time limitations, it was modifications took time. obvious that initial efforts would have to be restricted. This restriction was in terms of both However, once these problems were resolved, the the number of programs to be studied and the degree study proceeded smoothly. The approach taken and of optimization attempted. For these reasons, only results obtained follows. nine of the basic cycle programs were selected for study. These programs were selected on the basis

58 of their average run time from statistics accumul- which could be accomplished with three or four ated on all cycles processed during the prototype manhours of analysis and reprogramming effort were testing. The rationale used in selecting these considered for immediate implementation and test- programs was that they would provide the greatest ing. Changes meeting this criteria were not made potential for immediate reductions in cycle run unless they could be accomplished within the con- time. Each program had averaged between one and fines of system resources presently used by the six hours effective production time while the program. Specifically, no changes were considered group as a whole was consuming approximately 60- which would require more than 98K core, more tape 70% of the total cycle time. drives, additional disk space, or which would involve a major impact on the remainder of the

Object Computer for Experiment . Since most SAILS system. These constraints assured that any extension sites currently have S360/30's, it was significant identified savings could be incorpor- recognized that the ideal environment for conduct- ated and issued to the field with a minimum of ing the planned tests would be a S360/30 configur- further analysis. Caution was used so as not to ation with peripherals identical to those on which make changes which would adversely affect the the system will run; however, dedicated time in program's functional logic. Although potential the required quantity and time frame could not be for improvement was indicated in these areas, obtained. Consequently, it was decided that a they were not addressed due to the risk of creat- S350/40 with identical peripherals was the most ing "bugs" in the programs and thereby, jeopard- suitable alternative. Since the nature of many izing the objective. actions to optimize a program are actually "tuning" it to a particular environment, results obtained Methodology . Preparatory to actual testing, on the S360/40 were later validated on a S360/30. arrangements were made with Tesdata Corporation to acquire STAGE II on a 30-day trial basis.

Tools Employed . Three proprietary software products STAGE II reports were then obtained for the nine were used to analyze and monitor the subject selected programs. Initially, it was planned to programs. proceed immediately with changes indicated by STAGE II and appraise results through monitoring o STAGE II - Tesdata Corporation. This as the first step in program testing. This course package is a "COBOL" optimizer was subsequently abandoned; however, when two which accepts as input the source program factors became apparent. First, several errors statements and produces a series of reports and limitations were discovered in the STAGE II which identifies possible inefficiencies reports which indicated the information was some- and potential areas for improvement of both what less than totally reliable. run time and core storage requirements. Second, the volume of diagnostics created by o DOS Problem Program Evaluator (DPPE) - STAGE II made selection of prime optimization Boole & Babbage, Inc. This product areas difficult. As verified later by testing consists of two basic elements, the with DPPE, certain routines, although employing Extractor and Analyzer. The Extractor other than the most efficient techniques, simply resides within the same partition as the were not executed frequently enough to consume object program during its execution. significant amounts of time. Therefore, efforts Statistical samples of activities within were concentrated in accordance with DPPE and DCUE the object program are collected via a indications, using STAGE II reports as an aid in time based interrupt and recorded on a determining alternative methods and techniques temporary file. The analyzer then uses to incorporate. this recorded information to produce reports and histograms showing percentage Initially, the entire benchmark cycle was run on of CPU activity within the program's the S360/40 in a controlled environment and all procedure at 32 byte intervals, quantifies intermediate files necessary to test trial modif- both file oriented and non-file oriented ications for the nine selected programs were wait time and identifies that to which it captured. Timing statistics ^^ere collected and is attributable, execution times for each of the subject programs were recorded as the baseline time to which all o DOS Configuration Utilization Evaluator comparisons would be made. All timed tests (DCUE) - Boole & Babbage, Inc. This soft- conducted throughout the effort were in a dedic- ware monitor also consists of an Extractor ated environment. Worthy of special note is the and an Analyzer. The techniques used are fact that these baseline times were obtained with- similar to DPPE, but in this case, result- out a monitor in operation, while all times ing reports are relative to performance recorded for modified programs were obtained with and loading of the hardware and system a monitor in operation and are therefore inclusive

software as a whole. Detailed data is of monitor overhead (estimated at no more than 5'0 • presented regarding such items as input This means that reduction of run times resulting and output device utilization, channels, from subsequent testing of modified programs are, disk arm movements and SVC library. if anything, understated.

Experiment Constraints . In order to insure a After completion of the benchmark cycle test, reasonable degree of confidence that proven bene- programs were modified on the basis of DPPE and fits would result and could be quantified with DCUE reports, tested while remonitor ing , and available time and resources, certain constraints modified still further when this was feasible were self-imposed. Only those program modifications under the constraints mentioned earlier.

59 ;:: ; , : ,

TEST PROFILE RUN TIME, CORE USAGE, CPU ACTIVITY, WAIT TIME

EXECUTION TIME CORE STORAGE WAIT ACCUM ACCUM CPU NON PROGRAM SUBJECT RUN ELAPSED CHANGE USED CHANGE ACTIVE FOW FOW

P08ALB Basel ine 27 :10 NA 93,531 NA 80,,00% 3 .10% 10 . 28%

Modi f i ed 25;:05 -1.1% 89,903 -3.9% 85,,12% 3 .48% 11 .40%

P69ALB Basel ine 21 :33 NA 85,525 NA 84,.46% .89% 14 .57%

Modi f i ed 18;:02 -16.3% 89,525 +4 . 7% 72,.76% 5 .48% 18 .15%

P22ALB Basel ine 2:27;:17 NA 85,651 NA 64..20% 28 .14% .58%

Modi f i ed 2:06;:43 -14.0% 91 ,859 +7.3% 67,.78% 31 .91% .31%

P23ALB Baseline 1 :22;:23 NA 89,951 NA 63.48% 35 . 58% .90%

Modi f i ed 1:16;:40 -6.9% 97 ,723 +8.6% 65,.59% 34 .20% .48%

P50ALB Basel ine 1 :00;:52 NA 85,525 NA 53,.58% 37 .33% 7 .85%

Modi f i ed 41 :23 -32.0% 89,525 +4.7% 84,.97% 1 .90% 11 .42%

P72ALB Basel ine 1 5::08 NA 82,095 NA 91,.27% 1 .)l% 6 .88%

Modi f i ed 10;:14 -32.4% 73,825 -10.1% 89,.15% 3 .68% 6 .98%

P27ALB Basel ine 43;:16 NA 12,098 NA 97,.58% 1 .06% 2 .29% Modified 3;:35 -91 .7% 10,034 -17.1% 41 .60% 38 .94% 18 .80%

PIOALF Baseline 22;;39 NA 69,751 NA 34,.76% 62 .95% 2 .26%

Modif i ed 12;:52 -43.2% 85,549 +22.7% 74,.40% 20 .61% 4 .87%

PllALF Basel ine 1 :48;:19 NA 96,761 NA 16,.73% 10 .51% 72 .76%

Modified 1 :38;:48 -8.8% 97,609 + .9% Not Monitored •

FIGURE 3 Summarization of Results

SUMMARIZATION OF RESULTS

360/40 Benchmark Tests 3i0/30 Verification Tests I'M MODIFIED MODIFIED UNMODIFIED MODIFIED PROGRAM RUN TIME RUN TIME % REDUCTION RUN TIME RUN TIME % REDUCTION

P08ALB 27: 10 25::05 7.,7% 47: 36 45;:50 3.7%

P69ALB 21::33 18;;02 16;:3% 35:;31 26;:50 24.5%

P22ALB 2:27:;17 2:06;:44 14,.0% 3:46:;30 3:16::47 13.1%

P23ALB 1 :22;;23 1:16;:40 6,.9% 2:01 :00 1 :52;:46 6.8%

P50ALB 1 :00;:52 41 :23 32..0% 1 :28;:38 1 :07;:19 24.1%

P72ALB 15;:08 10;:14 32,.4% 22;:06 15;:07 31.6%

P27ALB 43;:16 3;:35 91 .7% 1 :30;:05 3;;27 96.2%

PIOALF 22;;39 12 :52 43,.2% 36;:46 21 ;:01 42.8%

PllALF 1 :48;:19 1 :38 :48 8,.8% 2:06;:28 2:05;:13 0.6%

TOTAL 8:48;:37 6:53 :23 21,.7% 13:14;:40 10:14;:20 22.7% FIGURE 4 Test Profile - Run Time, Core Usage, CPU Activity, Wait Time

60 Verification that each modified program had per- identified and depending upon their frequen- formed correctly during the tests consisted of cy of execution were altered. The import- comparing input and output record counts between ance of testing with representative data the benchmark test cycle and the respective pro- cannot be overstated when tuning a system gram test. In addition to control count compar- through the rearrangement of data patterns. isons, one program was further verified by an automated tape comparison of output files between o Data Definition and Usage. Special attent- the baseline and test runs. ion was given to assigning the most effic- ient usage for data item representation in

Analysis and Findings . A review of the selected core storage for those cases where the programs revealed many cases of inefficient COBOL applicable item had a significant impact on usage. It should be noted, however, that this CPU activity. Also steps were taken to situation is not unusual or unique to this part- eliminate or at least reduce to a minimum icular system or organization. Very often, mixed mode usage in both arithmetic express- application programmers have little exposure to ions and data move statements. efficient COBOL implementation. Although numerous alternative optimizing techniques exist, for this o Console Messages. It was found that a study only a few of the many available ones were considerable amount of time was being utilized. Only those techniques which were pot- consumed printing console messages. Those ential candidates for large scale benefits with a messages that required no monitoring or minimal expenditure of effort were employed. To response by the operator were diverted to assess the merits of various techniques and select the SYSLST in lieu of the console printer. the most appropriate ones, a knowledge of the To even further eliminate console printout, efficiencies/inefficiencies inherent in the COBOL it was suggested that eventually the system language, the compiler, the operating system and be modified so that simple comparisons and the hardware itself was applied. Major modific- checks of control counts become as integral ation areas for this study included; part of the system logic and not the respons- ibility of an operator. o Buffering and Blocking. In the SAILS system, core limitation necessitated In general, the modif iciations employed during this single buffering several key files. Since study were concentrated in the areas listed above. this system runs in the background part- At Figure 3 is a summarization by program of both ition with only SPOOLing processed in the the baseline and modified program run time for the foreground, the use of double buffered benchmark test on the IBM 360/40 and the verific- files in place of single buffered files ation test on the IBM 360/30. Notice that for the to reduce file oriented wait greatly benchmark test the degree of success ranged from improves total system performance. In 6.9% to 91.77,. The net result was a 21.77=, reduction those cases where core was available or in the processing time of the nine selected prog- could be made available, files were rams. Resources used to accomplish this effort double buffered on a priority basis accord- were 114 man days and 135 hours of computer time. ing to the amount of wait time indicated The chart at Figure 4 is a test profile showing by DPPE. Reducing the block size of key run time, core usage, CPU activity and wait time files required an impact analysis of for each program before and after modifications possible external interface problems or were made. possible degradation to other programs in the SAILS system which also accessed For three of the selected COBOL programs, some the key files. examples of the analysis, techniques employed and results achieved is presented. o Table Handling. Many tables processed by the SAILS system, were searched using P50ALB. This program originally ran 1:00:52. iterative subscripted loops. To reduce The initial DPPE analysis showed 37.337. of the program execution time, three alternative total run time attributable to File Oriented Wait searching techniques were employed; serial (FOW) of which 8:42 or 14.107. was wait time on the search, partition scan, and binary search. input Document History (DH) file and an additional Depending upon the number of entries and 9:43 or 15.747„ was wait time on the output DH file. the table organization, the most approp- The DH file contained variable length records riate one was selected. ranging from 175 to 2100 characters with an orig- inal block size of 8400 characters. Due to core o Subscript Usage. The misuse of subscripts limitations, both the input and output files were was identified as a contributor to excess- single buffered. To reduce the high FOW, the DH ive run time. Techniques employed to file was reblocked from 8400 characters to 6300 either eliminate or reduce the number of characters; thereby, permitting both files to be subscripts in a particular program included double buffered. Active CPU time was recorded at serial coding, use of intermediate work 53.587, of the total run time. A review of the areas, use of literal subscripts and com- DPPE histogram showed two areas where a consider- bining multiple subscripts to a single able amount of this time was being consumed. One level by redefining the matrix as one of these areas was attributable to a routine which dimension and applying a mapping function. moved individual segments of the DH file to a work area via an iterative subscripted loop. A routine o Data Patterns Analysis. Data patterns was substituted which accomplished the movement occurring within the problem program were of these segments in mass. The second large time

61 consuming area was identified as the processing then used to reallocate disk extents in an attempt time of two IBM modules used for handling variable to reduce arm contention. The results of this length records. Since these modules, which accoun- second test, which ran in 2:06:43, showed only ted for 12% of the total processing time, were not a 4 1/2 minute reduction. This reduction was accessible to the application programmer, no action somewhat less than expected. At this time, it was taken. became increasingly clear that the DMR currently in use was extremely costly in terms of total SAILS The first test after these modifications were made processing time. As a result, no additional tests ran 46:12 which showed a 247„ reduction from the were made and a major study of available file unmodified version's run time. Total FOW time had accessing routines was recommended, decreased to 5:37 or 12.19% of the run time and the CPU time attributable to processing the DH P27ALB, During the baseline cycle run, the execu- segment move had decreased significantly. Still tion time of this program was 43:16; however, it more areas for possible improvement were identif- had run as long as 6 hours during prototype testing, ied. Two single buffered files which previously A review of the most frequently executed routines had no FOW, now had wait time of 4:23 or 9.51% revealed that 95%, or 40 minutes of the total proc- of the total run time. Consequently, both files essing time was spent in a single routine which was were double buffered for the second test. After designed to clear an array of 324 counters via a analyzing the impact of block size modification subscripted loop, "PERFORMED VARYING," This on 14 other programs using the DH file, it was routine was replaced by straight-line coding a decided to further decrease the block size to series of conditional statements and moves to 4200 characters. By cutting the block size in clear 9 counters. When P27ALB was tested with the half, all programs using the file could double new routine, the results were startling. It ran buffer it without any alteration of core require- only 3 minutes, 35 seconds, a reduction of 92%. ments. The second test ran 41:23, a net decrease The routine which previously had caused the excess- of 23% from the unmodified versions run time. ive run time (40 minutes) was reduced to a total FOW had diminished to a level of 1.90% or less of 35 seconds. This test provided a classic examp- than a minute of the total run time. Active CPU le of the resulting differences between simple time was reported as 84.97% of the total run time. straight forward coding and usage of the more Active CPU time was reported as 84.97% of the complex COBOL options with all their inherent total run time. The IBM variable length record inefficiencies. modules now accounted for 197o of the time. The

remainder of CPU active time was spread across Conclusions ; numerous small routines. Because of the amount of time required to research each item, additional This study was successful in reducing the process- changes, which could bear heavily on CPU active ing time requirements for the SAILS system. It time, were not pursued during this study. also provided a good example of effective utiliz- ation of performance monitors. At the completion P22ALB. In the baseline cycle, this program ran of the study, management could actually see the 2:27:17. The initial DPPE analysis showed FOW practicality of using monitors. It also served as at 28.14% of the total run time. The bulk of FOW the catalyst to initiate action on some ADP related was on disk resident master files, all of which problems, specifically, the need for a tutorial are accessed by a Command written Data Management document providing programer guidelines and conven-

Routine (DMR) . SELECT clauses, data descriptions tions and the need for a reappraisal of the Command and the actual reads and writes of these files Data Management Routines. As verified by benchmark are not part of the problem program and therefore, testing, the 21.7/!, average reduction in processing opportunities for optimization were limited. To time for the nine programs studied actually elim- reduce FOW, three single buffered tape files, inated three hours from the daily cycle processing which constituted 2.5% or 4 minutes of the wait time. This savings when multiplied by the number time, were double buffered. Also some nonresponse of systems affected (41 Army bases) represents a type console messages were diverted from SYSLOG substantial savings to the Army. to SYSLST. Active CPU time was recorded at 64.20% or 1 hour, 21 minutes. A breakout of the CPU An optimization effort such as the one just descr- time revealed that 35.43% or 53 minutes was spent ibed is not a simple task. It requires extensive in an IBM Direct Access logic module. An analysis planning, encompassing a review of the entire of the CPU active time spent in the problem prog- system, both software and hardware. In order to ram showed that the three most time-consuming operate in the most expeditious manner and perform routines accounted for only 1.94% of the total the greatest service, an independent group of processing time. In other words, practically all qualified personnel with training and experience processing which was affecting the run time was in all phases of optimization is necessary. taking place outside the problem program. There- Provisions for a dedicated and controlled test fore, only minor changes were made to improve environment assure the highest reliability and computational efficiency. The first modified test predictability of results. As evidenced by this ran 2:11:15 an 11% reduction from the baseline study, a substantial investment and management test. As already stated, the fact that the commitment is necessary to successfully complete routines causing the most wait or most CPU act- an optimization effort. ivity fell outside the realm of the problem program, made short range modifications difficult. To help identify possible improvements in the area of disk organization, P22ALB was rerun using DCUE. Information obtained from the DCUE analysis was

62 ) ! :

A COMPUTER DESIGN FOR MEASUREMENT—THE MONITOR REGISTER CONCEPT

Donald R. Deese

Federal Computer Performance Evaluation and Simulation Center

Introduction . This paper is divided into B. Operational Problems with Hardware

two parts - in the first part I will des- Monitors . There are a number of opera- cribe (on a fairly hiah-level basis) the tional problems (Slide 2) which have Monitor Register Concept; in the second plagued hardware monitor users part, I will describe a specific imple- mentation of the Monitor Register Concept 1. Locating Test Points . There within the TRIDENT Submarine Fire Control are thousands of test points on a computer System. system.. It is often difficult for users to determine which test point (or combin- I. DESCRIPTION OF MONITOR REGISTER CON - ation of test points) reflects precisely

CEPT . the signal desired for measurement.

A. Hardware Monitor Concept . First, 2 . Signals Not Available at Test

a bit of background on the concept of Points . Some signals - especially those hardware monitoring so that the desira- involved with more esoteric measurements - bility of the Monitor Register Concept are not available at test points. will be apparent. (I will not try to

describe the justification for computer 3 . Signals Not Synchronized . As performance monitoring or the reason for users of hardware monitors begin measuring using a hardware monitor; these subjects "logical" aspects of their computer system are adequately addressed in numerous pub- (e.g. program activity, file activity, lished work. operating system activity, etc.), the si- multaneous measurem.ent of many signals is From a simplistic view, the hardware frequently required. The desired signals monitoring concept is as illustrated in are not always synchronized; sophisticated Slide 1. When using a hardware monitor, measurement techniques are required to very small electronic devices (called synchronize the signals - and frequently sensors or probes) attach to test points the measurement strategy must be abandoned or wire-wrap pins on the back planes of because of synchronization failure. computer equipment. These electronic

devices use a differential amplifier 4. Attaching probes . The process technique to detect voltage fluctuations of attaching probes entails a risk of at the locations to which they are crashing the computer system and prudance attached. (The voltage fluctuations re- requires a dedicated system; the process present changes in the status of computer disrupts operation and ala^-ms management. components - Busy/Not Busy, True/False,

etc.) The signal state, as detected, is 5. Changing probes . When mea-

transmitted by the electronic device over surement ob j ectives change , often probes a cable to the hardware monitor. The must be changed. The problems caused by hardware monitor measures the signal in changing probes are the same as initial one of several ways: it counts the oc- probe attachment.

currance of a signal stati (e.g. , count - instructions executed, count seek com- 6 . Large Number of probes re

mands, etc.); it times the duration of a quired . An increasinaly large number of signal state (e.g. time CPU Busy, time probes is required with measurem.ent stra- Channel Busy, etc.); or, it captures the tegies. This is especially true with the state of one or more signals (e.g., cap- more sophisticated strategies v.'hich re- ture contents of Instruction Address quire the capture of data register or Register, capture contents of Storage address register contents. Data Register, etc.). Many variations on this basic concept are commercially C. Monitor Pegister Concept . The available. Monitor Register Concept solves these problems

63 ° Slide 1 --- HARDWARE MONITORING CONCEPT

LOCATING TEST POINTS

SIGNALS NOT AVAILABLE AT TEST POINTS

SIGNALS NOT SYNCHRONIZED

ATTACHING PROBES

CHANGING PROBES

LARGE NUMBER OF PROBES OFTEN REQUIRED

o Slide 2 --- OPERATIONAL PROBLEMS

64 )

change probes since all relevant signals are made available at the Monitor Register Interface, and, since all relevant signals are available at the Monitor Register Interface, a large number of probes is not

required .

1. Data Flow. Before specifically This is a simple concept. What about discussing the Monitor Register Concept, let us implementation of the concept into a speci- consider the flow of data values within a com- fic computer design? The second part of puter system. Although actually quite complex, this paper describes an implementation of the data flow may be considerably simplified by the Monitor Register Concept under a pro- considering only those registers containing the ject FEDSIM performed for the Naval Weapons data or address values which reflect the internal Laboratory, Dahlgren, Virginia. operations of the computer. If the contents of these registers could be captured, and properly II. IMPLEMENTATION OF THE MONITOR REGISTER sequenced, the internal operation of the computer CONCEPT . could be duplicated. (If these registers could be captured, all of the important computer The Naval Weapons Laboratory (NWL) is functions could be measured, all the data could located at Dahlgren, Virginia. One of NWL's be captured, all addresses could be recorded, Missions (Slide 6) is to develop and check- etc. out the Fire Control System software for the TRIDENT submarine. Attaching hardware monitor sensors to each bit in these registers would re- A. Fire Control Svstem Software . The Fire quire a large number of sensors (at least Control System softv/are must perform a num- one sensor to each bit in every signifi- ber of complex tasks associated with the cant register) and the problem of sianal preparation for missile launch. Some of synchronization would still exist. these tasks are computation of missile tra- jectory, missile alignment and erection

2. Monitor Register . I"That is computation, and guidance computer shadow- needed to solve these problems is some ing. These are just a few of the many sort of funnel which would collect the functions performed by the Fire Control contents of all of these registers-of- System software. interest and funnel them to a common register which could be easily monitored. The Fire Control System software v/ill This is the essence of the Monitor Regis- operate in a real-time tactical system. ter Concept. As shown by Slide 3, with This system has specific and critical time- the Monitor Register Concept, the contents dependent requirements; if the software of a number of registers are collected does not respond to the external stimuli and funneled to a single "interface" within a certain time, input data may be register. As each register value changes, simply lost, or output may no longer be the contents of the register are presented accepted. Further, real-time systems, by (one register at a time) to the interface the very fact that time is a central fac- register. tor, present unique problems. Real-time systems encounter unpredictable sequences Of course, there really isn't a fun- of unrelated requests for system services. nel inside computers. But there is some- Since the timing interactions often are thing which is the electronic equivalent. difficult to predict and may be impossible I'll discuss that when I discuss a speci- to consistently duplicate, program errors fic implementation of the Monitor Regis- will frequently mysteriously and errati- ter Concept. To illustrate the concept, cally appear. however, let us shrink the funnel in Slide 3 and place it inside a CPU (Slide Compounding the real-time problems t) where it funnels register contents to facing NWL are the implications of the new a Monitor Register Interface. A hardware systems for the TRIDFNT submarine. When monitor can then be attached to this sin- some problem occurs in the Fire Control gle common interface (Slide 5 ) and acquire System software, it may be difficult to the contents of all of the registers which determine whether the problem is caused are input to the "funnel." by the application program or caused by the new computer system or the new opera- The Monitor Register Concept eliminates ting system. The new computer system, the operational problems listed earlier. might have insufficiently documented fea-

The test points are replaced with a speci- tures or pecularities , design problems, or fic Monitor Register Interface. All rele- actual hardware failure. The new operating vant signals are available at the Monitor system might have coding errors, logic Register Interface. The signals are syn - errors, insufficiently documented features, chronized when presented to the Monitor and so forth. Even the assembler and com- Register Interface. There is no need to piler are new; these could generate

65 PSW ALU EOR OAR IBR MDR UPC

1 1

° Slide 3

Slide 4 --- MONITOR REGISTER INTERFACE MONITOR REGISTER INTERFACE

66 o Slide 5

HARDWARE MONITOR

y

MONITOR REGISTER INTERFACE

THE NAVAL WEAPONS LABORATORY

ONE MISSION

DEVELOP AND CHECK-OUT FIRE CONTROL SOFTWARE erroneous code or data, they could have Program Status Word (PSW) has a three-bit poorly documented features, etc. address of "001"; a two-bit characteristic code of "00" indicates a normal PSW, a code In this difficult environment, NWL of "01" indicates an instruction-string- must develop the TRIDENT Fire Control terminator PSW, a code of "10" indicates a System software. The software must successful-branch PSW, a code of "11" indi- undergo extensive test and debug. Fince cates an interrupt PSW.) timing is often critical, the software must be efficiently tuned. The programmer writing the microprogram does not have to continually worry about B. Performance Measurement of the TRIDFNT the "performance monitoring" bits in the

Basic ProcessoF! In this environment and microprogram control word; normally, these with these tasks, NWL decided that perfor- bits wili be set to zero. The bits will mance measurement tools and techniques be set to non-zero only at those few easily are needed. As a result, NWL asked FEDSIM identified points in the microprogram rou- to assist them with the design of appro- tines where the appropriate registers are priate performance measurement features being altered. for the TRIDENT Submarine Basic Processor. The eight registers which are input to As you might imagine, a nuclear sub- the multiplexor are identified in Slide 7. marine is not the typical place to find The Program Status Word provides the vir- a hardware monitor. The environment poses tual"TnstructTorT'aS^ress , task identifier, some rather unique problems. condition designator, machine state, and interrupt status. The Memory Data Register There are unique problems even when provides operand values from either the the Basic Processor is located in the Test memory or local processor registers. The Berth environment at the Naval Weapons Operand Address Register provides inter- Laboratory in Dahlgren. (For example, mediate and final operand address values the computer will be ruggedized; it will (both virtual and absolute) . The Look- be surrounded by the "bathtub enclosure" Ahead Program Counter provides the absolute and there will be no possibility of con- instruction address. The Instruction Buf - necting hardware monitor probes to the fer Register provides the instruction word processor. If any signals are ever to at the initial stage, intermediate stages be monitored, the signals must be made and final stage of instruction execution. available to some external interface.) The Executive Input/Output Register pro- When we were considering the computer de- vides the port and device address for all sign approach to solving the problems of input and output. The Interrupt Register monitoring the TRIDENT Basic Processor, provides the interrupt level, interrupt we quickly realized that the environment conditions, etc. The CPU Timer Register was ideal for the Monitor Register Con- provides the means of time stamping various cept . events at the resolution of the Basic Pro- cessor minor cycle clock. C. Monitor Register Concept in the TRI - DENT Submarine r As you know, computers D. Uses of Data Available at Monitor Reg - do not really have a funnel into which ister^ How can all of the data provided data can be poured. However, the compu- by these registers be used? ter logic does have something very much like a funnel: the multiplexor. For the 1. Program Debug (Slide 8 ) . Prac- TRIDENT Basic Processor, we selected a tically all of the data provided at the multiplexor (Slide 7) having eight inputs Monitor Register could be used to debug and one output. Under microprogram con- programs. The Program Status Word could trol, at every minor cycle, the multi- be used to debug programs. The Program plexor selects one of the eight input Status Word could be used to trace program registers and presents it to the output logic steps and module execution, and to register. identify non-executed program legs. The Operand Address Register could be used to The microprogram control is facilitated identify references to specific program by five "performance monitoring" bits in variables or data areas and the Memory the microprogram control word. At each Data Register could be used to analyze the minor cycle, three of these bits (value values contained in the variables. Or, 000 to 111) are provided to the multi- for example, the instruction address in plexor. The multiplexor uses these three the Program Status Word could be captured bits as a binary address to select the whenever selected variables were referenced. appropriate input register and provides This could be used to determine which in- the register bits, in parallel, to the structions were referencing certain vari-

Monitor Register. ables .

The other two bits of the five in 2 . Program Tuning (Slide 9 ) . The microprogram control word are used to data provided at the Monitor Register could identify certain unique characteristics be used to "tune" or optimize specific user of the specific data contained in the programs. The Program Status Word could selected register. (For example, the be used to obtain a distribution of

68 o Slide 8

INTERNAL CPU REGISTERS PROGRAM DEBUG

r.icRO •PROGRAM TRACE PROGRAM 8:1 MULTIPLEXOR COIITROL •MODULE TRACE

•IDENTIFY NON-EXECUTED INSTRUCTION

•RECORD VALUES OF SELECTED VARIABLES

•RECORD ADDRESS OF INSTRUCTIONS REFERENCING SELECTED VARIABLES 3?-BlT DATA REGISTER CREATE DETAILED PROGRAM PROFILE

REGISTERS

PROGRAM STATUS WORD

MEMORY DATA REGISTER o Slide ^

REGISTER OPERAND ADDRESS PROGRAM TUNING LOOK-AHEAD PROGRAM COUNTER

INSTRUCTION BUFFER REGISTER •CPU EXECUTION BY AREA OF CORE EXECUTIVE INPUT/OUTPUT REGISTER USE OF SUPERVISOR SERVICES INTERRUPT REGISTER PAGING ACTIVITY CPU TIMER REGISTER CODING TECHNIQUES

" Slide 7 INDEXING LEVELS

o Slide 10 INSTRUCTION ANALYSIS

•OP-CODE ANALYSIS

'INSTRUCTION PATTERNS

I. -INSTRUCTION SEQUENCES

•INSTRUCTION STRING LENGTH ANALYSIS

69 COMPUTER COMPONENT MEASUREMENTS"

'CPU BUSY/WAIT EXEC BUSY •INPUT/OUTPUT TIMING. BY CHANNEL AND DEVICE DEVICE ERRORS

•INTERRUPT

•RACE CONDITIONS DURING PEAK I/O

•MEMORY CONFLICTS

o Slide 12 THE MAJOR POINT

EVERY SIGNIFICANT ADDRESS. INSTRUCTION OR

DATA VALUE IS AVAILABLE AT THE MONITOR

REGISTER INTERFACE. IF ALL THIS DATA WERE

RECORDED. THE COMPUTER PROGRAMS' EXECUTION

COULD BE SYNTHESIZED!

o Slide 13 --- TEST BERTH MODE — INTERACTIVE MONITORING

70 ,

instruction execution by area of core, so E. Test Berth Configuration . Hov; is NWL that highly used segments of code could going to use this design? Slide 13 is a be examined for optimization potential. simplification of the Software Development The Instruction Buffer Register could be System components which will be in the Test used to analyze programmer coding techni- Berth at Dalhgren. ques, the use of supervisor services by module, and the number of levels of indi- A hardv/are monitor is attached to the rect indexing. These two registers could Monitor Reaister and interfaces between the be used together to analyze paging acti- TRIDENT Basic Processor and a mini-computer. vity. The hardware monitor provides a high-speed electronic filtering, decoding, comparison

3 . Instruction Analysis (Slide lo ) . and buffering capability. The main reason The data provided at the Monitor Register for the hardv/are monitor is that the mini- could be used to analyze the instructions computer would not, of course, keep up with executed by the computer. The Instruction the rapid flow of register data provided at Buffer Register contains the instruction the Monitor Register. The hardware monitor as it was fetched from memory, and reflects passes on to the mini-computer only the any modification made to the instruction data required for a specific measurement because of, for example, indirect address- objective. In this way, the mini-computer ing. This data could be used for op-code can effectively process the data it receives. analysis, examining instruction patterns, analyzing instruction sequences, and so The mini-computer processes data from forth. The Program Status Word is uniquely the hardware monitor in a number of v;ays identified whenever program control is depending on the specific measurement ob- transferred; this feature facilitates jectives (program debug, program tuning, instruction string length analysis. and so forth) . The mini-computer has its own peripherals - tapes, disks, and a CRT;

4 . Computer Component Measurements the mini-computer is connectef^ directly to

(Slide 11) . Finally , the data provided the TRIDENT Basic Processor by an I/O chan- at the Monitor Register could be used to nel . measure the utilization of the computer components. The Program Status Word could Programmers at NWL will use the mini- be used to measure the amount of time the computer, via the CRT, for interactive mon- CPU is in Busy or Wait States, and the itoring. On command from a programmer, the amount of time that the CPU is executing mini-computer will load programs from its in Executive Mode or in a User Program. disks into the TRIDENT Basic Processor via The Executive Input/Output registers con- the I/O channel. The mini-computer will tain I/O channel and device addresses when set up control information for the Monitor input or output is initiated or terminated. Register logic and for the hardware monitor. Therefore it is easy to determine the chan- Then, the mini-computer will initiate exe- nel and device busy times, overlap times, cution of the program just loaded. The CPU Wait on Channel or Device, and so programmer can have monitored data dis- forth. The Interrupt Register could be played on the CRT and/or have it written used to determine interrupt levels, inter- to one of the mini-computer's tapes for rupt types, device errors, and so forth. later analysis. A combination of registers associated with memory, along with the CPU time, could be F. At-Sea Monitoring . When the TRIDENT used to determine memory conflicts. submarine goes to sea, the Fire Control System software has been exhaustively de- These are just examples of the uses bugged. However, we all know that soft- for data available at the Monitor Regis- ware is never really completely debugged ter. The major point (Slide 12) which ("debug" means that all known errors have these examples tried to illustrate, is been corrected.); the best software usually that every significant address, instruc - eventually fails. Therefore, it is desir- tion or data value is available at the able to have some sort of monitoring capa- Monitor Register. If all this data were bility at sea so that when the software recorded, the computer programs' execution does fail, the personnel at the Naval Wea- could be synthesized! pons Laboratory have something to aid them in finding the error. This is especially For a normal monitoring session, it important in that, since the software has is unlikely that all of the possible data been exhaustively debugged, error causes would be actually captured or analyzed. are difficult to determine. The massive amount of data would preclude this except for very unusual measurement There is a dual configuration in the objectives. Each monitoring session likely TRIDENT submarine (to accomodate preventive would be collecting data to solve a parti- maintenance, etc.). Slide 1'* shows the cular problem. implications of this dual configuration

71 ^ . from a monitoring view - the two systems are cross-connected and monitor each other

When the TRIDENT is configured for cross-connected monitoring, a port of the independent I/O control processor ("intel- ligent channel") would be connected to the Monitor Register of the other Basic Processor.

In the primary basic processor, con- trol logic would be initialized such that only selected registers would be presented to the Monitor Register. In the secondary Basic Processor, a utility program would be leaded, request allocation of memory, and issue a Read I/O to its I/O port con- nected to the Monitor Register of the pri- mary Basic Processor. Conceptually, the Read I/O would have two Data Control Words (DCWs) chained together. The first DCW would be a Read I/O operation with a data length the size of the memory allocated to the utility program. The second DCW If would be a transfer to the first DCW. In information about the performance of future operation, data would be read from the systems is to be available to the Monitor Register of the primary Basic ''outside world," specific design is required Processor and placed in the memory area. to make the information available, When the memory area was full, the second The DCW would transfer I/O control to the Monitor Register concept is one method first DCW to begin the Read I/O again. whereby a variety of information about the internal The memory area would serve as a circular computer performance can be made queue containing history data from the available at a common location primary Basic Processor Monitor Register. and in an easily defined interface mechan- Whenever some problem occurred with the ism. I believe that the Monitor Register Concept has primary Basic Processor, the memory in a place in future computer the back-up Basic Processor would be systems dumped to some external device for trans- mission to the Naval Weapons Laboratory. (As an illustration of this method, the utility program might be allocated 50,000 words of memory; control logic would be initialized such that only successful branch operation addresses would be pre- sented to the primary Basic Processor Monitor Register; the history data avail- able would be a record of the last 50,000 program logic legs executed.)

o Slide 14 TACTICAL MODE — CROSS - CONNECTED MONITORING

72 THE USE OF SIMULATION IN THE SOLUTION OF HARDWARE ALLOCATION PROBLEMS

Major W. Andrew Hesser, USMC

Headquarters, U.S. Marine Corps

INTRODUCTION

During the past year, Marine Corps personnel have been concerned with the redistribution of Marine Corps owned computer hardware at five major installations. This paper discusses the results of simulations of these installations.

All simulation models involved were created directly from hardware monitor data gathered wi th the SUM. The simulation language is a proprietary product, SAM, leased by the Marine Corps; the use of SAM in constructing simulation models was presented at the Spring, 1973 CPEUG Conference.

THE PROBLEM

The problems facing the Marine Corps were:

(1) A major installation presently has two System/360 Model AO's. QUESTION: Can the workloads of both be combined onto a System/360 Model 65?, a Model 50? Will there be sufficient capacity on each to process proposed wargaraes?

(2) A System/360 Model 651 was procured for the use of the operating forces of the Marine Corps in the Far East. The present computer, also a Model 651, will be used only for garrison forces and all jobs currently being processed for the Fleet Marine Forces (about 607o of the present workload) will be shifted to the new computer. QUESTION: What size computer will be needed to process the remaining workload?

(3) The Marine Corps central supply center is being closed and relocated to another installa-

tion. The center has two System/360 Model 501 ' s , and the new location presently has a Model 501. QUESTION: Can a Model 65J process the workloads of all three Model 50' s combined? What size Model 65 can process the workload of the two Model 50' s at the present supply center?

These questions were answered using simulation models.

QUANTICO

There are two installations at Quantico, Virginia, each of which has a System/360 Model 40. In October of 1972, a hardware monitoring team measured the Model 40 HGF with 448K bytes of core memory at the Computer Sciences School. Peripherals included eight CD- 12 (2314- type) disk drives and six 2420 tape drives. Table 1 contains the results of the system measurement and the simulation validation.

TABLE 1 CSS QUANTICO VALIDATION

ACTIVITY MEASURED SIMULATED

CPU ACTIVE 76.1% 78.2%

SUPERVISOR 48 . 77„ 51.0%

PROGRAM 27 . 47o 27.2%

CHANNEL 1 38 . 37o 37.7%

CHANNEL 2 6.0% 7.0%

MPX .4% .37.

TIME 160 MIN 160.1 MIN

In early 1973, another team measured the Automated Services Center at Quantico. This Model 40 HQ with 384K bytes of core memory has eight 2314- type disks and eight tape drives. Table 2 contains the results of the system measurement and the validated simulation results.

73 TABLE 2 Asc QUA^^TIC0 validation

ACTIVITY MEASURED SIMULATED

CPU ACTIVE 70.5% 69 . 1%

SUPERVISOR 54.37„ 52 . 5%

PROGRAM l6.27o 16.6%

CHANNEL 1 (DISK) 46.9% 45 . 6%

CHANNEL 2 (TAPE) 6 . 8% 6.8%

MPX .6%, .8%

TIME 150 MIN 149 MIN

To determine what size system would be required to process the combined workload of the two installations, simulations were done on a Model 501, then a Model 651. The simulated configuration contained two banks of eight disks and one of eight tapes. The simulation results are contained in Table 3. The Tjj^O notation means synthetic jobs arrived throughout the simulation. Tj=0 implies all jobs were in the job queue at time zero.

TABLE 3 COMBINED WORKLOADS OF THE CSS AND ASC

MODEL 501 MODEL 6bl MODEL 651 ACTIVITY TjTto Tjjto Tj=0

CPU ACTIVE 45..7% 17,.6% 34,.0%

SUPERVISOR 30,.6% 12,,2% 22,.3%

PROGRAM 15., 1% 5,.4% 10,.7%

CHANNEL 1 (D) 33,,0% 33,,3% 67.,7%

CHANNEL 2 (D) 30,.4% 30,.7% 60,.2%

CHANNEL 3 (T) 13,,7% 13..9% 27.,3%

MPX 1,,4% ,5% 1.,1%

TIME 148 MIN 147 MIN 76 MIN

It was concluded from these results that a Model 501 CPU would be adequate for processing the combined Model 40HG and Model 40 HGF workloads. There would be unused CPU capacity available for processing small war games. A Model 651 would have considerable unused capacity available for process- ing large scale war games. However, additional processor storage would be required for either 512K byte machine if war games were to be added.

OKINAWA

During November of 1972, the existing system at the Automated Services Center on Okinawa was measured. The system consisted of a System/360 Model 651, (512K bytes of core storage), with two banks of eight disk drives and two banks of eight tapes on four separate channels. The measurement and validation results are contained in Table 4.

74 .

TABLE 4 OKINAWA VALIDATION

ACTIVITY MEASURED SIMULATED

CPU ACTIVE 24 . 4% 23.8%

SUPERVISOR 14.2% 14.2%

PROGRAM 10.2% 9 . 6%

CHANNEL 1 (D) 34 . 4% 41.3%

CHANNEL 2 (D) 26.5% 21.5%

CHANNEL 3 (T) 32.5% 28.2%

CHANNEL 4 (T) 25.4% 29.0%

MPX 1.7% 1.3%

TIME ,'' 180 MIN 182 MIN

There were four approaches taken toward the problem of matching a hypothetical computer to the workload which would be reduced by an estimated factor of 607o.

(1) Simulate using the total SUM measurement figures (entire existing workload).

(2) Simulate with the SUM results reduced by some constant percentage proportionate to the amount of workload which was to be diverted to the new computer.

(3) Simulate with the present SUM results reduced in a more sophisticated manner to reflect probable changes in processing characteristics.

(4) 'Conventional' simulation using SAM Language to build detailed models of all remaining processes

For all simulations the currently installed peripherals were attached to the simulated CPU. The results of simulating the total workload on a Model 501 are shown in Table 5.

TABLE 5 SIMULATION OF 100% WORKLOAD ON A MODEL 501

ACTIVITY

CPU ACTIVE b4 . 1% 66.8%

SUPERVISOR 38.7% 39.3%

PROGRAM 25.4% 27 . 5%

CHANNEL 1 (DISK) 30 . 2% 32.8%

CHANNEL 2 (DISK) 39.9% 43.7%

CHANNEL 3 (TAPE) 24.9% 27 . 1%

MPX 2.9% 3.2%

TIME 206 MIN 189 MIN

75 The results show that the Model 501 would be capable of handling the total present workload within almost the same amount of time the Model 651 requires now. Considering that a highly active period was selected for input, the Model 501 certainly would be able to process the reduced workload.

The next two approaches were used to simulate the workload on a Model 40HGF (448K bytes) with two channels. Channel one had all sixteen 2314- type disks, and channel two had the sixteen 2420 tape drives.

The Data Processing Officer at the installation verbally indicated that the removal of jobs would leave about 407o of the total workload to be processed at the Automated Services Center (ASC). Using this information, models of the remaining ASC workload were created by reducing each hardware monitor input by 60%. Thus, each synthetic job made CPU and channel demands on the system equal to 407o of the originally measured data. It was realized that this did not represent the real world situation be- cause some activity (like disk) may be reduced more than 607o. Nevertheless this method would provide an estimate of the effect of removing the Fleet Marine Force processing.

The ASC provided SMF statistics for all jobs processed during September and October 1972. This data, coupled with a listing of the Job Names of all jobs that would remain at the ASC, was used to reduce the SUM data that created the synthetic jobs. In this reduction instead of a 607o reduction in all activity, each of CPU, Disk, Tape and MPX activity was reduced in accordance with the actual amount of activity that would be shifted as determined by SMF.

The formula for the reduction percentage is shown in Table 6.

Table 6 smf reduction technique

asc job cpu time 7„ cpu activity remaining = asc cpu time + fmf cpu time

7o CHANNEL ACTIVITY REMAINING ASC ACCESSES ASC ACCESSES FMF ACCESSES

The final reduction figures were:

(1) CPU activity reduced 67.7%

(2) Disk activity reduced 68.27o

(3) Tape activity reduced 65.27,,

(4) MPX activity reduced 72.67.

Considering these figures with the 607. reduction above, it appears that the estimation of the Data Processing Officer was high for the processing that would remain at the ASC. The results of these two techniques are summarized in Table 7.

TABLE 7 SIMULATION OF REDUCED WORKLOAD ON MODEL 40HGF

ACTIVITY b07o REDUCTION SMF REDUCTION

CPU ACTIVE 90 . 67o 79.97.,

SUPERVISOR 58.87o 53 . 2%

PROGRAM 3 1 . 87„ 26.77,

CHANNEL 1 (DISK) 23.47„ 18.87.

CHANNEL 2 (TAPE) 21.47o 19.07o

MPX 1.37, .9%

TIME 193 MIN 187 MIN

76 J

It is concluded that the Model 40 could process the reduced workload in about the same tirae that is required to process the total workload on the Model 501 now. The system would be CPU bound and would have little excess capacity for growth or contingencies.

The final step in the simulation analysis of the ASC workload was to be the construction of a model of each program that would be processed at the ASC. The bases for the models were the AUTOFLOW charts of the individual programs. Personnel at the ASC also provided probabilities of transaction flows for major decision points and file sizes and characteristics for the files used by the programs. Models were then constructed in the SAM language using detailed coding techniques.

The final simulations of each of the daily models were not completed due to software problems, excessively large (700K bytes) core requirements, and limited computer time. The models of the individual systems however can be used in future simulation applications.

The overall conclusions of the Okinawa study were that a Model 40 HGF would be adequate for processing the reduced workload but there would be little excess capacity for growth or contingencies. A Model 501 could process the reduced workload without difficulty.

PHILADELPHIA AND ALBANY

In June of this year, a monitoring team visited the Marine Corps Supply Activity at Philadelphia. The installation had two System/360 Model 501 CPU's which share most of the eighty-eight disk drives and twenty-two tape drives. The measurement and simulation validation figures are summarized in Table 8 below.

TABLE 8 PHILADELPHIA VALIDATION

MEASURED SIMULATED MEASURED SIMULAT ACTIVITY CPU "A" CPU "A" CPU "B" CPU "B"

CPU ACTIVE 63.4% 59.6% 90.1% 87.9%

PROGRAM 22.5% 22.4% 21.3% 20.3%

SUPERVISOR 40.9% 37 . 2% 68.8% 67 . 6%

CHANNEL 1 29.5% 16.6% 23.9% 29 . 5%

CHANNEL 2 51 . 1% 51.5% 13.2% 13.1%

CHANNEL 3 8.7% 20.2% 29.7% 26.8%

MPX .9% 1.2% .97. .8%

TIME (SECONDS) 14400 14550 14400 15222

Simulations were then done on two Model 65' s, an IH (778KI and a J (1024K). Each had two sei lector subchannels and 3 selector channels. The twenty-two tapes were accessible via either *e-> lector subchannel anri all disk control units had a two channel switch. The results of the simulation using the combined workloads are summarized in Table 9 below.

TABLE 9 PHILADELPHIA SIMULATION MODEL 65 MODEL 651H MODEL 651H ACTIVITY Tj?tO TjjtO Tj=0

CPU ACTIVE 47 . 1% 47 . 1% 60.5%

PROGRAM 15.6% 15.6% 20.0%

SUPERVISOR 31.5% 31.5% 40.5%

CHANNEL 1 45.2% 54.6% 61.7%

CHANNEL 2 22.1% 12.7% 24.8%

CHANNEL 3 30.2% 32.0% 40.0%

CHANNEL 4 27 . 8% 26.4% 33.7%

77 TABLE 9 (CONTINUED)

MODEL 65J MODEL 65IH MODEL 65IH ACTIVITY TjfO Tj^O Tj=0

CHANNEL 5 31.6% 31.2% 41 . 3%

MPX 2.4% 2.4% 3.0%

TIME (SECONDS) 14352 14356 11164

These results indicate;

(1) That with jobs arriving throughout the interval, the larger 1024K byte Model 65J has no advantage over the 768K byte Model 65IH. Both machines could process four hours of input data without difficulty in the same time. The activity levels of system components do not indicate any serious processing bottlenecks. It is noted that Channel 1 is the primary data path to the 22 tape drives and is used by the model whenever it is free. The channel imbalance between channels 1 and 2 therefore is of little significance.

(2) That the Model 65IH would have some excess capacity available as indicated by the results of having four hours of synthetic jobs all queued at the start of the simulation. The Ti=0 simulation does not C-onsider job and job step interdependencies and file contention, however, and therefore represents the shortest possible processing time. Nevertheless, even with all partitions filled and jobs making maximum use of the system resources, there are no significant system bottlenecks. This simulation indicates that the Model 65IH processing the combined workload would not be a saturated system.

In February of 1973, a team of analysts measured the computer system at the Marine Corps Supply Center, Albany, Georgia. The Model 501 at that installation has forty disk drives supported by a two channel switch between channels one and two. Channels two and three have a total of fifteen tape drives. Table 10 below summarizes the results of the measurement and the simulation validation.

TABLE 10 ALBANY VALIDATION

ACTIVITY MEASURED SIMULATED

CPU ACTIVE 72.0% 69.6%

SUPERVISOR 53.2% 50.0%

PROGRAM 18.7% 19.5%

CHANNEL 1 53.5% 48.1.%

CHANNEL 2 34.7% 48.4%

CHANNEL 3 6.1% 9.6%

MPX 1.1% 1.6%

TIME (SECONDS) 14400 14093

The workloads of Philadelphia and Albany were then combined into one workload to be simulated on a Model 65J with jobs arriving throughout and with jobs queued. The results of the two runs are summarized in Table 11.

TABLE 11 COMBINED ALBANY AND PHILADELPHIA

ACTIVITY Ij^O Tj=0

CPU ACTIVE 70.7% 74,7%

SUPERVISOR 48.5% 51.3%

PROGRAM 22.2% 23 . 4%

78 1 . s

TABLE 11 (CONTINUED

ACTIVITY lj£0

CHANNEL 1 (T) 61 21 60 . 67,,

CHANNEL 2 (T) 34.07o 40.17o

CHANNEL 3 (D) 58.1% 57 .07.

CHANNEL 4 (D) 55. 5Z 62.77,

^^^^ 0°/

MULTIPLEXOR 3.87c 4.17,

TIME (SECONDS) 14406 13630

These results Indicate that a Systeiii/360 Model 65J would be saturated If it were to try to process the total combined workload. The disk channel activity is higher than could be reasonably expected and is higher than any total channel activity ever measured in the Marine Corps. For example, the activity on the HQMC disk channels seldom exceeds 40 percent. The simulation with all jobs queued shows only a 67. improvement over the Tj^O simulation with regards to throughput time. It is emphasized that this is the best that could ever be expected and clearly indicates a saturated system.

RECOMMENDATIONS

There were four events recommended for 1973 and 1976. For 1973, the first event was the actual installation of the Model 651 at the Third Marine Amphibious Force on Okinawa. Next was the shifting of al processing presently at Camp Butler, Okinawa to the 111 MAF system. The third event was to ship the Model 651 CPU/CHANNELS from Camp Butler, Okinawa to the Marine Corps Supply Activity, Philadelphia and install the Model 651 CPU/CHANNELS with no other changes. One of the Model 501' would then be moved from Philadelphia to Okinawa and installed.

In 1976, the target date for most of the changes, the four events would be as follows: First, shift some processing from Philadelphia to Albany, Georgia; then ship the System/360 Model 651 with % of the peripherals to Albany and install it with 256K additional core. The Model 501 at Philadelphia would continue to process. The third event for the year would be the shifting of the remainder of the processing from Philadelphia to Albany, Finally, the remaining Model 501 from Philadelphia could be installed at Quantico, Virginia. This would then release the two Model 40' s there for re-utilization elsewhere.

SUMMARY

The Marine Corps has been able to use simulation as an effective tool for determining equipment allocation. The capability of building models from hardware monitor data has allowed models to be built in a short period when a considerably longer time would have been required otherwise.

79

.

HUMAN FACTORS IN COMPUTER PERFORMANCE ANALYSES

A. C. (Toni) Shetler

The Rand Corporation

ABSTRACT

The results of computer performance ENSURING PARTICIPANT AWARENESS experiments executed in a normal work en- vironment can be invalidated by unexpect- When an experiment relies on overt ed activities of the people involved actions of people other than the analyst, (operators, users, and system personnel). it is critical that their orientation in- Human behavior during an investigation can clude an awareness of the experiment's confound the most careful analyst's re- relationship to the hypotheses; lack of sults unless this behavior is considered awareness can lead to inconclusive re- during experimental planning, text execu- sults. The negative effect was clearly tion, and data analysis. Human actions illustrated in an experiment where normal need not thwart efforts and can, in some operations were required to validate sys- instances, be used profitably. tem specific hypotheses.

In general, an analyst can design an The investigation involved increasing experiment that precludes dysfunctional the peripheral capacity of an IBM 360/65 human behavior, can collect data to in- MVT system. The workload processed by the dicate when it occurs, or can plan the computer system indicated that projected experiment so the behavior is useful. workloads would saturate the system. A channel was available to attach additional INTRODUCTION peripheral devices to validate a hypothesis that suggested a solution to this problem. This paper considers human behavior during experimental planning, test execu- The hypothesis was stated as follows: tion, and data analysis for computer per- Adding peripheral devices to the available formance evaluation. Human problems in channel and directing spooled data to the testing environment during a computer those devices will result in an increased performance evaluation effort can be lik- capacity; proving and implementing this ened to a field test for a product veri- solution should result in an increased fied in a laboratory. That is, a hypo- machine capacity. The experiment in- thesis that explains user behavior or cluded 1) testing the hypothesis using a system relationships is usually first ex- specialized job stream that led to stat- amined in a controlled environment. Once ing the hypothesis; 2) if these control- a hypothesis has been explained in the con- led tests were positive, following them trolled environment, it is usually neces- with a modified environment for normal sary to expand the explanation to the nor- operations; and 3) then verifying a capac- mal environment; this involves exposing ity increase by examining accounting data. the hypothesis to the variability of human This test could only be executed once reactions. A computer performance experi- because a special device hookup was re- ment using the normal system environment quired which was borrowed from another requires that an analyst be aware of the computer system and had to be returned at human problems associated with testing the conclusion of the experiment - the hypotheses time frame was a 15-hour period.

Five suggestions to consider when The first part of the experiment, to conducting such experiments include: verify the hypothesis in a controlled en- vironment, was dramatic in its conclusive- o Ensuring participant awareness, ness. Therefore, we prepared for the part, to validate the new periph- o Defining the measures, second eral configuration the following morning. o Testing for objectivity, The operators had modified start-up pro- cedures and were told an experiment re- o Validating the environment, quired them to use these procedures until o Using multiple criteria. 10:00 o'clock. The users were not inform- ed of the experiment because their partic- The format of this paper is to identify ipation included submitting jobs — a each consideration and present a brief task they performed normally. The results illustration. were examined after the 10:00 o'clock

81 completion and were very positive. In ority preventing any one job from gaining fact, substantially more work, in terms of control for a prolonged time. The system jobs processed and computer resources used modification was first tested in a control- (CPU time, I/O executions, etc) had been led environment with special tests to ver- accomplished with the modified peripheral ify its execution characteristics, and was configuration when compared to the equiva- then used for several days in the normal lent time period of the previous week work environment. (morning start-up to 10:00 AM). Only later did we discover that the operators, Aside from verifying that on-line users in an attempt to "help" the experiment, were actively using the system (through got started much faster than they had the accounting and observation) , the validity previous week and thus made the system of the hypothesis was confirmed by noting available for 25% more elapsed time than the abrupt end to the complaints about lock- is usual during the time prior to 10:00 AM. out. This experiment emphasized the need The test results were invalidated; nothing for an appropriate measure that was easy could be said about the effect of a mod- to collect from the users. ified peripheral configuration for the nor- mal environment, and the addition of peri- TESTING FOR OBJECTIVITY pheral devices could only be based on the non-system specific testing in the con- Users are not always good subjective trolled environment using the test job evaluators of system changes, and the ana- stream. lyst should not trust them. An experiment to evaluate the effects of reducing the This experience most emphatically memory dedicated to an on-line system re- demonstrated the problem of active partic- quired soliciting user evaluation to ipants not being aware of their role in identify the effect of reducing the avail- the experiment -- help from the operators able memory. The users were told that invalidated the experiment. This problem reduced memory configuration would be avail- is termed the "Hawthorne effect", and it able Monday, Wednesday, and Friday, and must be considered whenever it may in- standard memory would be available Tuesday validate results. and Thursday. Following the experiments, the users claimed that the Monday/Wednesday/ DEFINING THE MEASURES Friday service was unacceptable and the Tuesday/Thursday service was very good. When using people to measure the Only subsequently were they informed that results of an experiment, the measures Monday/Wednesday/Friday service was based should be selected to require the least on the reduced memory configuration. Self- overt action on their part. A particularly interest, combined with changed perceptions vivid illustration of this occurred in an of reality, make users poor subjective experiment to test the resolution of pri- evaluators of system modifications. orities among on-line users. VALIDATING THE ENVIRONMENT The on-line user jobs locked each other out for long periods (5-10 minutes) Test periods can coincide with a par- and these users were extremely vocal in ticular event and can invalidate the experi- their concern that something be done to ment. Spurious workload changes can also eliminate the lockout. The problem reso- invalidate experimental results. If a test- lution constraints were: ing period coincides with a particular event, such as annual accounting and re- o No user was to be singled out as the ports, the results can be deceiving when culprit and prevented from executing; applied to the standard working environment. Normal fluctuations in workload over short o No user program should require modi- periods (one or two days) can result in fication to implement a solution; the same effect. o No job could lockout other jobs for a prolonged period (greater than 5 One short test (one-half day) was

to 10 seconds) ; seriously affected by a user who had just completed debugging several simulations and The solution must be simple to imple- o submitted them for execution. The execu- ment and inexpensive to run -- the tion of these simulations resulted in facilities available time-slicing extraordinary CPU utilization (approximately were not acceptable. double) , definitely out of proportion to the workload characteristics when data over The measure defined was quite simple: a longer period were examined. Fortunately, the intensity the on-line users' com- of this workload shift was discovered, though plaints about lockouts — these users many analysts forget that a computer system would complain until their problem was seldom reaches a "steady state". The work- solved. The procedure for testing the load characteristics of a system during a also very simple hypothesized solution was short test period must be compared with — a system modification to the interrupt characteristics over a longer period. A pri- logic that rotated jobs of the same formal comparison period with appropriate

82 .

controls should be used to reduce the pro- third week to be used as a control period, bability of invalid results caused by au- while the second week provided a test en- tonomous workload changes. vironment that simulated the less expensive hardware configuration. The results of Accounting data should be verified to the experiment were: validate the normalcy of the environment. The measures that can be included are: o Batch turnaround had increased from 0.5 to 0.7 hours, with 60-80%, CPU utilization rather than 90% of the work being accomplished within an hour Mass store channel utilization (deter- mined from statistics on batch turn-

Tape channel utilization around) . Memory requested per activity o On-line system response time had no Memory-seconds used per second statistically significant change; a variety of on-line system commands Average number of activities per were tested using a special purpose job hardware stimulator. Activities initiated per hour o BIOMOD, a full vector graphics sys- Tape and disk files allocated per tem, was used by systems personnel activity to evaluate the quality of the on-line Average number of aborts per 10 0 response -- the quality was found to activities be degraded, but not disabled. The users of these services could still Cards read and per punched activity function. Lines printed per activity o CPU utilization had gone up 5-10% If the objectives, hypotheses, and pro- above the 6 0-65% level it had been cedures for an experiment are clearly operating at previously (data from defined, many of the metrics listed above system accounting) may be irrelevant and the analyst can ig- nore them. By combining the different criteria it was established, with reasonable certainty, USING MULTIPLE CRITERIA that the memory module could be removed without imposing on the productivity of Validating hypotheses about major the users of that system. Virtually no changes to the computer system presents a one complained that the conclusion was serious problem. The magnitude and direc- incorrect although several users (including tion of the proposed modification usually one experimenter) vehemently argued against dictate the level of investigation to the the core reduction until the multiplicity impact on the users. This level is usu- of unanomous indicators v/as known. ally quite high, particularly if the ob- jective of an investigation is to reduce CONCLUSIONS the availability of resources. People appear to accept results more readily when Introducing human variability into the a number of different indicators all point arena of computer performance evaluation ex- toward the same conclusion. perimentation requires that the analyst take more than the usual precautions, as An investigation to remove a memory illustrated above. In addition, the gener- module, at a savings of $7,500.00 per al procedures to validate hypotheses about month was definitely desirable from the a particular system requiring testing in computer center's viewpoint if the fol- the uncontrolled environment should include lowing could be validated: verifying the hypotheses in a controlled environment and then, exposing the hypo- o Batch turnaround would not increase theses to the uncontrolled environment. beyond one hour for the majority The normalcy of the environment should be of the jobs. validated through checks on the accounting data and through use of a control period, o On-line system response time would The inclusions of these not increase significantly. as appropriate. should result in more accuracy in testing o The quality of response to on-line hypotheses involving people interactions. systems would permit them to still be viable. REFERENCES

The experiment involved collecting mea- Bell, T.E., B.W. Boehm, and R.A. Watson, sures from the system, systems personnel, "Computer Performance Analysis: Frame- and a hardware stimulator. Measurements work and Initial Phases for a Performance from a three week period were analyzed. Improvement Effort," The Rand Corp., The test plan called for the first and R-549-1-PR, November, 1972.

83 Bell, T.E., "Computer Measurement and Eval- Roethlisberger , F.J., and W.J. Dickson, uation — Artistry, or Science?", The MANAGEMENT AND THE WORKER, Cambridge, Rand Corp., P-4888, August, 1972. Mass., Harvard University Press, 1939

Bell, T.E. , "Computer Performance Analysis: Measurement Objectives and Tools," The Seven, M.J., B.W. Boehm, and R.A. Watson, Rand Corp., R-584-NASA/PR, February, "A Study of User Behavior in Problem 1971. Solving with an Interactive Computer," The Rand Corp., R-513-NASA, April, 1971.

Kolence, K.W. , "A Software View of Mea- surement Tools," DATAMATION, Vol. 17, No. 1, January 1, 1971, pp. 32-38. Sharpe, William F, , ECONOMICS OF COMPUTING, Columbia University Press, 1969. Lockett, J. A., A.R. White, "Controlled Tests for Performance Evaluation," The Rand Corp., P-5028, June, 1973.

Watson, R.A. , "The Use of Computer System Mayo, E. , THE HUMAN PROBLEMS OF AN INDUS- Accounting Data to Measure the Effects TRIAL CIVILIZATION, Macmillan, New York, of a System Modification," The Rand 1933. Corp., P-4536-1, March, 1971.

84 .

DOLLAR EFFECTIVENESS EVALUATION OF COMPUTING SYSTEMS

Leo J. Cohen

Performance Development Corporation

ABSTRACT measurement value that can then be com- Our technical evaluations of comput- pared. As a consequence, on such an "ab- ing system performance tend to concentrate solute" scale of performance the relative on specific aspects of capability or capa- capabilities of two systems can be evalu- city. The dimensions for such measure- ated. To accomplish this the dimension ments that are involved are scientific required is "dollars". Thus we cast in nature; CPU-seconds, byte seconds, and dollars - the universal measure of mater- so ono Generally speaking, these dimen- ial value - in the role of performance sions are incompatible and must be con- measurement unit for the entire system. joined via analysis into a synthezised In the broadest view we will say that the view of the whole. Of particular interest "dollar capacity" of our system is its to this paper, is the problem of creating total cost per hour, per month, or as de- a dollar evaluation that is related to preciated, and we will distribute that the configuration as a capacity, and to capacity into "use" categories. We will the use of that capacity by loads in all then measure the useful work obtained from equipment categories. these dollars against the totality of dollars involved. The approach adopted is the develop- ment of the concept of dollar use of the various configuration equipments relative to the dollar capacity that these equip- ments make available. The use dollars are developed from observed measures of technical performance, and the method conjoins these various measures into a Use Categories set of dollar related measures of overall performance

The measures of overall performance The three major categories of a com- in dollar value terms leads naturally to puter configuration are the CPU, the me- a set of figures of merit that are re- mory, and the peripheral equipments. For ferred to as dollar effectiveness. It is the CPU there are four load categories then shown, with a 3-system example, that into which we can distribute the total CPU these dollar effectiveness measures pro- capacity. These categories are applica- vide absolute measures of comparability tion operating system, cycled and idled between different systems in terms of the CPU load. Thus, in a particular instance value of technical performance relative of CPU utilization there will be, in each to total dollars of cost. of these categories, a certain load repre- senting a portion of this capacity. We define this load, determined after the INTRODUCTION fact, as a "use". For example, suppose that in a period of observation of sixty In dealing with the CPU, the measure- seconds, 25 seconds was given over to the ment dimension is often the CPU-second, execution of application programs. We CPU-minute, and so on, depending on the will then say that the application use of load unit that is most convenient. For the CPU was 25 seconds. Similarly, there the peripheral equipments, we measure data will be an operating system use of the CPU, transfer load in terms of device-minutes, and so on, for each of the four CPU cate- etc., while for the memory the dimensions gories . are usually of the form word-second, or byte second. That is, the dimensions Memory use will be defined in an anal- chosen for measurements made in each of gous fashion. If 120 kilobyte seconds of the equipment categories best serves memory load are observed for, say, appli- those Categories without overt concern cation programs over a given performance for their compatibility. This means that period, then the application use of the there is no direct approach available, memory was 120 KBS. Thus, the notion of based on these dimensions, that will allow use is after the fact, whereas the concept us to unify the performance figures for of load appears as a specification of a each of the equipment categories into a requirement on the system given in advance. homogeneous measurement that is capable This is precisely the context in which the of providing a scale of performance suited notion of equipment use should be under- to comparative evaluation. Therefore, we stood. That is, we seek to determine ways seek a new dimension which subsumes perfor- in which the dollars of the system are mance measurements in all equipment cate- used, and the implication is that the de- gories and conjoins them into a single termination can only be made after an

85 . observation of the system has occurred. The Distribution Matrix

Each of these equipments can be asso- Figure 1 is an example of a distri- ciated with a capacity and the loads im- bution matrix. In the left hand column posed on them can be distributed over the of the matrix the various equipments of four use categories described above. The the configuration have been listed. In discussion of dollar effectiveness that this example, the configuration consists follows is dependent on this distribution of the CPU and memory, two channels, a of loads by equipment type and usage, and disk, and three tapes. For the sake of in the appendix a detailed accounting of simplicity the controllers for the de- the usage categories is made for the vari- vices have been excluded from the example. ous equipments. However, the discussion of the dollar effectiveness techniques The next column of Figure 1 gives the should provide a suitable contextual defi- total cost per hour for each equipment nition of these elements, and it is there- listed in column one. This is then fore recommended that the reader reserve followed by four column pairs, one pair his review of their formal specifications. for each of the rate and use categories. The data of the first row thus represents the distribution of the $90 per hour total DOLLAR USE cost for the CPU into the application, operating system, cycle and idle dollar The equipments of the computer confi- use categories. The second row shows the guration can be associated with a cost distribution of the $110 per hour cost per unit of time. Since the load for each of the memory. Here the distribution ma- of these elements is already distributed trix indicates that $33 of this total cost over the four load categories, it follows was used in the execution of application that an equitable distribution of the in- programs, or equivalent ly , by users in dividual costs would be on the basis of solving their problems. Of the total cost proportional load in each category. Thus, per hour $2,20 was spent on the operating that proportion of the total CPU load system, and $30,90 represents the cost of which is due to execution of application cycling of the total $110 cost per hour programs will be said to account for a for the memory, 40% of the memory was un- proportionate amount of the cost for the used, and as a consequence $44 of that CPU itself. For example, suppose that 50% expense represents the price paid for this of the total CPU load is due to applica- idle capacity. tions, 20% due to the operating system, 5% due to cycling, and 25% to idling. If Looking at the two rows associated the total cost for the CPU were $90 per with channels one and two, it can be seen hour, then this cost would be distributed that their individual cost per hour is into four "dollar use", categories; appli- $25, Channel one evidently connects both cation use, operating system use, cycle application and operating system files to use, and idle use. In the present example, the core memory since there are rates the $90 would therefore be distributed shown in both of these categories. As a into $45 of application use, $18 of operat- result, there is an operating system use ing system use, $4.50 of cycle use, and as well as an application use for the em- $22.50 of idle use. It should be noted ployment of channel one. These figures that just as the four load categories are are 75(J and $4.50 respectively. On the all-inclusive and mutually exclusive, so other hand, channel two apparently connects also are the four use categories. This no operating system files to the core means that these use categories must have memory, and therefore the operating system their individual dollar values summing up rate and associated operating system dollar to the total cost per unit time for the use for this channel are both zero. The equipment figures indicated in the row for the disk show that the operating system and appli- It should be obvious that the tech- cation files are both resident on that nique described above can be extended to device, whereas the tape equipments all all of the equipments of the configuration. indicate that only application programs This follows directly from the fact that make reference to tape files. for each type of equipment a definition of each of the four load categories has been Overall, each equipment of the con- established. Dividing the load observed figuration is associated with a total cost, by the period of the observation results a portion of which is attributed to appli- in the load rate, and it is this rate cation use. In one sense, the best possi- that then gives rise to associated dollar ble system is that one in which the user, use for the equipment. through applications programs, derived a maximum dollar benefit. To measure the degree of dollar benefit derived by users from the system, all dollar use values in the application category are added

86 .

AppI. AppI. Cycle Cycle Idle Idle per Hour Rate $ use Rate $ use Rate $ use Rate $ use

CPU 90 .60 54.00 .20 18.00 .05 4.50 ,15 13.50

MEMORY 110 .30 33.00 .02 2.20 .28 30.80 .40 44.00

CHANNEL 1 25 .18 4.50 .03 .75 .54 13,50 ,25 6.25

CHANNEL 2 25 .26 6.50 0 0 .34 8.50 .40 10.00

DISK 40 .18 7.20 .03 1.20 .54 21.00 .25 10.00

TAPE 1 40 .11 4.40 0 0 .15 6.00 .74 29.60

TAPE 2 40 09 3.60 0 0 11 4.40 .80 32.00

TAPE 3 40 .06 2.40 0 0 .08 3.20 .86 34.40

TOTALS $410 $115.60 $22.15 $92.50 $179,75

THE DISTRIBUTION MATRIX

FIGURE 1 together. In the example this is $115.60. the common controller and thereby makes Thus, of the total $410 per hour cost for application program files available. In the system, application jobs use these particular it should be noted that the dollars to the extent of $115.60 for pro- operating system has no files on these blem solving. Adding up the operating tapes and therefore makes no reference to system dollar use values we see that them $22.15 of the $410 cost per hour is nec- essary to produce the results of applica- For the second configuration the same tion program execution. Furthermore, the mainframe will be employed but an addition- example shows that a total of $92.50 was al disk set and channel will be added. expended in cycling the various facilities These additional equipments will be used of the system. This must be accounted as for files associated with the application a cost for producing user results as well. programs only. In the third configura- Finally, the total cost of the idle faci- tion these two channels and associated lities is $179.75. disk sets are retained, but the mainframe is changed from a Model A to the slower "Model B". A CONFIGURATION EXAMPLE

As mentioned earlier, the concept of APPLICATIONS dollar effectiveness is to be applied system wide. In order to derive measures 0/S that conjoin the effects of each of the DISK SET 1 equipment categories over each of the load ing categories, we demonstrate the appli- cation of the dollar effectiveness tech- CH 1 nique to a system configuration problem. MPLX PRINTER MODEL A We will first consider a system hav- READER ing the "Model A" mainframe, with two channels, a disk set consisting of a num- CH 2 ber of disk packs on a common controller, and two tapes. The configuration is shown in Figure 2. It also includes a APPLICATIONS multiplexor connecting a printer and a card reader. As the figure shows, both T 2 the application programs and the operat- T 1 ing system have files on the disk set, and therefore there will be a degree of conflict between the application jobs and the operating system for references SYSTEM #1 to the disk set. On the other hand, channel 2 connects the two tapes through FIGURE 2

87 . . .

We will consider the distribution $700.00 per month and so on. The total matrix for each of these configurations. monthly cost is $21,964.00. However, for the sake of simplicity we have excluded all details of controller The next four column pairs represent performance, assuming instead that since each of the four use categories for these there is one controller per channel, the equipments. For example, the rate at controller performance is sufficiently re- which the CPU was employed for application presented as part of the channel cost. jobs is .432. Therefore, of the total The figures for the multiplexor and its $6,740.00 cost for the CPU, application attached devices have also been left out jobs have gotten .432*6740=2912 dollars of the examples. This is done in the of use per month. interests of simplicity and economy of discussion. However, in a formal analysis, With the operating system rate (0/S) all equipment categories should be repre- at ,212, the operating system use of the sented, particularly where their indivi- total CPU dollars is $1429.00. In the dual costs may be a significant fraction last two column pairs, we see that the of the whole. Finally, the several disk cycle use and idle use of the CPU, in packs of a unit have been brought toge- dollar terms, are $613.00 and $1786.00 ther under a single heading and desig- respectively nated as a disk set. That is, in the discussions that follow they are to be The distribution for $10,060.00 per treated as a composite entity. A more month cost of the memory is along the accurate analysis would, of course, sepa- same lines, based on the rate figures rate these, but this simplifying repre- shown. It might be noted that the operat- sentation is sufficient for our purposes ing system memory use is particularly high, here suggesting that the operating system is of extreme size. The next largest of these is for cycle use, followed by the appli- Benchmark Loads cation and idle use, in that order. The size of the cycle figure in fact, suggests If a comparison of performance be- that users are requesting larger blocks tween systems is to be conducted, a stan- of storage than they can employ. dard load characterizing the installation's activities must be imposed on both of them. Consider now the figures for channel 1 Such loads are usually referred to as and disk set 1. In the application cate- benchmarks gory the rate at which application jobs employed the disk was .297, However, Benchmarks are most often in the form channel 1 was employed at a rate of ,231. of programs. However, in (1) a rather The disparity in these rates can be different technique is put forward that accounted for in terms of seek operations involves CPU, memory and data transfer on the disk packs. That is, during seek loads. Whichever the method, the basic activity there were periods when no data objective is the same; namely, to establish was being transferred over the channel and a standard load that is reasonably de- therefore the application rate for the scriptive of the way in which the computer channel in that period is zero. It should is employed. be noted that the operating system rate for disk set 1 is .126 and that there is It is just such a benchmark load that also a reduction in the operating system is required for a dollar effectiveness rate for channel 1 as well. Now consider evaluation. This benchmark will be the the cycle rate for disk set 1, This indi- same for all systems studied and when im- cates that 31.2% of the time, files on posed will create loads in all of the this disk set were active but unused. equipment categories. Looking at the cycle rate for channel 1, the figure ,382 represents an increase over the cycle rate for the disk set that System 1 is equal to the decrease in the channel rates for application and operating system The first configuration, shown in together. Thus, the decreased utilization Figure 2, is referred to as System 1, and of the channel in data transfer activities Figure 3 is the distribution matrix for has turned up as increased cycle rate. this system. In this figure all equip- ments of the configuration with the excep- It should be noticed that the idle tion of the controllers, multiplexor, rate for channel 1 and its disk set is printer and card reader are listed in the equal to the idle rate for the CPU. The left hand column. The next column lists implication is that whenever an applica- all of the costs for these equipments, tion job was present in the system the showing them as monthly figures. Thus, programs executed by that job had active the CPU cost is $6740.00 per month, the files on the disk set. This is not the memory $10,060.00 per month, the channels case with tapes 1 and 2, however, since OVERHEAD Application Idle Cost 0/S Cycle Rate Use Rate Use Rate Use Rate Use

CPU 6740 .432 2912 .212 1429 .091 613 .265 1786

MEMORY 10060 .248 2495 .375 3772 .261 2626 .1 16 1 167

CHANNEL 1 700 .231 162 .120 84 .384 269 .265 185

CHANNEL 2 700 .205 144 0 0 .203 142 .592 414

DISK SET 1 2624 .297 779 .126 331 .312 819 .265 695

TAPE 1 570 .091 52 0 0 .076 44 .833 474

TAPE 2 570 .114 65 0 0 .127 73 .759 432

TOTALS 21964 6609 5616 4586 5153

DISTRIBUTION MATRIX

SYSTEM 1

FIOJRE 3 their idle rates are both a good deal Now consider the ratio of the total higher than the idle CPU rate. This means, application use of $6609.00 to the total of course, that some jobs did not execute monthly cost of $21,964.00. Expressed application programs having active files as a percentage, we have that 30,0% of on the tapes. On the other hand, the the total cost was used for application idle rate for channel 2 is .592 and this job execution. On the other hand, the is significantly lower than the idle $10,202.00 of overhead cost, combining rates for either of these tapes. The rea- operating system and cycle use dollars, son for this becomes clear if we consider represents a 46.4% use of the total the application rates for the channel and monthly dollars expended. And, by simi- the two tapes. The channel 2 application lar calculation, 23.5% of the total dollars rate is .205 and is the sum of the two of system cost idle. application rates for the tapes. Thus, the tapes were necessarily utilized sequen- We will refer to the ratio of appli- tially since there is only a single chan- cation use to total cost as the system's nel. However, the cycle rate for channel "dollar effectiveness". The ratio of 2 is .203, and this is the sum of the cy- the sum of operating system and cycle cle rates for these two tapes. This means dollars to the total cost will be referred that files were active but unused on these to as the "effectiveness cost", and the tapes "sequentially"; i.e., at no point ratio of the total idle dollars to total did any job execute requiring the files on cost as the "excess capacity cost". We both tapes, and furthermore at no time will develop from these effectiveness were there two tape jobs executing con- figures a basis for comparison of systems, currently. Thus, the cycle rate for or configurations of a given system. It channel 2 is maximum while its idle rate is toward this objective that we are work- is minimized. ing in the sections to follow.

Measures of Effectiveness System 2

By summing each of the dollar use The configuration for System 2 aug- columns in the distribution matrix for ments that of System 1 by the addition of System 1, we arrive at total dollar use a second disk set and an associated figures in each of the four categories. channel. Figure 4 shows this configura- Thus, of the total $21,964.00 per month tion. Note that the mainframe remains a cost for the system, $6609.00 of appli- Model A, and that disk set 1 is now de- cation use is obtained. Operating sys- voted entirely to the operating system tem use is $5616.00, and the cycle is files, while all application program files $4586.00. These two together represent have been moved to disk set 2. The new an overhead dollar use of $10,202.00. channel is referred to as channel 3, and Finally, out of the $21,964.00 of month- the configuration retains channel 2 and ly cost, a total of $5153.00 is idle. attached tapes, as well as the multiplexor and printer/reader combination.

89 0/S APPLICATIONS

DISK SET 1 DISK SET 2

CH 1 CH 3

MPLX PRINTER MODEL A READER

CH 2

SYSTEM #2

FIGURE 4

The distribution matrix for System 2 positive effect of lower cycle rate for is shown in Figure 5. The equipment list the CPU is lost to higher cycle rates and in the left hand column has been modified associated cycle dollars for the peripheral to include the new channel and disk set equipment. This can be seen by looking along with the additional cost of these at Figure 5 for disk set 1. The effect requirements. More specifically, the of the reconfiguration has been to make equipment costs have increased by the the operating system files available to $700.00 for the third channel and the the operating system jobs without inter- $2624.00 for the disk set 2, to a total ference from application jobs. On the of $25,288.00. But this column of figures other hand, the only activity on this disk is not the only place where there is an set is with respect to operating system effect in the distribution matrix. The files in order to operate the application addition of data transfer capability will jobs. Thus disk set 1 is idle only when also have a general effect that will be there are no application jobs in the reflected in the dollar effectiveness fi- system, and therefore the idle rate is gures . .269. Note that the data transfer load for this disk set remains at .126, repre- The first observation is with regard senting the data transfer load created to the cycle rate for the CPU. With the by the operating system in making refer- installation of the additional peripheral ence to its files. Since there is no devices there is a reduction in data trans- application job reference to this disk set fer conflicts between the operating sys- in the configuration, it follows that the tem and the application jobs, and this remaining load appears in the cycle cate- results in a lower cycle rate. As the gory, and as the distribution matrix shows, figures show, the cycle rate goes from the cycle rate for this disk set is .605. .091 down to .087. However, since the test is made while executing the same set Now consider disk set 2. This disk of jobs as used in System 1, and since set is devoted entirely to the files for the period of observation is kept con- programs executed by application jobs, and stant, it is clear that the CPU rates for therefore the data transfer load created both the application and operating system on this disk set will be the same as in jobs in System 2 must be the same as that System 1 since these are the same jobs. in System 1. Therefore, the decrease in The rate for the application jobs is there- the cycle rate from System 1 to System 2 fore .297. However, the absence of appears entirely as an increase in idle operating system data transfer load on rate in System 2. this disk set means that the remainder of this capacity must be distributed between The effect on the cycle rate of the cycle and idle loads. For this configu- equipment reconfiguration is rather small ration, the cycle rate for disk set is because the application jobs executed for .481. Thus we see that the reduction in these experiments do not create a high cycle rate of the CPU, as a consequence volume of data transfer. In fact the of the addition of the new disk set and

90 ,

OVERHEAD Application Idle Cost 0/S Cycle Rate Use Rate Use Rate Use Rate Use

CPU 6740 .432 291 2 .212 1429 .087 586 .269 1813

MEMORY 10060 246 2474 .375 3771 .259 2605 120 1210

188 CHANNEL 1 700 0 0 .121 85 .610 427 .269

THANNEL 2 700 .205 144 0 0 203 142 .592 414

CHANNEL 3 700 .233 163 0 0 .222 155 ,546 382

DISK SET 1 2624 0 0 .126 331 .605 1588 .269 705

ni<5K SET 2 2624 .297 779 0 0 .222 583 .481 1262

TAPE 1 570 .091 52 0 0 .076 43 .833 475

TAPE 2 570 .114 65 0 0 .127 72 .759 433

TOTALS 25288 6589 5616 6201 6882

DISTRIBLITION MATRIX

r SYSTEM 2

FIGURE 5 attendant channels, must be associated dollar effectiveness measures. By summing with some increases in both the cycle and each of the dollar use columns in our four idle rates of these equipments, and as the categories and making the effectiveness figures of the distribution matrix show, calculations that were applied earlier to with significant additional costs in these System 1, we find that the dollar effec- categories tiveness is 26.1% and that the effective- ness cost is 46.7%. This is derived from A similar situation applies to the the sum of the operating system and cycle channels of this configuration. As the use figures, and is significantly differ- distribution matrix shows, there is no ent for System 2 simply because the im- longer an application rate for channel 1, provement in the mainframe cycle rates while there is a ,121 operating system and associated dollars is more than off- rate, and an idle rate of .269. This means set by the increase in the cost of cycled that the remaining capacity for the chan- peripheral equipment. The indication is, nel is concentrated in its cycle rate, of course, that too much of the increased and that is .610, It should be noted that dollar capacity of the system has gone the idle rate for the channel is equal to into overhead use, while the application the idle rate for the CPU. This channel use of these dollars has remained al- is associated with the operating system most the same. To put it differently, files which are active whenever there is the dollar effectiveness has dropped an application job in the system. while the effectiveness cost has increased. Finally, our calculations also show that Channel 3, the new channel of the con- a significant portion of the increased figuration, is associated with disk set 2 cost of the system has gone into idle and the files of the application jobs. dollars, since the cost of the excess ca- We see that the application rate is .233 pacity is computed to be 27.2%. for this channel, with the difference be- tween this rate and that for disk set 2 being attributed to seeking activity on System 3 the disk in which the channel is not in- volved. The cycle rate for this channel It was stated earlier that the amount is .222 and its idle rate is .546. Thus, of CPU activity relative to the data the reconfiguration has added dollars in transfer activity is high. Yet, as the the cycle and idle categories for the distribution matrix figures show, there channels as well. is a significant amount of idled CPU load remaining in the period of observation. The overall effect of the reconfigu- Let us suppose that our long range in- ration can be appreciated in terms of the terests are potentially satisfied by an

91 excess storage capability represented by tion time of applications jobs, so that the configuration of System 2. On the there will also be a corresponding in- other hand, if our processing load at the crease in the memory loads associated CPU is not expected to rise appreciably, with the application jobs. Thus, the it is reasonable to consider changing the memory application rate is .270, the configuration of System 2 to the extent operating system rate .412 and the cycle that it involves a slower but less expen- rate .284. These increases have been de- sive CPU. We refer to this configuration, rived from the remainder capacity of the shown in Figure 6, as System 3. The slow- memory so that the memory idle rate is er CPU is referred to as the "Model B", at .034. but the remainder of the configuration is unchanged from that of System 2, That is, The data transfer loads and asso- we retain the two disk sets and the tapes ciated rates for the application jobs with their associated channels. The appli- have remained the same in this configu- cation and operating system files are se- ration. This is also true for the parated on the disk sets as before. operating system data transfer load rela- tive to disk set 1 and the associated The major property of System 3 is, channel. However, because the application of course, the change in the CPU from a jobs are present in the system for a Model A to a Model B. There will conse- greater execution time, the files asso- quently be a number of effects since the ciated with these jobs are active for a CPU for Model B is approximately one half correspondingly greater time period. Be- as fast as the one for Model A, and its cause the application load has remained cost is approximately 1/3 less. This the same, however, this implies an in- gives a total cost of $23,368.00 as com- crease in the cycled load and associated pared to $25,288.00 for System 2. cycle rates. The distribution matrix indicates that these cycle rate increases are significant.

0/S APPLICATIONS Again, each of the dollar use figures are totaled at the bottom of the distri- bution matrix. This shows the dollars of DISK SET 1 DISK SET 2 application use at $6,825.00 while the total overhead dollars are $13,968.00, CH 1 CH 3 and the total idle dollars are $2,575.00. Our effectiveness calculations then show that the dollar effectiveness is 29.2%, the effectiveness cost and the PRINTER 59.8%, B cost MODEL READER idle 11.0%.

CH 2 ABSOLUTE EFFECTIVENESS APPLICATIONS The sense in which we have been pur- suing the concept of dollar effective- ness is related to the degree of effec- 0 tive use, or application, that is derived from the total dollars spent on the sys- SYSTEM #3 tem. Thus, the dollar effectiveness fi- gure that was defined earlier is a measure of the percentage of these total dollars that are effective in problem solving. FIGURE 6 But to compare the performance of two sys- tems on a dollar basis requires that we Both the rate and the dollar effects take into account not only the effective- are immediately apparent in the first ness of the system's cost, but also the line of Figure 7, showing the distribu- specific cost to achieve that effective- tions for the CPU. As can be seen, the ness. That is, a high dollar effective- total monthly cost for the CPU is ness at a high effectiveness cost implies $4,820.00. The application rate has a low excess capacity cost. If the two jumped to .603, the operating system rate systems under consideration have equal to .296, while the cycle rate has fallen total cost, then clearly the higher the to .013. The idle rate, representing the dollar effectiveness and lower the effec- remainder capacity for the CPU, is nearly tiveness cost, the better the effective- at zero. ness per unit cost. We can formalize this expression by dividing the dollar As a consequence of the slower CPU effectiveness figure by the effective- there will be an extension in the execu- ness cost figure for each of the computing

92 .

OVERHEAD Application Idle Cost O/S Cycle Rate Use Rate Use Rate Use Rate Use

CPU 4820 .603 2906 .296 1427 .013 63 .088 424

MEMORY 10060 .270 2716 .412 4145 .284 2857 .034 342

CHANNEL 1 700 0 0 .121 85 .791 553 .088 62

CHANNEL 2 700 .205 144 0 0 .616 431 .179 125

CHANNEL 3 700 .233 163 0 0 .597 418 .170 119

DISK SET 1 2624 0 0 .126 331 .786 2062 .088 231

DISK SET 2 2624 .297 779 0 0 .552 1449 .151 396

TAPE 1 570 .091 52 0 0 .096 54 .813 464

TAPE 2 570 .114 65 0 0 .165 93 .721 412

TOTALS 23368 6825 5988 7980 2575

DISTRIBUTION MATRIX

SYSTEM 3

FIGURE 7

systems under consideration. We can then System 2 as being of critical interest, normalize their different total costs to so that we now seek to evaluate the re- derive a measure that shall be referred to lative merits of System 2 and System 3 as "absolute effectiveness", which is use- in dollar effectiveness terms. ful in the comparative evaluation of sys- tems . Note first that the dollar effective- ness for these two systems is 26.1% and 29.2% respectively, while their respec- Sample Comparisons tive effectiveness costs are 46.7% and 59.8%, By inspection of these figures To demonstrate the absolute effective- one can see that the improvement in ness idea, we have listed the three effec- dollar effectiveness in System 3 is more tiveness computations for each of the three than off-set by a coincident increase in systems in Figure 8. This shows dollar effectiveness cost. For System 2 this effectiveness, effectiveness cost, and means that for every dollar spent to run excess capacity cost foreach of the systems the system (overhead), 55.8 cents were and with these figures, several other fi- employed in problem solving. On the gures that are derived from them. Let us other hand, for System 3, only 48.8 cents first, however, recapitulate some of the of each dollar was employed in problem evaluation parameters that we established solving. To put this another way, the in the development of this configuration ratio of dollar effectiveness to effective- problem ness cost is best when maximum. This occurs when the dollar effectiveness is We began with a given load of jobs maximum and the effectiveness cost is which was seen to be relatively "compute minimum. Hence the effectiveness per unit bound", and with a significant amount of cost represents a measure of application idled load available in our first system, use relative to the cost of system opera- designated as System 1. We then adjusted tion. However, effectiveness per unit the configuration to System 2 and stated cost does not take into account the total that an excess data transfer capability cost of the system. That is, the dollar was desirable for future requirements. effectiveness and effectiveness cost It was presumed that the CPU requirement values are both derived relative to total would not appreciably increase, and we system cost, so that their ratio removes considered a lower cost configuration hav- this cost factor. If, therefore, the ing a slower CPU that was designated as effectiveness per unit cost is divided by System 3. For the scope of the present the total cost, we will get a figure that problem, we will consider the requirement is referred to as "unit effectiveness per for increased data transfer capability, dollar", or more simply as "unit effec- used to justify the configuration of tiveness". This unit effectiveness is

93 System #1 System #2 System #3

DOLLAR EFFECTIVENESS (A) 30.0% 26.1% 29.2%

EFFECTIVENESS COST (B) 46.4% 46.7% 59.8%

EXCESS CAPACITY COST 23.5% 27.2% 1 1.0%

EFFECTIVENESS/UNIT COST (C=A/B) .646 .558 .488

TOTAL COST (D) $21 ,964 $25,288 $23,368

EFFECTIVENESS/DOLLAR (10,00 C/D) .294 .220 .208

ABSOLUTE EFFECTIVENESS

FIGURE 8 actually the dollar effectiveness divided employed by the current job load, employ- by the product of the effectiveness cost ment of the available data transfer capa- and the total cost. Thus the higher the city, represented by cycled data transfer dollar effectiveness and the lower the loads will have to be realized by in- effectiveness and total costs, the great- creasing the data transfer loads created er will the unit effectiveness figure be. by the current jobs. This means chang- ing the nature of these jobs, which pro- More broadly, if unit effectiveness vided one of the parameters of our evalua- represents the number of dollars of use tion problem as stated earlier. This relative to the dollars of cost, then unit would not be the case for System 2, how- effectiveness measures this on a per ever, since the excess CPU capacity implies dollar basis, and as such it is a figure that we might run our current jobs, plus of merit that is directly comparable for new jobs that create additional data two systems of differing total cost. transfer load.

In the table of Figure 8, effective- ness per unit cost for System 2 is ,558 Lost Dollars and for System 3, .488. Dividing these by their respective total costs and multi- In order to be able to employ idled plying the result by 10,000 to place the peripheral equipment load we must have dividend on a convenient scale, gives the some idled CPU load available. That is, unit effectiveness value 220 for System 2, to use the peripheral equipments it is and .208 for System 3. Thus, the cheaper necessary to first execute instructions system does not provide as much perfor- on the mainframe. Therefore what of the mance for the dollar as the more expensive case where the rate of utilization of the one. However, we may not interpret this CPU leaves the idled CPU load at or near as meaning that for accomplishing the same zero? This is effectively the circum- job as represented by the CPU and the data stance in Figure 7 where the idled CPU transfer loads exerted by application jobs rate is only 8.8%. For all intents and on the two systems, the performance of purposes we may say in this situation System 2 relative to its total cost of that there is virtually no more CPU "room" $25,288.00 is better than that for System for the execution of additional jobs, and 3 at its total cost of $23,368.00. That as a consequence the various idled rates is, the total number of dollars that are for the peripheral equipment represent unused, and which is referred to in Figure wasted assets and their associated idled 8 as the excess capacity cost, is signifi- dollars are wasted as well. We will re- cantly higher in System 2 than in System 3. fer to these as "lost dollars". This means that in System 2, 27.2% of the total cost of the system is unused, and Of course the same may be said for represents dollars that are available for the memory loads that cannot be utilized employment. Of course not all of these because there is no more available CPU dollars can be turned into application capacity. This is hardly a question in dollars. However, the greater this ex- the example of Figure 7 since the idled cess cost figure is, the greater the po- memory rate is only 3.4% and this suggests tential for conversion into application that even if there were an available idled dollars. This implies then, that for CPU rate there is no room in the memory System 3 the 11.0% excess capacity cost for the execution of any further jobs. figure leaves little room for such con- version. Furthermore, since the CPU ca- This question of lost dollars can be pacity of System 3 is almost entirely given a somewhat broader scope if one

94 raises the argument that even though APPENDIX idled load isn't available at the CPU for the introduction of additional jobs, there In this section we turn our atten- may be a significant cycle load available. tion to a more detailed investigation That is, by increasing the average degree of the data transfer characteristics of of multiprogramming we might indeed be the equipment in the data transfer chain. able to have its CPU load requirements These equipments will include the channels, satisfied by the available cycled CPU the controllers, and the devices. The ob- load. Of course for this to have any jective is to define categories of use for effectiveness whatsoever it is necessary these equipments so that these categories that the new jobs introduced be CPU only, are compatible with the application, opera- or highly CPU bound. Such jobs do not ting system, cycled and idled categories use the peripheral equipments and as a defined for the CPU and memory. consequence we will still have our idled load and again the same lost dollars. Channels and Active Files In summary then, once the idled CFTJ rate gets particularly low (and/or the The channel is considered as a idled memory rate) we can expect that necessary connector between file and core idled capacity in the peripheral equip- memory. This implies a connection to a ment translates into lost dollars. particular device or set of devices. We define the channel as being idle when The significance of lost dollars is either one of two conditions exist. The that they represent money paid for the first of these is that there are no jobs configuration which can never be utilized. at all in the system; the second condi- If tuning has been accomplished so that tion is that no job currently present in both the main frame and peripheral equip- the system executes a program that makes ments are operating optimally for the reference to any file on any device con- loads we impose, then it follows that nected by the channel. The condition for little short of a juggling of the job an idle channel is of course obvious when schedule can do much to recover any of the device has a removable media, such as these lost dollars. In fact, it is most a magnetic tape or disk pack. In such likely the case that modification of a cases, when no media is mounted, the de- current schedule that produces satisfac- vice clearly has no file present that will tory thruput and turnaround will modify be referenced by a program execution. For these performance figures of merit as devices with permanently mounted media, well, so that in exchange for fewer lost however, such as a drum, the files carried dollars the installation pays the price by the device are permanently available. of lesser technical performance. In this case, the criteria is whether or not that availability is of interest to any job currently executing in the system. When some job in the system executes a CLOSING REMARK program making reference to a given file, we assume that a data transfer reference The dollar effectiveness approach to to that file is potentially likely and performance analysis is designed to give that the job could not run with out the presence of the file on the device. By performance management a means by which the total effectiveness of the system extension, then, the job could not run without the presence of the channel for over all equipment categories - CPU, memory, channel, controllers, and peri- communicating the data of that file to the core memory. Thus the channel is pheral devices can be measured. It furthermore can be thought of as an deemed necessary when a file associated overview approach to technical performance with a given job execution is present on peripheral device, and in that it relates performance in all of the associated It is time these equipment categories in terms of unnecessary otherwise. the distribution of the various loads im- period over which the latter condition posed on the system by applications, and exists that is referred to as "idled time, and we will such the loads that occur as a result. This channel" call therefore provides an opportunity for files "inactive". On the other hand, performance management to gauge the value files associated with current jobs will of future needs; i.e., growth capability, be called "active". and to take this into account when plan- ning the acquisition of new equipment to Consider now the case where the channel connects core memory to a device modify the current system, or a complete system to replace it. holding a file referenced by one of the jobs currently executing in the system; under these conditions the channel is not idled. To distinguish between a channel's active time, that is the time it is transferring data, and its inactive

95 but non-idled time, we shall refer to the vice. In fact, if there were a second, latter as "cycled channel" time. The separated group of devices, with only a non-idled, non-cycled time for the chan- single channel connection and a much nel can then be distributed into two cate- higher activity, the meaning of cycled gories. The first of these is the "appli- load on the unused channel relative to cation channel" time, which applies when- the device group with very low activity ever data is transferred between core is reasonably clear. That is, in this memory and the device for an application case the allocated but unused channel job. The second category is "operating facility is not only wasted but mis- system channel" time, and is applied placed, and designation of this unused whenever the data transferred across the facility time as cycled load rather than channel is associated with an operating idled load underlines this situation. system file.

Thus, we have distributed the total Floating Channels time of channel availability into four categories: When the channel is trans- A possible configuration for data mitting data associated with an applica- transfer channels in a system is as tion file, application channel time is "floating" channels. This means that accumulated; when a file associated with any channel not currently involved in the operating system is employing the a data transfer activity may be assigned channel to transfer data, operating sys- to any specified device in the confi- tem channel time is accumulated; if no guration. Under such circumstances all jobs are present in the sys tem or no file channels are creating idled load when on the device connected by the channel is there are no jobs in the system or no associated with any job currently in the files mounted on any devices reachable system, the channel is idled; all time by the floating channels. Suppose now remaining in the period of observation that there are C floating channels, and is therefore cycled channel time. that there are F active files in the system. If the number of active files Suppose now that the channel is is less than the niomber of floating associated with not just one device, but channels, we will say that C - F channels with a set of devices. Such a situation are idled. Otherwise, if the number of occurs when the channel leads to a set active files is greater than or equal of disk packs or a set of magnetic tapes. to the number of floating channels, no It is then sufficient for the purpose of channels will be inactive and no idled channel load definition to consider all channel load is created. Figure 9 de- of the devices that are accessible by picts floating channels under different the channel as a single device. That is, usage conditions. if no device of the group is associated with any job currently active in the sys- Rather than define cycled load di- tem, we will also say that the channel rectly in an environment involving is idled. floating channels, it is better to com- pute the cycled load for all channels as the remainder after the application, Multiple Channels operating system, and idled loads have been accounted for. Thus we can think The same definition of idled channel of channel capacity in terms of, say, load also applies to the case where more so many "channel-seconds". The distri- than one channel is associated with a bution of this capacity must then be over common group of devices. Thus, suppose each of the four load categories, leaving that with a suitable controller design, one of them, in this case cycled load, to two channels could connect to the same be calculated from the rest. group of disk packs. Then both channels are idle whenever no disk pack is mounted or when no files on any of the mounted Multiplexors disk packs pertain to any jobs currently operating in the system. This parallel Multiplexor sub-channels operating configuration, however, does present some in a non-burst mode can be considered as minor difficulty in the definition of a collection of individual floating cycled channel load. That is, suppose channels relative to the devices that they that the level of activity in the system connect. The distribution of the load is sufficiently low so that only one of on the multiplexor channel is then propor- the two channels to the set of devices it tional to the utilization of its sub- is also inactive. Therefore, we must de- channels in the four load categories. fine this unused channel capacity as When the multiplexor operates in a burst cycled load. The argument here is that mode, however, it acts as if it were a the definition of cycled load is indepen- selector channel. That is, for the period dent of the degree of activity at a device, of burst mode operation the entire set of or relative to the files held by the de- sub-channels of the multiplexor is con-

96 . .

ACTIVE FILES

F1 F2 F3

3 ACTIVE CHANNELS

9 INACTIVE CHANNELS

ACTIVE FILES

ALL CHANNELS ACTIVE

FLOATING CHANNELS

FIGURE 9 nected to, or are available for, data activity over all channels will be asso- transfer to one specific device. All ciated with the controller, each in its sub-channels, therefore, accrue either own load category. Furthermore, a con- application or operating system load. troller may be busy throughout the period Otherwise the cycled and idled loads of certain operations, while the asso- accumulated by the multiplexor are the ciated channel is cycling. An example same as those defined for its non-burst is a controller for a disk pack that is operation. busy through the time of arm movement. Over this time however, the attached chan- nels may be inactive and cycling. The Controllers

A controller for a single device or The Devices group of devices that is connected to a single selector channel defines its idled The distribution of loads for a peri- and cycled loads in exactly the same way pheral device follows the same pattern as as the channel. The application and that for the other equipments in the data operating system loads to be associated transfer chain between file and core with a controller are, of course, defined memory. As before, we will say that the during the periods of their respective device is idled if no media is mounted on activities. Thus, a controller is idled it, or if no jobs are currently present when all devices with which the controller in the system. The device is also idled is associated are either unmounted or con- if it holds no active files. tain no active files. It is, of course, idled when there are no jobs in the sys- When the device holds active files tem. The controller is active, but un- the load on the device is distributed used, when there are active but currently into application, operating system and quiescent files on its devices. In this cycled categories, according to the same case the controller is creating cycled criteria applied to the channels and load controllers

It should be noted that the con- troller and the channels to which it connects might not have identical load REFERENCE distributions. That is, the controller may connect several channels to a group Leo J. Cohen, System Performance Measure- of devices and therefore the sum of the ment and Analysis , Performance Develop- ment Corp., Trenton, N.J., May, 1973.

97

.

COMPUTER SCHEDULING I AN MVT ENVIRONMENT

Daniel A. Verbois

U.S. Army Mate; •iel Command

INTRODUCTION requirement, run time (calculated, estimated or observed), the earliest time the job can be Modern day business data processing systems have started and the latest allowable time for job changed dramatically over the past ten years. completion, percentage of CPU time, devices re- With the advent of third-generation machine quired to support the job, predecessor jobs that power, sophisticated data management and opera- must be completed prior to the beginning of each ting systems, and with more extensive use being job, number of disk tracks that will be required made of teleprocessing, the modern systems are to support transactions into and out of each job, impressive, indeed. But, along with these number of disk tracks that will be free at the advances, a problem has appeared that is signi- successful completion of the job and any special ficantly different from the early days of data conditions that must be considered, such as: processing. That problem is efficient and restrictions on updating the same file by two effective machine scheduling. The difficulty programs simultaneously or the physical mounting lies in the fact that it is not a simple matter of a given file before job initiation. The job to translate human requirements and human thought profile can be established at any level preferred. processes into complex machine requirements in For jobs that will be processed on computer such a way so as to realize the maximum utiliza- systems running under an MFT (Multiprogramming tion of expensive machine resources. The problem with a Fixed number of Tasks) operating system, it is most often solved by simply allowing the compu- is sufficient to define the profile at the job ter to accept jobs on a first-come, first-serve level as opposed to the step level (jobs are basis or by utilizing an unsophisticated software comprised of one or more job steps which are job selection procedure. An example of the defined as that unit of work associated with one former is the practice of loading the job queue processing program) . For those jobs that are to with the jobs that are to be executed over a be processed on machines which operate under an given period of time and executing these jobs in MVT (Multiprogramming with a variable number of a first-in, first-out manner. An example of the Tasks) operating system, it may be convenient to latter is utilization of a feature of the IBM define the profile at the step level. In either 360/Operating System (OS) termed HASP (Houston case, the job profiles can be collected in such a Automatic Spooling & Priority) . This feature is manner that one or more jobs may be scheduled by primarily used in those installations that pro- using an indexing technique. For example, suppose cess large quantities of relatively independent a given application. Weekly Billing, comprises jobs; it is unsuitable for long running mutually five separate jobs. In building the APS data base dependent processes that access integrated files. the five jobs would be individually described, The intent of the following discussion is not to each with a separate name. In addition, the five pursue the pros and cons of the various schedul- job profiles could be "collected" under a separate ing methodologies; rather, it is to describe a name to facilitate their manipulation in a remote program which helps greatly to eliminate many of environment the problems associated with scheduling of the AMC Commodity Command Standard System (CCSS). Jobs may be scheduled for completion in any period up to 24 hours in length by entering the job names AUTOMATED PRODUCTION SCHEDULER DESCRIPTION (or collection names) into the computer via a

terminal (see Appendix B) . The APS program will The Automated Production Scheduler (APS) is a mix and match the job profiles in such a way so as COBOL program incorporating many of the basic to produce the optimum sequence of execution. principles of PERT (Program Evaluation and Review Those jobs that cannot be scheduled within the Technique) . The scheduler applies these princi- time frame established are indicated on a separate ples to descriptions of computer jobs to evaluate APS report. the relationship between a variety of activities that compete for resources and time. A computer APS REPORTS job in IBM terminology is defined as "a total processing application comprising one or more Several reports are provided by the scheduler. related processing programs, such as a weekly Each report is optional in the sense that, when payroll, a day's business transactions or the operating the terminal in interactive mode, the reduction of a collection of test date." There- opportunity is provided to print or suppress fore, in order to apply the principles of PERT, printing of each report. The basic reports are: it is necessary to describe the workload for the computer in terms of computer jobs, i.e., job -Status Report (Appendix C) . This report is profiles must be constructed. essentially an extract from the APS data base and reflects major parameters assigned to describe Within the APS data base, each job to be each job selected for scheduling. The option is scheduled is identified by a name that is provided to change any parameter for any job on compatible with the IBM job control language either a temporary (one time usage) or permanent naming conventions (See Appendix A) . For each (physical data base change) basis. name, important operational and scheduling parameters are provided. These are: £ore

99 changes in processing -Unscheduled Runs (Appendix D) . As previously priority tend to disrupt indicated, all jobs that cannot be scheduled with- the best of schedules. Operational environments in the time frame or hardware configuration that are basically unstable may find that pre- provided will be shown on a separate report. scheduling has limited value in defining optimum job sequence. This is because of the bias thrown

-Release Sequence (Appendix E) . This report in by unpredictable run times. Although this is lists the optimum sequence of job release along a disadvantage, it can be compensated for in part with estimated start time (SETUP) estimated time by the workload simulation aspect and in part by of completion (REMOVE) and jobs that must complete the ease in which rescheduling can be accomplished prior to release of a given job (PRED) . A message APS retains the last schedule produced so that it is printed at the end of this report which is possible at any time to enter, through the indicates the degree of multiprogramming that can terminal, the specific conditions at a point in be expected. time and obtain a new schedule in a few minutes (3-5).

-Resource Map (Appendix F) . A listing of resources remaining for each time increment is CURRENT APS USAGE provided by this report, along with a theoretical multiprogramming map. Column headings refer to The USAMC Automated Logistics Management Systems resources remaining available after assignment Agency presently utilizes the APS system in two of those required to support the scheduled jobs. ways. The first way was addressed briefly above; i.e., used as a pre-scheduling planning tool to TERMINAL CONSIDERATIONS insure preproduction considerations are honored. A second and growing use of the scheduler is in

There are a number of commercial scheduling the area of "simulation" (Appendix G) . In this systems available in the market-place that will latter role the scheduler plays an increasingly perform in a manner similar to APS; however, APS important role in the area of equipment configur- is believed to be the only scheduling system that ation predictions. These predictions are made in operates from a terminal. The primary advantage the following manner: in having capability to operate the scheduler from a terminal is the ability to dynamically -The workload requirements of a given program reschedule on an as-required basis without are described in terms of transaction type and interruption of the host computer. volume, disk storage requirements and master file activity estimates. These basic statistics are Before discussing the scheduling/rescheduling utilized to produce program run time estimates. aspect of APS, it would be beneficial to discuss general advantages and disadvantages of auto- -The run time estimates are entered into an matic scheduling. First, it is perhaps APS data base along with other job profile para- unfortunate that the term "scheduler" is used meters described earlier. Model schedules are to describe the subject system. A more accurate then produced which attempt to show a theoretical term would be "pre-scheduler" because, basically, sequence of work that can occur on a given equip- that is precisely what happens. By utilizing ment configuration. automated scheduling (pre-scheduling) we can determine the best job start sequence to most -The reports produced by the scheduler are effectively utilize equipment resources. Auto- analyzed to determine whether additional hard- mated scheduling will also allow determination of ware is required or too much hardware is provided the best time to start "one-time job" after a for an estimated workload at a given location. normal production day has been scheduled. This For example, if the estimated workload cannot be is readily accomplished by examining the resource accommodated on a given configuration, the sched- map to choose the time frame wherein resources uler reports will show all jobs that can be run are available for "one-time jobs." If described and will further list the jobs that cannot be in the APS data base, it can automatically be scheduled under the constraints provided. A fitted into the current schedule by simply quick analysis of the output reports will reveal entering its name through the terminal, thereby that, perhaps the reasons scheduling cannot occur providing the dynamic rescheduling capability. are a limitation or inavailability of some system resource (i.e., core, tape drive, disk drives, A secondary benefit of automatically pre- etc.). The overall effect of the "simulation" scheduling a periodic computer workload is the is a quick look at the impact of additional work- forced attention to detail necessary to properly load on a given equipment configuration. This utilize APS. All too frequently a given work methodology has proven to be extremely accurate period is scheduled in a most cursory manner. By and effective to date with the qualification identifying the requirement to the scheduler, that, as with a "normal simulator", the results more assurance is given that the proper homework are only as accurate as the input provided. is done before a job is entered into the computer for execution. For example, one of the uses made APS DATA BASE CONSIDERATIONS of the APS system is to simulate a given scheduling period to determine whether more work As may be noted in Appendix G, a powerful has been planned than can actually be performed. feature of the automated scheduler is the capability to update the scheduling data base Of course, as with all planning, the real world with actual history data. Update is accomplished frequently will not allow exact execution. Un- by using data generated by the System Management expected job failures, equipment failures or Facilities (SMF) feature of the IBM OS. This

ioo . .

APS AVAILABILITY feature automatically records operational para- meters for each job and job step executed. It The APS program described in this paper is a provides all of the basic information included in proprietary package of Value Computing, Inc. the APS data base except the predecessor con- of Cherry Hill, New Jersey. In October 1972, siderations. Not only is the capability to up- the USAMC ALMSA acquired a license for this date the data base to reflect actual history on package. Provisions in the license include the a recurring basis important, but in addition, use of the package throughout the Department of those installations that must build the job pro- the Army without additional charge. file from scratch will find the SMF data a valuable source of information. Since SMF is the The program as it stands under the licensing source of the machine utilization accounting and arrangement was designed around the requirements billing reports, a secondary benefit is obtained. of the cess. In this respect, it is built to The APS data base can be used to serve not only operate under the IBM MVT operating system the scheduler, but also as the repository of Release 20.6 and higher. The access method utilization statistics specified is TCAM (telecommunications access method)

101 CN CM now< < O

< « W I I 2 ^ M M W iH W Bh W tH r- w PS w PS W .H W PS W t-H W PS M « O 3 W « P w p w Q O 3 W O O O- H P O O- H O O CD- H P R C H cn m w w w cq CQ W U U cn CP W CQ W W W CO O K P2 OS < O « M P5 < O PS O 03 pd P3 < t-3 fi< D-i ;s •-3 CL, fL, ^ -1 P^ p-1 >-> Oh Oj S

H CM J

I P 0f ui-I > Mz P 1 \ W iJ O CJ M > w > p p o I ^ I ij I o

O -H

102 cn I Q

a

r-1 t ?s X X ^ ^ ^ PQ ca w

trm

1 o '-I H Q O Pi M O § X PL, •H W C3 O t3 pci P-l ^ C o

0) in H tH H•H o HI

-u oin l-l n) CO 4J o w CO o

(1>

XI X X X ca Q ca o Q in

103 w w w •H O a Q d w a. u p. o

J CO to w « w w o 3C O o ai CO cu CO J

X X X! AX O CQ U a lO iri

I

104 lOS 106 PERFORMANCE EVALUATION OF TIME SHARING SYSTEMS

T. W. Potter

Bell Telephone Labs

Introduction

One of the most severe problems we face today in a new in- house time sharing system is that it becomes overloaded and degraded overnight. If It were possible to determine under Figure 1 what load conditions time-sharing performance degrades Concurrent beyond acceptance then one could estimate when to Users upgrade. More important than knowing when to upgrade 80 is knowing what to upgrade in order to obtain the most cost-effective performance possible. This paper presents techniques that have helped to solve these problems. It begins by describing methods of analyzing how the user is using the time-sharing system. Through these methods one can create reasonable synthetic jobs that represent the user load. These synthetic jobs can be used in benchmarks under elaborate experimental design to obtain response curves. Jan April July Oct Jan The response curves will aid in determining under what load conditions the time-sharing system will degrade beyond acceptance and the most cost-effective method of upgrading. This information together with a reasonable growth study, Response Figure 2 which is also discussed, will keep the time sharing system Time (sec) from severe performance degradation. 20

The response data obtained from benchmarks can be used to create an analytical model of the time-sharing system as well as validating other models. Such models and their usefulness will be discussed.

Some new in-house computer systems can provide batch, remote batch, transaction processing and time-sharing ser- vices simultaneously. These sophisticated systems do not lend themselves properly to performance analysis. There- No. of concurrent fore, the task of keeping the time-sharing system tuned to Users maximum performance or minimum response time is dif- ficult. In order to standardize more stringently the tech- niques available to keep time-sharing response minimal we Suppose that the curve in Figure 2 is based upon a system must separate time-sharing from the other environments or with processor P-j, memory M-j and disk D-]. Then we re- dimensions and present the resulting techniques in this benchmark using the same processor P-] and memory Mi but paper. These techniques can then be extended to the change disk Di to Now we obtain curves like Figure 3. multi-dimensional case.

Techniques Response Figure 3 Time (sec) Let us assume we are given the task of creating a five-year growth plan for an existing in-house time-sharing system.

If we know what kind of users are on the system and how they are using it we would know the usage characteristic or user profiles of the time-sharing system. Now given the estimated growth of time-sharing we could determine the systems reaction to these users.

1 1 1 1 Therefore if we know the growth (Figure 1) and we know how the time-sharing users are using the system at this time 40 60 80

(user profile) then by benchmarking we can determine the No. of concurrent system reaction at higher loads and obtain a response curve Users (Figure 2). This assumes of course we can define response of possi- time. If we re-benchmark using different configurations we We could re-benchmark again changing any number can then see the effect on performance of alternate con- ble hardware or software factors. Hardware factors are dis- figurations. cussed in this paper.

107 The hardware monitor can collect information relating to Let us assume that a hardware monitor has been connected the instruction mix which can be very helpful in reproducing to the processor(s), channel(s), and communication con- the processor requirement. The monitor can also collect troller or processor. This monitor should at least collect information relative to the terminal user in terms of utilization data to determine peak periods of resource average number of characters received/transmitted per user utilization. The monitor should also collect or sample the interaction. Since the monitor can also collect information instructions (op codes) executing within the processor(s). relative to the peak periods of utilization we can determine The monitor could also collect terminal related data by when to study the log files. In order to guarantee per- sampling a register within the communication processor formance one needs to reproduce the peak period loads and which contains information on each terminal I/O. This therefore information on non-peak periods need not be information could consist of the terminal number; whether used in most cases. the terminal was presently sending or receiving data; and the number of characters transferred. The terminal data Typically the log files give information relative to average could also be collected via a software monitor. Now let us memory requirement, average processor time and average assume that there is only one synthetic job which will disk and terminal I/O. Most log files account for the time- represent the entire load. What might we observe as typical sharing processor time, disk and terminal I/O and number of hardware monitor data? commands or languages requested. For example, if Fortran was used then those resources used by Fortran during the In creating the synthetic job we need to determine not only sample period would be recorded. Typically by studying the amount of processor time used but also the type of the commands or languages we can find a small number of instructions executed (scientific or data processing). We them that account for the majority of the resource utiliza- might observe the following hardware monitor data for in- tion. A user survey is very helpful in determining the most struction mix. frequent programming languages, the program size in arrays versus code, and the types and sizes of disk files. The OP Code Usage in Percentage console log usually gives an indication of the number of concurrent users during different periods for an entire day. OP Code Day 1 Day 2 Day 3 Console log information can be used to make sure that the time-sharing system is relatively consistent from day to day. Load 29.6 28.7 30.7 This can be done by graphing the maximum number of concurrent user during each hour of a day and comparing the Store 19.7 19.9 19.4 daily graphs. (See Figure 4C). If large fluctuations do occur then one must compare the user profiles that are created Compare 13.5 10.0 12.1 during the different peak periods and determine which profile to use as a pessimistic representation of the user load Transfer 25.5 24.0 24.9 (a harder load on the system). Arith 7.8 7.9 7.1

Number MISC. 3.9 9.5 5.9 Concurrent Figure 4C Users This data might indicate to the analyst that the users are data processing oriented as opposed to scientific.

Schematically, we have used the monitor to collect data Day 1 on the instruction mix:

Hardware Monitor Day 2

1 1 1 1 1 1 I 1 8 9 10 11 12 13 14 15 16 17 Instruction TIME Mix

These four major tools (monitors, log file, surveys and Now by obtaining terminal data we could collect the fol- console logs) now allow us to create one or more synthetic lowing distributions on each terminal: jobs that reasonably represent the user load. A synthetic job is like any other job in that it requires a programming Percent language, processor time, disk and terminal I/O, and memory of requirements. If more than one programming language has Input large usage then more than one synthetic job is required. Let us work through an example on how these tools are used to create synthetic jobs. Data contained within this 10 20 30 40 50 example are purely hypothetical. Characters per input on Terminal 1

108 Let us assume that disk D2 produces the desired minimum From Figure 1 we see that the additional module of memory response time. Response time is defined as the time it takes should be on the system by January. the system to respond with the first printable character after the user returns the carriage on his terminal. We shall This process of benchmarking and considering alternative assume that the system was originally engineered for configurations can be continued over and over until a reason- minimum response time of say 2.5 seconds. This assumption able five-year plan takes shape. creates a number of unanswered questions. What expenses are associated with the best response time as compared to Therefore given the usage characteristics of the present

'good' response time (2.5 seconds vs. 5 seconds)? At what in-house time-sharing users and the grovrth estimates we response time does the user become irritated? The answers can find the systems reaction and produce a rgowth plan. to these questions might allow the system to be engineered in a more cost-effective manner. This can be represented schematically as:

Let us assume that an average response of ten (10) seconds or greater irritates the users. Our goal then in this five-year User growth plan is to keep average response time below ten seconds. Note that human factor methods must be utilized Profile in order to obtain the unknown irritation level.

This irritation level would indicate in Figure 4A that at 45 Growth terminals the configuration needs upgraded possibly from Estimate disk D-) to disk This decision to upgrade is based only on performance. One could base a decision to upgrade on cost-performance, in other words it might be cheaper to Find accept a penalty in dollars for responses beyond 10 seconds System up to say 15 seconds before replacing the D-j with a costly Reaction disk This assumes of course one can attach a price to degradation.

Response Figure 4A The major limitation to such an ideal scheme as outlined Time above is that we usually are not given the usage characteris- tics or the growth estimates.

Therefore, we must determine the user profile and growth estimate and then proceed finding the systems reaction.

User Profile

The user profile or characteristic can be obtained via hardware monitors, log files, survey data and console logs. also used but are not No. of concurrent Note that software monitors can be in the Users addressed in this paper. These tools interrelate following manner:

Now given the magic number of 45 (the point to upgrade).

Figure 1 indicates that the disk D2 should be on the system in October.

Let us assume that the in-house time-sharing system will perform adequately with the new disk D2 until the growth exceeds 72 concurrent users (Figure 4A). In order to keep response time below ten seconds when the number of users exceed 72 we must reconfigure the hardware and re-bench- mark. We might find that an additional module of memory M2 provides the needed response time as in Figure 4B.

Response Figure 4B Time 20

40 60 80 100 No. of concurrent Users

109 On some log files the average program size for each language processor can be obtained while on other log files only the Percent overall average program size exists for time-sharing jobs. of If Fortran and Basic are the most frequently used languages Output then an average size Fortran and Basic job needs to be created as synthetic jobs. Let us assume the log files indicate

_L _L that the average Fortran programs object code is 12K words 50 100 150 200 (K=1024) on some machine and the average BASIC pro-

Characters per output gram's object code is 4K words. We might find that Fortran on Ternninal X is executed n times during the peak period and Basic is

These distributions could be summarized as follows: executed m times according to the log file. With this information and the number of disk l/O's performed per Avg. Avg. execution of Fortran or Basic we might find that Fortran Input Percent In Term No. Output Percent Out has 30 disk I/O and Basic has 50 disk l/O's on the average. 18 100 3.58 4.97 If possible we should separate disk l/O's into input and output. 29 76 2.93 6.11 In time-sharing we often find that program execution only occurs a small percentage of the time (30%) whereas com- mand execution occurs most frequently (70%). Therefore,

we must study all the possible commands (old, new, list, etc.) and select a subset which will account for a large percentage of resource utilization. In many time-sharing systems where the users are data processing oriented the

Total Input - 30% three commands: request a file; edit a file; and list a file, account for the majority of command related resources.

Total Output - 70% This, however, forces the analyst to create three synthetic jobs for the commands and one for each language used.

Weighted Avg Input - 30 characters Therefore, if Fortran and Basic are highly used then there

are two more synthetic jobs or five in all.

Weighted Avg Output - 90 characters Typically, the time-share commands process file and the Collecting this data over three or more days would allow the files can be reproduced. For example listing a file by the analyst to conclude that the terminal user is data processing LIST command can be reproduced by obtaining an average or report writing at his terminal. file size (to recreate the terminal and disk I/O) and the

average processing time required to list the file. Each

In order to obtain detailed data for the synthetic program command is studied in order to reproduce it as closely as we need to determine the peak periods of resource utiliza- possible. Let us assume that five synthetic jobs are created tion. Here we might find that the peak period is from 2:00 which are data processing oriented then the next question is p.m. to 3:00 p.m. each day. We could average the synthetic how are these five synthetic jobs mixed together? In order job characteristics over the sample days or create the job to determine how the synthetic jobs are mixed together we characteristics from the day where most resources were used. use the log files to determine how often each command or Assuming the latter case, we could then extract from the log language is executed. We sometimes see that these five files the peak period data. jobs account for 90% of all system resources but are only

referenced 40% of the time. If we assume that these five We need to determine how large the synthetic program are the only jobs on the system and adjust the 40% figure to should be and how much peripheral I/O is generated. Schem- 100%) we obtain the frequency of executing the various atically the log file and hardware monitor relate as follows: jobs. For example, we might find:

Hardware Basic 14% Monitor

Fortran - 10%

Instruction Terminal Peak List - 20% Mix Characteristic Period

Edit - 10%

Log File Old - 46%

This percentage can represent the steady state probability of being in any one synthetic job and as such represents this job mix. This assumes we can define steady state. From a hardware monitor and the log files we can reasonably create synthetic jobs and determine their mix. There are

110 still a few unanswered questions like what kind of files do different problem. This is the problem of predicting an the programs use (random or sequential)? This question initial cutover load on a new time-sharing system when all and others can be reasonably answered via the user survey. users are outside on commercial service bureaus. Since we The survey helps to fill gaps with synthetic jobs and should assume the in-house system already exists we will present the be designed to find out what the user thinks his program methods required to determine the above regulatory per- does. centage.

The console log allows the analyst to determine how consis- First, obtain past bills for time-sharing services charged to tent the users are from day to day and how often the system your company. Some of these bills will contain information goes down. The later item is important because a false peak on log-on/log-off time by user-id. Other bills will only have period is usually observed when a system regains conscious- dollar resources used and each billing unit will be different. ness and as such should be ignored.

If we were to graph the number of connects on each (if After creating the synthetic jobs by studying the usage possible) of the time-sharing service bureaus by days within characteristics we now must determine the anticipated each month (Figure 5) we can obtain peak days within each growth. month. A connect is where a user logs onto the time-sharing system. Growth Study Figure 5 Total Number An in-house time-sharing system grows because: new appli- Of Connects cations are created on the system; more users become 100 familiar with the in-house system; and outside users (com- mercial service bureaus) are requested by company policy lb — to move in-house whenever possible. In the first two cases the anticipated growrth of time-sharing is directly 50 — related to the amount of money in the budget. Actually we assume that increased budget money means increased usage 25 but we do not actually know whether a $2000/month in- crease equals 10 concurrent users or 2 concurrent users. For this reason we must collect past budget estimates and relate 1 2 3 4 5 . . . them to past usage. For example we might find a 10% August increase in dollar usage in the past year (hopefully budgeted) and a real time-share growth like Figure 4D which might Vendor X indicate a 20% growth in concurrent users. Now by observing, which day in the month was the peak day we can graph the total number of connects by hours on that Figure 4D day (Figure 6). Concurrent Users 100

Total Figure 6 No. of Connects

20 —

15 — J FMAMJ J ASOND Months 10 -

Now our anticipated growth without service bureau users 5 — moving inside is estimated at 20%. What happens to this anticipated growth if outside users are requested or desire to I move inside? 10 11 12 13 14 15 16 17

If this in-house system is regulated in such a manner that the Hours - Busy day - August outside users can only move in-house by going through desig- Vendor X nated channels then a regulatory body can control the per- centage of outside users to move inside. This percentage would be based upon the number of concurrent users from your company on each of the service bureaus. Once this percentage has been obtained then the regulatory body can The hour given on Figure 6 with the largest number of con- limit the number of outside users to move inside or plan the nects should be studied in more detail. If we are given the move over a period of time. The techniques used to obtain log-on time and elapsed time we can compute the log-off the above percentage can be used to solve an entirely time and create the following time diagram:

111 must be scheduled according to some predetermined plan. This plan might indicate that two concurrent users from vendor X-], one from vendor X2 and so on can be moved

inside per month according to available manpower. If all time zones are the same we can assume that our in-house 10:45 log-on peak number of concurrent users will increase by the two 2 11:00 1 concurrent users from vendor X^, one from vendor X2, etc. which move 11:15 users inside per month. The number of concurrent users moving inside now allows us to determine the regulatory percentage 11:30 discussed earlier as say 20%. We now conclude our growth 11:45 estimate with the following growth chart:

12:00 log-off 12:15 Concurrent Users

100 Inside & 80 outside growth 60 y 40 This now says that for the month of August we could see as Inside growth many as six concurrent users from your company on this 20 only at system one time. J L J L I I I F M A M J J A S 0 N D We can obtain reasonable estimate of current outside growth Months like Figure 7 for each vendor by studying their billing data over a four to six month period.

This chart indicates that if the outside growth moves Number of inside and the inside system observes some regular increase Concurrent then the system soon gets out of hand. This could mean that Users the schedule of moving outside users inside namely the

regulatory percentage of 20 is too progressive.

One of the main reasons for discussing outside time-sharing

usage is that a company which has inside time-sharing probably has some outside usage. This outside usage usually grows as does the inside usage since more computer users seem to be turning to time-sharing. For this reason any growth plan for in-house time-sharing would be incomplete Jan without some consideration of outside users migrating inside.

The growth procedure discussed above will give us a reasonable estimate of growth in the immediate future.

However what happens in the long term is never known. This might indicate that your company has an estimated Setting up this procedure will allow you to follow your outside growth of 25% on vendor X. growth and at least estimate out at three to six month intervals. We could now tabulate the outside growth in the following manner: System Reaction

Present Predicted We can find the systems reaction by setting up experiments Service Bureau Concurrent Users Growth on a controlled machine and collecting performance data like response time. Experiments to collect response time data Xi 5 25% can be set up using a terminal emulator which makes another computer look like 'n' terminals. This of course assumes that X2 10 10% we can recreate the load via synthetic jobs in a mix (20%

job 1, 10% job 2, etc.). For example we can start 20terminals executing jobs for say one hour during which we can force synthetic job one to execute 20% of the time.

Xp 7 1 5% The terminal emulator is designed to time stamp and store

all data sent or received onto a magnetic tape for later

If the regulatory group (limits outside to inside moves) has analysis. During the one hour period to run 20 termianis we two people to help move the outside users inside then they must allow the system to stabilize. This stabilization can be

U2 defined in a number of ways, however I find it convenient do we determine how much better disk D2 is than disk in data analysis to run the system long enough for the jobs Di in Figure 3? One way is to eye-ball the curves and esti- to reach a minimum variance on their individual response mate, but this does not account for the variance on each times. For example synthetic job one has a mean response point on the two curves. For example, we will not find at time and a variance. Once the system is stable the variance 40 terminals on disk D2 a response of exactly 4.5 seconds. on response time for synthetic job one is minimal. The time We will find a mean of 4.5 plus or minus some deviation. required to reach minimal variance for synthetic job one Since some variance exists for 40 terminals of D2 some vari- will differ from the other synthetic jobs assuming of course ance also exists for 40 terminals of disk D-]. Could this steady state exists. Note of course the difficulty in observing Indicate that there is no difference between disk Di and a running variance and deciding on steady state. 2 at 40 terminals? It depends on the variance of the two means typically referred to as statistical significance testing.

It is extremely hard to analyze the effect of five synthetic How do we take into account the variance on D2 at 40 jobs and 20 terminals with a measurement criteria like terminals and at the same time 60 terminals since the response time unless one analyzes each synthetic jobs variances will also differ? One way is to create an adequate response time as a function of the mix. The raw data should statistical experimental design for the benchmark so that a always accompany the analysis in the following manner: regression analysis or analysis of variance can be performed

on the data . The advantage of regression analysis is that

Synthetic Job No. 1 it can relate the difference between the two curves in terms Observation of additional terminals at the same response. For example, disk D2 in Figure 3 may give 40 more terminals at the same 2 3 4 N Terminal response time than the disk D-]. The 40 more terminal figure 1 Ml is plus or minus say 2 terminals. This is very useful informa- 2 tion because if disk D2 cost $100 more than disk D^ 3 per month then it cost $2.50/terminal to replace disk Measurement Criteria D^ at 40 terminals with D2. Comparing the S2.50/terminal for the disk to say $5.00/terminal for a memory increase 20 we would find the disk enhancement to be more cost-

effective. This regression analysis can be very useful in

comparing curves but it completely depends on the bench-

mark and how well the experiment was designed. I recom- mend reviewing your design with statisticians prior to This table of raw data indicates that we observed the running the benchmark. Therefore by creating a user load, a response time or measurement criteria for synthetic job one growth estimate and an adequate experimental design we can on each terminal a number of times. Keep in mind that at make a reasonable growth plan for a time-sharing system. the same time synthetic job one is running it is possible for This procedure for a growth plan is repetitive and must be the other jobs to be running. so in order to be reasonably approximate.

Let us assume we have determined a growth estimate and a The techniques presented in this paper to recreate the load, user profile. By benchmarking alternate configurations estimate growth and find the system's reaction were not with varying number of concurrent users up to the maximum designed to be totally encompassing primarily because the predicted by the growth estimate we can obtain various methods of data analysis and decision making in performance response curves. These curves will then be useful in evaluation are barbaric. determining when to upgrade. Since the growth estimate may only be reliable from three to six months in advance The remainder of this paper addresses the real life problems then these curves should be very accurate for that period. associated with using response time as a measurement of Now if we can be assured that as the growth increases the time-sharing performance. In this we shall assume that the user profile stays reasonably the same then the curves can be load can be reproduced via synthetic jobs and accurate useful throughout the entire growth period. growth forecasts can be made. With these assumptions we can design experiments in order to collect response curves One should not benchmark every time the user profile or like Figure 3. A terminal emulator or load driver would be growth estimate changes. Typically a by-product of bench- used to implement these experiments. The real problems are marking will be an awareness of how sensitive the response in the benchmark results. Let us study the curves in Figure curves are to certain factors (e.g., processor requirement). 3. Those factors that make up the synthetic jobs should be tracked or followed on the in-house time-sharing system so First give response time a simple definition. Create a as to observe any great change. This usually can be done via synthetic job written in Fortran with a 'DO' loop in which a a log file analysis program. If the user profile changes terminal read statement is placed. In program execution greatly and there is not a clear cut alternative for a system data is requested from the user; the user inputs his data and upgrade then re-benchmarking is usually required. There- hits a carriage return, then the program requests more data. fore by determining a user load and growth estimates we can This may be viewed on a time scale as: analyze the systems reaction and as a result upgrade our system based on cost-effectiveness. There are a number of problems not yet addressed in this paper, for instance if Request Response Request there are five synthetic jobs (three commands, two jobs) Data Input &CR Time Data Input &CR what is response time? Another interesting problem is how

113 Therefore response time is the time from a carriage return the mean response time at 40 terminals is a certain amount 'CR' to the time the system responds with a printable we could create an interval in which this mean would lie character. We know as a fact that if the 'DO' loop requests 95% of the time. The more you want to guarantee a ten entries of data from the terminal then we will have ten mean response time the wider the interval. Therefore in different response times. Now if this one terminal executes order to guarantee a difference between the curves in Figure the synthetic program over and over we will obtain yet 8 we must use both the mean and variance at each possible different response times. Ten different terminals all exe- point on the X-axis. We are now faced with comparing cuting separate copies of this synthetic program will also intervals. Now the problem of determining when to upgrade observe different response times. and what to upgrade is more complex. We could compare

alternative curves via regression analysis and if given the cost Three separate sources of variability exist: within the syn- of various alternatives we could create econometric models to thetic job; across executions of synthetic jobs within determine the most cost effective upgrade. In addition, if a terminals; and across terminals. Yet this is the simplest 95% guarantee is not needed then one might be willing to definition of response time possible. Now Figure 3 looks risk an 80% or 51% guarantee in which case risk analysis or like Figure 7 where with each mean response time plotted statistical decision theory becomes useful. Keep in mind we we have also plotted a window or variance around the have defined response time in the simplest way possible but mean. Therefore Figure 7 looks like Figure 8. it really doesn't represent the real world because a single synthetic job usually cannot recreate the load. What

then . . . multiple synthetic jobs? Figure 7 Response Time If we have more than one synthetic job or multiple synthetic jobs how is response time defined? If we are to recreate the real load on the system then these synthetic jobs must have different response times. For example one program may accept data, compute and then respond where as another may accept a command and immediately respond.

This source of variability between different jobs is usually so large that response time has no real meaning. One could create a graph like Figure 8 for each of the synthetic jobs. 20 40 60 80 Now the response at twenty terminals would mean that Concurrent Users when twenty terminals were connected to the system Response „ Figures running the mix of synthetic jobs we obtained a mean Time^. ^ response time of x for synthetic job number one with a variance of Y. We are now faced with the problem of determining with not one synthetic job but many synthetic jobs when to upgrade and what to upgrade. One might find

for synthetic job one that a new disk is required after 40 concurrent users whereas synthetic job two indicates more memory at 35 concurrent users. Well we could get around this problem by some cost trade-offs but we immediately 80 20 40 60 land into another pit. We assumed that the multiple Concurrent Users synthetic job recreated the load, today. What about tomorrow? Will our decisions be good if synthetic job one changes in its compute requirements? How good?

The problem is that we usually show our management how All these problems because we used response time as our different curve Di and D2 are from Figure 3, when in fact measurement criteria in time-sharing. We can't use processor the real difference between D-j and D2 is shown in figure 8. utilization as the measurement criteria so what now?

If we were to base a decision to upgrade on Figure 3 we Most of us will ignore variances and return to the simple could really make a mistake. Instead we should consider definition of response time with one synthetic job and

mean response time and variance. Curve D-] in Figure 8 is results like Figure 3. If this is the case then the technique now a very fat curve or interval. This interval comes about outlined for a time-sharing growth plan are quite adequate.

because we want to be sure that the mean at Y terminals is If we wish to use variances and multiple synthetic jobs then in fact within the interval a certain percent of the time. we must first decide how to use the data. Then we can pro-

For example if we want to guarantee 95% of the time that ceed to make the growth plans more rigorous.

114 - A CASE STUDY IN MONITORING THE CDC 6700 A MULTI-PROGRAMMING, MULTI-PROCESSING, MULTI-MODE SYSTEM

Dennis H. Conti, Ph.D.

Naval Weapons Laboratory

ABSTRACT

The complexity of many present day computing systems has posed a special challenge to existing performance measurement techniques. With hardware architectures allowing for several independent processors, and with operating systems designed to support several classes of service in a multi- programming environment, the problem of measuring the performance of such systems becomes increasingly difficult. The architecture of the CDC 6700 poses such a challenge. With two CPU's and twenty peripheral processors (independent, simultaneously-executing processors), monitoring the CDC 6700 becomes an exceptionally difficult task. With the operating system supporting four modes of service - batch (local and remote), graphics, time-sharing, and real-time - all in a mult i -programming environment, the monitoring task becomes even more complex. This paper presents a case study of an on-going effort to monitor the CDC 6700. The goals, approach, and future plans of this monitoring effort are outlined, in addition to the benefits already accrued as a result of this study. Several software monitors used in the study are discussed, together with some proposed hardware monitoring configurations. The performance measurement study described here has proved to be an extremely worthwhile venture, not only in terms of its direct impact on improved system performance, but also in terms of "spin-off" benefits to other areas (benchmark construction, measurement of operator impact, feedback to on-site analysts and customer engineers).

I, Background

As a large R&D center with requirements for real-time applications. batch, interactive, and real-time computing, the computational needs of the Naval Weapons Labora- Up to fifteen jobs may be active at one time. tory (NWL) are particularly demanding. A CDC Each active job is said to reside at a "control 6700 computer with a modified SCOPE 3.3 operating point" and may be in one of five stages of system is currently in use to support these needs. execution (executing with one of the CPU's, wait- In order to fully appreciate the complexity of ing for a CPU, waiting for some PP activity to monitoring such a system, a brief description of complete, waiting for an operator action, or its hardware and software architecture is in order. rolled out). Both CPU's may never be assigned to the same control point at the same time (i.e., The CDC 6700 consists of two CPU's (CPU-A the operating system does not support parallel- approximately three times faster than CPU-B) , and processing of a single job). twenty peripheral processors (PP's). The periph- eral processors are virtual machines with their The system monitor, MTR, resides in one of the own CPU and memory, operating independently of PP's and oversees the total operation of the system each other and the two CPU's. The PP's may (scheduling the CPU's, scheduling the other PP's, access both central memory and their own 4K of honoring CPU and PP requests, advancing the clock). core. Central memory consists of 131,000 60-bit As there are no hardware interrupts in the system, words. Twenty-four 841 disk drives, three 844 all requests for system resources are done through disk units, and two 6638 disks account for over MTR. Of the remaining nineteen PP's, some have 970 million characters of permanent file space fixed tasks assigned to them (e.g., one is dedi- and over 360 million characters of temporary cated to driving the operator's display), while scratch space. the others are available for performing a wide range of tasks (input-output, control-card pro- The modified SCOPE 3.3 operating system sup- cessing, job initiation). A PP program may re- ports four concurrent modes of service - batch side either on the system device or in central (local and remote), graphics, time-sharing, and memory. When an available PP is assigned a task real-time. The time-sharing subsystem, INTERCOM, by MIR, the relevant PP program is loaded into the provides both a file-editing and a remote batch PP memory and execution begins. Upon completion, capability. The graphics subsystem supports two the PP is again available for a system task. graphic terminals via a pair of CDC 1700 com- Clearly, the complexity of both the hardware puters. The real-time subsystem provides a and software architectures of the 6700 poses a hybrid computing capability in addition to other tremendous challenge to existing performance

115 ;

measurement techniques. In light of the "multi" finally, the number of absolute, CPU program (e.g., aspects of the system (multi-programming, multi- the FORTRAN compiler) loads. Due to limitations processing, multi-mode), not only is the acqui - in the size of PP memory, several Items recorded sition of monitoring data difficult, but its in the original ISA had to be deleted in order to interpretation is even more so. facilitate the above changes. In addition, the output medium of ISA data was changed from In spite of these seemingly overwhelming punched cards to mass storage files. difficulties, such a monitoring effort was under- Because several taken at NWL in the fall of 1972. The goals of critical items in the system could not be software this effort were to: monitored (e.g., memory conflicts), a feasibility study was undertaken to 1) determine any existing bottlenecks in employ a hardware monitor to capture some of the the system; data. Relevant probe points were acquired, with 2) provide a day-to-day "thermometer" with details of the following hardware monitoring which any abnormal aberrations could be experiments completed by 1 April 1973: detected; 1) CPU state; 3) aid in the planning of equipment con- 2) CPU idle vs. I-O activity; figurations ; 3) number of seeks, number of transfers, 4) aid in determining the need for and in the average seek time, average length of selection of new equipment. transfers for all mass storage devices; What follows is a description of: (1) the 4) CPU and PP wait time due to memory sequence of events leading up to the monitoring contention. effort; (2) the monitoring effort itself; and (3) the results and future plans of the effort. III. The Monitoring Effort

Full scale use of ISA began in May, 1973. The Pre-monitoring Analysis II. The following describes the data collection and analysis procedures of ISA as they were and still Prior to the monitoring effort, serious are being used. consideration was given to: (1) available mon- itoring tools; and (2) the system activities to ISA is automatically loaded Into a PP each be monitored. Several software monitors were time the system is brought up - whether the first considered. Due to its flexibility, scope, and time each day or after a system malfunction. The availability, the software monitor ISA written at system is normally up 24 hours a day, six days a Indiana University was chosen as the primary week with two hours a day allocated for software software monitoring tool. ISA resides in a maintenance and two hours for hardware maintenance. dedicated PP and samples various system tables Input parameters to ISA allow for n batches of and flags. Due to some basic design differences data to be dvraiped, where each batch represents m between the hardware and operating system at minutes worth of monitoring data. After some Indiana University with that at NWL, approximately experimentation, n and m were set to 24 and 50 three man-months of effort was required to imple- respectively - thereby minimizing the number of ment ISA and make its data collection and analysis dumps, but yet still maintaining a reasonable as automatic as possible. sampling period.

A second software monitor was written to A GRAB program is run once a day to collect the extract information from the system dayfile (a dumped batches of data and consolidate them detailed, running log of major system events). into one large file. An ANALIZE program then And finally, a third software monitor which reads this file and produces a listing of mean- records individual user CPU activity was acquired. ingful statistics. Some system activities recorded by ISA as a function of Termed SPY, this routine resides in a PP for the time are: duration of user's job, while continually a 1) percent of time Interactive users were "spying" on his control point. editing text vs. executing a program; 2) average and maximum central memory used by Choosing the type of system activities to interactive users; monitor was one of the more difficult parts of 3) central memory utilization; the pre-monitoring analysis. Without knowing a 4) CPU utilization (for both CPU-A and CPU-B) priori what the bottlenecks in the system were control point activity (e.g., percent of (if, indeed, any existed at alll), the choice of 5) time n control points were waiting for activities to monitor was based primarily on an a CPU); intuitive feeling (on the part of several systems percent of free space on each of the mass personnel) as to the major points of contention 6) in the system, backed up by some rough empirical storage devices; 1-0 data. Some of the activities to be monitored 7) average and maximum number of requests were determined in part by the particular mon- outstanding for each mass storage device; itors chosen. Other activities were found to be 8) PP activity; best monitored by hardware monitoring. 9) number of PP program loads; 10) number of absolute CPU program loads. Before software monitoring could take place, An additional option was subsequently added to changes were required of ISA and its associated ANALIZE to summarize all ISA data collected with- analysis program to record information on: two in a given hourly time frame between any two given CPU's; 131K of central memory; 29 mass storage dates. devices; additional control point activity; and

116 The dayflle analysis program was and still is order of 8 times a second). These most being run daily. Data collected by the dayfile frequently loaded PP programs were analyzer includes: turnaround time and load immediately made central memory resident, statistics for the various batch classes, fre- thereby significantly reducing traffic quency of tape read and write errors, abnormal on the channel to the system device, in system and user errors, frequency of recoverable addition to decreasing the PP wait time. and unrecoverable mass storage errors. 2. With the daily runs of ISA indicating the The SPY program was made available to the amount of free permanent file space, a general user population in December, 1972. means existed for assessing the permanent Although its use was originally limited to a few file space situation. When space appeared users, others immediately began to see its worth. to be running out, appropriate measures Requiring only two control cards to execute and were taken before space became so critical produce a CPU distribution map, SPY became that all operations ceased (as was some- extremely easy for the average programmer to use. times the case in the past),

After some initial problems with mating probe 3. Two situations arose which dramatically tips to probe points were overcome, several showed the merit of ISA as a feedback system activities (CPU state, memory contention) mechanism to on-site analysts and CE's. were successfully hardware monitored and their Immediately after the introduction of a respective probe points verified. This verifi- new mass storage driver, ISA data indicated cation procedure required two weekends of dedi- that a certain PP overlay was being loaded time. As cated machine a result of this feasi- on the average of 74 times a second - bility study, an effort was immediately undertaken almost as frequently as the other 400 PP to acquire a dedicated, on-site hardware monitor. routines combinedl The analyst who imple- At attempts to acquire the time of this writing, mented the driver was easily persuaded to such a monitor are still proceeding. incorporate the overlay into the main body of the driver, thereby eliminating unneces- sary loads. IV. Results On another occasion ISA data indicated The number and type of benefits accrued as a that one particular scratch device was direct result of the monitoring effort unquestion- being used half as much as its identical ably proved its worth. Some of these benefits counterpart. In discussing this situation were expected, while others were rather pleasant with one of the CE's, it was learned that surprises. he had installed a "slow speed" valve in The dayfile analysis program proved to be an the unit while waiting for a "high speed" especially worthwhile tool for detecting abnormal valve to be shipped. He had neglected hardware and software errors. Upon noticing such mentioning this action to any of the abnormalities, the personnel monitoring the data operations' personnel. would immediately notify the on-site analysts or 4. Several times throughout the monitoring CE's, who then undertook corrective actions. period the effect of operator interaction Each week the daily turnaround time and load on the system was clearly reflected in the statistics would be summarized and compared with ISA data. One specific example which data from previous weeks. A relative increase in occurred intermittently was an operator turnaround time and decrease in number of jobs error in configuring the scratch devices. run per day was usually directly proportional to Specifically, the operator was told to the number of system interruptions. These numbers 'OFF' a particular scratch device. How- were thus good indicators of the relative "health" ever, ISA data showed that scratch was of the system. being allocated to that device. The The SPY program did provide some dramatic operator was notified and corrective action results. One particular CPU-bound program, for was taken. example, was found to be spending over 25% of its 5. A execution time in the SIN and COS routines. few months into the monitoring effort three 844 disk drives were installed. With The hardware monitoring study demonstrated the a storage capacity and transfer rate feasibility of an expanded hardware monitoring approximately three times that of an 841, effort. In addition, it emphasized the need for their impact on system performance was the cooperation of local CE's in helping to anxiously awaited. Summary runs of ISA develop and verify probe points, and for an exten- data collected before and after the intro- sive pre-monitoring analysis period. duction of the 844' s showed, in fact, a 107o increase in CPU utilization. The most fruitful results of the monitoring effort undoubtedly came from the ISA software 6. With the introduction of the 844' s, a monitor. At least seven major benefits were method was also needed for determining directly attributable to ISA data collection: where to put the system device. Should it reside on an 841 and compete with seven other 841 's for channel use? Should it 1. After the first two weeks of running ISA, reside on dedicated with its faster it became apparent that a number of PP a 844 programs were continually being read from transfer rate, but compete with t-wo other 844 's portion disk and loaded into a PP (some on the and take up only a small of

117 available disk space (with the remaining a parametric study of system resources based on space on the pack wasted)? Or should it data collected from ISA; dynamic interaction reside on a 6638 \-7ith its dedicated between the various monitors and the system channel, but yet even more wasted space? resource allocators, A combination of the latter ISA has proved to be an invaluable tool in two efforts would effectively eliminate the human helping to make the above decision. Full from the data-collection —^ human-interpretation configuration testing using ISA is still —> system-modification sequence. The performance continuing at the time of this writing. monitors, observing the state of the machine at time t, could appropriately set various system 7. Finally, ISA has been a useful tool in the parameters in order to optimize the allocation of a representative benchmark. In design of resources at time t + At. An effort is currently benchmarking the SCOPE 3.4 operating underway to employ this technique in order to system, a one-hour job mix was desired balance scratch allocation among the three dif- whose resource requirements matched as ferent types of mass storage devices. closely as possible those of a "typical" one-hour period of everyday operation. ISA data averaged over a random two-week VI. Conclusions period was used as a "standard". Next, a random mix of jobs was obtained from the This paper summarized the monitoring effort input queue and run with ISA monitoring .at the Naval Weapons Laboratory, the pre-monitor- the system's performance. Monitoring ing analysis required for this effort, and its data from this mix was then compared with resultant benefits and future plans. Three soft- the "standard". Noting that the random ware monitoring tools were discussed: a dayfile mix was too CPU bound, several synthetic analysis program, the SPY program, and the ISA I-O bound jobs were included. This new monitor. In addition, several hardware monitor- mix was run and its ISA data was compared ing configurations were proposed. Of all the with the "standard". This iteration monitoring tools used to date, the ISA software process continued until a mix was obtained monitor was shown to be the most rewarding. whose ISA data matched as closely as Several concrete benefits directly attributable possible that of the "standard". This to the use of ISA were discussed. final mix was then declared a "repre- sentative benchmark" and was used in the Results of the monitoring effort at NWL show benchmarking studies. that the previously stated goals have been achieved and that, for a minimum investment in m^n-effort, a computer system as complex as the V. Future Plans CDC 6700 can be reasonably measured with existing monitoring tools. In addition, the future incor- As the monitoring effort at contin- NWL is a poration of feedback from the monitors to the uing one, enhancements to the monitoring tools are operating system should make the system even more for: constantly being made. Future plans call responsive to the resource requirements of its full-scale hardware monitoring; implementation of dynamically changing workload. A machine capable a new, more flexible version of SPY; provisions of "learning" from its own internal monitors for graphical output from the ISA data analyzer; would indeed be an important innovation.

118 s . . .

FEDSIM STATUS REPORT

Michael F. Morris and Philip J. Kiviat

FEDSIM

FEDSIM formal sense.

FEDSlM's goals and the criteria under which it op- FEDERAL COMPUTER PERFORMANCE erates are aimed specifically at fulfilling its mis- sion. GOALS AND CRITERIA EVALUATION AND SIMULATION CENTER * Minimize cost of procuring/utilizing com- puter systems. Maximize productivity of computer systems--efficiency

is the Automation FEDSIM Data Agency's newest * Act on request of user agencies. Maintain and, by far, smallest center. Government-wide confidential FEDSIM-Customer relationship. interest in this center is evidenced by the fact * Provide responsive service. that FEDSIM' s full name was changed from the Federal ADP Simulation Center to the Federal * Cost recovery. Computer Performance Evaluation and Simulation * Provide centralized information/advisory service Center— a change made in response to a request from Congressman Jack Brooks to make FEDSIM' In projects involving proposed systems, FEDSIM name better fit its mission. strives to make the cost of both acquiring and of using the new computer as low as possible, while MISSION meeting all system requirements. This is done through detailed design testing using simulation Provide reimbursable technical assistance, and mathematical modeling to examine the system as proposed, and the system as it might be im- support, and services throughout the Federal proved --before committing resources to creation of any system. Examination of existing systems Government for simulation, analysis, and involves use of devices and techniques that make the fit of resources and demands clear, so that performance evaluation of computer systems. alternate methods may be proposed and tested that will enable the installed equipment to accom- plish more work in the same time or the same work in less time FEDSIM is unusual in many respects. It provides consultant services throughout the Federal Gov- FEDSIM responds to requests of government com- ernment to improve the performance of computer puter users; FEDSIM cannot go into an installation systems — both existing and proposed--on a fully without an invitation. To insure that FEDSIM will cost-recoverable basis. FEDSIM has no budget- help and not hurt its customers, a confidential ary impact on the USAF. All services provided relationship is maintained. This encourages the are charged for at a rate that enables FEDSIM to customer to give FEDSIM analysts all of the in- meet its payroll, rent, utility, and contracted formation about their problem(s) without fear that costs. FEDSlM's mission addresses a pressing any practices will be "reported" to higher auth- need within the government computer community. orities. As reports prepared are the property of This need has been so well recognized by com- the customer, they are not released by FEDSIM puter managers that FEDSIM is now in an "over- without prior written customer approval. sold" condition based almost entirely on customer demands without a "marketing" effort in any Because of the nature of FEDSlM's mission, cus-

119 tomers usually have a significant problem before ORGANIZATION & POLICY GUIDANCE they call for FEDSIM assistance. The FEDSIM goal has been to respond to every call as quickly Chief of Staff as possible. To do this without an extremely U.S. Air Force large staff requires that FEDSIM maintain con- FEDSIM Joint tracts with private companies with specialized Policy Comm. computer performance evaluation capabilities DOD; GSA; Commander Customer that can provide analyst support to meet peak NBS; USAF; AFDAA demands. Costs of this support are passed on to Other Federal customers at the prices paid by FEDSIM. Over- Users head costs are recovered out of the hourly rate charged for FEDSIM analysts. This insures that no government agency pays more for contractor AFDSDC AFDSC FEDSIM support through FEDSIM than this same support would cost if purchased commercially. Also in- cluded in FEDSIM' s overhead is the cost of pro- viding information and advice to Federal agencies on the availability, price, and usefulness of par- ticular performance evaluation techniques and Along with the establishment of FEDSIM, the Air products. Force organized its centralized computer support activities into the Air Force Data Automation MILESTONES Agency. In recognition of FEDSIM' s different mission, a Joint Policy Committee with members Sept. 1970 - GSA requests USAF to act as from Air Force, the Department of Defense, GSA, as Executive Agent. and the National Bureau of Standards was estab- lished. This committee sets FEDSIM policy, ap- May 1971 - USAF agrees. proves capital asset acquisitions, and resolves any priority disputes that may arise between 29 Feb. 1972 - FEDSIM activated. FEDSIM users, FEDSIM is allowed to interact directly with its customers so that no intermediate organizational barriers will be encountered that 1 July 1972 - Initial operational capability. might decrease FEDSIM' s responsiveness to cus- tomer demands. This availability of technical expertise on an "as needed" basis for improving the use of govern- ORGANIZATION ment computer systems was one of the reasons behind the General Services Administration's ef- L Commander fort to establish FEDSIM and other Federal Data Processing Centers. GSA recognized the need Technical Administration for special skills and experience necessary to Director provide computer performance evaluation se»rvices to all agencies of the government. Because of Program the record of achievement in this specialty with- Development in the Air Force, GSA asked that the Air Force operate FEDSIM. After examining the impact of this unusual offer made in late 19 70, the Air Applications Models Analysis Staff agreed in mid- 19 71 to assume the executive agent role for GSA. To insure that FEDSIM had the necessary authority, GSA made a full dele- gation of procurement authority for CPE products Internally, FEDSIM is organized by natural diff- to the USAF for FEDSIM to exercise. The initial erences in performance evaluation tools. The FEDSIM cadre started work towards establishing three operational divisions are each comprised of the needed CPE capability in February 19 72; its particular types of specialists: The Applications initial operational capability was achieved on Division deals with problems that tend to occur

1 July 19 72. This is a remarkably short time for at every computer installation--problems that can the establishment of a new government agency. be handled with package simulators capable of The earlist record of a suggestion that there be replicating most commercially available computers such an agency places the seed of the idea in running typical data processing workloads; the early 1969. This rapid step from idea to reality Models Division creates special simulation mod- suggests that the "time was ripe" for the birth els for analysis of unusual computer systems or of this type of organization. applications; the Analysis Division performs pro-

120 .

jects that require special data collection tech- niques and mathematical analysis. The staff PROJECTS CURRENT FOR : positions are few at FEDSIM. The Technical Director, aided by the Program Development Of- General Services Administration (3) ficer, insures that FEDSIM methods and products U.S. Coast Guard represent state-of-the-art use of available tech- U.S. Air Force (ACDB/DSC) nology, and that projects undertaken by FEDSIM Joint Technical Support Agency are within FEDSIM' s capability and jurisdiction. Department of Interior Administration performs the usual administrative Federal Communications Commission tasks and the unusual task of insuring that all U.S. Navy (Naval Weapons Laboratory) project-related time is properly billed to FEDSIM U.S. Army (Military Personnel Center) customers Department of Transportation Tennessee Valley Authority

RESOURCES At that time, FEDSIM had computer performance *Simulators * Hardware Monitors evaluation projects underway for 10 agencies that COMET (SCERT) Microsum would result in an income during FY 74 of over $1 CASE Dyna probe 7900 million. X-Ray

* Simulation Languages ^Software Monitors TYPICAL PROJECT OBJECTIVES ECSS SUPERMON SIMSCRIPT 11.5 PROGLOOK Technical Evaluation of Proposals GPSS CUE Equipment Reconfiguration PPE Program Performance Improvement Facility Study * Skilled People Computer Sizing/Measurement Analyst Computer System Design Programmers Present and Future Capacity Study Technicians Model Builders Statisticians FEDSIM' s projects have generally fallen into 7 areas: Technical evaluation of proposals. Equip- ment reconfiguration, Program performance improv- FEDSIM' s resources include most of the commer- ment. Facility study. Computer sizing and measure- cially available performance evaluation products. ment. System design, and Studies of existing and However, FEDSIM' s greatest resource is the future capacity. Hence, FEDSIM is addressing skilled people that use these products. problems of computer performance throughout the life cycle of computer systems from the examina- tion of vendors' proposals to projection of an in- PROJECTS COMPLETED FOR: stallation's future computer needs.

General Services Administration Atomic Energy Commission AGENCY RELATIONSHIPS Department of Commerce Internal Revenue Service GSA Social Security Administration (ADP FUND) U.S. Navy (Naval Technical Training Center) Civil Service Commission Project Operating Housing and Urban Development Reimburse- Expenses U.S. Postal Service ments National Bureau of Standards Capital Department of Labor Expenditures U.S. Air Force (Accounting & Finance Center)

Operating Through careful use of these resources, FEDSIM project Expenses had completed projects for 12 Federal Departments Customers Services FEDSIM 'Manpower USAF and agencies by the end of its first year in opera- tion.

121 . .

The relationships between FEDSIM and the cus- and periodic reports are made during the project tomer are usually the only visible activities asso- to inform the customer of project status and costs ciated with FEDSIM support to a customer. How- until the work is completed. Deliverables are ever, the full mechanism of relationships is shown presented on a schedule that is spelled out in the in this diagram. The GSA ADP Fund Serves as the agreement. FEDSIM has begun a follow-up pro- "financial accumulator" in this system. The Air cedure that involves contacting the customer 30 Force furnishes operating manpower and funding days after a project is completed to insure that all and charges these at cost to the GSA Fund on a information has been understood, and to give the monthly basis. GSA reimburses the Air Force each customer an opportunity to ask any questions that quarter. FEDSIM provides project services to its may have come up relative to the project. A sec- customers and reports all manhour and other costs ond follow-up approximately 180 days after a pro- by project to GSA. GSA bills these services to ject is completed is made to determine the value customer agencies to reimburse the GSA Fund. of the project to the customer, and to find out Capital expenditures for equipment are normally what actions resulted from FEDSIM' s recommen- billed directly to GSA for payment from the Fund. dations. This second contact is also intended to encourage any suitable additional projects that may To re-emphasize the simplicity of the FEDSIM- have come up in the interim. These follow-ups customer relationship, the procedure for obtaining serve to insure that the customer got what he ex- support normally starts by a telephone contact and pected, and aid FEDSIM's marketing effort. briefing on FEDSIM' s capability and discussion of the customer's problem. SERVICES

PROCEDURES *Computer performance evaluation consultant T services and technical assistance. Contact ? *Contractual assistance for purchase, lease, Preliminary Project Plan or use of computer performance evaluation Y products Signed Agreement T *Training in the application of computer per- Project formance measurement and evaluation Y techniques Deliverables Y Follow-up To summarize, FEDSIM provides service in com- puter performance evaluation that is specifically tailored to each individual customer's needs. These Based on this contact, FEDSIM works with the services are available for existing installation in customer to prepare a "preliminary project plan" the form of technical assistance of consultant ser- which spells out the problem, service and sup- vices and for the purchase or acquisition of CPE port, deliverables, and any special arrangements products. FEDSIM also provides training at both necessary to conduct the project. Once mutual the management and technical level to encourage agreement is reached on the plan, it is signed by wider use of computer performance evaluation tech- both FEDSIM and the customer and becomes an niques to improve computer usage throughout the "agreement." At this point, the project is begun Federal Government.

122 DATA ANALYSIS TECHNIQUES APPLIED TO PERFORMANCE MEASUREMENT DATA

G . P . Learmonth

Naval Postgraduate School

ABSTRACT

The methodology of system performance measure- of statistical analysis. ment and evaluation generally requires statistical analysis of data resulting from benchmarking, Present day statistical analysis provides two on-line measurement, or simulations. This broad areas which may be exploited for perform- paper will survey the pertinent topics of statis- ance evaluation. Classical statistical inference tical analysis as they relate to system perform- techniques require distributional assumptions ance measurement and evaluation. before data collection takes place. When the data is on hand, systematic application of statis- tical methods enable the testing of a broad range INTRODUCTION of hypotheses which may be specified. In the absence of a well-defined theory, this area of The purpose of this conference is to bring togeth- statistical methodology is somewhat limited in er people with the common interest of measuring application to performance measurement problems. and evaluating computer system performance. We Recent developments in the area of exploratory have been forced into measuring and evaluating data analysis, however, are extremely promising. computer system designs by the ever increasing In data analysis we are more concerned with in- complexity of both hardware and software systems. dication rather than inference. Data analysis A quick survey of documented performance studies techniques are very flexible and will enable us to reveals that nearly every study uses a different incorporate our experience and intuition into an approach, usually based on a mixture of experi- iterative process of analyzing performance data. ence and intuition. In order to make performance measurement and evaluation a viable discipline, This paper intends to survey some pertinent topics we need to replace experience with theory and in statistical analysis as they might apply to com- intuition with standard research methodology. puter system performance measurement and evaluation. The two divergent approaches to computer per- formance evaluation are the analytic modelling approach and the empirical measurement and BENCHMARKING: THE CONTROLLED EXPERIMENT evaluation approach. The first approach relies on mathematics Perhaps the most widely used method of perform- , queuing, theory, and probability theory to provide both the theoretical framework ance measurement and evaluation is benchmarking and the necessary methodology to perform an This form of experiment is employed to make rela- analysis. We are concerned here, however, with tive judgments concerning the performance of two the second approach. Empirical experiments lack or more hardware and/or software configurations. both a theory and a methodology and hence we The experiment is accomplished by running either find ourselves proceeding along tangents making a standard mix of programs or a synthetic mix little or no headway. through each configuration and comparing the results on several criteria. The theoretical framework of empirical perform- ance evaluations can be defined as soon as our A critical area in benchmarking is the selection collective experience is filtered and assembled of the job mix. The wrong mix will obviously into a cohesive body of knowledge. For a re- invalidate any conclusions made. A so-called search methodology, we may do as many other standard mix consists of a selection of real sciences, both physical and behavioral, have programs deem.ed to be representative of the done; that is, use the existing methods and tools installation's work load. A synthetic job mix

123 .

involves the use of a highly parameterized pro- interactive tasks. The controlled variables are gram which essentially does nothing constructive main store size and page replacement algorithm. but can be tuned to behave in any desired fashion The three different job mixes will influence the Any theory of performance evaluation must nec- response variable, namely, the page replacement essarily contain a precise definition of work, load rate. However, for this experiment we are not characterization concerned with this effect. In order to compensate for the extraneous variance induced by the job Setting aside the problem of proper workload mixes, the following design is adopted. characterization, we may concentrate on the problem of data collection. Pertinent system Assuming three levels of main store, 100, 150, performance data may be gathered externally via and 200 pages and three replacement algorithms, hardware monitors at no system overhead or in- first-in first-out (FIFO), least recently used ternally via software monitors with varying (LRU), and a predetermined optimal (BEST), we degrees of overhead. In most benchmarking ex- have nine combinations. We establish the design periments, copious data is recorded for later as follows: analysis. However, this "shotgun" approach to data collection is generally wasteful and un- FIFO LRU BEST necessary. In general, one carefully chosen

measure will suffice to answer any single ques- 100 Jl J2 J3 Jl = job mix 1 tion, for example, throughput, elapsed time, number of page faults, response time, etc. 150 J2 J3 Jl J2 = job mix 2

Typically, a benchmark experiment aims to 200 J3 Jl J2 J3 = job mix 3 measure and assess system performance under varying system parameters. These system para- meters are controllable by the experimenter and Rather than submit each job stream to each com- are set to the desired levels of comparison. bination of variables thereby performing 27 ex- After all combinations of system parameter periments, we assign each mix once and only settings are subjected to the benchmark, the once to each row and column. In this way each experimenter must make conclusions regarding level of independent variable is subjected to all the relative performance of the system under three job mixes and by virtue of the arrangement, these varying conditions. any external influence of the job mix has been balanced-out over the design. The description of benchmarking given above is conducive to formal statistical analysis under Analysis of variance techniques are available to the category of the Design of Experiments and compute the necessary test statistics. With Analysis of Variance. The properly designed these statistics the effect of different store sizes statistical experiment encompasses the proper may be tested independently of replacement choice of measured, or dependent variable; the algorithm. Similarly, replacement algorithm may settings of levels of the controlled, or independ- be tested independently of store size. Lastly, ent variables; the control of external influences; any interaction effect between store size and the analysis of the remaining variance; and the replacement algorithm may also be tested. testing of the hypotheses under consideration. Tests on the design of experiments and analysis The measured variable should be the one which of variance are available and go into greater will be directly affected by the choice of inde- detail than has been done here. See, for ex- pendent variables . Similarly, the independent ample. Kirk and Cox. For a very thorough variables should be set at levels which will en- designed experiment in performance analysis able the effect of alteration to be readily deter- see Tsao, Comeau, and Margolin and Tsao and mined . Control of external variation should be Margolin. accounted for in the experimental design in order to balance out these effects which may unknow- This classical statistical inference technique ingly influence the results. For example, con- can be seen to be quite useful. It suffers, sider the problem of determining the effect of however, from the lack of an underlying theory main store size and page replacement algorithm in performance evaluation, particularly in work- on the page replacement rate. Three different load characterization and in the necessary dis- benchmark job mixes are to be used. One mix tributional assumptions associated with the represents, say, batch jobs running in the response variable. However, in the data background, another represents interactive user analysis sense, where the experiment might lack tasks, and the third, a mixture of both batch and inferential value, it certainly provides great

124 . .

indicative value. The results of an analysis as and weekends. described above may turn out to be inconclusive, but most certainly they will indicate the direc - While a formal analysis of variance was not per- tion in which further analysis should be taken. formed, the authors did rely on classical as well By systematic and orderly design of experiments, as data analysis techniques to guide their research much unnecessary and fruitless labor can be effort. avoided A second way of treating collected on-line measure- ments is to use the data to uncover interrelation- ON-LINE MEASUREMENT ships among system parameters. In this way, the impact of changes in the system may be predicted Benchmarking is widely used but in many in - based on knowledge of these interrelationships. stances an installation cannot afford the time and resources to perform stand-alone experiments A statistical method which can be applied in this like those described above. An equally popular area of evaluation is regression analysis. As an measurement technique involves the on-line coll- inferential tool, regression analysis does impose ection of system data through either hardware or certain assumptions beforehand. However, as in software monitors. Vast amounts of data are the statistically designed experiment, these prior collected and stored on tape or disk during normal assumptions may not be strictly met, but we still operating hours. use regression as an indicative tool.

The stored data is usually treated in two differ - In general, regression analysis attempts to form a ent ways. First, data is edited and reduced for functional relationship between a response, or de- simple description and presentation. Most pendent variable and a set of one or more independ- everyone has seen tabulated data from computer ent variables. The function is most often taken to centers listing the number of batch jobs, number be a linear one, thereby keeping the mathematics of terminal sessions, breakdown of programs by relatively straight- forward compiler, etc. Elementary statistics such as means, standard deviations, and ranges usually In the field of performance measurement, the work accompany these reports. of Bard at the IBM Cambridge Scientific Center is the most noteworthy application of regression anal- It is sometimes desired to assess the impact of ysis. In his paper Bard specifies all of the meas- major system changes such as operating system urements taken by the CP/67 System at the Scien- releases, addition of hardware, or the rewriting tific Center. In four separate sections, he pre- of a major application program. Statistical tech- sents the results of four regression analyses. The niques are available to analyze this impact when first application is an attempt to relate system sufficient data has been gathered after the change overhead (CPTIME) to other measured variables in has been made. Classical statistical methods en- the system. The second analysis expresses able the comparison of means and variances to throughput as a function of the number of active test whether changes in selected variables are users. The third analysis is concerned with the significant. saturation of the system. Having constructed a regression function for throughput as a measure of An example of this kind of measurement analysis overall system performance, it was possible to de- is given in Margolin, Parmalee, and Schatzoff. fine the points at which an increase in one variable Selected on-line measurements automatically re- could only be made at the expense of some other corded by a software monitor within the CP/67 op- independent variable. The last analysis of his erating system were used to assess the impact on paper is a regression study of the free storage man- supervisor time of changes to the free storage agement data previously analyzed by Margolin, management algorithm. The authors were careful Parmalee, and Schatzoff. to design their experiment according to the meth- ods outlined in the preceding section. Their ex- While these are only a few examples of the anal- periment consisted of changing the free storage ysis of data measured on-line, the potential use- management algorithm on successive days. The fulness of statistical methods and data analysis system was not altered in any other way during are evident. the experiment and normal user activity took place. To avoid a possible source of extraneous variation, the two-week experiment was designed SIMULATION to have different versions of the free storage man- agement algorithm randomized over the days of the Performance measurement and evaluation studies week. It was known that user activity varied over which involve simulation of the computer system the days of the week with low usage on Fridays are generally undertaken either (P because analytic

125 .

models are too complex, or (21 because the system measurement and evaluation may find great poten- under investigation is not yet operational. tial in the application of the statistical and data analytic methods. We feel that by its inherent Analytic modelling requires a great number of fac- flexibility, data analysis is perhaps the prom.ising tors to be expressed and their interrelationships tool. defined in a concise mathematical form. Often this is impossible or simply too complicated and When adequate theory is available, the inferen- simulations are built which try to mimic the sys- tial methods of classical statistical analysis may tem. Simulations are subject to many possible be used to test whether what has been observed difficulties concerning the level of detail which conforms to that theory. will be incorporated into the model. Additionally, the job mix which is run through the simulation must also be properly characterized to cause the REFERENCES model to behave as the real system would.

Bard, Y. , "CP/67 Measurement and Analysis: Over- The lack of proper theory as to how one builds a head and Throughput, " in Workshop on System

model and how to characterize workload constitute Performance Evaluation , New York, Association the difficulties with simulations. For the reasons for Computing Machinery, March 19 71. stated above, however, simulation is often the Cox, only recourse available to derive estimates of sys- D. R. , Planning of Experiments , New York;

tem performance John Wiley & Sons, Inc. , 1958.

But for the lack of realism, performance measure- Kirk, R. E., Experimental Design Procedures for ment and evaluation studies are conducted using the Behavioral Sciences, Belmont, California, the simulation model much as they would be con- Brooks/Cole Publishing Company, 1968. ducted on a real system. The arguments advanced in the two preceding sections for the use of sta- Margolin, B. H., R. P. Parmalee, and M. tistical methods and data analysis techniques are Schatzoff, "Analysis of Free Storage Algorithms" equally valid when analyzing the output from a in Statistical Computer Performance Evaluation, simulation of a computer system. Walter Frieberger (Ed.), New York, Academic Press, 1972.

CONCLUSIONS Tsao, R. H. , L. W. Comeau, and B. H. Margolin,

"A Multi-factor Paging Experiment- I The Ex-

We began this paper by pointing out that com- periment and Conclusions," in Statistica 1 puter performance measurement and evaluation Computer Performance Evaluation, Walter lacked an underlying theory and research method- Frieberger (Ed.), New York, Academic Press, ology. It was not the aim of the paper to propose 19 72. a theory, but to call attention to an existing re-

search methodology which has found wide spread Tsao, R. H. , and B. H. Margolin, "A Multi-factor use in other disciplines. Paging Experiment: II Statistical Methodology" in Statistical Computer Performance Evaluation , It is hoped that convincing arguments have been Walter Freiberger (Ed.), New York, Academic made to support the assertion that performance Press, 1972.

126 . . .

A SIMULATION MODEL OF AN AUTODIN AUTOMATIC SWITCHING CENTER COMMUNICATIONS DATA PROCESSOR

LCDR Robert B. McManus

Naval Postgraduate School

I . INTRODUCTION The ASC's are the interconnected message and circuit switching points of the AUTODIN, each As part of my thesis for a Master's Degree at the originally designed to accommodate 250 message Naval Postgraduate School, with the assistance switching users and 50 circuit switching users. of Professor Norman Schneidewind , my advisor, There are eight leased ASC's in the United States I developed a simulation model of the AUTODIN and eleven government owned ASC's overseas. Automatic Switching Center (ASC) Communications The geographic location of each ASC is shown in Data Processor (CDP) located at McClellan Air Figure 1, While viewing this figure, it should be Force Base, California. pointed out that each ASC is not connected to every other ASC in the network. Rather, the ASC This paper presents a brief network description is connected by trunks to the closest five or six focusing on the CDP, in order to give an idea of ASC's with relay capabilities through these ASC's the operation of the AUTODIN as a whole and how to the other units in the network. the CDP fits into the overall picture. This net- work description will be followed by a discussion of the simulation program construction, including B. ASC Major Functions the more important restrictions required. Next will be highlighted some of the anticipated uses The four major functions performed by the ASC are: of this and like simulation models. Finally, some message switching, circuit switching, message conclusions will be drawn concerning the AUTODIN processing, and message bookkeeping and pro- a result of as experiments conducted using the tection. The message switching function is the model only one to be covered in this paper in that it is by far the most important function performed and is the only one dealt with in the simulation model. II. BACKGROUND The message switching function involves taking A. General Network Description the received message from the various incoming channels of the ASC, converting the code and The Automatic Digital Network (AUTODIN) is a speed as necessary to conform with the intended switched network, which along with the Auto- outgoing channel, and delivering the message to matic Voice Network (AUTOVON) constitutes a the proper outgoing channel. The message trans- major portion of the worldwide Defense Communi- mitted by the system conforms to certain specified cations System (DCS). The AUTODIN processes formats, all messages being composed of a 95% of the Department of Defense record traffic, header, text, and ending. which amounts to about one million messages a day. It currently costs about $80 million a year Message switching employs the store-and-forward to operate this system for over 1300 subscribers concept of transmission, in which each message located throughout the world is accumulated in its entirety at each ASC on its route, then routed and retransmitted as outgoing The AUTODIN is made up of Automatic Switching channels become available. Selection of Centers; ASC tributaries, which comprise the messages for retransmission over available out- 1300 subscribers; and the interconnecting cir- going channels is made on a first-in-first-out cuits, trunks, transmission links, and terminal (FIFO) basis, according to message precedence. facilities

127 >-

_l 5 5 u- u. < < ^ Z <<<<<<

'a

5 - < Z S o i: uj < r: cc 25 H ^ O >• 3: 2 2 C o = o O< _• ^ u. S o c: O t UJ a." < < '^- K t , o vu) . . I- H Q (/I to ci lu n i/i en i_ r-i < iii 2 ^ V < o o p- o cj 5 < c:< o > > y o -) o. < < < re o o u 2 i: U«/>

o

o u. u_ u_ u. u. u. •H 2:<<<«f <:=!<<< M t/1 t/1 CO (/) '.T (/) (/I VI Ctt 3333333333 O o *A o

u*

O < < u-~ > 9 oO -, a < O a. O O o < (J < ti. ^ u. ^ ? 2 »i CO — ^ < o ^. 2 c: iLi O o ^- 3 < S o CO 3 -J t_ UJ

128 .

C. ASC Major Equipment The Communications Data Processor (CDP) is a large-scale digital computer designed especially The major functions of the ASC are accomplished to perform and control communications functions. by utilizing several special purpose computers, The CDP controls the flow of data through the ASC related peripheral devices and necessary com- and processes the data according to the AUTODIN munications equipment. A block diagram of the program and commands from the system console. major components of an ASC operating system is The CDP receives the message one lineblock at a provided in Figure 2, which will be a helpful time from the ADU. The CDP, upon receipt of the reference in the following discussion of some of first lineblock, validates the message header and the more important pieces of equipment and their determines proper routing for the message. The operation. lineblocks are then accumulated in memory one lineblock at a time, with an acknowledgment The message arrives at the ASC over incoming returned after each, until the end of the message channel lines from the ASC tributaries or inter- is received. Acknowledgment of message receipt connecting trunks. The message transmission is then transmitted to the sender. The message along these circuits is in a carrier form up to the is then linked in the appropriate address queue to MODEM and a low level D. C. signal thereafter. await channel assignment and movement on a The MODEMS are also used to convert this analog FIFO basis, subject to message precedence and signal used for transmission in the voice frequency line availability. There are two CDP's at each band back to digital signals for ASC internal pro- ASC with one on-line and the other in stand-by or cessing. The purpose of monitoring and measuring utilized for off-line work. The CDP is made up of instruments is to indicate communication line con- a Basic Processor Unit (BPU) , a High-Speed tinuity and signal quality. The cryptographic Memory (HSM) , a Processor Operator's Console, devices provide traffic and a series of Transfer Channels which interface security protection, and the switching facilities various peripheral devices with the BPU. Two of connect buffering equipment to the corresponding the most outstanding features of the CDP are its communication channels. simultaneity and interrupt features. The CDP has the ability to operate several of the peripheral The Buffer System links the Technical Control devices asynchronously with internal processing. facility with the Accumulation and Distribution At the termination of each peripheral device opera- Units (ADU), provides temporary storage for tion, the current CDP program is automatically speed matching and contains the controls neces- interrupted, and all register settings are stored. sary to interface the ADU with the communication The interrupt feature enables return to this point channels in the program at a later time without loss of con- tinuity. These features allow the system to initi- The Accumulation and Distribution Unit is a ate peripheral device operation and continue with special purpose computer operating with combined other tasks with a resulting reduction in message wired-in and stored programs. Its purpose is to transit times and increase in computer throughput. provide the necessary controls and storage for data both being received from and transferred to Of the many peripheral devices contained in the communications channels. Each ASC has Figure 2, the Intransit Storage Unit or Mass three ADU's, with two being on-line and one in Memory Unit (MMU) and Overflow Tape (OVT) are stand-by. The ADU holds the messages destined the two most important to the message switching for the CDP in one of three ADU zones, which are function. The MMU is the temporary storage area the terminus of the input transfer channels and where messages received at the ASC reside until starting point for outgoing transfer channels. outgoing channels become available and complete These zones have been labeled as High, Medium, retransmission out of the ASC is effected. The and Low Speed Zones. The ADU transfers MMU consists of disc or drum storage units. messages from and to the CDP, translates receiv- The OVT is used to hold messages when "over- ed message characters into Fieldata code for use load" conditions exist, i.e., when the capacity in the CDP and stores the lineblocks in the ADU of the MMU storage has been exhausted. core memory until the complete message is for- warded to the CDP. For outgoing messages, the ADU translates the message from Fieldata to an III. SIMULATION PROGRAM CONSTRUCTION appropriate code to match the recipient's equip- ment, and passes the message to the line termina- A. Simulation Model Coverage tion buffer, where it passes out of the ASC. The portion of the message switching operation

129 -

MODEM TECHNICAL MODEM CONTROL -ENCRYPTED

/^ /

1 1

I 1 .

JL INPUT TECH. OUTPUT TRIE/ CRYPTO CONTROL CRYPTO TRIE/ TRUNK CLEAR 'A TRUNK

CIRCUIT SWITCH UNIT

BUFFER

Circuit Switching Path Message Switching Path

Figure 2. ASC Major Components.

130 . . . .

covered by the simulation model is contained in queues, transit times , storage and facility utili- Figure 3. Messages are transferred from the zations, and message sizes. A factor derived from three ADU zones via the ADU/CDP transfer actual system through-put was computed to measure channel into the HSM of the CDP. Input process- the degree of simultaneity which is present in the ing of the message is performed, which includes CDP. This factor is the ratio of through-put using interpretation of the destination from the Routing multiprogramming to through-put using monoprogram- Indicators contained in the heading, acknowledg- ming. The simulation times were modified by this ment of receipt of the message to the sending ASC factor so that the model does in fact approximately or tributary, plus other administrative functions. reflect simultaneity of operation. The simultaneity feature of the CDP was incorporated so that mean Upon completion of Input Processing, the message service times computed by the simulation model is transferred to Intransit Storage on the MMU via would be accurate representations of those actually the CDP/MMU transfer channel. Upon availa- experienced by the ASC. bility of the proper channel for the outgoing message, the message is transferred to the HSM In the simulation model, messages were generated again for Output Processing via the MMU/CDP at the actual mean arrival rate of messages at the transfer channel. Upon completion of Output McClellan ASC, calculated over a four- month Processing, the message is transferred out of the period. The mean message size was ascertained CDP to the appropriate ADU zone via the CDP/ADU in the same way. transfer channel. The characteristics of the actual CDP functions and values assigned are summarized below. See B. Characteristics of the Simulation Model Figure 4, which shows the model components a nd processing steps, in conjunction with the belo The simulation model was written in the General description: Purpose Simulation System (GPSS) language and run on the IBM 360/67 located at the Naval 1. Messages were generated at the rate Postgraduate School. of .35 messages per second under normal conditions. In the absence of specific The characteristics of the CDP are treated in this information on the distribution of inter- simulation model from the functional point of arrival times, an exponential distribution view. This reduced the level of detail to within was assumed manageable limits, enabled the use of the GPSS language, and, in fact, made the simulation 2. The mean message length was 33 line- effort feasible at all. The individual instructions blocks. An exponential distribution was are not simulated, but times required to perform assumed. Messages vary in length, but the various functions are calculated and appropri- a lineblock is standardized at 80 informa- ate delays utilized. If the simulation had been tion carrying characters, plus 4 framing formulated to the level of detail of the individual characters. Each character is 8 binary instruction, the use of GPSS would not have been digits in length. possible and the CDP simulation program execu- tion would have run slower than the actual in- 3. The transfer channel between ADU and struction execution time. Even with this simpli- CDP has a transfer rate of 1240 lineblocks fication, one day of real time required over 30 per second minutes of computer time to execute. 4. Input processing consists of a mean of In addition, as was pointed out in the background 20 thousand computer instructions, expo-

section, one of the important features of the CDP nentially distributed; with from 1 to 77 is its simultaneity of operation and multiprogram- , uniformly distri- ming ability. The result of this is that messages buted; and from 1.2 to 2 microseconds of are not actually handled serially through the sys- execution time per cycle, uniformly dis- tem, but rather, many operations occur simultan- tributed. Input processing is carried out eously on different messages throughout the sys- in the HSM which has a capacity of 6461 tem. The approach taken in the simulation is lineblocks again a departure from the way things actually happen in the system. Messages are generated 5. The transfer between the CDP and MMU and flow serially through the steps of the simula- has a transfer rate of 4166 lineblocks per tion, with statistics gathered on such items as second

131 1

1 1 ^ 1 <, k; I [-1 o < < s: ^ 1

UQ O H

o u H

:2O rH

+j Q

0) bO cd w in

to

1 >- •H

1 < H4 H O • < S. < W Q "

1

132 I

O W rH Q W < CO o \2 2: < Ph <; < [X, o cs^ n: U H CJ

w CO w i-i m E-i CO t-l rH !ID (-U ^ 12: § U LO -5 CJ Ph CJ 002 < CO < H 0 o H VO 2; H CO < J 0 o, CO QPh CJ CO CTi H CO :3 w N^; < w CJ Cti w H 0 0 Di w ^3 ^ < :3 0 P-( cy

Ph W H CO W s w < CJ CJ W < ^~^ r-A :3 CJ \0 —) w Cl< Pi :r> Ph p:: < "^a- m Q :=) < o s ;=) < cm LO 2 C U pq H < -3 CO u

Csl CO Ph W W E-t CO CO C5 Ph J?; W CJ CO z < d. CJ Ph CJ \2: < 2 0 C5 < p < = CJ Uh 2 U cm

CO w H COW X < ^ :d u u w a, 0 0 ci :=> 2 ^ < cy

;

cm >- w CO o u w < t— \ Ph W CJ 3 Ph Pi < o Q 3 < Ph H < CO CJ

Ph Pi rH C5 W W U Ph u \co ;^ < P S2 < tL. cri <; x: < CJ

133 . . . .

6. The MMU has a capacity of 112896 model forces the manager and programmer to state lineblocks, of which 75% Is allocated to clearly and explicitly his understanding of his intransit storage. When 80% of the space system allocated to intransit storage is filled, emergency procedures are initiated and low precedence traffic is diverted to the B. Isolation of Critical Areas Overflow Tapes. Therefore, the actual capacity of the intransit storage under Systematic testing of the model provides valuable normal conditions is 64512 lineblocks. insights into the most sensitive and critical areas

of system performance . By using the available Three user chains were constructed in the data the model can be tested and examined for MMU to correspond with the three ADU errors in data values, logic, and functions. Upon zones. Time delays in each chain and satisfactory completion of the model, actual sys- the amount of traffic which occupied each tem testing with data values ranging from optimis- chain was determined by calculating the tic to pessimistic can give indications of areas actual percentage of traffic which was where system performance is weak and where handled by each tributary in that zone. changes are needed or additional analysis is The high speed zone has a capacity of required 28 lineblocks and is the terminus of all 4800 and 2400 baud circuits and trunks.

A total of 74.3% of the traffic occurs on C . Ascertaining Configuration Possibilities these lines. The medium speed zone has a capacity of 18 lineblocks and is the Since the AUTODIN is a relatively complex real- terminus of all 1200 and 600 baud lines. time computer system, there is a variety of design, A total of 18.6% of the traffic handled by equipment and configuration possibilities available. the ASC occurs on these lines. The low Once a model is developed, it can be used to speed zone has a capacity of 6 lineblocks evaluate possible configuration changes. This is and is the terminus of all 300, 150, 75, vastly superior to actually carrying out the mod- and 45 baud lines. A total of 7.1% of the ification of the system and then ascertaining the traffic occurs on these circuits. effect.

7. Output processing consists of a mean V. CONCLUSIONS of 10 thousand computer instructions with the same characteristics as for input pro- A. Experiments Conducted and Their Results cessing .

The various experiments conducted using the sim- With regard to the simulation output, the ulation model were combinations and variations GPSS automatically provides an output of two basic experiments. First, the rate of in- which contains the number of transactions putting transactions was varied. This had the which occupied each block of the simula- effect in this simulation of changing the system tion, plus a number of performance measure- traffic loads into the ASC. Second, the ability ment criteria for items in the simulation. of the tributary to receive traffic was varied by Other items are available, but must be modifying the values in the three user chains programmed separately. See Figure 5 for within the MMU a sample of the GPSS output. The more important performance measurement From these experiments the following conclusions criteria include: mean service times, were drawn: mean input and output processing times, utilization percentages, mean wait times 1. The present CDP is more than adequate and average storage contents. to handle even drastic increases in traffic volume

IV. ANTICIPATED USES OF THIS AND LIKE 2. Tributary caused backlogs are going to SIMULATIONS build up in the CDP and the available storage on the MMU may well be exhausted. A. Aiding Problem Definition 3. The need for an upgrade of the ASC tribu- The analysis necessary to construct the simulation taries, on an as-required basis, is indicated.

134 Z '

Figure 5

COMPUTER OUTPUT

»- >r >r o ^ < r- O f^J fN' r- r- r- r- r*-

t-HOOOOOOOOOO ^-ZOOOOO UJ UJ en se a: a: (->/) rj 21- o O UlZ UJ Or-.UJ ^.-^f^Jrr|^JlA'Or-coc^o iilf-irsiro^in otl- -J- ^ ^ «r 5:-mc^c^ < 0 OJ CD -.ui^srrg <2 3e:o o UJCC —J 0* c o in in iTMA ir> j>00-0«0-0-00 XH- zi- r-

ZOOOOOOOOOo 20000000000 . 32 UJZCOom KJ UJ 00 Ul of -OO C£ C£ 3 >ujcr 0 19 a <>-^ .H St; ^ rg m in ^ CO o ^i-)r\jro^in^^-cDchO 20 — O o 1- -J _l CL • UlZ m CD UJi/nooO 0uji-iaJC7' 2 0z t:jOooo o 00 >UJ OZ "5 3: z •-t M • UJ t- lsll/> 3 _l i- UK 1/1 < 20000000000 zoooooooooo Ul > UJ Ul ooa^ t_ Ul u: C£ a: cc 1- rgOsJO UJO • • (-1/) 2 coooco ooo>o Ol- *sji— coOl/^Q ^irsIfsifMfNjrvaojrMiNjtMrM <o o ooooo^oo UJ~0OO _J 0 pn m n ri .V| n m m >_j . . . < X :^ (/I > UJ o uiz-t rjo 1- O 0otJ^O CC jiLi0^cro2 O 1- t- a>- • • • UJ <.-(M'NJ'0< 200000000*^0 zoooooooooo (-Qf-'-'Oc: UJ Ut UJ >ujraij^ UjU-TgnjfsjO ^cc O o O 1 Oh-OOOl- t- QC cc Z O a of 3:crooofM(^ oo*^ Q - UJ l:: y- UJI/IOO ^ CD Set o o u 22 Ol— (D — 2 lU CD 5<:'^(Nim>f in^r-ooo^o Ctuj • . • i/l UJH-— UJ uji/ir^lO OUJ o o >2 #^w 3 ci-oooo -J _i z > O ^UfCC o O >_HM «c < o UJ>— «1<o-ommLiu"Mninin trt-or-ocrcc--'Cfn'~* ^ <1 (M rsj nj (M ^i CM ^ ^ t\j Cl- <<0^00000(N-0 >. ^CM — >-z o^^sIOoooooooo 1- ,0 — -0 0 ClOOOOOOOOOO Ul >j-inNr oc 2 K- c**. m PI n 1^1 ro rn 0 inNtm a El/1 — r<-.«i < *o 3 31- OC a. _i zz >- - u. u.ccc^ ^(^-^m^o^-ocao {N. -J C u t- •-' U < u i/:c I'. 2< — U.C < u > U c u. I- c r> u— < U. -J_J V t~ 3 U. V. O

135 . . , , , ,

In the final analysis, the simulation program Norman, Eugene and Lloyd Rice, "Research Report- developed and discussed in this paper permits AUTODIN and its Naval Communications Inter- a limited analysis of the ASC and its CDP. face," Management Quarterly , May 1972. As follow- on efforts provide tools for analyzing additional portions of the AUTODIN ASC, more Reddig, Gary A, , Captain, USAF, Communications comprehensive analyses can be made. Hope- Operating Performance Summary , McClellan fully, one day all segments will be completed AFB, California, 13 October 19 72. and can be put together so that their interaction can give some indication of overall system Reitman, Julian, Computer Simulation Applications , performance John Wiley and Sons, Inc., New York, 1971.

Rubin, Murry and Haller, C. E. , Communications

Switching Systems , Reinhold Publishing Corp. , New York, 1966. BIBLIOGRAPHY Sasieni, Maurice and Yaspan, Arthur and Lawrence,

AUTODIN System and Equipment Configuration , Friedman, Operations Research — Methods and

Technical Training Manual 4ALT 29530-1-1-1 , Problems John Wiley and Sons Inc. , New York, Department of Communications Training, 1959. Sheppard AFB, Texas, February 1971. Schulke, H. A., BGEN/USA, "The DCS Today and

Boehm, B. W., Computer Systems and Analysis How it Operates," Signal , July 1971. Methodology Studies in Measuring, Evalua-

ting, and Simulating Computer Systems , Rand Seaman, P. H., "On Teleprocessing System Corporation Report R-520-NASA September Design--The Role of Digital Simulation," 1970. IBM Systems Journal, Vol. 5, NR 3, 1966.

Computer Sciences Corporation Report R41368I-2- Study Guide, AUTODIN Switching Center Opera - 1, DIN Simulator Program Reports, May 1972. tions--General Description of the AUTODIN Program, Sheppard AFB, Texas, 15 July 1970.

DCA AUTODIN CONUS-System Instruction Manual , RCA/DCS Division, Defense Electronics Wagner, Harvey, Principles of Management

Products, Camden, N. J. , 1 August 1969. Science with Applications to Executive Decisions Prentice-Hall, Inc., Englewood

DCS AUTODIN Switching Center and Tributary Cliffs, N. J. , 1970. Operations Defense Communications Agency

Circular 3I0-D70-30, 10 June 1970. . Personal Interview with R. A. Ondracek, Western Union Maintenance Analyst, ASC

DCS Traffic Engineering Practices , Defence McClellan AFB, California, 3 November 1972. Communications Agency Circular 300-70-1,

May 1971 . Personal Interview with LCDR Hansen, OINC, Naval Message Center, Naval Post- Martin, James, Design of Real-Time Computer graduate School, Monterey, California,

Systems , Prentice-Hall Inc . , Englewood 18 January 19 73.

Cliffs, N. J. , 1967.

. Personal Interview with Alan Nabb, Martin, James, Programming Real-Time Computer Captain/USAF, OINC ASC Norton AFB,

Systems , Prentice-Hall Inc . , Englewood California, 14 July 19 72.

Cliffs, N. J. , 1965.

Schriber, Thomas J. , General Purpose Simulation Martin, James, Telecommunications Network System/360: Introductory Concepts and Case

Organization , Prentice Hall Inc., Englewood Studies The University of Michigan, Ann Arbor,

Cliffs, N. J. , 1970. Michigan, September 1968.

Nabb, Alan, Captain, USAF, Norton ASC Com -

munications Operating Performance Summary , Norton AFB, California, May 1972.

136 ATTENDEES

EIGHTH MEETING OF

COMPUTER PERFORMANCE EVALUATION USERS GROUP

Marshall D. Abraras John J, Bongiovanni National Bureau of Standards AFDSDC/SYO Technology B216 Gunter AFB, Al. 36114 Washington, D.C. 20234

Jim N, Bowie PFC Edward B. Allen AMC ALMSA US Army Computer Systems Command 210 North 12th Street CSC S -AT STOP H-14 St, Louis, Mo. 63188 Ft. Belvoir, Va. 22060

N , J . Brown Russell M. Anderson NAVCOSSACT 1900 Lyttonsville 1002 Code 10.1 Wash Navy Yd. Silver Spring, Md. 29010 Washington, D.C.

Philip Balcom Richard A. Castle US Army Computer Systems Command US Army Computer Systems Command STOP H-6 Ft, Belvoir, Va, 22060

• Ft, Belvoir, Va. 22060 .

John F. Cavanaugh Dr. N. Addison Ball NAVORD SYSCOM CENO National Security Agency US NAV ORD STA Ft. Meade, Md, 20755 Indian Head, Md. 20640

Rudie C, Bartel Dr, Yen W. Chao HQ USAF (ACDC) FEDSIM US Air Force Washington, D.C. 20715 9724 Eldwick Way Potomac, Md, 20854

Merton J. Batchelder US Army Computer Systems Command Dennis R. Chastain 7719 Bridle Path Lane USGAO McLean, Va, 22101 441 G St. NW RM 7131 Washington, D.C. 20548

T, K. Berge FEDSIM/CC Leo J. Cohen Washington, D.C. 20022 Pres., Performance Development Corp. 32 Scotch Rd. Trenton, N.J. 08540 Bob Betz Boeing Computer Services 12601 SE 60 St. Dr. Dennis M. Conti Bellevue, Wash. 98006 Naval Weapons Lab

Code : KPS Dahlgren, Va. 22448 Raymond S. Blanford Defense Supply Agency Cameron Station Rm. 4A-612 L. J. Corio Alexandria, Va. 22314 Western Electric Co. 222 Broadway New York, N.Y. 10038 John A. Blue US Navy ADPESO Crystal Mall Joe M. Dalton Washington, D.C. 20376 Control Data Corp. 5272 River Road Washington, D.C. 20016

137 Jim Dean James M. Graves Social Security Administration USAMSSA 6401 Security Blvd. Room BD1040 The Pentagon Baltimore, Md. 21235 Washington, D.C. 20330

Donald R. Deese Donald A. Greene FEDSIM/AY DSA-DESC-AST HQ USAF 1507 Wilmington Pike Washington, D,C. 20330 Dayton, Ohio 45444

Jules B. Dupeza Lowell R, Greenfield DOT NASA FRC 11900 Tallwood Ct. Box 273 Potomac, Md. 20854 Edwards, California 93523

Gordon Edgecomb Robert A, Grossman Chemical Abstracts Service NAVCOSSACT Ohio State University Bldg. 196 Washington Navy Yard Columbus, Ohio 43210 Washington, D.C. 20374

Emerson Eitner Vincent C. Guidace Chemical Abstracts Service Dept of AF FEDSIM Ohio State University Washington, D.C. 20330 Columbus, Ohio 43210

Col Michael J. Bamberger Richard B. Ensign DSA, Cameron Station FEDSm 11207 Leatherwood Dr. Washington, D.C. 20330 Reston, Va. 22070

Joseph Escavage I. Trotter Hardy Dept of Defense General Services Administration, CDSG 9800 Savage Rd. C403 Rm. 6407 R0B3 7th & D Sts SW Ft. Meade, Md. 20755 Washington, D.C. 20407

Robert H. Follett Norman Hardy IBM PR:SD Room 3137 IRS 1401 Fernwood Road 1111 Constitution Ave. NW Bethesda, Md, 20034 Washington, D.C, 20007

Hatt Dr. K. E. Forry MAJ W. L. , CAF USACCOM-COMPT-SA USAF/ESD/MCST-2 Ft. Huachuca, Ariz. 85635 L, C, Hanscom Field Bedford, Mass. 01730

E. A. Freiburger Senior Marketing Engineers CPT Harry A. Haught 3718 Tollgate Terrace Navy Annex Falls Church, Va. 22041 Arlington, Va.

Robert Gechter Herman H. Heberling Dept of Interior, CCD Fleet Material Support Office 19th & C Sts NW Mechanicsburg, Pa, 17055 Washington, D.C. 20242

MAJ Andrew Hesser CPT Jean-Paul Gosselin USMC DND Computer Centre, Tunney's Pasture HQTRS (ISMDS) Ottawa, Ontario Washington, D.C. 20380 Canada KIA0K2

138 William J. Hiehle Lt. Raymond S. Jozwik Defense Electronics Supply Renter USAMSSA 1102 E. Linden Ave. 2500 N. Van Dorn St. Mlamisburg, Ohio 45342 Alexandria, Va. 22302

Dr. Harold J. Highland B. A. Ketchledge Chairman SIGSIM Bell Laboratories 562 Croydon Road Piscataway, N.Y. 08704 Elmont, N.Y. 11003

Philip J. Kiviat Lt. Donald W. Hodge, Jr. FEDSIM HQ SAC (ADISE) Dept of Air Force Offutt AFB, NE 68113 Washington, D.C. 20330

John N. Hoffman John T. Klaras Boeing Computer Services Dept of Air Force 955 L'Enfant Plaza North SW Rm 1D118, Pentagon Washington, D.C. 20024 Washington, D.C. 20330

John V. Holberton Ronald W. Korngiebel GSA Connecticut General Life 10130 Chapel Road Bloomfield, Connecticut 06002 Rockville, Md, 20854

Mitchel Krasny Marjorie E. Horan NTIS MITRE 5285 Port Royal Rd, 1820 Dolley Madison Blvd. Springfield, Va. 22151 McLean, Va, 22101

David W, Lambert Larry G. Hull Mitre Corporation NASA-GSFC P, 0, Box 208 Code 531.2 Goddard Space Flight Center Bedford, Mass 01730 Greenbelt, Md. 20771

John H, Langsdorf George Humphrey HQ USAF (ACD) Naval Underwater Systems Washington, D,C. 20330 Newport Naval Base Newport, R.I. 02871 Don Leavitt Computer World Frank Hunter S. 797 Washington St. US Army Computer Systems Command Newton, Mass. 02160 7704 Random Run Lane, Apt. 104 Falls Church, Va. 22042

William J. Letendre HQ ESD/MCDT Stop 36 Jau-Fu Huoang Hanscom AFB, Ma. 01730 NAVCOSSACT, Dept of Navy Bldg. 196, Washington Navy Yard Washington, D.C. William P. Levin Defense Supply Agency Cameron Station, Rm. 4B587 LT Robert L. James Alexandria, Va, 22314 USAF-AFDSC AFDSC/GMJ The Pentagon Washington, D.C. 20330 Jack R. Lewellyn HQ NASA Code TN W. Richard Johnsen Washington, D.C. 20546 US Army Computer Systems Command CSCS-ATA-S, Stop H-14 Ft, Belvoir, Va. 22060

139 Allen LI W. J. O'Malley Federal Hwy Adm Internal Revenue Service 400 7th St. S.W. 1111 Constitution Ave NW Washington, D.C. 20590 Washington, D.C.

MAJ J. B. Lloyd J. P. O'Rourke AFDSC NAVCOSSACT The Pentagon Washington Navy Yard Washington, D.C. 20330 Washington, D.C. 20374

Alex Machina Chris Owers CENG Social Security Administration P.O. Box 258 6401 Security Blvd. Indianhead, Md. 20640 Baltimore, Md. 21235

Neil Marshall Charles W. Pabst Pinkerton Computer Consultants NAVCOSSACT 8810 Walutes Circle Washington Navy Yard Alexandria, Va. 22309 Washington, D.C. 20374

Sally Mathay Frank J. Pajerski Central Nomix Ofc NASA - Goddard NAVORDSTA Code 531 NASA-Goddard SFC Indianhead, Md, 20640 Greenbelt, Md. 20371

CPT Norm Hejstrik Frank V. Parker Air Force Comm Service/L-93 US Navy (NAVCOSSACT) Tinker AFB, Okla 73145 819 Duke St. Rockville, Md, 20850

Ann Melanson USAF Richard D. Pease Hanscom Field IBM Corp. Bedford, Mass 01730 10401 Fernwood Rd. Bethesda, Md. 20034

CPT Kenneth Melendez USAF, ESD, L. G. Hanscom Field Emery G. Peterson ESD/MCI USATRADOC-DPFO Bedford, Mass 01730 Fort Leavenworth, Kansas 66027

Anne Louise Moran MAj Ernest F. Philipp Dept of Defense USAF/SAC Ft. Meade C403 7910 Greene Cir. Ft. Meade, Md. Omaha, Ne 68147

Michael F. Morris Charles Pisano FEDSIM Fleet Material Support Office 1919 Freedom Lane Mechanicsburg, Pa, 17055 Falls Church, Va. 22043

Terry W. Potter Deanna J. Nelson Bell Telephone Laboratories DOD Computer Institute Box 2020 Rm. 4A1105 Washington Navy Yard New Brunswick, N,J, 08903 Washington, D.C. 20374

Marvin D, Raines George R. Olson US Army Computer Systems Command Control Data Corp. 513 Shelfar Pi, 8100 34th Ave SO Box 0 HQNIOU Oxon Hill, Md, 20022 Minneapolis, Minn. 55440

140 Mr. C. D. Ritter James Sprung US Geological Survey FEDSIM 4615 Gramlee Circle 5240 Port Royal Rd. Fairfax, Va, 22030 Springfield, Va. 22151

G. F. Rizzardi Don W. Stone Dept of Air Force TESDATA Systems Corp. AFDSC/SF, Pentagon 235 Wayne Street Washington, D.C. 20330 Waltham, Mass, 02154

E. E, Seitz Hal Stout US Army RSJ) COMRESS 3205 Plantation Pkwy. 2 Research Ct. Fairfax, Va, 22030 Rockville, Md, 20850

William Selfridge Herbert D, Strong US Army Computer Systems Command Jet Propulsion Laboratory Stop H-6 4800 Oak Grove Dr. Ft, Belvoir, Va. 22060 Pasadena, Calif. 91011

A. C. Shetler COL. H.J. Suskin The Rand Corporation Defense Electronics Supply Center 1700 Main Street Wilmington Pike Santa Monica, Wash. 90406 Dayton, Ohio

Ronald L. Shores Marvin Sendrow Atomic Energy Commission Fed, Home Loan Bank Board Washington, D.C. 20545 101 Indiana Ave, N,W. Washington, D.C. 20552

Eileen Silo Dept of Defense Kay Shirey 9800 Savage Rd. US Army MILPERCEN Ft, Meade, Md. 20755 DAPC-PSB-B Hoffman II Bldg, 200 Stovall St, Alexandria, Va, 22332 George J, Skurna Defense Elect, Supply Ctr, 1507 Wilmington Pike Nora M. Taylor Dayton, Ohio 45444 Naval Ship R6J) Center Code 189.1 Bethesda, Md. 20034 Charles L. Smith The Mitre Corporation Westgate Research Park Harold J. Timmons McLean, Va. 22101 DSA, Data Systems Automation Office c/o Defense Construction Supply Ctr. Columbus, Ohio 43215 George M. Sokol Deputy for Engineering US Army Computer Systems Command Terry L. Traylor Ft, Belvoir, Va, 22060 NAVCOSSACT Bldg 196, Code 101 Washington Navy Yard Washington, D.C, David V, Sommer NSRDC Annapolis, Md, 21401 John B. Trippe OA-914, The Pentagon Washington, D.C, 20330 John W. H. Spencer Bureau of the Census Statistical Research Division Ray G. Tudor Washington, D.C. 20233 SAC HQ (USAF) 814 Matthies Dr. Papillion, Ne 68046

141 Ralph R. Turner F. C. Warther US Army Tradoc - DPFO TESDATA Bldg 136 7900 Westpark Dr. Ft. Leavenworth, Kansas 66027 McLean, Va. 22101

Daniel M. Venese Michael E. Wentworth, Jr. The Mitre Corporation Social Security Administration Westgate Research Park 6401 Security Blvd. McLean, Va, 22101 Baltimore, Md, 21235

Daniel A. Verbois David P. Wheelwright USAMC - ALMSA GSA/ADTS/CDSD P.O. Box 1578 AMXAL-TL 7th & D Sts. S.W. St, Louis, Mo. 63188 Washington, D.C, 20407

H, C, Walder Brian R. Young Western Electric Co. , Inc. 222 Broadway 1502 Kennedy Plaza

New York, N.Y, 10038 Bellevue, Ne . 68005

142 NBS-114A (REV. 7-73K

U.S. DEPT. OF COMM. 1. PUBLICATION OR RKPORT NO. 2. Gov't Accession 3. Rc-cipient'.s Acccs.sion No. BIBLIOGRAPHIC DATA No. SHEET NBS Special Publ. 401

4. TITLE AND SUBTITLE 5. Publication Date September 1974 Computer Performance Evaluation 6. Performing Organization Code

7. AUTHOR(S) Various: Edited by Dr. Harold Joseph Highland, 8. Performing Organ. Report No. State University Agricultural and Technical College: N. Y. 9. PERFORMING ORGANIZATION NAME AND ADDRESS 10. Project/Task/Work Unit No. NATIONAL BUREAU OF STANDARDS DEPARTMENT OF COMMERCE 11. Contract/Grant No. WASHINGTON, D.C. 20234

12. Sponsoring Organization Name and Complete Address (Street, City, State, ZIP) 13. Type of Report & Period Covered FIPSCAC, National Bureau of Standards, Room B264, final

Building 225, Washington, D.C, 20234 14. Sponsoring Agency Code

15. SUPPLEMENTARY NOTES

Library of Congress Catalog Number: 74-13113

16. ABSTRACT (A 200-word or less factual summary of most si^ificant information. If document includes a significant bibliography or literature survey, mention it here.) The Eighth Meeting of the Computer Performance Evaluation Users Group [CPEUG], sponsored by the United States Army Computer Systems Command and the National Bureau of Standards, was held December 4-7, 1973 at NBS, Gaithersburg. The program chairman for this meeting was Merton J. Batchelder of the US Army Computer Systems Command at Fort Belvoir VA

22060 [ CSCS-ATA Stop H-14 } . About 150 attendees at this meeting heard the 17 papers presented on com- puter performance, evaluation and measurement. Among the papers pre- sented were those dealing with hardware and software monitors, workload definition and benchmarking, a report of FIPS Task Force 13, computer scheduling and evaluation in time-sharing as well as MVT environment, human factors in performance analysis, dollar effectiveness in evaluation, simulation techniques in hardware allocation, a FEDSIM status report as well as other related topics. These proceedings represent a major source in the limited literature on com- puter performance, evaluation and measurement.

17. KEY WORDS (six to twelve entries; alphabetical order; capitalize only the first lett'-r of the first key word unless a proper name; separated by semicolons) Computer evaluation; computer performance; computer scheduling; hardware monitors; simulation of computer systems; software monitors;

systems design and evaluation; time-sharing . systems evaluation

18. AVAILABILITY [x] Unlimited 19. SECURITY CLASS 21. NO. OF PAGES (THIS REPORT)

' For 1 Official Distribution. Do Not Release to NTIS UNCLASSIFIED 155 pages

Ix ' Order From Sup. of Doc, U.S. Government Printing Office 20. SECURITY CLASS 22. Price Washington, D.C. 20402. SD Cat. No. C13. 10:401 (THIS PAGE) $1.80 Order 1 ! From National Technical Information Service (NTIS) Springfield, Virginia 22151 UNCLASSIFIED USCOMM-DC 29042-P74

NBS TECHNICAL PUBLICATIONS

PERIODICALS National .Standard Reference Data Series — Provides JOURNAL OF RESEARCH reports National Bureau quantitative data on the physical and chemical proper- tics of materials, compiled from the world's literature of Standards research and development in physics, and critically evaluated. mathematics, and chemistry. Comprehensive scientific Developed under a world-wide program coordinated by NBS. Program under authority papers g"ive complete details of the work, including of National Standard Data (Public 90-396). laboratory data, experimental procedures, and theoreti- Act Law See also Section 1.2.3. cal and mathematical analyses. Illustrated with photo- graphs, drawings, and charts. Includes listings of other Building Science Series—Disseminates technical infor- NBS papers as issued. mation developed at the Bureau on building materials, components, systems, and whole structures. series Published in two sections, available separately: The presents research results, test methods, and perform- • Physics and Chemistry (Section A) ance criteria related to the .structural and environmen- Papers of interest primarily to scientists working in tal functions and the durability and safety character- these fields. This section covers a broad range of physi- istics of building elements and systems. cal and chemical research, with major emphasis on standards of physical measurement, fundamental con- Technical Notes— Studies or reports which are complete stants, and properties of matter. Issued six times a in themselves but restrictive in their treatment of a year. Annual subscription: Domestic, $17.00; Foreign, subject. Analogous to monographs but not so compre- $21.25. hensive in scope or definitive in treatment of the sub- ject area. Often serve as a vehicle for final reports of • Mathematical Sciences (Section B) work performed at NBS under the sponsorship of other Studies and compilations designed mainly for the math- government agencies. ematician and theoretical physicist. Topics in mathe- Voluntary Product Standards Developed under pro- matical statistics, theory of experiment design, numeri- — cedures published by the Department of Commerce in cal analysis, theoretical physics and chemistry, logical Part 10, Title of the of Federal Regulations. design and programming of computers and computer 15, Code The purpose of the standards is to establish nationally systems. Short numerical tables. Issued quarterly. An- recognized requirements for products, and to provide nual subscription: Domestic, $9.00; Foreign, $11.25. all concerned interests with a basis for common under- DIMENSIONS/NBS (formerly Technical News Bul- standing of the characteristics of the products. The ktin) This monthly magazine is published to inform — National Bureau of Standards administers the Volun- scientists, engineers, businessmen, industry, teachers, tary Product Standards program as a supplement to students, and consumers of the latest advances in the activities of the private sector standardizing science and technology, with primary emphasis the on organizations. work at NBS. Federal DIMENSIONS/NBS highlights and reviews such Information Processing Standards Publications (FIPS PUBS) Publications in this series collectively issues as energy research, fire protection, building — constitute technology, metric conversion, pollution abatement, the Federal Information Processing Stand- ards Register. serve health and safety, and consumer product performance. The purpose of the Register is to as the official source in Federal Gov- In addition, DIMENSIONS/NBS reports the results of of information the ernment regarding standards issued by NBS pursuant Bureau programs in measurement standards and tech- to the Federal Property Administrative Services niques, properties of matter and materials, engineering and Act of 1949 as Public 89-306 Stat. standards and services, instrumentation, and automatic amended. Law (79 data processing. 1127), and as implemented by Executive Order 11717 (38 FR 12315, dated May 11, 1973) and Part 6 of Title Annual subscription: Domestic, $6.50; Foreign, $8.25. 15 CFR (Code of Federal Regulations). FIPS PUBS NONPERIODICALS will include approved Federal information processing standards information of general interest, and a com- Monographs Major contributions to the technical liter- — plete index of relevant standards publications. ature on various subjects related to the Bureau's scien- tific and technical activities. Consumer Information Series—Practical information, based on NBS research and experience, covering areas Handbooks—Recommended codes of engineering and of interest to the consumer. Easily understandable industrial practice (including safety codes) developed language and illustrations provide useful background in cooperation with interested industries, professional knowledge for shopping in today's technological organizations, and regulatory bodies. marketplace. Special Publications— Include proceedings of high-level national and international conferences sponsored by NBS Interagency Reports—A special series of interim NBS, precision measurement and calibration volumes, or final reports on work performed by NBS for outside NBS annual reports, and other special publications sponsors (both government and non-government). In appropriate to this grouping such as wall charts and general, initial distribution is handled by the sponsor; bibliographies. public distribution is by the National Technical Infor- mation Service (Springfield, Va. 22151) in paper copy Applied Mathematics Series—Mathematical tables, or microfiche form. manuals, and studies of special interest to physicists, engineers, chemists, biologists, mathematicians, com- Order NBS publications (except Bibliographic Sub- puter programmers, and others engaged in scientific scription Services) from: Superintendent of Documents, and technical work. Government Printing Office, Washington, D.C. 20402.

BIBLIOGRAPHIC SUBSCRIPTION SERVICES

The following current-awareness and literature-survey ceding bibliographic services to the U.S. Department bibliographies are issued periodically by the Bureau: of Commerce, National Technical Information Sen-- ice, Springfield, Va. 22151. Cryogenic Data Center Current Awareness Service (Publications and Reports of Interest in Cryogenics). Electromagnetic Metrology Current Awareness Service A literature survey issued weekly. Annual subscrip- (Abstracts of Selected Articles on Measurement tion: Domestic, $20.00; foreign, $25.00. Techniques and Standards of Electromagnetic Quan- tities from D-C to Millimeter-Wave Frequencies). Liquefied Natural Gas. A literature survey issued quar- Issued monthly. Annual subscription: $100.00 (Spe- terly. Annual subscription: $20.00. cial rates for multi-subscriptions). Send subscription Superconducting Devices and Materials. A literature order and remittance to the Electromagnetic Metrol- survey issued quarterly. Annual subscription: $20.00. ogy Information Center, Electromagnetics Division, Send subscription orders and remittances for the pre- National Bureau of Standards, Boulder, Colo. 80302. U.S. DEPARTMENT OF COMMERCE National Bureau of Standards Washington, D.C. 20P34 POSTAGE AND FEES PAID U.S. DEPARTMENT OF COMMERCE OFFICIAL BUSINESS COM-2 1 5

Penalty for Private Use. S300